Abstract:
Biochemical test development can significantly benefit from combinatorial optimization.
Multiplex assays do require complex planning decisions during implementation and subsequent
validation. Due to the increasing complexity of setups and the limited resources,
the need to work efficiently is a key element for the success of biochemical research and
test development.
The first approached problem was to systemically pool samples in order to create a
multi-positive control sample. We could show that pooled samples exhibit a predictable
serological profile and by using this prediction a pooled sample with the desired property.
For serological assay validation it must be shown that the low, medium, and high levels
can be reliably measured. It is shown how to optimally choose a few samples to achieve
this requirements. Finally the latter methods were merged to validate multiplexed assays
using a set of pooled samples. A novel algorithm combining fast enumeration and a set
cover formulation has been introduced.
The major part of the thesis deals with optimization and data analysis for Triple X Proteomics
- immunoaffinity assays using antibodies binding short linear, terminal epitopes
of peptides. It has been shown that the problem of choosing a minimal set of epitopes
for TXP setups, which combine mass spectrometry with immunoaffinity enrichment, is
equivalent to the well-known set cover problem.
TXP Sandwich immunoassays capture and detect peptides by combining the C-terminal
and N-terminal binders. A greedy heuristic and a meta-heuristic using local search is presented,
which proves to be more efficient than pure ILP formulations.
All models were implemented in the novel Java framework SCPSolver, which is applicable
to many problems that can be formulated as integer programs. While the main
design goal of the software was usability, it also provides a basic modelling language,
easy deployment and platform independence.
One question arising when analyzing TXP data was: How likely is it to observe multiple
peptides sharing the same terminus? The algorithms TXP-TEA and MATERICS
were able to identify binding characteristics of TXP antibodies from data obtained in
immunoaffinity MS experiments, reducing the cost of such analyses.
A multinomial statistical model explains the distributions of short sequences observed
in protein databases. This allows deducing the average optimal length of the targeted
epitope. Further a closed-from scoring function for epitope enrichment in sequence lists
is derived.