Abstract:
In the last decade, the use of high-throughput methods has become increasingly popular in various fields of life sciences. Today, a wide range of technologies exist that allow gathering detailed quantitative insights into biological systems. With improved instrumentation and technological advances, a massive growth in data volume from these techniques has been observed. Bioinformatics copes with these heaps of data by providing computational methods that process raw data to extract biological knowledge. Computational mass spectrometry is a research field in bioinformatics that collects and analyzes data from mass-spectrometric high-throughput experiments.
In this thesis, we present two new methods as well as a new data format for computational mass spectrometry. The first method applies to a scientific problem from the field of structural biology: to determine spatial interactions between protein and nucleic acids. For this purpose, we develop experimental protocols, programs, and analysis workflows that allow identifying UV-induced cross-links in (ribo-)nucleoprotein complexes from mass spectrometry data. An outstanding feature of our method is the ability to exactly localize amino acids and (ribo-)nucleotides in contact with each other. Applied to data from yeast and human we identify new interaction partners with, to date, unmatched resolution.
The second method applies to metaproteomic studies of complex communities of microorganisms. In an unmanageable number, bacteria, simple fungi, or plants populate the most varied habitats. They are found in a high number of symbiotic or parasitic relationships which serve predominantly for the uptake of nutrients. Organisms differ in their biochemical repertoire allowing them to decompose a wide range of substrates. Remarkably, this enables functional groups of soil bacteria to even nourish themselves from environmental toxins.
We present a method from the field of metaproteomics, which allows for identification of organisms involved in substrate degradation as well as methods to group them according to their function in the degradation process. To this end, we use substrates labeled with stable isotopes, which are metabolized by the organisms. The isotope abundance in proteins serves as an indicator for the conversion of the substrate. This abundance is automatically determined by our novel computational method and assigned to the individual organisms. The automation of this process reduces the manual work from several months to a few minutes and, thus, enables large study sizes.
The third part of this work contributes to the better communication and processing of results from metabolomics and proteomics studies. We present a tabular, standardized, human-readable and machine-processable data format mzTab as a complement to existing data formats. We provide software components that allow processing of the format and demonstrate how the format can be integrated into complex proteomic and metabolomic workflows. The recent acceptance of mzTab by the largest proteomic data repositories represents a significant success. Also, we see an already widespread adoption by academic software developers and the first support by a commercial software vendor. Our novel format facilitates meta-analyses and makes research results from the field of proteomics and metabolomics available to scientists from other research areas.