Abstract:
The study of proteoforms, distinct molecular forms of a protein, is a key to understanding the complexity of biological systems and their underlying molecular mechanisms. For the analysis of proteoforms, top-down proteomics (TDP) based on mass spectrometry (MS) is currently the most powerful analysis technology. It allows intact proteoforms to be directly measured and characterized, preventing loss of information compared to the conventional bottom-up approach. Still, quantitative TDP measurement is an ongoing challenge. Accurate quantification of individual proteoforms is a critical step in identifying alterations in the proteome under various biological conditions. Several quantitative experiment methods for TDP have been introduced, but they still face significant challenges, especially data analysis methods.
In this thesis, we introduce several algorithms to tackle data analysis problems in quantitative TDP. Firstly, as deconvolution is a key first step in TDP data analysis to alleviate the complexity of MS signals in TDP, FLASHDeconv was developed. It is an algorithm for fast and robust spectral deconvolution, employing the novel idea of mass spectra transformation for decharging. FLASHDeconv promises not only unprecedented runtime but also more genuine deconvolution results compared to existing methods.
As FLASHDeconv paved the way for further data analysis steps, FLASHQuant was developed to specifically contribute to quantifying proteoform in a fast and accurate manner. Using a key algorithm of FLASHDeconv allows FLASHQuant to find proteoform features rapidly, and then coeluting proteoforms are resolved and quantified, providing accurate quantification results. Moreover, FLASHQuant showed highly reproducible quantification in both a simple targeted protein dataset and a complex proteome-level dataset.
To satisfy the strong need for a graphical user interface to visualize results from FLASHDeconv and FLASHQuant, we implemented FLASHViewer. It is a web application for visualizing deconvolved or quantified signals. To accommodate the diverse usage for both tools, informative plots are modularized to enable configurable layouts.
All algorithms discussed in this thesis were developed and implemented based on OpenMS, an open-source platform for computational mass spectrometry. As a part of the OpenMS community, the developed tools are all publicly available and platform-independent.