Computational methods for pangenomics and multiomics integration

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/165044
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1650448
http://dx.doi.org/10.15496/publikation-106373
Dokumentart: Dissertation
Erscheinungsdatum: 2025-05-05
Sprache: Deutsch
Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Nahnsen, Sven (Prof. Dr.)
Tag der mündl. Prüfung: 2025-03-28
DDC-Klassifikation: 500 - Naturwissenschaften
Schlagworte: Bioinformatik
Freie Schlagwörter:
pangenomics
pangenome graphs
multiomics
Lizenz: https://creativecommons.org/licenses/by/4.0/legalcode.de https://creativecommons.org/licenses/by/4.0/legalcode.en http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=en
Gedruckte Kopie bestellen: Print-on-Demand
Zur Langanzeige

Abstract:

Biomedical research models often simplify complex biological processes, with each focusing on one specific molecular mechanisms. For example, genomics can examine the heritable genotype of an organism, while the phenotype refers to the observable traits or characteristics of an organism resulting from the interaction of its genotype with the environment. However, a single data source is often insufficient to explain complex genotype phenotype relationships due to analysis bias. To address this, the integration of multiple omics data sources, multiomics, provides a more comprehensive approach. Nevertheless, some omics analysis techniques still rely on reference based methods, which can introduce reference bias and complicate the discovery of accurate genotype phenotype relationships. Pangenome models offer a solution by relating a representative set of genomic sequences within a population. Pangenome graphs, in particular, store both the shared and variant regions of a set of genomes in one data structure. The contributions of this thesis lie in two different fields: Multiomics and pangenomics. On the multiomics side this thesis showcases the explorative power of integrative multiomics for genotype phenotype validation and discovery in cancer immunotherapy. Through cell surface molecule profiling of cancer cell panel data, and integration with transcriptomics and proteomics data, I identified potential cancer specific markers. I validated biomarker candidates using public data to highlight the importance of comprehensive multiomics analysis and data integration for discovering and validating cancer specific biomarkers. On the pangenomics side this thesis explores two main research questions. First, to overcome the reference bias and implementation limitations of existing pangenome graph construction pipelines, I developed a cluster efficient, reference free pipeline to build pangenome graphs, enabling comprehensive genomic diversity studies. The second research question addressed the need to efficiently visualize and analyze pangenome graphs. Therefore, I developed a new layout algorithm that enables efficient visualization of pangenome graphs at the gigabase scale. Additionally, I implemented methods for detecting complex regions, manipulating structure, annotating, and performing exploratory analysis, which allow for comprehensive analysis of these graphs at the same scale. This enables researchers to examine the genotype phenotype relationships encoded in gigabase scale pangenome graphs in an unbiased manner. The results of this work show that integrating data from different biological origins improves interpretation and uncovers relationships that single data sources cannot, effectively mitigating analysis bias. The models proposed and the results presented in this doctoral thesis contribute to advancing current knowledge towards improved genotype phenotype discovery in biomedical research.

Das Dokument erscheint in:

cc_by Solange nicht anders angezeigt, wird die Lizenz wie folgt beschrieben: cc_by