Computational Analysis for Medical Research in Genomics and Metagenomics

DSpace Repository


Dokumentart: Dissertation
Date: 2018-12-17
Language: English
Faculty: 7 Mathematisch-Naturwissenschaftliche Fakultät
Department: Informatik
Advisor: Huson, Daniel H. (Prof. Dr.)
Day of Oral Examination: 2018-12-12
DDC Classifikation: 004 - Data processing and computer science
500 - Natural sciences and mathematics
570 - Life sciences; biology
610 - Medicine and health
Keywords: Bioinformatik , Genomik
License: Publishing license including print on demand
Order a printed copy: Print-on-Demand
Show full item record


The investigation of microbes in research has been changing with the rise of environmental sequencing from a view centered on an isolate microbe living in a laboratory setting to a broader view of microbial communities - microbiomes - as they thrive in their natural environment. Sequencing of an isolated organism generates essential insights and enables us to produce reference genomes and genomes annotation, which in turn let us compare different organisms through their genomes and study the metabolic pathways they utilize. In comparison to that, environmental sequencing does answer entirely different questions, and it does provide us with a much better view on how microbes live in their environment, what changes they undergo based on their interactions with their host, environment or with each other. It also enables us to study the genomes of microbes which previously could not be cultured in isolation and provides many more new possibilities to many different life sciences which are interested in the microbial community on earth. One of these sciences with a significant influence on our daily lives is medicine. In this work, I present multiple projects where genomics, 16S rDNA analysis and metagenomics have helped medical research to gain insights on diverse topics and under varying conditions. During these projects, I have determined a recurring need for primary analysis of environmental data, which currently often only can be done by bioinformatics specialists or at least would need time-consuming efforts to study the necessary tools for scientists from other fields. As those scientists also spend a lot of time planning and conducting the experiments which have led to the generation of the data and verifying the findings, they often do not have the time necessary to pick up the knowledge required even for many basic steps for environmental sequence analysis. To make metagenomic analysis more approachable for a variety of users, I have developed three pipelines for the fundamental analysis of environmental data - collected in the CommunAl toolkit - which require minimal hands-on effort and infrastructure to run the necessary analysis. The toolkit includes the alignment-based 16S rDNA analysis tool STARA, which is applicable on any type of sequencing available and can analyze all samples from a dataset from preprocessing over alignment to the taxonomic assignment in one run. The second member of the toolkit is MAPle, a pipeline for paired-end short-read WGS metagenomic sequencing analysis. Like STARA this also analyzes a dataset from preprocessing through to taxonomic and functional assignment. The taxonomic abundances determined by those two tools can be further investigated using TaxCo. TaxCo computes correlations between taxonomic abundances and numeric metadata and presents the results in tabular and graphic form. With these tools, I hope to enable everyone involved in environmental data analysis to generate insights from any of their datasets, which in turn will hopefully help to make environmental sequencing - especially metagenomics - a more feasible choice for everyone who could profit from the possibilities and insights it can provide.

This item appears in the following Collection(s)