Abstract:
Since the dawn of evolutionary biology, it was the dream of scientists to obtain a meaningful genealogy of species, a "tree of life". The term "phylogenetics" was coined by Ernst Haeckel for that area of research, meaning the history of the evolutionary relationships between species. First phylogenetic approaches focused on morphological differences between species. However, the analysis of the phylogeny of microbial organisms is hindered due to the limited number of observable morphological differences. With the discovery of the structure of DNA by Francis Crick and James Watson, and the development of the Sanger sequencing technology, it became feasible to use genetic information for phylogenetic inference.
Regarding the prokaryotic universe (Bacteria and Archaea), a main question of phylogenetics is whether there exists a prokaryotic "tree of life" actually. Those organisms exhibit mechanisms for the direct exchange of genetic material between cells that can belong to different species (called horizontal gene transfer). Accordingly, genes can be derived from different organisms rather than via clonal reproduction, as expressed by a phylogenetic tree. In this thesis, we introduce the GBDP ("Genome BLAST distance phylogeny") framework for inferring phylogenies based on whole genomes, and we compare the results with a current taxonomic tree based on single genes. Furthermore, we investigate the amount of horizontal gene transfer in a common set of prokaryotic genes by using a state-of-the-art method, as well as two newly developed approaches. Additionally, a new method for species delineation is proposed that is based on the GBDP method for deriving whole genome phylogenies.
In the last part of the thesis, several software packages are presented. CopyCat, together with AxParafit and AxPcoords, represents the first Grid-enabled software package that is optimized for large-scale cophylogenetic studies. With these tools, large host and parasite phylogenies can be screened for correlations. Furthermore, MEGAN, a user-friendly software application for the analysis of metagenomic datasets is presented. Metagenomics is the study of microorganismal communities by direct extraction of DNA from environmental samples. To aid the development and testing of metagenomic software, we developed MetaSim, a tool to generate simulated metagenomic datasets.