Abstract:
Microorganisms, such as all Bacteria, Archaeae, and some Eukaryotes, inhabit all
imaginable habitats in the planet, from water vents in the deep ocean to extreme environments of
high temperature and salinity. Microbes also constitute the most diverse group of organisms in terms
if genetic information, metabolic function, and taxonomy. Furthermore, many of these microbes
establish complex interactions with each others and with many other multicellular organisms. The
collection of microbes that share a body space with a plant or animal is called the microbiota, and
their genetic information is called the microbiome.
The microbiota has emerged as a crucial determinant of a host’s overall health and
understanding it has become crucial in many biological fields. In mammals, the gut microbiota has
been linked to important diseases such as diabetes, inflammatory bowel disease, and dementia. In
plants, the microbiota can provide protection against certain pathogens or confer resistance against
harsh environmental conditions such as drought. Furthermore, the leaves of plants represent one of
the largest surface areas that can potentially be colonized by microbes.
The advent of sequencing technologies has let researchers to study microbial communities
at unprecedented resolution and scale. By targeting individual loci such as the 16S rDNA locus in
bacteria, many species can be studied simultaneously, as well as their properties such as relative
abundance without the need of individual isolation of target taxa. Decreasing costs of DNA
sequencing has also led to whole shotgun sequencing where instead of targeting a single or a
number of loci, random fragments of DNA are sequenced. This effectively renders the entire
microbiome accessible to study, referred to as metagenomics. Consequently many more areas of
investigation are open, such as the exploration of within host genetic diversity, functional analysis, or
assembly of individual genomes from metagenomes.
In this study, I described the analysis of metagenomic sequencing data from microbial
11
communities in leaves of wild Arabidopsis thaliana individuals from southwest Germany. As a model
organisms, A. thaliana not only is accessible in the wild but also has a rich body of previous research
in plant-microbe interactions. In the first section, I describe how whole shotgun sequencing of leaf
DNA extracts can be used to accurately describe the taxonomic composition of the microbial
community of individual hosts. The nature of whole shotgun sequencing is used to estimate true
microbial abundances which can not be done with amplicons sequencing. I show how this
community varies across hosts, but some trends are seen, such as the dominance of the bacterial
genera Pseudomonas and Sphingomonas . Moreover, even though there is variation between
individuals, I explore the influence of site of origin and host genotype. Finally, metagenomic
assembly is applied to individual samples, showing the limitations of WGS in plant leaves.
In the second section, I explore the genomic diversity of the most abundant genera:
Pseudomonas and Sphingomonas . I use a core genome approach where a set of common genes is
obtained from previously sequenced and assembled genomes. Thereafter, the gene sequences of
the core genome is used as a reference for short genome mapping. Based on these mappings,
individual strain mixtures are inferred based on the frequency distribution of non reference bases at
each detected single nucleotide polymorphism (SNP). Finally, SNP’s are then used to derive
population structure of strain mixtures across samples and with known reference genomes.
In conclusion, this thesis provides insights into the use of metagenomic sequencing to study
microbial populations in wild plants. I identify the strengths and weaknesses of using whole genome
sequencing for this purpose. As well as a way to study strain level dynamics of prevalent taxa within
a single host.