Abstract:
Our ability to cost-effectively sequence entire genomes has revolutionized many areas of science, from applied fields like medicine and breeding, to evolutionary biology and archaeology. Although the sequencing of extant genomes has led to significant breakthroughs, our understanding of many fields can be further enhanced by sequencing the genomes of historical and ancient specimens such as mammalian skeletal remains, archaeobotanical remains or museum specimens. Studying this ancient DNA has greatly improved our understanding of human history, allowed the reconstruction of genomes of extinct species, and has enabled us to track genetic changes during processes such as the domestication of plants and animals. To make meaningful inferences from ancient DNA sequencing data, however, requires understanding the unique characteristics of ancient DNA molecules extracted from different types of specimens. Particularly the post-mortem degradation of DNA molecules poses challenges for data generation and analysis that need to be addressed. This work contributes to overcoming several of these challenges. To improve our understanding of the kinetics of DNA degradation and to aid experimental design, we studied degradation processes in a time-series dataset of herbarium specimens. This allowed us to identify patterns of age-associated DNA damage which accumulate over time, most notably those caused by random fragmentation of the DNA backbone and by the deamination of cytosines. Next, we focused on the implications of these damage patterns for ancient DNA data analysis. One consequence of DNA degradation is an increased risk of contamination with modern DNA, which is often at a much higher concentration than the highly fragmented endogenous DNA of an ancient or historical specimen. This necessitates the authentication of DNA sequences by providing proof of their ancient origin. Motivated by the constant danger of exogenous contamination in ancient DNA research, we investigated several approaches to aid authentication, from the application of novel laboratory procedures to the development of a statistical method for data analysis. These methods primarily rely on the presence of age-associated damage patterns, which we and others have shown to be present ubiquitously in authentic ancient DNA sequences. Once positive evidence of authenticity has been provided, the aim is often to study the genetic variation within a collected sample set, or between newly acquired samples and reference panels of genetic variation. To facilitate this, we present methods which are designed to allow the assessment of nucleotide variation from low-coverage sequencing data typical for ancient DNA. In addition, we developed a method to investigate intra-specific ploidy variation from sequencing data directly. All of these methods are designed with a focus on ancient DNA applications but can also be applied more broadly. Finally, we applied what we learned about the characteristics of ancient DNA, and the methods we developed, to study ancient DNA sequences from archaeological sediments. We show how specialized experimental procedures and analytical methods permit meaningful evolutionary inference from such sequences, which allowed us to illuminate the domestication history of cultivated grape, an important fruit crop. Altogether, the work we present contributes to our understanding of many aspects of working with DNA from ancient and historical specimens and opens up opportunities to apply the experimental and analytical procedures presented here to a larger variety of sample types. This will allow the use of ancient DNA sequencing for an increasing diversity of organisms, especially plants and microbes, to enhance evolutionary inference. In addition, we anticipate that our contributions add to the continuous improvement of the standards applied when working with ancient DNA, especially regarding the authenticity of sequences on which subsequent inferences are based.