Insights into Mutational Processes in Arabidopsis thaliana from Single-Molecule Long-Read Sequencing

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/179757
http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1797570
Dokumentart: Dissertation
Erscheinungsdatum: 2026-05-19
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Biologie
Gutachter: Weigel, Detlef (Prof. Dr.)
Tag der mündl. Prüfung: 2026-05-07
Freie Schlagwörter:
Arabidopsis thaliana
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en
Zur Langanzeige

Abstract:

Mutation has long been a central topic in biology, as it provides the raw material for genetic diversity and represents the ultimate source of adaptation. The genome sequences observed in present-day individuals have already undergone mutation, together with recombination and selection, over the course of evolution. Over the past decade, short-read sequencing has been widely used to study mutations and mutation rates, leading to a strong understanding of genetic variation at the scale of single-nucleotide polymorphisms (SNPs) and small insertions and deletions (indels). However, the investigation of mutations in other genomic regions, such as repetitive sequences, had to await new technologies. In the last decade, long-read single-molecule sequencing has seen increasing application in genomics. Among these technologies, Pacific Biosciences (PacBio) High-Fidelity (HiFi) sequencing employs circular consensus sequencing, merging multiple passes of a single molecule into one high-accuracy read. Both chapters here detect mutations using HiFi sequencing, each focusing on different features of the technology. In Chapter One, I exploit the long-read and high-accuracy characteristics of HiFi sequencing to describe telomeric repeat diversity in Arabidopsis thaliana. Telomeres, located at the ends of linear chromosomes, protect chromosomes from degradation. In many species, telomeric regions consist of short repeat units; in A. thaliana, the canonical unit is a seven base-pair (bp) sequence, TTTAGGG. While the very ends are composed of canonical repeats, more proximal regions contain variant repeat types, indicating that mutations have occurred. I comprehensively characterized sequence variation in telomeric repeat arrays at the chromosome ends across 74 genetically diverse A. thaliana accessions. I identified several distinct types of telomeric repeat units, uncovered evolutionary processes such as local homogenization and higher-order repeat formation, quantified telomeric repeat number changes at both germline and somatic levels, and revealed chromosome end-specific patterns in the distribution of variant repeats. These findings provide a detailed view of telomeric repeat variation in A. thaliana at multiple levels, expanding our knowledge of the evolution of chromosome ends. In Chapter Two, I leverage the features of circular consensus sequencing and amplification-free library preparation, which allow multiple passes of each DNA strand for a single molecule. This approach enables high-accuracy sequencing of both strands, allowing the detection of sequence differences between them. Such differences can reflect unrepaired errors or misrepair events during DNA replication. I present a method to identify sequence differences between the two strands within single molecules. By analyzing polymerase kinetic information, specifically, pulse width and interpulse duration, I confirmed the authenticity of these events. To validate this pipeline, I analyzed one A. thaliana accession with 3,747,759 molecules and identified three molecules exhibiting sequence differences greater than 50 bp. Further examination of the detailed sequence characteristics suggested that these events could result from template switching, slippage-mediated strand mispairing, or palindrome-mediated deletions. This study provides a proof-of-concept approach for capturing differences between strands immediately after new strand synthesis but prior to repair, or following erroneous repair, thereby deepening our understanding of how mutations arise. Together, these two studies uncover mutational processes in A. thaliana at different scales using the latest single-molecule long-read sequencing technology. Specifically, these findings enhance our understanding of germline and somatic mutations at telomeric regions, as well as ongoing mutations during DNA replication in leaf tissue. Moreover, the approaches developed here can be applied to study mutational processes in other tissues and species.

Das Dokument erscheint in: