The Sequence Space of Natural Proteins

Weidmann-Krebs, Laura

Publikationsdienste
→
TOBIAS-lib - Publikationen und Dissertationen
→
7 Mathematisch-Naturwissenschaftliche Fakultät
→
Dokumentanzeige

dc.contributor.advisor	Lupas, Andrei (Prof. Dr.)
dc.contributor.author	Weidmann-Krebs, Laura
dc.date.accessioned	2020-06-19T08:11:55Z
dc.date.available	2020-06-19T08:11:55Z
dc.date.issued	2020-06-19
dc.identifier.other	1701144999	de_DE
dc.identifier.uri	http://hdl.handle.net/10900/101671
dc.identifier.uri	http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-1016714	de_DE
dc.identifier.uri	http://dx.doi.org/10.15496/publikation-43050
dc.identifier.uri	http://nbn-resolving.org/urn:nbn:de:bsz:21-dspace-1016719	de_DE
dc.description.abstract	Proteins carry out the majority of functions at the molecular level of all organisms. They are composed of amino acid sequences that upon folding assume specific structures, which are essential to perform their function. In contrast, the great majority of randomly generated amino acid sequences fails to fold into a defined structure and is not functional. In order to better understand functional proteins, the aim of this thesis is to determine general features of natural protein sequences by contrasting them to random sequence models. For this, three different approaches are applied. The first approach focuses on sequence features that are shared among all proteins, resulting in a global consideration of natural proteins. For this, the pairwise similarity between sequence fragments derived from a large data set of bacterial proteomes is analyzed. These similarities are interpreted as distances, indicative of how sequences are distributed over the space of all possible sequences. The results show that the great majority of distances between natural sequences coincide with those between random sequences of the same amino acid composition. The global occupation of sequence space by natural proteins is thus almost random, an observation that contrasts with the widespread concept of sequences organized into dense clusters defined by common descent. In fact, most related sequences share a similarity that is expected from the random sequence model. They are thus not more similar than random sequences, resulting in their wide distribution across sequence space. Most distances between natural sequences that remained unaccounted for by the random sequence model, can be associated with the different use of amino acids in individual proteins. Only few distances are found to be affected by common sequence motifs in non-related proteins. With this, the amino acid composition of individual proteins is demonstrated to be the most distinctive feature that characterizes natural protein sequences globally. Furthermore, common descent and divergent evolution are demonstrated to have no impact on the global occupation of sequence space, while convergent evolution is responsible for specific sequence motifs that are common in natural proteins. The second approach analyzes the range of sequence similarities that is associated with common descent. In contrast to the first approach that studies the global occupation of sequence space, here, the local one is of interest. For this, sequences in close proximity to individual query sequences are studied. With increasing distance to the query, the likelihood of common descent decreases, becoming uncertain at a range that has been coined the ‘twilight zone’. Previous studies validated common descent by structural similarity in order to estimate the boundaries of the twilight zone. The approach applied in this thesis determines these boundaries from the statistical significance of sequence similarity, thereby refining its definition. With the third approach, the characteristic amino acid composition of individual proteins was further studied at a local level. Given that proteins are generally composed of distinct structural and functional parts, their amino acid composition along the entire sequence was expected to fluctuate accordingly. However, the results of a random model based on the amino acid composition of domain-sized fragments are comparable to those of the model based on the composition of proteins. In contrast to the initial expectation, this finding suggests a homogeneous amino acid composition along individual protein sequences. Different reasons for this homogeneity are considered such as fold-specific recombination, topology and genomic context, which could not be associated to this finding. By analyzing the codon composition of protein domains it becomes clear that this homogeneity of amino acids is correlated to a homogeneous usage of codons. This suggests that amino acid composition may be modulated by codon bias, an effect that has been associated with expression level and translation efficiency in other studies. With this approach, structural constraints on amino acid composition could be contrasted with constraints that cause codon bias, two features of proteins that have been analyzed extensively before and are studied here jointly.	en
dc.language.iso	en	de_DE
dc.publisher	Universität Tübingen	de_DE
dc.rights	ubt-podno	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=de	de_DE
dc.rights.uri	http://tobias-lib.uni-tuebingen.de/doku/lic_ohne_pod.php?la=en	en
dc.subject.classification	Proteine , Evolution , Homologie , Konvergenz	de_DE
dc.subject.ddc	004	de_DE
dc.subject.ddc	570	de_DE
dc.subject.other	Zufallssequenz	de_DE
dc.subject.other	homology	en
dc.subject.other	protein sequence	en
dc.subject.other	convergence	en
dc.subject.other	random sequence	en
dc.title	The Sequence Space of Natural Proteins	en
dc.type	PhDThesis	de_DE
dcterms.dateAccepted	2020-05-12
utue.publikation.fachbereich	Informatik	de_DE
utue.publikation.fakultaet	7 Mathematisch-Naturwissenschaftliche Fakultät	de_DE

Dateien:	Dissertation_LWK_publication.pdf 14.3 MB PDF Beschreibung: PhDThesis

Das Dokument erscheint in:

7 Mathematisch-Naturwissenschaftliche Fakultät [5092]

Zur Kurzanzeige

Veröffentlichen

Stöbern

Gesamter Bestand
Diese Sammlung

Mein Benutzerkonto

Einloggen

The Sequence Space of Natural Proteins

DSpace Repositorium (Manakin basiert)

Das Dokument erscheint in:

Stöbern

Gesamter Bestand

Diese Sammlung

Mein Benutzerkonto