Abstract:
Proteins are the most abundant and diverse class of biomolecules that mediate the vast majority of biochemical processes. The functional units within a protein are the "domains" which fold autonomously from the rest of the linear amino acid sequence in the protein. Novelty in protein function often arises as a result of gain, loss or re-shuffling of existing domains. Thus, protein domains can arguably be seen as stable units of evolution. However, the evolutionary origin of domains themselves is more challenging and is largely unexplored area of research.
Domains often adopt to a limited number of structural forms called folds, despite the seemingly endless diversity of the proteins. These folds are largely formed by a limited "vocabulary" of recurring supersecondary structural elements, often by repetition of the same element and, increasingly, elements similar in both structure and sequence are discovered. This suggests that modern protein domains evolved by fusion and recombination from a more ancient peptide world and that many of the core folds observed today may contain homologous building blocks.
Solenoid repeat proteins of Tetratrico Peptide Repeat (TPR) domain represent an attractive model to explore this issue. TPR domains are formed by repetition of an alpha-hairpin, a supersecondary structural element. Since alpha-hairpins are frequent in proteins, therefore TPR-like domains might have arisen by the repetition of protein fragments that were originally used in a different structural context.
In order to explore this question, we require a better ability to judge, which alpha-hairpins are TPR-like. Currently, several resources are available for the prediction of TPRs, however, they often fail to detect divergent repeat units. We therefore developed "TPRpred", a profile-based method which uses a P-value-dependent score offset to include divergent repeat units, and also exploits the tendency of the repeats to occur in tandem. We benchmarked the performance of TPRpred in detecting TPR-containing proteins and in delineating the individual repeats within a protein, against currently available resources. TPRpred not only performed significantly better in detecting divergent repeats in TPR-containing proteins, but also detected more number of individual repeat units.
We identified several promising alpha-hairpins in non-TPR proteins which resemble the repeating unit of TPR, by using TPRpred in conjunction with structure-structure comparisons, and we further selected the best five hairpins namely, the mitochondrial outer membrane translocase Tom20, the ribosomal protein S20 (RPS20), the phospholipase C (PLC), the heat shock protein 20 (HSC) and the bacterial glucoamylase (BGA), to experimentally construct new TPR-like domains by repetition. Using each of these hairpins, we constructed three different artificial genes coding for one, two and three copies. The resulting artificial proteins were expressed, purified and then characterised using circular dichroism, thermal denaturation and fluorescence spectroscopy experiments. The biophysical properties of these TPR-like domains can also be correlated to the statistical significance of the parental hairpin likely to be a repeating unit of TPR. Although high-resolution structures have not yet been determined, proteins made from the hairpins of Tom20 and RPS20 appear to have native-like properties. The hairpin of RPS20 is significant in our study, because ribosomal proteins are among the most ancient proteins known, and since many of the modern non-ribosomal proteins contain fragments from the ribosomes, they might have been the building blocks in early protein domain evolution.