Abstract:
Specialised metabolites are chemical compounds that have a big impact on human life, being involved in our pharmaceuticals, our food and even in industrial production processes. Due to their relevance, the discovery of new molecules with desired bioactivities is imperative. From a bioinformatic point of view, these efforts are supported through genome mining tools, which can detect the genes relevant for the biosynthesis of specialised metabolites in the producers’ genomes. These genes are often found in close proximity to one another, forming biosynthetic gene clusters (BGCs), especially in bacterial organisms. The advancements of sequencing methods has led to a rapid increase in available bacterial genomes, which can be explored for their biosynthetic capacity. Though a large number of bacterial BGCs, and the compounds whose biosynthesis they encode, have already been detected, much is still unclear about them. Their evolutionary history can be especially complex, with horizontal gene transfer (HGT) events more common in BGCs compared to the rest of the genome. However, evolutionary studies of biosynthetic genes have led to the development of certain genome mining as well as bioengineering methods and are necessary for the advancement of the field.
In the present dissertation, I am describing the efforts to promote the discovery of specialised metabolites by increasing our understanding of their distribution and evolution. Exploiting the available volume of sequencing information, a global analysis of bacterial BGCs revealed that there is great difference among the biosynthetic capacity of different taxa. Next, as I attempted to understand the observed distribution, the focus shifted on one specific system, glycopeptide antibiotics (GPAs), whose biosynthetic pathway is explained in detail. Subsequently, the phylogenetic reconstruction of the related BGCs’ evolutionary history was possible and it revealed an important inaccuracy in the current classification system. Finally, an attempt to identify any significant associations between the presence of these and other kinds of BGCs was made. Even in a preliminary stage, the latter analysis revealed a promising lead that may constitute an adaptation mechanism to HGT of BGCs, though this hypothesis requires further investigation. Apart from the insights already gained, the methodologies and datasets presented here are expected to be the focus of various future studies.