Abstract:
Transcriptome analyses are an important tool for studying the biological mechanisms behind the ability of organisms to react to changes in their environment, as well as to elucidate which genes play important roles in diseases such as cancer. They can be used to find targets for drug design, to optimize the output of biochemical production, and, most importantly, to gain an understanding of the fundamental functioning of living cells.
Microarrays have opened the door for high-throughput expression experiments of thousands of transcripts. Recently they have been complemented by RNA sequencing methods which produce new types of data and a significantly larger data volume. Bioinformaticians are confronted with many challenges of integration: Data of different types need to be integrated, many methods for different analysis steps have to be put together, and visualizations of primary and meta data need to be combined with statistical approaches to derive meaningful results from the data. In addition, specialized data structures are required for efficient computations.
In this dissertation, solutions to several of these challenges are presented. Mayday, a framework for visual inspection and analysis of microarray data, was largely redesigned to create a strong platform for transcriptome analysis.
The new Mayday includes a flexible plugin system, a framework for handling meta information associated with transcripts, experiments, or whole datasets, as well as an interactive system for filtering lists of transcripts according to a large variety of criteria. A new visualization package was implemented as a basis for the highly interactive, linked views which are vital for the analysis and inspection of complex datasets. Furthermore, interactive scripting and querying possibilities were added based on different programming languages, most notably the statistical computing language R. With these, bioinformaticians can quickly test ideas and perform non-standard analyses directly inside Mayday. A first step in the direction of on-line collaborative analysis is presented with Mayday's integration into the Gaggle communications system.
With the new Mayday as a solid foundation, the SeaSight extension was developed, which is the main focus of this dissertation. It provides a generic framework for raw data processing both for the new RNA-seq data types as well as for data generated by different microarray platforms.
In addition, an algorithm for the efficient processing of RNA-seq data is presented which allows for the application of this new technology to samples from species where a genome reference sequence is currently not available, adding a further method to the transcriptomics researcher's toolkit.
Together, the new Mayday and SeaSight provide the community with the first software tool which offers a one-stop solution for transcriptome data analysis, spanning the whole pipeline from raw data import, via filtering and statistical testing, to higher-level analyses and interactive visualization, and provides a solid foundation for further development in the transcriptomics area in particular, and in the Systems Biology field in general where the multitude of 'omics' data increase the need for integrated approaches to data interpretation.