Abstract:
Today, gene expression data is acquired with increasing speed with increasing quality and depth. High throughput technologies like DNA microarrays and next generation sequencing technologies have led to a rising pace of new discoveries in the biomedical field. These technologies are complemented by high throughput pipelines for proteomics and metabolomics profiling. Altogether, vast amounts of primary measured data, complementary data from other omics and meta information from many sources is available for researchers. This data needs to be jointly analyzed and visualized in context of external data and meta information.
In this thesis, new tools and concepts are introduced for the purpose of visualizing gene expression data in the context of meta information and complementary data from other "omics" experiments.
First, the application of generic visualization tools to resequencing microarrays, which are used for finding mutations in single genes is discussed. For this final step of gene expression analysis, an application called ResqMi, ("Resequencing using Microarrays") is presented that allows to use generic and adapted visualization tools on resequencing microarrays, in order to improve quality control, data analysis and revision of problematic base calls.
The focus of this work is on the visualization of gene expression data. Here, new tools for the visualization of gene expression data in the context of meta information from processing results and external sources, like functional annotations are introduced. For the visualization of clustered gene expression data, profile logos extend the concept of sequence logos to expression data. Chromograms and tag clouds, tools for visualizing different properties of collections of nominal data are applied in combination in order to explore temporal, spatial and other patters in annotations of gene expression data. Furthermore, enhanced tabular views of summarized gene annotations and genes ranked by statistical values are discussed for comparative visualization of textual and numeric meta data.
Graph based visualizations of gene expression and meta data are more generic and investigated in greater detail. Most tools for visualizing biological pathways do not make full use of gene expression or meta information data.
Here, a variety of ways to include gene expression data into biological network visualizations is investigated and implemented, based both on the node rendering and the layout of the graph. This allows dense, high dimensional visualizations. Specialized tools that are optimized for working with pathways in KEGG and BioPax formats, are presented as well as MGV (the Mayday Graph Viewer), a general tool for visualizing a wide range of biological networks that offers a full range of options within a rich, extensible user interface. Options for integration and creation of network data, data organization and analysis within the graph framework are investigated. MGV furthermore incorporates tools for integrating data from several datasets, which allows to combine multiple "omics" data in one visualization. With dynamic groups that can contain nodes with data from all sources, cross dataset analyses can be performed. Further applications include the integration of metabolomics data, clustering comparisons and the visualization of gene models.