### abstract ###
Network analysis transcends conventional pairwise approaches to data analysis as the context of components in a network graph can be taken into account.
Such approaches are increasingly being applied to genomics data, where functional linkages are used to connect genes or proteins.
However, while microarray gene expression datasets are now abundant and of high quality, few approaches have been developed for analysis of such data in a network context.
We present a novel approach for 3-D visualisation and analysis of transcriptional networks generated from microarray data.
These networks consist of nodes representing transcripts connected by virtue of their expression profile similarity across multiple conditions.
Analysing genome-wide gene transcription across 61 mouse tissues, we describe the unusual topography of the large and highly structured networks produced, and demonstrate how they can be used to visualise, cluster, and mine large datasets.
This approach is fast, intuitive, and versatile, and allows the identification of biological relationships that may be missed by conventional analysis techniques.
This work has been implemented in a freely available open-source application named BioLayout Express 3D.
### introduction ###
Complete genome sequencing of hundreds of pathogenic and model organisms has provided the parts list required for large-scale studies of gene function CITATION.
Enormous amounts of data pertaining to the nature of genes and proteins and their interactions in the cell have now been generated by techniques including, but not limited to: gene coexpression analysis, yeast two-hybrid assays, mass spectrometry, and RNA interference CITATION.
Such functional genomics and proteomics approaches, when combined with computational biology and the emerging discipline of systems biology, finally allow us to begin comprehensive mapping of cellular and molecular networks and pathways CITATION, CITATION.
One of the main difficulties we currently face is how best to integrate these disparate data sources and use them to better understand biological systems in health and disease CITATION .
Visualisation and analysis of biological data as networks is becoming an increasingly important approach to explore a wide variety of biological relationships.
Such approaches have already been used successfully in the study of sequence similarity, protein structure, protein interactions, and evolution CITATION CITATION.
Shifting biological data into a graph/network paradigm allows one to use algorithms, techniques, ideas, and statistics previously developed in graph theory, engineering, computer science, and computational systems biology.
In classical graph theory, a graph or network consists of nodes connected by edges.
For biological networks, nodes are usually genes, transcripts, or proteins, while edges tend to represent experimentally determined similarities or functional linkages between them CITATION .
Conventional analysis techniques are generally pairwise, where an individual relationship between two biological entities is studied without considering higher-order interactions with their neighbours.
Graph and network analysis techniques allow the exploration of the position of a biological entity in the context of its local neighbourhood in the graph, and the network as a whole CITATION.
Another important advantage of such techniques is that for noisy datasets, spurious edges tend not to form structure in the resultant graph, but instead randomly link nodes; although this may not be the case for data generated by techniques with inherent technical biases.
Because many network analysis techniques exploit local structure in networks between biologically related nodes, they are far less troubled by inherent noise, which may confound conventional pairwise approaches CITATION .
One example of network analysis is the clustering of protein protein similarity and interaction networks.
These techniques illustrate that graph clustering performs extremely well and allows the discovery of novel aspects of biological function CITATION.
Such techniques can hence provide insight into both local features of networks and also global features of the network .
Although network analysis of biological data has shown great promise, little attention has been paid to microarray gene expression data.
These data are now abundant, generally of high quality, and consist of the type of high-dimensional data for which such approaches are well-suited.
In principle, transformation of gene expression data into a network graph holds few challenges.
The similarity between individual expression profiles may be determined by one of a number of possible statistical techniques, e.g., the Pearson and Spearman correlation coefficients CITATION.
Networks can be constructed by connecting transcripts by edges that infer varying degrees of coexpression based on an arbitrary correlation threshold CITATION.
Indeed, a number of groups have previously sought to apply the network paradigm to microarray data, establishing relationships between genes based on correlation of expression CITATION CITATION.
While these studies have suggested the power of this approach, limitations in the functionality and visualisation capabilities of the tools supporting their attempts have severely limited their approaches for general application.
In this manuscript, we describe the development and application of a new network analysis tool, BioLayout Express 3D, that facilitates the construction, visualisation, clustering, and analysis of microarray gene expression data.
Specifically, we chose to analyse the Genomics Institute of the Novartis Research Foundation mouse tissue gene expression atlas to demonstrate the efficacy of this approach CITATION.
The GNF data was generated so as to provide a genome-wide analysis of transcript abundance across a wide range of normal tissue and cell types.
This dataset represents one of the most complete systematic studies of tissue-specific expression in the mammalian transcriptome to date.
However, in common with other large datasets, analysis of these data presents significant challenges.
Certain genes are known to only be expressed by a single cell type, at specific times during development, or in response to explicit stimuli.
Others are thought to be expressed by all cells simultaneously at about the same level.
Between these two extremes, there are many other genes that are expressed in most or a number of cell types, but whose transcription may be regulated to give a specific temporal and spatial pattern of expression.
It is also known that genes that play distinct roles in a common pathway or biological process are often expressed in a similar manner; i.e., they are coexpressed CITATION.
Hence, when genes are found to have analogous expression profiles, this may indicate that the genes have linked functional activities.
To better understand aspects of gene regulation and the functional role of the encoded proteins, we chose to explore the utility of network analysis to explore the innate structure of this dataset.
We demonstrate that this approach can accurately locate clusters of genes sharing similar network connectivity, the relationships between these clusters, and statistical analysis of functional annotations.
