Upload
ray4hz
View
121
Download
0
Embed Size (px)
Citation preview
1
Visualization of Ciona Intestinalis
Co-‐expression Network
by
Hang Zhong
A dissertation submitted in partial fulfillment
of the requirements for the degree of
Master of Science
Department of Biology
New York University
May, 2012
2
ACKNOWLEDGEMENTS
I would like to thank my advisor, Richard Bonneau, for
providing me the opportunity to participate in this project, ongoing
guidance and support. I am also indebted to professor Lionel
Christiaen for inspiring the project. This thesis could not have come to
fruition without the help of Florian Razy, who offered insightful and
thought-‐provoking input.
I am also everlastingly grateful to Duncan Penfold-‐Brown for
teaching me the programming. I would also like to thank Kieran Mace,
Aviv Madar, Kevin Drew, Maximilian Haeussler and Claudia Racioppi
who so patiently offer their time and support. Many thanks to Todd
Heiniger and Joel Rodriguez for revising the thesis.
Finally, I would like to thank my family for the invaluable
support they have given me in the course of my life and studies.
3
ABSTRACT
The abnormalities of the heart development causes most
frequent congenital diseases in humans. The conservation of the Gene
Regulatory Network (GRN) involved in heart development, cellular
simplicity, low genetic redundancy and relevant evolutionary position
lead researchers to study the ascidian Ciona intestinalis. To extract
useful information from the Microarray data for researchers to infer
the heart network in Ciona, this thesis not only applies the standard-‐
based approaches to find the differential expression genes, but also
explores the network-‐based approaches to find functional group. By
visualizing the co-‐expression network in Gaggle, the list of ASM and
heart candidate genes are fine-‐tuned. In addition, the modules
containing candiate and known marker genes may deserve further
study.
4
TABLE OF CONTENTS
ABSTRACT .................................................................................................................................. 3
1. INTRODUCTION ............................................................................................................... 7
1.1 GENE REGULATORY NETWORK OF CARDIOGENIC PRECURSORS IN CIONA ............................... 7
1.2 MICROARRAY DATA ANALYSIS ............................................................................................... 8
1.3 NETWORK VISUALIZATION THROUGH GAGGLE ....................................................................... 9
2. METHODS ........................................................................................................................ 10
2.1 MICROARRAY EXPERIMENTAL DESIGN ................................................................................ 10
2.2 GENE EXPRESSION DATA .................................................................................................... 10
2.2.1 QUALITY CONTROL ........................................................................................................................ 10
2.2.2 PREPROCESSING ............................................................................................................................ 11
2.3 STATISTICAL TEST .............................................................................................................. 11
2.4 CLUSTER ANALYSIS ............................................................................................................ 11
2.5 FUNCTIONAL ENRICHMENT ANALYSIS ................................................................................ 12
2.6 GENERATION OF NETWORKS .............................................................................................. 12
2.6.1 STRING PROTEIN NETWORK ........................................................................................................ 12
2.6.2 UNWEIGHTED CO-‐EXPRESSION NETWORK ................................................................................ 13
2.6.3 WEIGHTED CO-‐EXPRESSION NETWORK ..................................................................................... 13
2.7 NETWORK VISUALIZATION ................................................................................................. 14
2.7.1 FILE FORMAT ................................................................................................................................. 14
2.7.2 ANALYZING NETWORK BY PLUGIN IN CYTOSCAPE .................................................................... 14
3. RESULTS .......................................................................................................................... 15
3.1 DIFFERENTIAL EXPRESSION ............................................................................................... 15
3.1.1 EXPECTATION OF THE MICROARRAY DATA ................................................................................ 15
3.1.2 ASM AND HEART CANDIDATE GENES .......................................................................................... 15
3.2 NETWORK VISUALIZATION IN GAGGLE ............................................................................... 17
5
3.2.1 NETWORKS ..................................................................................................................................... 17
3.2.2 FINDINGS FROM THE NETWORK VISUALIZATION IN GAGGLE .................................................. 20
3.2.2.1 GAGGLE AS INFORMATION INTEGRATION CENTER ............................................................... 20
3.2.2.2 MODULE FROM ALLEGROMCODE ............................................................................................. 21
3.2.2.3 MODULE FROM WEIGHTED NETWORK .................................................................................... 22
3.2.2.4 FINE-‐TUNED LIST ...................................................................................................................... 23
4. DISCUSSION .................................................................................................................... 25
4.1 ASM CANDIDATE GENES ...................................................................................................... 25
4.2 ANNOTATION IN CIONA INTESTINALIS ................................................................................ 25
4.3 FUNCTIONAL RIBOSOME GROUP AND COE ........................................................................... 26
4.4 TIME-‐SERIES ...................................................................................................................... 27
4.5 LIMITATIONS OF THE CO-‐EXPRESSION NETWORK ............................................................... 28
FIGURES AND TABLES ......................................................................................................... 29
FIGURE 1 PIPELINE. ................................................................................................................... 29
FIGURE 2 NORMALIZED UNSCALED STANDARD ERROR (NUSE). ................................................. 30
FIGURE 3 HEAT-‐MAP OF ASM AND HEART CANDIDATE GENES. ................................................... 30
FIGURE 4 OUTPUT OF THE SHORT TIME-‐SERIES EXPRESSION MINER. ........................................ 31
FIGURE 5 SELECTING SOFT POWER. ........................................................................................... 31
FIGURE 6 CIONA INTESTINALIS WEIGHTED CO-‐EXPRESSION NETWORK. .................................... 32
FIGURE 7 MODULE SIGNIFICANCE. ............................................................................................. 33
FIGURE 8 INTRAMODULAR CONNECTIVITY AND MODULE SIGNIFICANCE. ................................... 34
FIGURE 9 STRING PROTEIN NETWORK. ..................................................................................... 35
FIGURE 10 LABELING IN WEIGHTED NETWORK. ........................................................................ 35
FIGURE 11 THE 1ST MODULE INFERRED BY ALLEGROMCODE FOR UNWEIGHTED CO-‐EXPRESSION
NETWORK. 36
FIGURE 12 THE 1ST MODULE OF UNWEIGHTED CO-‐EXPRESSION NETWORK ENRICHMENT. ......... 37
FIGURE 13 THE 1ST MODULE INFERRED BY ALLEGROMCODE FOR WEIGHTED CO-‐EXPRESSION
NETWORK. 37
6
FIGURE 14 THE 1ST MODULE OF WEIGHTED NETWORK ENRICHMENT. ....................................... 37
FIGURE 15 RIBOSOME GROUP IN THE STRING. ........................................................................... 38
FIGURE 16 RIBOSOME GROUP IN STRING NETWORK ENRICHMENT. ............................................ 38
FIGURE 17 RIBOSOME GROUP AND COE. .................................................................................... 39
FIGURE 18 GREY COLOR GENES. ................................................................................................ 39
FIGURE 19 TAN MODULE ........................................................................................................... 40
FIGURE 20 BROWN MODULE ..................................................................................................... 40
FIGURE 21 TURQUOISE MODULE ENRICHMENT. ......................................................................... 41
FIGURE 22 GENES IN TURQUOISE PLUS STEM CONDITION. ........................................................ 41
FIGURE 23 GENES OF TURQUOISE PLUS STEM CONDITION ENRICHMENT. ................................... 42
FIGURE 24 SUB-‐GROUP OF CANDIDATE GENES IN UNWEIGHTED NETWORK. .............................. 42
FIGURE 25 SUB-‐GROUP OF CANDIDATE GENES IN UNWEIGHTED NETWORK ENRICHMENT. ........ 43
FIGURE 26 ASM CANDIDATE GENES IN WEIGHTED NETWORK ENRICHMENT. ............................. 43
FIGURE 27 ASM AND HEART CANDIDATE GENES ........................................................................ 44
REFERENCES ........................................................................................................................... 45
7
1. INTRODUCTION
1.1 Gene regulatory network of cardiogenic precursors in Ciona
The abnormalities of the heart development causes most
frequent congenital diseases in humans. The conservation of the Gene
Regulatory Network (GRN) involved in heart development, cellular
simplicity, low genetic redundancy and relevant evolutionary position
lead researchers to study the ascidian Ciona intestinalis(Davidson
2007). In Ciona, a single pair of blastomeres called B7.5 gives birth to
the anterior tail muscle (ATM) and to the trunk ventral cells (TVC)
(Figure 27). Following migration from the tail, the TVC undergo
asymmetric cell divisions at the ventral midline of the trunk. The
medial TVC give rise to the heart while the lateral TVCs migrate
toward the atrial placode where they will form the atrial siphon
muscles (ASM). Thus, the TVC are similar to the multipotent cardio-‐
pharyngeal progenitors found in vertebrates, while ASM are likely
equivalent to the jaw muscle in vertebrates.
A few years ago, the first cardiogenic the Gene Regulatory
Network (GRN) in Ciona was proposed (Christiaen, Davidson et al.
2008), decoupling genes necessary for heart specification from genes
necessary for cell migration. Later study has been shown that ASM
precursors express the transcription factor COE (Stolfi, Gainous et al.
8
2010), which is necessary and sufficient to specify ASM fate.
Misexpression of COE in the whole TVC lineage blocks heart
development and imposes an ASM fate to all cells. Conversely,
misexpression of a constitutive repressor form of COE provokes the
opposite phenotype, blocking ASM formation and causing all cells to
form heart tissue. Using the genome-‐wide Microarray analysis to
study this crucial COE gene and find the downstream effectors of COE,
it is expected to gain insights to the gene regulatory network of the
heart.
1.2 Microarray data analysis
Most of the existing studies have focused on the differential
expression to identify genes that distinguish different sets of samples.
It’s quite common to apply different testing method, such as t-‐test, F-‐
test, or nonparametric versions of the Wilcoxon test to rank
thousands of genes, and the most significant genes are select
(Gentleman 2005). Other specific statistical methods are also
commonly used in the Microarray data analysis, such as Significance
Analysis of Microarray (SAM) (Tusher, Tibshirani et al. 2001) and
LIMMA (Wettenhall, Smyth 2004) using a Bayesian mixture model.
Another way of using microarray data is to understand an
individual gene or protein’s network properties by studying the co-‐
expression, where genes that have similar expression patterns across
a set of samples are hypothesized to have a functional relationship.
9
This co-‐expression network-‐based approach is consistent with the
important concept that has emerged over the past decade—genes and
their protein products carry out cellular processes in the context of
functional modules and are related (Barabasi, Bonabeau 2003,
Barabasi, Oltvai 2004).
1.3 Network visualization through Gaggle
It has been well recognized that visualization plays a key role in
helping to understand biological systems, particularly in the era of
high-‐throughput studies with a wealth of ‘omics’-‐scale data
(Gehlenborg, O'Donoghue et al. 2010). This thesis applies the simple,
open-‐source Java software system Gaggle (Shannon, Reiss et al. 2006)
for co-‐expression network visualization. Gaggle is a cross-‐platform
system integrated with diverse databases (KEGG, BioCyc, and String)
and software (Cytoscape, DataMatrixViewer, R statistical
environment, and TIGR Microarray Expression Viewer). With four
simple data types (names, matrices, networks, and associative arrays),
researchers can explore many different sources and variety of
software tools by entering these information into the Gaggle Boss and
transferred to other tools.
10
2. METHODS
The pipeline of this thesis is in Figure 1.
2.1 Microarray experimental design
The microarray data used in this study are kindly provided by
Dr. Lionel Christiaen. It consists of 30,969 probe sets from Affymetrix
GeneChips. The perturbation group includes LacZ control, the over-‐
expression and loss of function of transcription factor Collier/EBF/OIf
(COE) in the sorted TVC cells at 21 hours post fertilization (hpf)—
after the asymmetric divisions of the TVCs but before completion of
the ASM migration. Time-‐series group is comprised of 11 time points,
every 2 hours varying from 8 to 28 hours in TVC cells.
2.2 Gene expression data
2.2.1 Quality control
This thesis applies the arrayQualityMetrics (Kauffmann,
Gentleman et al. 2009), a Bioconductor package for quality control. It
provides an HTML report with several diagnostics plots. In general,
the array will be discarded if it is identified as an outlier in both
before and after normalization in the report.
The Microarray data firstly is imported in statistical
programming language R, and then carried on the quality control by
arrayQualityMetrics. The sample LacZ.3 is removed since it was
11
reported an outlier in both before and after normalization (Figure 2).
2.2.2 Preprocessing
The cell files of the Microarray are normalized by the RMA
method (Gentleman 2005). The expression matrix contains 30,969
probes and 48 arrays. After the non-‐specific filtering by variance
(IQR=0.5), the matrix contains 15,484 probes, 48 arrays.
Using the collapseRows function in WGCNA, the probes with
maximum variance are selected to represent genes. After merging the
probes, the merged matrix contains 10,079 probes and 48 arrays.
2.3 Statistical test
The merged matrix is ranked by moderated F test and genes
are selected with significant p-‐value (<0.05, using Limma package)
(Smyth 2004) after adjusted by Benjamini-‐Hochnerg method. After
ranking, the top-‐rank matrix contains 4,307 probes and 48 arrays.
The top-‐rank matrix is imported to one of the Gaggle Geese
MultiExperiment Viewer (MeV) and under Significant Analysis for
Microarrays (SAM) test (COE versus COEW group, p-‐value < 0.05,
1000 permutation, FDR = 0.9).
2.4 Cluster analysis
12
Hierarchical clustering is performed for ASM and Heart
candidate genes using MeV, using Pearson correlation metric and
average linkage clustering.
The time-‐series group data, totaling 36 arrays, are averaged for
each time point and imported to Short Time-‐series Expression Miner
(STEM), using STEM Clustering Method.
2.5 Functional enrichment analysis
Blast2GO (B2G) (Conesa, Gtz et al. 2005) is a comprehensive
bioinformatics tool for annotation, visualization and analysis in
functional genomics research. It offers a suitable platform for
functional research in non-‐model species, such as Ciona intestinalis.
DNA sequences in fasta format were loaded to Blast2GO.
15,629 genes remained in the Blast2GO, followed by blasting, go-‐
mapping and yielded Go-‐terms for 3,964 genes. The test group from
different lists is tested against the reference group (3,964 genes)
using the Fisher’s Exact Test (p-‐value < 0.05, FDR correction).
2.6 Generation of networks
2.6.1 String protein network
Using the Ensembl gene name in this filt.gene matrix as input,
the genes of interest in the Search Tool for the Retrieval of Interacting
Genes (STRING) database (Szklarczyk, Franceschini et al. 2011) are
extracted from the STRING website in Text Summary format and
13
parsed to Cystoscape simple interaction format (SIF) (Shannon,
Markiel et al. 2003) by python programming language.
2.6.2 Unweighted co-‐expression network
The Pearson Correlation Coefficient for all pair-‐wise
comparisons of genes is calculated from filt.gene matrix in R. High
correlated genes are selected with cutoff 0.9 and parsed to simple
interaction format (SIF) (Shannon, Markiel et al. 2003) by python.
2.6.3 Weighted co-‐expression network
2.6.3.1 Network construction
The procedure can be found in the WGCNA website (Horvath
2011).
2.6.3.2 Module detection
Pearson correlation coefficients are calculated for all pair-‐wise
comparisons of genes across all samples. The resulting Pearson
correlation matrix is transformed into the weighted adjacency matrix
with the above power beta 6. The average linkage hierarchical
clustering is used to group genes on the basis of the topological
overlap dissimilarity measure of their network connection strengths
(Zhang, Horvath 2005). Using a dynamic tree-‐cutting algorithm
(Langfelder, Zhang et al. 2008), 13 modules are found with the minimum
cluster size of 70 (Figure 6). Genes that are not assigned to modules
are assigned the color grey.
14
2.6.3.3 Module significance
The p value of moderated t test is the output from topTable of
AffylmGUI package in R (Smyth 2004).
2.7 Network visualization
2.7.1 File format
The output files from WGCNA are parsed to simple interaction
format (SIF) (Shannon, Markiel et al. 2003) by python.
2.7.2 Analyzing network by plugin in Cytoscape
AllegroMCODE and Network Analysis plugin in Cytoscape are
used to analyze the network. Finding the cluster automatically is
achieved by AllegroMCODE.
15
3. RESULTS
3.1 Differential expression
3.1.1 Expectation of the Microarray data
Genes that are up-‐regulated in the overexpression of COE or
down-‐regulated in loss of function of COE are considered ASM
candidate genes downstream of COE, while genes that are down-‐
regulated in overexpression of COE or up-‐regulated in loss of function
of COE are considered Heart candidate genes repressed by COE (Stolfi,
Gainous et al. 2010).
Using the COE and COEW group as two classes in the
Significant Analysis for Microarrays (SAM), the contrast would yield
ASM and Heart candidate genes.
3.1.2 ASM and Heart candidate genes
3.1.2.1 Lists from SAM
336 significant genes are derived from SAM and separated into
206 ASM candidate genes (negative in SAM, expression of COE group
lower than that of COEW group) and 130 Heart candidate genes
(positive in SAM, expression of COE group higher than that of COEW
group). These two groups can be distinguished by the first three
columns in the heat-‐map (Figure 3, Figure 27).
16
Based on the Hierarchical Clustering and observation, the ASM
candidate genes can be roughly divided into three large groups:
A1. The first group (up-‐down-‐up-‐ASM, 61 genes), shows a “U”
shape curve through the time-‐series experiments, with the earliest
up-‐regulation right at the experimental time point of 8 hours. This
group contains Snail (‘SNAIL’ in the thesis), SET and MYND Domain 1
(SMYD1) and Myodblast determination protein (Myod, ‘MYOD’ in the
thesis).
A2. The second group (early-‐ASM, 45 genes), including COE
and Myocyte Regulatory Light Chain (MRLC5, ‘MYL5’ in the thesis)
gene, shows early up-‐regulation around 14 hours.
A3. The third group (late-‐ASM, 100 genes) has relatively late
up-‐regulation after 18 hours, with myosin heavy chain genes (MHC3),
tropomyosin 1(TPM1, ‘CTM1’ in the thesis) and muscle like actin 2
(MA2) in the group.
The Heart candidate genes can be divided into two large
groups:
H1. The first group (early-‐Heart, 99 genes) shows early up-‐
regulation (before 20 hours), containing heart markers BMP2/4, NK4,
NOTRLC/HAND-‐LIKE, and ETS/POINTED2.
17
H2. The second group (late-‐Heart, 31 genes) displays relative
late up-‐regulation (after 20 hours), with mesenchyme specific gene 3
(MECH3) in the group.
As expected, two lists of genes have some important markers
in them and noticeable temporal expression. But these ASM and Heart
candidate genes didn’t show Go-‐term enrichment from the Blast2GO,
which might indicate the need to fine-‐tune the list, even though the
Blast2GO with few go terms is another concern. Further improvement
of the ASM and Heart candidate gene list would be necessary to know
the effect of the non-‐specific filtering, selecting the probe for a gene by
maximum variance and SAM ranking.
3.1.2.2 Clusters from STEM
Total 7 significant model profiles showed in the STEM output.
23 out of the 206 ASM candidate genes are in the significant profiles.
Most of them are in the profile 20, similar to the late-‐ASM, including
the MHC3, MA2 and MYL5 genes. For the Heart candidate genes, 13
out of 130 are in the significant profiles.
3.2 Network Visualization in Gaggle
3.2.1 Networks
3.2.1.1 STRING protein network
The STRING (Szklarczyk, Franceschini et al. 2011) protein
network is created to make good use of the existing data resources. It
18
provides both experimental and predicted interaction information
from computational techniques, presented as different colors in the
edge (Figure 9).
3.2.1.2 Co-‐expression network
The network-‐based approaches, also termed graph-‐based
approaches, aim to extract recurrent expression patterns or
conserved module from the rapid accumulation of Microarray
datasets. The Microarray dataset is modeled as a relation graph where
each node represents one gene and two genes are connected through
the edge based on certain expression correlation parameter (Zhang,
Horvath 2005) to measure the similarity between expression profiles
(Pearson Correlation Coefficient is used in this thesis). The graph,
namely network, can be represented by an adjacency matrix that
encodes whether a pair of nodes is connected. For unweighted
networks, entries are 1 or 0. For weighted networks, the adjacency
matrix reports the connection strength for the gene pairs, between 1
and 0 (Zhang, Horvath 2005). The concept of connectivity in graph
theory, also termed degree, can be depicted as the row sum of the
adjacency matrix, measuring the direct neighbors of the node in the
unweighted networks and connection strengths in the weighted
network.
Two co-‐expression networks are generated in this thesis.
19
The unweighted co-‐expression network is formed by the genes
with the Pearson Correlation Coefficient higher than 0.9. A total 766
nodes are in this unweighted network with clustering coefficient
0.311 (output result from the Network Analysis plugin in Cytoscape,
measuring the cohesiveness of the neighborhood of a node).
The genes with the top 5000 strong weight are outputted to
build the weighted co-‐expression network (cutoff for the weight is
0.23), a total of 814 nodes, with clustering coefficient 0.728.
The unweighted network has more isolated clusters with only
2 nodes linked by 1 edge. The weighted network has greater density
with some hubs (high connectivity), and also contains colors in the
node for the different modules detected in the WGCNA.
Though these two networks are different in the adjacency
matrix, they are both based on Pearson Correlation Coefficient to
present the genes of high similarity in the graph in terms of their
closeness. In other words, genes of same expression profiles across all
of the experiments would be close to each other in the network. These
network-‐based approaches allow for the exploration of the position of
a biological entity in the context of its local neighborhood in the graph
and network as a whole, and less troubled by inherent noise that
confound conventional pairwise approaches (Freeman, Goldovsky et al.
2007).
20
3.2.2 Findings from the network visualization in Gaggle
3.2.2.1 Gaggle as information integration center
In this post-‐genomic era, biologists often face the challenge to
freely explore the experimental and computational data from many
different sources and diverse software tools, such as storing different
data for genes, retrieving data from a list of genes, and mapping one
list of genes with another. Once the network has been loaded in the
Cytoscape, Gaggle, as an information integration center, can help to
solve these problems with respect to Microarray data.
Storing different data for genes can be achieved by labeling. As
shown in the Figure 9 and 10, two networks present data from 6
different sources, such node color for module, node label for ASM or
Heart candidate genes, node shape for significance in moderated F
test, node size for connectivity, edge color for different interaction,
and distance between nodes for closeness. Therefore the network in
Cytoscape functions as a visual database.
Retrieving data from a list of genes, such as expression matrix,
is also feasible through the basic function “broadcast” in Gaggle. For
example, a list of genes of interest in the Cytoscape can be sent to the
Gaggle Boss, and then broadcast to Data Matrix Viewer (DMV), which
can output the expression matrix.
21
Mapping one list of genes with another can be done
conveniently in Gaggle thourhg the many functions that it offers. In
the MultiExperiment Viewer (MeV), a sub-‐list of genes can be
launched in a new viewer. In Cytoscape, the function “Create new
network from selected nodes” can be used in this task. Between
different tools, the function “broadcast” would serve as a bridge to
transfer the list and map it in the existing tools.
3.2.2.2 Module from AllegroMCODE
The main goal of the co-‐expression network visualization is to
find the highly correlated genes (module) related to the ASM or Heart
network, specifically aiming to infer targets of the transcription factor
COE.
In the unweighted network without predefined modules, the
modules can be automatically detected by AllegroMCODE, a plugin in
Cytoscape to find highly interconnected groups of nodes in a huge
complex network. The 1st module detected by AllegroMCODE for the
unweighted network is shown in the Figure 11. This module is
significantly enriched in biological process (Figure 12), such as
biosynthetic process and cellular biosynthetic process.
For the weighted network, the 1st module (Figure 13) detected
by AllegroMCODE contains largely turquoise module genes (only 1
22
grey color gene. This module is significantly enriched in intracellular
process (Figure 14).
Comparing these 1st modules of unweighed and weighted
network, they both contain ribosome related genes (gene name starts
with “RP”). Because these two networks are both generated from the
same Microarray data, an external reference would be necessary to
determine whether this ribosome group is found by chance. The
common list of 23 genes is from the comparison between the 1st
module in weighted network and all turquoise module genes in
STRING network, which has 16 ribosome related genes.
3.2.2.3 Module from weighted network
Weighted correlation analysis (WGCNA) has advantages in
identifying candidate targets with its unique mathematical features
(Langfelder, Horvath 2008). While the highly correlated genes can be
grouped into different modules, those genes that are far from the
modules are depicted in grey. Figure 18 shows that these grey color
genes in the weighted network are often with fewer edges and
targeted at miRNA, which are reasonably different from other
functional modules.
In Figure 7 and Figure 8, the tan and brown modules have
strong module significance (the significance is defined as –log10 (p-‐
value in moderated t test)). By visualizing these two modules from
23
their top 50 intramodular connectivity genes respectively, these
modules can be found enriched in the ASM and Heart candidate genes.
Interestingly, NK4 gene is in the tan module with other genes (Figure
19). Islet (ISL) gene, which is not in the candidate list yet reported to
be ASM gene, is in the brown module with some known markers, such
as MA2, MHC3, NOTRLC/HAND-‐LIKE, and ETS/POINTED2 (Figure
20). These results would be helpful to be a starting point for making
hypothesis of the Heart network in Ciona.
As the largest module in the weighted network, enriched in
cellular process and others (Figure 21), it is natural to consider
limiting the list of the turquoise module genes with other conditions.
The list of genes resulted from turquoise module and STEM condition
shows a clear temporal expression and enrichment in muscle and
heart related go-‐terms (Figure 22, Figure 23), while containing only
four genes found in the list.
3.2.2.4 Fine-‐tuned list
The network in Gaggle can serve as a visualization center as
well as a fine-‐tuning filter for a list of genes, because the network is
built upon the high correlated pair of genes with reduced noise. It is
by no means the genes that are not in the network that should be
discarded, but it is good to have expected go-‐term enrichment to
confirm the list. Because the go-‐term enrichment is related to the
24
proportion of genes with the same go-‐terms, the number of noisy
genes in the whole list would have a great impact on the enrichment.
Importing the candidate list to the co-‐expression network would
reduce the noise and yield better enrichment result.
By “broadcasting” function in the MeV, the Cytoscape can
receive and label the 336 significant genes in the unweighted network
with yellow color, and then create a sub-‐network for the candidate
genes. A subgroup of the candidate genes (Figure 24) is significantly
enriched in muscle and heart related go-‐terms (Figure 25), which
previously could not be reported from the Blast2GO. The ASM
candidate genes in the network are also enriched in muscle and heart
go-‐terms (Figure 26), while the Heart candidate genes in the network
are still not reported enrichment from the Blast2GO.
25
4. DISCUSSION
4.1 ASM candidate genes
COE is necessary and sufficient to specify ASM fate (Stolfi,
Gainous et al. 2010). It is understandable that COE expresses earlier
than the late-‐ASM genes (A3 group), such as MHC3, TPM1, MA2. While
for the up-‐down-‐up-‐ASM (A1 group), it has the earliest up-‐regulation,
with MYOD in the group. In Xenopus, the cross-‐regulatory interactions
of COE orthologs with genes of the Myogenic Regulatory Factor (MRF)
family, such as MYOD and MYF5, are crucial for muscle commitment
and differentiation (Green, Vetter 2011). However, how COE may
repress the cardiac fate and promote cell migration in Xenopus has
never been studied. A possible hypothesis is that in Ciona, the early
functions controlled by COE in ASM precursors are independent on
MRF activation since the MRF in the A1 group has earlier up-‐
regulation than COE in the A2 group.
And the A1 group genes are more likely to be TVC genes, which
also can explain the fact that there are heart related go-‐terms in the
enrichment of the ASM genes in the weighted network (Figure 26).
4.2 Annotation in Ciona intestinalis
The draft of genome sequence of the ascidian Ciona intestinalis
(Dehal, Satou et al. 2002) has been a valuable research resource.
26
However, there are numerous inconsistencies with the gene models
because of the intrinsic limitations in gene prediction programs and
the fragmented nature of the assembly (Satou, Mineta et al. 2008).
Therefore the annotation job for the probe in this study focuses on
combining available resources from various databases, such as
Aniseed (Tassy, Dauga et al.), Ensembl Genome Browser (Kersey,
Lawson et al. 2010), CIPRO (Endo, Ueno et al.), STRING (Szklarczyk,
Franceschini et al. 2011), UCSC Genome Browser (Karolchik, Hinrichs
et al. 2011), and also internal files from Dr. Lionel Christiaen’s lab.
There are 16,250 non-‐redundant genes in the 30,969 probes, which
will be the criteria to map a probe to a gene. It is unavoidable that
there are differences between the gene annotation in this thesis and
other sources.
4.3 Functional ribosome group and COE
The highly linked ribosome genes in the STRING network
(Figure 19), enriched in ribosome process (Figure 20), naturally lead
to a question—what is the relationship between this functional
ribosome group and COE. By broadcasting this list of ribosomes and
COE genes to MeV, the heat-‐map and expression plot show the
similarity in the time-‐series experiments of ribosome group and COE.
And this group of ribosome genes has quite a stable expression
profile. It is likely to find more housekeeping genes in the same
module as the ribosome group, which is not the focus of this thesis.
27
4.4 Time-‐series
Though the clustering algorithms, such as Hierarchical
clustering (Eisen, Spellman et al. 1998), K-‐means, and Self-‐organizing
Maps (SOM) (Tamayo, Slonim et al. 1999), can be used to analyze the
Microarray data and yield many biological insights, they are not
designed for time-‐series data since they assume that data at each time
point is collected independent of each other, and ignore the sequential
nature of time-‐series data (Ernst, Nau et al. 2005). This thesis applies
the Short Time-‐series Expression Miner (STEM) method to learn
about the time-‐series experiments with the hope of finding clues
about the true biological pattern, which is designed for the analysis of
short time series Microarray gene expression data (Ernst, Bar Joseph
2006). The algorithm (Ernst, Nau et al. 2005) of STEM starts by selecting
a set of potential expression profiles, covering the entire space of all
possible expression profiles that can be generated by the genes in the
experiment, and each represents a unique temporal expression
pattern. Next, each gene will be assigned to one of the profiles and
after the permutation resulting in different large clusters with
significant model profiles by greedy algorithm (Ernst, Nau et al. 2005),
which are colored in the top list in the user interface.
It is worth to mention that the STEM is designed for short time-‐
series (defined 3 – 8 time points in their website); while the time
points in this Microarray dataset is 11.
28
4.5 Limitations of the co-‐expression network
The co-‐expression network approaches have several limitations
including the following. First, the network similarity is based on the
Pearson Correlation Coefficient, which is sensitive to outliers.
Therefore the quality of the input matrix would be important to the
final result. It would be helpful to try the data transformation or use
Spearman’s rank correlation coefficient.
A second limitation is that the Pearson Correlation Coefficient
based co-‐expression network is more suitable for finding global co-‐
expression genes(Qian, Dolled Filhart et al. 2001), and it cannot
accurately detect the time-‐delayed or transient response of the down-‐
stream effectors for the time-‐series experiments. It would be better to
use local clustering (Qian, Dolled Filhart et al. 2001) to find the time-‐
delay or local co-‐expression genes, or other tools specialized in long
time-‐series experiments like The Graphical Query Language (GQL)
(Costa, Schnhuth et al. 2005).
A third limitation is that it is difficult to pick thresholds for a
biological network. The hard-‐threshold for the unweighted network
would arbitrarily cut off some biological meaningful edges. The weak
weight modules would also be cut off in the weighted network while it
is possible that this kind of weak linkage would be biologically
meaningful.
29
Figures and tables
Figure 1 Pipeline.
30
Figure 2 Normalized unscaled standard error (NUSE).
One of the tests in the arrayQualityMetrics, NUSE, detected sample
LacZ3 as an outlier.
Figure 3 Heat-‐map of ASM and Heart candidate genes.
ASM candidate genes are red in the first and third column. A1: up-‐
down-‐up-‐ASM. A2: early-‐ASM. A3: late-‐ASM. Heart candidate genes
are red in the second column. H1: early-‐Heart. H2: late-‐Heart.
31
Figure 4 Output of the Short Time-‐series Expression Miner.
Significant clusters are colored at the top row.
5 10 15 20
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Scale independence
Soft Threshold (power)
Scal
e Fr
ee T
opol
ogy
Mod
el F
it,si
gned
R^2
1
2
3 45 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20
5 10 15 20
050
010
0015
00Mean connectivity
Soft Threshold (power)
Mea
n C
onne
ctiv
ity
1
2
3
45 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Figure 5 Selecting soft power.
The soft threshold power beta of 6 is chosen for calculating the
adjacency matrix since it reached a high topology model fit (R^2) and
high mean connectivity.
32
Figure 6 Ciona intestinalis weighted co-‐expression network.
The dendrogram results from average linkage hierarchical clustering.
The color-‐band below the dendrogram denotes the modules, which
are defined as branches in the dendrogram. Of the 10, 079 genes,
6162 were clustered into 13 modules, and the remaining genes are
colored in grey.
33
black blue brown green greenyellow grey magenta pink purple red tan turquoise yellow
Dynamic−cutree Module Significance(COE−COEW modt) p= 3.1e−86
Dynamic Module
coes
ig0.
00.
20.
40.
60.
8
black blue brown green greenyellow grey magenta pink purple red tan turquoise yellow
Cou
nts
010
0020
0030
0040
00
Figure 7 Module significance.
Module significance is determined as the average absolute gene
significance (defined by minus log of a p-‐value) measure for all genes
in a given module.
34
●●●
●
●
●
●●●
●
●
● ● ●●● ●●●●
●
●● ●●
●●
●
●
●● ●● ●
● ●● ●●
●
●●● ●●
●
●● ●
●●
● ●● ●●●
●● ● ●● ●●
● ●
●
●●●
●
●●● ● ●●
●●
●●●
●
●
●
● ● ●●
●
●● ●●●●
●●●●
●
●●
●●
●●●●
●
●● ●
●●
●●
●●●
●
● ●●●●●
● ●
●
●●● ●
●
●●●● ●
●●
●
●
●●
●
●●
●
●●
● ●●● ●●●●●● ● ●●● ●
●● ●●● ●●●●
●
●●● ●●●
●
●● ●●●● ●
●●● ●●●●●
●●●
●
●●●●● ●●●●●●● ●
●
●●●
●●
●●●
●● ●● ●●
●●●●
●● ●● ● ●
●●
●
●●●●
● ● ●● ●● ●● ●●●● ●●●●●
●
●●
●●●● ●●● ● ●
●
● ● ●● ● ●●●●●●●●
●
●●●
●●
●
●● ●●●●
●●
●
●
●
● ●
●
●
●
●
●
●●●●●● ●●
●●
● ●● ●●
●
●●● ●
●
●●●
●
● ● ●●●
●●
●
●
●● ●●●
●
●●●●● ●●
●●●●
●●●
●
●●●
●
●●● ●●●
●●●● ●●
● ●●●●●●
●●●●●● ●
●● ● ●
●
●●●● ●●●
●
● ●●●●●
●
●●●●●●● ●
●
● ●●
●● ● ●●
●●●●
●●●●●●●
●
●●
●
●
●●●●●●
●
●●
●
●
●
●
●
●
●
● ●● ● ●● ●●● ●●●●
●
●●● ● ●●●
●●●
● ●●
●
●●● ●●●●
●●●●
●●● ●●● ●
●● ●●●
●
●
●
●
●
●● ●
●
●●●●●
● ●
●
●
●● ●
●
● ●●
●
● ●●●●
● ●●●
●●●●
●●
●
●●
●
●●●● ●●● ●
●
●● ●●●●●● ●●●
●●
●● ●●● ●
●●
●
●
● ●
●
●
●● ●
●
●●●
●●
●
●●
●
●
●
●●● ●●● ●● ●
●●●●
●●
●●
●● ●● ●●
●
●
●●●● ●
●●●●●
●●
●● ● ●●
●
●●● ● ●
●
●
●●●●●
●●●●● ●●● ●● ●
●
●●●●
●●● ●
●
●●
●●● ●●
●
●● ●●● ●
●●●
●●
● ●●●
●
●●●● ●●●
●● ●● ● ●
● ●● ●●
●●
●●
●●
●●●
●●
●
● ●●
●●●●●●
●●
●
●●
●
●●
●●
●●●●
● ● ●●
●●
●
●●● ●●●● ●● ●
●
●● ●●
●
●●
●●●●●●● ●● ●
●● ●
●
●
●●●●●● ●●●●●●●●
●●
● ●●● ●●●
● ●●●
● ● ●●●● ●●●● ●
●●
●●●
●
● ● ●● ●●● ●
●
●
●●●
●
●●●● ●
●
●●
●● ●●●
●
●
●●
●●
● ●●●● ●●
●
●
●●
●●
●●●
●
●●
●●
●●●
●
●● ●●● ●
●
●●
●●●
●
●● ●●●●
●●
●
●● ●●●
●
● ●●●
●●●●
●●●
●
● ●●●● ●
●
●●●●
●
● ●●●● ●●● ●
●
● ●● ●●●●
●
●
● ●●●
●●● ● ●●
● ●
●
●●●
●● ●
●
●● ●● ●●● ●● ●
●●
● ●●●●
●
●● ●●● ●● ●●
●●
●●
●
● ●●●
●●
●
●●
●
●
●● ●
●
●● ●●
●●
●
●●●●
●
●
●● ●● ●●
●●●●●
●●● ●●●●●●
●●●
●●●
●● ● ●●●●●
●
●●
●●●
●
●●
●
● ●●
●●
●●●●●
●
●● ●●●
●
●
● ●●●
●
●●
●●●
●● ●● ●●● ●● ●
●
●●
●
● ●●
●●
●●● ●
●
●●●
●
●●●
●●● ●
● ●●● ●●
●
●●
●●
●
●●●
●
● ●●● ●
●
● ●
●●
● ●
●●●
● ●
●●
● ●● ●●
●
● ●● ●●
●●●
●●
●●
●●● ●●
●●●
●●● ●
●●●●
●
●
●
●●●● ●
●
●●●●● ●●
●
●
●●
●● ●●●●
●
●●●●●
●● ●●●● ● ●●●
●
●●● ●●
●● ●
●●
● ●●● ●● ●●● ●
●
●● ●●●●●● ●
●●
●●●
● ● ●●●●●
●
●●●
●
●
●
● ●●● ●●
●
●●
●
●
● ●●●
●
●● ●
●
●●● ● ●● ●
●
●
●
●● ●
●
●●●●
●●● ● ●
●
●
●●●
●
●
●
●
●
●●●● ● ●●● ●●●●
●●●● ● ● ●
●●● ●
●●●●●●● ●
●
●
●● ●● ●
●
●●● ●● ●
●●
●● ●●●
●● ●●
●
●●
●
●
●
●●
●
●●●
● ●
●●
●●
●●●●
●●
●●●
●●●
●●●● ●●● ●
●
●●●
●
●
●●
● ●●
● ●
● ●●● ●
●●
●●● ●●● ●●
●● ●●●●● ●● ●●
●●
●●●
●●
●
●●●● ● ●●
●
●
●
●
●
● ● ●● ● ●●
●
● ●●●●●●
●● ●
●●●●
●●● ●●● ●
●●
●
●●●
●
●●●●
●●
●
●●
● ●●
●●● ●●
●
●● ●●
●●●●
●●●
● ● ●●
●●●●
●● ●●●
●●●●
●
●● ●● ●●●
●●
●●●●●● ●●●●●
●● ●
●
● ●● ●●● ●●
●
●● ●●● ●
●
● ●●●
●
●
● ●●● ●●
●●
●
●
●●● ●● ●
●● ●●●
● ●●●●●●●
●●● ●●●● ●●●● ●●●●● ●
●●● ●●●
●●●●●● ●● ●● ●●
●
●●●●●● ●
●
● ● ● ● ●●●
●
●
●
● ●●
●●
●
●
●
●
●
●●●● ●● ●
●
●
●●
●
●
●
●●●●●
●●
●
●●●
●
●
●
●●●
●●●●●●●● ●
●●
●
●● ●
●
● ●● ●●●●●●
●●
●
●
●● ● ● ●●●
●
●●
●
●●●
●● ●●
●
●●●
● ●●●●
●
●
●
●
●●
●
●●●
●●
●
●
●
●● ●●●●●●
●
●●
●●
●●
●●● ●● ● ●● ● ●●●
●
● ●● ●
●●● ●●
●● ●
● ●●● ●
● ●●●● ●● ●
●
● ●●●●
●●●●
●
● ●●●
● ●●●●●
●
●
●
●●
●●●
●
●●● ●●
●●●
●
● ●●
●● ●
●
●●●●●●
●●
●●
●
●●
●
●●
●●
● ●●●●●●
●
●● ●●●●●
●● ●● ● ●●●●
●●●● ●
●●●● ●
●
●
●●
●
●
●
●●●
●● ●● ●●
●●●●
●● ●●●
●●
●
●● ●●●●
●
●●
●
●
●●●
●
●●●
●
●● ●●●
●
●● ●●
●
●● ●●
●
●
●● ●●● ●●
●● ● ●● ●●●● ●●
●●
●
●●
●●● ●● ●●● ●
●
● ●● ●●●
●●●
●● ●● ●● ●●●● ● ●●● ●● ● ●●● ●
●●●● ●● ●
●
●●●●
●
●
●
●● ● ●●● ●●●●●●
●
●●●
●
●●●
●
●
●●●
● ● ●
●● ●
●
●
●●
●●
●
●
●●
●
●
●
●● ●●
●●
●
●● ●●● ● ●●
●●● ● ●●● ●● ●
●●
●● ●●
●●
●●●
●
●●●●● ●● ●●
●●
●● ● ●● ●
●
●●
●
●●● ●
●
●● ●●
●● ●●●●
●
●
●●
●
●● ●● ●●
●● ●
●● ●
●
●●
●●● ●●● ●
●
●● ●●
●
●● ●●●●
● ●●
●●●
●●●●
●
●●
● ●●●●
●
●● ● ●● ●●●
●
●●● ●
●
●●
●●
●
●●
● ●●●●
●
●
●
●●● ●● ●● ●●● ●●
●●●
●●● ●●●●●●
●● ●●
●●●
●
●●
●●
●
● ●●
●●●●●● ●
●●● ● ●●●● ●
●● ●● ●●
●●●
●
●●
●
● ●● ●●
●
●
●●
●●
●●
●●
●●●
●●
●●
●
●●●●●
●●
● ● ●
●
●●●
●●●● ●●
●
● ●●●●
● ●
●
●●● ●●
●●●
●
●
●
●
●● ●
●
●●
● ●
●
●●●
●
●●● ●●
●●
●
●●●● ●●
●
● ● ●●● ●
●● ●●●
● ●
●
●●● ●● ●● ●●
●
●
●
●●●●●●
●
● ●●●●● ●
●●●
●●● ●● ●
●
●● ●
●
● ●●●●● ● ●●● ●●●● ●
●●
●
● ●●
●●●● ●●●● ●●
●
●
●● ●●●
●
●●● ●● ● ● ●
●
●
●
●●● ●●
●●
●●
● ● ●● ●●●
●●●
●
● ●●● ●●●●
●●●
● ● ●●
● ●●●
●●●● ● ●
●● ●
●
● ●● ●● ● ●
●●● ●
●●● ● ●● ●
●
●●●●●●●●●●
●
●● ●
●
●
●● ●●● ●● ●●
●
●●● ● ●●
●
●● ●● ●●
●
●
●
●●
●
●●
●
● ●●●
● ●● ●●●
●
●
●
● ●●
●●●● ●●●● ●●●●
●●
●
●●● ●
●
●●●●
●● ●●●
●
●● ●●
●●●● ●
●
●
●
● ●●
●
●●
●●
● ●●
●● ●● ●●
● ●
●
●●●
●
●●
●
●
● ●●● ●●●
● ●●●●
●●
●
●●●●●●
●
●
●
● ●●● ●●
●●●
●
●● ●●● ●●
●
●●●●
●●●
●
●
● ●●● ●●● ●●● ●●● ●●●●●●●●
●
●●● ●●●
●●●● ●●● ●
●●●
●
● ●●
●
● ●
●
●● ●●
●
●
●
●●
●● ●● ●● ●●●
●
●●●●● ● ●● ●●●●
●● ●● ●
●●
● ●●●●●● ●
●●●●
●
● ●●● ●●
●
●● ●
●
●● ●●●●
●
●
●●● ●●
● ●●
●●●
●●
●
●● ●●● ●●●●
●
● ●● ●
●
● ●●●● ●● ●●●● ●●
●
● ●●●●
●
●●●● ●
●●●●●
●
●
● ● ●●●
●
●● ●● ●●●
●
●● ●●●●●●
●
●
●
●
●● ●
●●●●
●
●●
●
●●● ●●●● ●●●
●●●●
●
●●●●
●
●●
●
●● ●●
●
●●●●●●● ●
● ●
●●●
● ●●●
●
●● ●●
●
●
●
●
●
● ●
●
● ●
●
●●●●●● ●
●● ●
●
●
●
●●
●●●
● ●
●●● ●●
●● ●●●●
●●● ● ●●● ●
●●●
●
●● ●● ●●●
●
●●● ●
●
●●● ●
●
● ●●●●● ●●● ●● ●●
●●●●● ●
●● ● ●●●
●●
● ●
●
●● ●●
●●
●●●●
●
● ●●●● ●●●●●●
●●● ●●● ●●
●
●●●●● ●● ●●● ●
●
●●●
●● ●● ●●
●
●
●●●
●
●
●
●●
●●●●●
●
●
●
● ●●●●● ●●●
● ●● ●
●●●●●●
●●●●●
●●●●
●● ●●●
●
●● ●●●● ●●●
●
●●
●●● ●●●●
●●●●
●
●●● ●●
●
● ●●●
●●●
●
●
●
●● ●●●
●
●
● ●●●●●
●●●●●●
●●●●● ●
●●
●
● ●●●●
●● ●●
●●
●
●●
●
●●
●●●
●
●●
● ●●●●
●
●● ●
●
●●●●
●●●●●
●
●
●●
●● ●●●●
●●● ●
●●●●● ●
●●●
●
● ●
●
● ●●●
●● ●●
● ●
●
●●
●
●●● ●●●●●
●●●
●
● ●●●●●●● ●●
●● ●
●
●● ● ●● ●
●●
●
● ●●
●
●●● ●
●
●●
●●
●
●●●●
●●●● ●●
●
●●
●
●
●●
●●●
●
●● ●
●
●●●●● ●
●●●●
●
●● ●●●●● ●●
●●
●●● ●
●
●
● ●
●●●● ● ●●●
●
●●
●
●●● ● ●
●
● ●●
●
●●●
●
●
●●● ●●●●●
●● ●●●
●
●●● ●
●
● ●● ●●●
●● ●●
●●●
●●
●
●
●
●
● ●● ●●
●
●
●
●● ●●
●
●●
●●
●
●
●
●● ●
●
●● ●●
●●
●●●●●● ●●
● ●●●
●●
●
●
● ●●●●●● ●●●● ●●●
●
●
● ●●●● ●●
●
●●
● ●●
●●●●●●●● ●●
●
●●
●
●● ●● ●●●●
●
●
●
●
●●●
●
●●●● ●●● ●
●
● ●●
●
●
●●● ● ●●
●
●●
●●●●
●●
●
●●
●●
●
●●
●
● ●●●
●●
●
● ●● ●●
●
●
●
●
●●
●● ●● ●● ●●●● ●● ●●● ●● ●●● ● ●●
●●●●
●●●●
●
●●
●●
●●
●● ●●● ●
●
●
●●
●●
●
●●
●●● ●
● ●●
●●●●
●●● ●●●● ●
●●●
●●● ● ● ●●
●
●●●●●
● ●
●
● ●●●●
●
●●
●
●
●
●
●●●
●
●
● ●● ●●
●
●
●
●●● ● ●●●●
●
● ●●
● ●●
●
●
●
●● ●● ●●●●
●
● ●●●●● ●● ●
●●
●
●
●● ●●●
●● ● ●
●● ●
●
●●
● ●●
●
●● ● ●●
●
●●● ●●● ●● ●● ●
●
● ●●●
● ●●
●●●
●●●
●
●●●
●●
●
●
●
●●
●●
●● ●●
● ●● ●● ● ●●
● ●
●
●
●
●●● ● ●●● ●
●
●● ●● ●●●●●● ●
●
●
●
●●●●●●
●
●●●
0 2 4 6 8 10 12
01
23
45
6
grey cor=−0.023, p=0.14
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●●●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●●
●
●
●
●
●●
●●
●
●
●
● ●
●
●
● ●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
0 5 10 15 20 25 30
0.0
0.5
1.0
1.5
pink cor=−0.066, p=0.36
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●● ●
●
●●
● ● ●
●
●
●●●
●●
●●
●
●
●
●
●
●● ●●
●
●●
●
● ●●
●
●
●● ●
●● ●●
●●
●
●
●
●
●
●
●● ●
●● ●
●●●
●
●● ● ●
●
●
●
●●●
●
●
● ●●
●
● ● ●●
●
●●●●
●
●
●●
● ●
●●
● ●
●
●● ●
●
●●●
●
●
●
●● ●
●
●●
●●
●
●●
●
● ●● ●
●
● ●
●
●
● ●
●
●●
●
●
●
●● ●
● ●● ●● ●●●
●
●
●
●
● ● ●● ●●●
●
●
● ●●●●●●
●
●
●
● ● ●●
●
●●
●● ●● ●●● ●
●●
●●●
●
●
●●●●
●●●
●
●●
●
●
●
● ●
●
●●
●
● ●
●
●●
●
●● ● ●●
●●●
●
●
●●
●
● ● ●
●
●●●●
●
●
●
●
●
●
● ● ●●●
●
●● ●●●
●●
●
● ●● ●● ●
●
●
●
●● ●●
●●●●
●
● ●
●●●●
●
● ●●
● ●
●
●●
●●
●● ●
●●
●●
●
●●
●●● ●● ● ●● ● ● ●●
●
●
●
●
● ●● ●
●●
● ●●
● ●●●
● ●
●
● ●
●
●
●
●●●
●
●
● ●
●
●
● ●
●
●
●●
●
●
●
●● ●
●●
●
● ●●●●●
●● ●
● ● ●●●
●
●●
●
●●
●
●
●●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●
● ●●●●
●
●
●
●
●
●
●
●
●● ●● ●●●
●
●●● ●
●
●
●
●●●● ●●
●
●●
●
●●
● ●●
●●
● ●● ● ●●●
●
●●
●
●
●●
●
●
●●
●● ● ●
●●●●
● ●
●●
●● ● ●
●
●
●●
●●
● ●●
● ● ●
●
●
●
●
●●● ●
●
● ● ● ● ●● ●●● ●●
●
●●
●●
●●●
●
●●
●● ●
●● ●
●
●
●●
●● ●
●
●
●
●
●
●
●
●●●
●●
●
● ● ●●●
●● ●
● ●
●●●●
●
●●
●●
●
●
●
● ●●
●
●
●●
●●●
●
●
●● ●● ●
●
●
●
●
●
● ●●●
● ●●
●●●
●
●
●●
●●●●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
● ● ●●●
●
●
●
● ●
●
●
●
●●
●
●
●
●
● ●●
●
●● ●
●
●
●
●
●
● ●●
●
●
● ●●● ●
●
●●
●
●● ●
●●●
●●
●
● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●●
●
●●●●
●● ●
●
●●
●●
●
●●
●● ●●
●
● ● ●● ●
●
●●
●
●
●
●
●
●
●●
●●●●
●
● ●
●
●●
●
●
●
●●
●
● ●●●
●
●● ●
●●●● ●
●
●
● ● ●●
●●
●
●
●
● ● ●●● ●
●
●●
●
●
●●
●●
● ●
●
●
●●●●
●
●
● ●●
● ●
●
● ● ●●
●●●●●●
●●
●● ●●●
● ●●
●
●
●●
●
●
●
● ●●●
●
●
● ●●
●
●●
● ●●●● ●● ●●
●
●
●
● ●●
●
●●
●
●
●● ●● ●●●
● ●
●
●● ●
●●●
●
●
●
●●●
●
●
●●● ●●●●●
●●
●●
●●●
●
●● ●●
●
●●● ●●
●
●●●
●
●●
●● ●
●
●●
●●●
●
●
●
●
● ●● ●●● ●● ●● ●
●●●
● ●
●
●●
●
●●
●●
●
●
●●●
●
●
●●● ●
●
●
●● ●
●
● ●● ●
●
●● ●
● ●
●
●●
●●● ●
●
●●
●●●
●● ●
●
●
● ●
●
●●●
●● ●● ●●
●
●
● ● ●
●
● ● ●
●●
●
● ●● ●
●
●
●
●
● ●
●
●●
●
●
●● ● ●●
● ● ● ●● ● ●●●
●
●● ●
●
●●●●
●
●●● ●●●●
●●
●●
●
●
●
●●●● ●
●
●
●● ●
●
●
●
●
●
●
●
●●●
●● ●
●● ●● ● ●●●
●
● ●
●●
●
●●● ●
●
●
●
●
●●●●●
●●●
●
●●● ● ●
●
●●
●● ●● ● ●
●●
●●●
● ●●
●
●●
●
● ●
●
●
●●
●
●●●
●
●
●●●●●●
●●
50 100 150 200 250
01
23
45
turquoise cor=−0.0093, p=0.75
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●
●●
●●
●
●
● ●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●●●
●
●
●
●
●●
●
●
●●
●
●
●●● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
5 10 15 20 25 30
0.0
0.5
1.0
1.5
magenta cor=0.11, p=0.19
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●●
●● ●
●
●●
●
●●●●●
●
●
●
●
●●●
●●● ●
●●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●●●●
●
●●
●●
●●
● ●●
●
●●
●●
●
● ●●●
●●
●●● ●
●●
●
●
●●
● ●●
●●
●
●
●●●
●
●●
●
●
●
●●
●
●●●
●●
●●
●●
●
●
●●
●
●●●●
●●● ●● ●●●●
●●
●●
●
●
●●
●
●● ●●
●
●
●
●
●●
●
●●
●●●
●
●
●● ●
●
●
●
●
●●
●
●
●●
●●
●
●
● ●●
●●●●
●
●● ●
●●●
●● ●●
●
●
●●
●●
●
●
● ●●
●
● ●
●
●●
●●
●
●● ●
●● ●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
●
●
●
●
●
●●
●
●
●●●●
●
●
●●
●
●● ●
●
●
●●
●● ●●
●
●●●
●
●
●
●
●●●
●
●●●
●●
●
●
●●
●●● ●
●
●● ●●●●
●
●●● ●
●
●
●● ●●●● ● ●
●
●● ● ●
● ●
●●
●●
●
●
●●●●●
●●
●● ●●
●
● ●●
●
● ●●
●
●
●●
●
●
● ●
●●
●●
●●
●
● ●●●● ●●●
●
●
●●●
●
●
●
●●●●●
●●● ● ●
●●
●
●
● ●●
●●●
●●
●●
●
●●●
●
●
●
●●
●●● ●●
●
●
● ●
●●
●●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●●
●●●
●●
●
●●
● ●●
● ●●●
●
●●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●● ●●● ●
●
●
● ●
●
●●● ● ●
●●
●●●
●●●
●
●● ●
●
●●
●●
●●
●●
●
●
●
●●
●●
●●
●●
●●●
●●●
●●
0 10 20 30 40
0.0
0.5
1.0
1.5
2.0
2.5
red cor=−0.09, p=0.036
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●
●
●
●
●
●
●●
●●
●
●●●
●●●●●●
●
●●
● ●●●
●●● ●●●
●
●●●●●
●
●●
●
● ●●
●
●
●
●●●●●
●
●●
●
●
●
●
● ●
●●
●
●
●●
●
●
●
●●●●
●
●●
●●
●●
●
●●●
●● ●●●● ●● ● ●●●●
●
●● ●
●
●
●
●● ●
●
●●
●
●
●
●
●●●● ●●
●
●●
●
●
●● ●●
●●● ●●
●● ● ●●●
●
●● ●
●
●●
●●
●
● ● ●● ● ●
●
●
●
●● ●
●
●●●
●●
●
●●●
●
●●
●
●●● ●●●●
●
●
●
●●
● ●
●
● ●
●
●● ●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●●●●
●●
● ●
●
●●
●●●
● ●●●
●●
●
●
●
●●
●
● ●● ●●
●●
●●●
●
●●
●
●●
●
●
●●
●●
●
●
●
●●
●
●
●
●
● ●●●
●
●●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●● ●
●
●
●
●
●
●●●● ●
●●
●
●
●●●
●
●●● ● ●●●●
●
●
●
●●
●
● ●
●
●●
●●
●
●
●● ●
●
●●
●●● ●
●
●
●●
●
●
●
●●
●
● ●●●
●●●●●● ●
●●
●
●
●
●
●
●
● ●●●●●● ●● ●●
●
●
●
●
●
● ●●
●
●●●
●● ●
●
●
●
●
●
●●●
●
●●●
●
●●●
●
●●●●●● ●● ●●● ●
●
●
●
●
●
●
●
●
●●●●
●
●●●
●
●
●
●
●
●
●
●
●
● ●●
●● ●●●
● ●
●●
●●● ●● ●●
●
●
●●
●
●●●
●●
●
●
●
●●●
●
●●●●
●●
●●
●●●●●●
●
●
●
●
●●
●●
●●● ●
●
●●●
●
● ●●●● ●●●●● ●
●
●●● ●
●
●
●● ●● ●
●
●
●●
●
●
●●●●●
●
● ●●
●●●
●
●
●●●●
●
●
●●●
●
●
●
●●●●●
●●●●
●
●● ●●● ●●●
●● ●
●●●●●
●
●
●
●
●●
●
●●●
●●● ●
●
●
●
● ●●
●
●●●
●
●●●●●●
●
●
●
●●●
●●
●
●
●●●●
●
●● ●● ●
●
●
●
●
●●●● ●●●●●
●
●
●●
●
●●
●
●●●
●● ●
●●
●
●●
●
●
●
●
●●●
●●●
●●
●
●●●
●●
●
●●●
●●● ●●●
●●●
● ● ●●
●
● ●● ●●● ●● ●●●
●●●●●● ●●●
●
●●●
●
●
●●
●
●
●
●●
●
●
● ● ●●
●
●●
●
●
●
●● ●
●
●●
●
● ●
●
●
●
●●
●●●
●
●
●
●●
●●
●
●
●●
●
●● ●
●
●●
●●
●
●
●●
●
●
●
● ●
●
●●
●
●
●●●●● ● ●
●
●●●●
● ●●
●
●●● ●●● ●●
●
●
●
●
●●
●
●●
●
● ●
●
●●● ●●●
●
●●●● ●
●
●
●
●
●
●●
●
●
●●
●
●●
●
● ●● ●
●
●●● ●●
●
● ●●
● ●●
●
●●●●
● ●
●
●
●●
●
●
●
●●● ●●
●
●●
●
●●●
●●
●
●
●●●
●
●●●●●
●
●
●
●
● ●●●●●
●
●
●
●●●
●●●
●●●
●
●●●
●●
●
●
●
●
●● ●●●
●●
●
●
●
●
● ●
●
●●
●●
●
●●
●
●●
●
●
●
● ●●
●
●●●● ●● ●● ●●
●● ●
●
●●●
●
●● ●●● ●
●
● ●●●● ● ●●●
●
●
●●
●
●
●
● ●●
●
●
●●●
●● ●●●●
●
●●●● ●●●
●
●
●
●● ●
●
● ●●●●
●
●●
●
●●
●
●●
●●
●● ●●
●
●●
●●●● ●●
●
● ●
●●
●
●
● ●●
●
●●●●
●
●
● ● ●●●● ●●●
●●●
● ●●
● ●
●
●●
●
●
●
●
0 5 10 15 20 25
01
23
4
blue cor=0.28, p=2.3e−22
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
● ●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
0 2 4 6 80
12
34
tan cor=0.5, p=1e−07
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce●● ●
●
●
●●
●
●
● ●
●
●
●
●●●
●● ●
●
●●
●
●
●
●●●●●
●
●
●
● ●●
● ●
●●
●
●
●●●● ●●
●
●
●
●
●
●● ●●
●●
● ●
●
●
●
●
●
●●
●
●●●●
●
● ●●
●
●●●●●● ●●●
●
●
●●
● ●●●
●
●● ●
●
●●●
●
● ●
●
●● ●●●●
●
●
●
●
●
● ● ●●●●
●
●●●
●
● ●●●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●●●●
●
●
●●
●
●●
●●
●
●
●●
●●● ●●
● ●●●
●
●●
●●
●
●●●●●●
●
●
●●
●
●●
●●
●
●
●
●●
● ●●
●
●
●
●●●
●●●● ●
●
● ●●
●
●
●
●
●●● ●
● ●
●
●
●●
●● ●●
●
●●
●
●
●
●
●
●●●
●
● ●●●●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●● ●●●
●●
●●●
●
●●●
●●
●
●
●
●
●●
●
●
●
●●
● ●
●
●
●●
●
●
● ●●
●
●●
●
●● ●
●
●
●
● ●●
●●
●
●●● ●●
●
●
●●● ●
●●● ●●●●
●
●
●
●●●●
●
●● ●
●
●●
●
●
●
● ● ●●
●
●●●
●
●●●●
●●●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●●●● ●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●●● ●
●●
●●
●●●
●
●● ●
●
●●●
●●●●●
●
●
●
●
●
●●●
●
●
●
●●
●
●●
●
●
●
●●● ●●●●
●● ●
●●
●
● ●
●
●
●●
●
●●
●
●
●
● ●●●●●
●
●
●●●●
●
●
●
●
●
●●
●
●
●
●●
● ●●●●●●
●●
●
●
●●
●
● ● ●
●
●
●● ●●
●
●
●
●●
● ● ●●●●●
●
● ●●●
●
●
●
●
●
●
● ●
●
●●●●
●
●
●
●
●
●
●
●● ●●
●●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●●●
●
●
● ●●●
●
●
● ●●
●
●
●
●●
●●●
●
●
●
● ●
●
●●
●●
●
●●
●
●●●
● ●●●
●
●●●●●
●●
●
●
●
●
●
●
●● ●●
●
●
●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
● ●●
●
●
●●
●
●
●●●● ●●
●●
●
●●●● ●
●
●
●
●
● ●●
●
●●
●●●
●
●●
●
● ●●● ●
● ●
●
●
●
●●●●●
●
●
0 20 40 60 80
01
23
45
6
brown cor=0.61, p=5.9e−79
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●●
●
●●●●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●●●
●
●
●●●●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●●
● ●
●●●
●
●
●
●
●
●●
●
●●
●
●●●●
●
●
●●
●
●●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●●
●●●●
●●●●
●
●●
●●
●●●●
●●
●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●●●●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●●
●
●
●
●●
● ●●●
●
●●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●●
●
●
●●
● ●
● ●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
● ●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
● ●
●
●●●
●
●●●
●
●
●
●
●
●
●
● ● ●
●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●●
●●●●●●
●●●
0 5 10 15 20 25 30
0.0
0.5
1.0
1.5
2.0
2.5
black cor=0.24, p=2.6e−06
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●
● ●
●
●
●●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
● ●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
● ●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●●
●
●●
●
●
2 4 6 8 10
0.0
0.1
0.2
0.3
0.4
0.5
greenyellow cor=−0.13, p=0.14
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●●●
●
●
● ●● ●
●●
●
●
●●
●
●
●
●
●
●●●
●●
●
●
●
●●●
●●
●
● ●●
●
● ●●●
●
●
●●
●●
●
●●●●
●
●
●●
●
● ● ●
●
●
●
●
● ●●
●●
●
●
●●●● ●
●
●
●
●
●●●
●
●
● ● ● ●●
●
●
●● ●●●●
●● ●
●●●●
●●
●
●●
●●
● ●
●
●
●
●●
●●● ● ●●
●●
●
●
●●●
● ●
●
●
●●
●●
●●
● ●
●
●
●
●
● ●
● ●
●●
●● ● ●● ●●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
● ●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●●
●●
●●
●
●
● ●●
●
● ●●
●
●●● ●
●
●●
●●
●●
●●●●
●
●
●●●
●●
● ●
●●
●
●
●
●
●●●●●
●
●
● ● ●● ●●
●●
● ●
●●
●
●
● ●
●
●● ●
●●
●
●●●
●●● ●●
●
● ●●
●
●
●
●
●
●
●
● ●●●
●
●●
●
●●
●●
●
●●
●●● ●●●
●
●●●
●
●
●●
● ●
●
●●
●●
●●●
●● ●●
●●
●
●
●●
●
● ●●
●
●
●●●
●
●
●●
●
●
●
●
●
●●●●●
●
● ●
●
●
●
●
●
● ●●● ●
●●
●●
●
●●
●●● ●●
●
●● ●●
●
●●●
●
●●
●
●●
● ●●
●
●
●
●
●
●● ●
●●
●
●
●●
●●●
● ●● ●
●●
● ●● ●
●
●●
●
●
● ●●
● ●
●●
●●●
●
● ●●
●●
● ●●
●
●
●
●
●●
● ●
●
●●
●● ●
●
●
●●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●●
●●
● ●● ●●● ●●
●
●
●
●●
●
●
●●
●●
● ●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
● ●
●
●●●●
●
●●●●● ●
●●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●●●
●
●
●
●●●●
●
●
●
●
● ●●
●
●●
●●
●
●
●
●
● ●
●●●
●
●
●
●●●
●
●
●●●
●
●●
●
●● ●
●
●
●
●
●
●●
0 10 20 30 40 50 60
0.0
0.5
1.0
1.5
yellow cor=−0.044, p=0.27
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●●●
●
●●
●
● ●● ●
●
●
●
●●●
●
●
●●
●●●●
●
●
●
● ●
●
●
●
●
●●
●
●● ●
●
● ●●
●
●
●
●
●
●
●●●●
●
●
●●
●●●
●
●●
●
●
●
●
● ●●
●●
●
●
●●●
●● ●
●
●
●● ●●● ●
●
●
●
●●●●
●●
●●
●
●
●●
●●
●
● ●●●
●●
●●
●
●
●
●
●● ●● ● ●●
●
● ● ●●
●
●●
●●
●
●● ●
●
●
●
●
●
● ●●● ●
●
●
●●
● ●●
●●
●
● ●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●● ●●
●
● ●
●●
●
●
●
●●
●
●
●●
●●
●●●●
●●
●●●
●
●
●
●●
●
●
●●●
●●
●
●●●
● ●●●●●
●
● ●● ● ●●
●
●
●
●
●
●
●●●
●
●● ●
●●
●●●●
●●
●
●
●●
● ●
●
●
●●●
●
●
●
●
● ●●●
●
●●●
●
●
● ●●
●
●●●
●
●
●
●
●
●
●●
●
●
●●●
● ●
●●●
●●●●
●
● ●●●●●
● ●
●
●●
● ●
●●
●
●
●●
●● ● ●●
●●●
●
●
●
● ●●●
●
●
●●
● ●●
●
●
●
●
●●●●
●●
●
● ● ●●●
●●
●
●●●●
●
●
● ●
●
●●
●● ●● ●
●●●
●
●
●●
●
●
●
●●●
●
●
● ●
● ●●
●
● ●● ●
●●
●
●
●●
●● ●●
●
● ●●●
●
●
●
●
●
●●
●
● ●●● ●
● ●
●
●
●
●
●
●
●●●● ●
●● ●●
●
●●●
●●●●
●●
●
●
●
●● ●● ●
●
●● ● ●
●●
●●●
●●
●
●
●
●
●
● ● ●
●
●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●●● ●●●●●
●
●
●●
●
●
●●
●
●●
● ●●
●●
●
●
●
●
●●●●
●
●
●
●●● ●
●
●
●●
● ● ●● ●
●
●
●
●
●
●●
●●
●
●● ● ●
● ●
●●
0 5 10 20 30
0.0
0.5
1.0
1.5
2.0
green cor=−0.079, p=0.054
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
● ●
●●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●●
●
●
●
● ●
●
● ●
●
●●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●●●●
●
●
● ●
●
●
● ●
2 4 6 8 10 12
0.0
0.1
0.2
0.3
0.4
0.5
0.6
purple cor=−0.094, p=0.27
Connectivity
Gen
e C
OE−C
OE
W S
igni
fican
ce
Figure 8 Intramodular connectivity and module significance.
Intramodular connectivity measures how connected, or co-‐expressed,
a given node is with respect to the nodes of a particular module. It is
the connectivity in the subnetwork defined by the module.
35
Figure 9 STRING protein network.
The edge colors represent different evidences. Neighborhood: green;
Gene Fusion: red; Coocurrence: blue; Coexpression: black;
Experimental: magenta; Databases: cyan; Textmining: greenyellow;
Homology: light-‐blue.
Figure 10 Labeling in weighted network.
Different labelings in the network represent different data. Node
color: module color; node border color: significant clusters in STEM;
node shape: significant genes in moderated F test are diamond shape,
36
while not significant genes are round shape; node label color: ASM
candidate genes are blue, Heart candidate genes are red.
Figure 11 The 1st module inferred by AllegroMCODE for
unweighted co-‐expression network.
37
Figure 12 The 1st module of unweighted co-‐expression network
enrichment.
Figure 13 The 1st module inferred by AllegroMCODE for weighted
co-‐expression network.
Figure 14 The 1st module of weighted network enrichment.
38
Figure 15 Ribosome group in the String.
Differential GO-term Distribution
Test Set Reference Set
0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0 6 5 7 0 7 5 8 0 8 5 9 0 9 5% Sequences
ribosomestructural constituent of ribosome
translationribonucleoprotein complexstructural molecule activity
cytosolic ribosomecytosolic part
small ribosomal subunittranslational elongation
cellular protein metabolic processgene expression
cellular macromolecule biosynthetic processmacromolecule biosynthetic process
cytosolic small ribosomal subunitendocrine pancreas development
translational terminationprotein metabolic process
non-membrane-bounded organelleintracellular non-membrane-bounded organelle
macromolecular complexcellular protein complex disassembly
cellular macromolecular complex disassemblyprotein complex disassembly
endocrine system developmentmacromolecular complex disassembly
cellular biosynthetic processpancreas development
viral genome expressionviral transcription
viral infectious cyclecellular component disassembly
viral reproductive processbiosynthetic process
cytosolreproductive cellular process
cellular macromolecule metabolic processviral reproduction
cytoplasmic partmacromolecule metabolic process
large ribosomal subunitreproduction
ribosome biogenesismacromolecular complex subunit organization
cellular macromolecular complex subunit organizationreproductive process
rRNA metabolic processrRNA processing
cytosolic large ribosomal subunitcytoplasm
cellular metabolic processrRNA binding
ribonucleoprotein complex biogenesisprimary metabolic process
RNA bindingribosomal small subunit biogenesis
ncRNA processingdevelopmental process
intracellular organelleorganelle
multicellular organismal developmentncRNA metabolic process
system developmentorgan development
erythrocyte homeostasisintracellular
metabolic processcellular component biogenesis
GO
Te
rms
Figure 16 Ribosome group in STRING network enrichment.
39
Figure 17 Ribosome group and COE.
Figure 18 Grey color genes.
40
Figure 19 Tan module
Figure 20 Brown module
41
Figure 21 Turquoise module enrichment.
Figure 22 Genes in turquoise plus STEM condition.
42
Figure 23 Genes of Turquoise plus STEM condition enrichment.
Figure 24 Sub-‐group of candidate genes in unweighted network.
43
Differential GO-term Distribution
Test Set Reference Set
0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5 40.0 42.5% Sequences
cardiac muscle tissue developmentheart process
heart contractionpositive regulation of heart contraction
striated muscle tissue developmentmyofibril assembly
actomyosin structure organizationmuscle contraction
muscle tissue developmentcardiac cell differentiation
muscle system processactin filament-based movement
circulatory system processblood circulation
regulation of heart contractionheart development
muscle structure developmentcellular component assembly involved ...
striated muscle cell developmentsarcomere
contractile fiber partmuscle cell development
heart morphogenesismyofibril
contractile fiberstriated muscle cell differentiation
system processanatomical structure formation involved ...
striated muscle thin filamentsarcomere organization
cellular component morphogenesiscardiac myofibril assembly
stress fiberactin cytoskeleton organization
muscle cell differentiationmuscle organ development
positive regulation of multicellular organismal processactin cytoskeleton
positive regulation of cell adhesioncardiac cell development
cardiac muscle cell developmentactin filament-based process
GO
Te
rms
Figure 25 Sub-‐group of candidate genes in unweighted network
enrichment.
Figure 26 ASM candidate genes in weighted network enrichment.
44
Figure 27 ASM and Heart candidate genes
Part A illustrates the generation of ASM and Heart cells from TVC. Part
B summerizes different temporal expression groups of ASM and Heart
candidate genes, with the count numbers and known markes. Arrows
represent the trend of their temporal expression.
45
References
1. BARABASI, A. and BONABEAU, E., 2003. Scale-‐free networks. Scientific American, 288(5), pp. 60-‐69.
2. BARABASI, A. and OLTVAI, Z., 2004. Network biology: Understanding the cell's functional organization. Nature Reviews Genetics, 5(2), pp. 101-‐U15.
3. CHRISTIAEN, L., DAVIDSON, B., KAWASHIMA, T., POWELL, W., NOLLA, H., VRANIZAN, K. and LEVINE, M., 2008. The transcription/migration interface in heart precursors of Ciona intestinalis. Science, 320(5881), pp. 1349-‐1352.
4. CONESA, A., GTZ, S., GARCA-‐GMEZ, J., TEROL, J., TALN, M. and ROBLES, M., 2005. Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Oxford: Oxford University Press.
5. COSTA, I., SCHNHUTH, A. and SCHLIEP, A., 2005. The Graphical Query Language: a tool for analysis of gene expression time-‐courses. Oxford: Oxford University Press.
6. DAVIDSON, B., 2007. Ciona intestinalis as a model for cardiac development. London, UK: Academic Press.
7. EISEN, M.B., SPELLMAN, P.T., BROWN, P.O. and BOTSTEIN, D., 1998. Cluster analysis and display of genome-‐wide expression patterns. Washington, D.C.: National Academy of Sciences.
8. ERNST, J. and BAR JOSEPH, Z., 2006. STEM: a tool for the analysis of short time series gene expression data. London: BioMed Central.
9. ERNST, J., NAU, G. and BAR JOSEPH, Z., 2005. Clustering short time series gene expression data. Oxford: Oxford University Press.
10. FREEMAN, T., GOLDOVSKY, L., BROSCH, M., VAN DONGEN, S., MAZIRE, P., GROCOCK, R., FREILICH, S., THORNTON, J. and ENRIGHT, A., 2007. Construction, visualisation, and clustering of transcription networks from microarray expression data. San Francisco, CA: Public Library of Science.
11. GENTLEMAN, R., 2005. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer-‐Verlag.
46
12. GREEN, Y. and VETTER, M., 2011. EBF proteins participate in transcriptional regulation of Xenopus muscle development. San Diego [etc.]: Academic Press.
13. HORVATH, S., 2011. Weighted Network Analysis : Applications in Genomics and Systems Biology. New York: Springer.
14. KAUFFMANN, A., GENTLEMAN, R. and HUBER, W., 2009. arrayQualityMetrics-‐-‐a bioconductor package for quality assessment of microarray data. Oxford: Oxford University Press.
15. LANGFELDER, P. and HORVATH, S., 2008. WGCNA: an R package for weighted correlation network analysis. Bmc Bioinformatics, 9, pp. 559.
16. LANGFELDER, P., ZHANG, B. and HORVATH, S., 2008. Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Oxford: Oxford University Press.
17. QIAN, J., DOLLED FILHART, M., LIN, J., YU, H. and GERSTEIN, M., 2001. Beyond synexpression relationships: local clustering of time-‐shifted and inverted gene expression profiles identifies new, biologically relevant interactions. London,: Academic Press.
18. SHANNON, P., MARKIEL, A., OZIER, O., BALIGA, N., WANG, J., RAMAGE, D., AMIN, N., SCHWIKOWSKI, B. and IDEKER, T., 2003. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Cold Spring Harbor, N.Y.: Cold Spring Harbor Laboratory Press.
19. SHANNON, P., REISS, D., BONNEAU, R. and BALIGA, N., 2006. The Gaggle: An open-‐source software system for integrating bioinformatics software and data sources. Bmc Bioinformatics, 7, pp. 176.
20. SMYTH, G., 2004. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. [Berkeley, CA]: Berkeley Electronic Press.
21. STOLFI, A., GAINOUS, T.B., YOUNG, J.J., MORI, A., LEVINE, M. and CHRISTIAEN, L., 2010. Early Chordate Origins of the Vertebrate Second Heart Field. Science, 329(5991), pp. 565-‐568.
22. SZKLARCZYK, D., FRANCESCHINI, A., KUHN, M., SIMONOVIC, M., ROTH, A., MINGUEZ, P., DOERKS, T., STARK, M., MULLER, J., BORK, P., JENSEN, L. and VON MERING, C., 2011. The STRING database in 2011:
47
functional interaction networks of proteins, globally integrated and scored. [London]: Information Retrieval Ltd.
23. TAMAYO, P., SLONIM, D., MESIROV, J., ZHU, Q., KITAREEWAN, S., DMITROVSKY, E., LANDER, E.S. and GOLUB, T.R., 1999. Interpreting patterns of gene expression with self-‐organizing maps: methods and application to hematopoietic differentiation. Washington, D.C.: National Academy of Sciences.
24. TUSHER, V., TIBSHIRANI, R. and CHU, G., 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proceedings of the National Academy of Sciences of the United States of America, 98(9), pp. 5116-‐5121.
25. WETTENHALL, J. and SMYTH, G., 2004. limmaGUI: A graphical user interface for linear modeling of microarray data RID B-‐5276-‐2008. Bioinformatics, 20(18), pp. 3705-‐3706.
26. ZHANG, B. and HORVATH, S., 2005. A general framework for weighted gene co-‐expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4, pp. 17.