47
1 Visualization of Ciona Intestinalis Coexpression Network by Hang Zhong A dissertation submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Biology New York University May, 2012

Visualization hang zhong

  • Upload
    ray4hz

  • View
    121

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Visualization hang zhong

1    

 Visualization  of  Ciona  Intestinalis  

Co-­‐expression  Network  

by  

Hang  Zhong  

 

A  dissertation  submitted  in  partial  fulfillment  

of  the  requirements  for  the  degree  of  

Master  of  Science  

Department  of  Biology  

New  York  University  

May,  2012  

 

 

 

 

 

 

Page 2: Visualization hang zhong

2    

ACKNOWLEDGEMENTS  

  I   would   like   to   thank   my   advisor,   Richard   Bonneau,   for  

providing  me   the   opportunity   to   participate   in   this   project,   ongoing  

guidance   and   support.   I   am   also   indebted   to   professor   Lionel  

Christiaen  for  inspiring  the  project.  This  thesis  could  not  have  come  to  

fruition  without   the  help  of  Florian  Razy,  who  offered   insightful  and  

thought-­‐provoking  input.    

I   am   also   everlastingly   grateful   to  Duncan   Penfold-­‐Brown   for  

teaching  me  the  programming.  I  would  also  like  to  thank  Kieran  Mace,  

Aviv  Madar,  Kevin  Drew,  Maximilian  Haeussler  and  Claudia  Racioppi  

who   so  patiently  offer   their   time  and   support.  Many   thanks   to  Todd  

Heiniger  and  Joel  Rodriguez  for  revising  the  thesis.    

Finally,   I   would   like   to   thank   my   family   for   the   invaluable  

support  they  have  given  me  in  the  course  of  my  life  and  studies.    

 

 

 

 

 

 

Page 3: Visualization hang zhong

3    

ABSTRACT  

The   abnormalities   of   the   heart   development   causes   most  

frequent  congenital  diseases  in  humans.  The  conservation  of  the  Gene  

Regulatory   Network   (GRN)   involved   in   heart   development,   cellular  

simplicity,  low  genetic  redundancy  and  relevant  evolutionary  position  

lead   researchers   to   study   the   ascidian   Ciona   intestinalis.   To   extract  

useful   information   from  the  Microarray  data   for   researchers   to   infer  

the  heart  network  in  Ciona,  this  thesis  not  only  applies  the  standard-­‐

based   approaches   to   find   the   differential   expression   genes,   but   also  

explores   the  network-­‐based   approaches   to   find   functional   group.  By  

visualizing  the  co-­‐expression  network     in  Gaggle,   the   list  of  ASM  and  

heart   candidate   genes   are   fine-­‐tuned.   In   addition,   the   modules  

containing   candiate   and   known   marker   genes   may   deserve   further  

study.  

Page 4: Visualization hang zhong

4    

 

TABLE  OF  CONTENTS  

ABSTRACT  ..................................................................................................................................  3  

1.   INTRODUCTION  ...............................................................................................................  7  

1.1   GENE  REGULATORY  NETWORK  OF  CARDIOGENIC  PRECURSORS  IN  CIONA  ...............................  7  

1.2   MICROARRAY  DATA  ANALYSIS  ...............................................................................................  8  

1.3   NETWORK  VISUALIZATION  THROUGH  GAGGLE  .......................................................................  9  

2.   METHODS  ........................................................................................................................  10  

2.1   MICROARRAY  EXPERIMENTAL  DESIGN  ................................................................................  10  

2.2   GENE  EXPRESSION  DATA  ....................................................................................................  10  

2.2.1   QUALITY  CONTROL  ........................................................................................................................  10  

2.2.2   PREPROCESSING  ............................................................................................................................  11  

2.3   STATISTICAL  TEST  ..............................................................................................................  11  

2.4   CLUSTER  ANALYSIS  ............................................................................................................  11  

2.5   FUNCTIONAL  ENRICHMENT  ANALYSIS  ................................................................................  12  

2.6   GENERATION  OF  NETWORKS  ..............................................................................................  12  

2.6.1   STRING  PROTEIN  NETWORK  ........................................................................................................  12  

2.6.2   UNWEIGHTED  CO-­‐EXPRESSION  NETWORK  ................................................................................  13  

2.6.3   WEIGHTED  CO-­‐EXPRESSION  NETWORK  .....................................................................................  13  

2.7   NETWORK  VISUALIZATION  .................................................................................................  14  

2.7.1   FILE  FORMAT  .................................................................................................................................  14  

2.7.2   ANALYZING  NETWORK  BY  PLUGIN  IN  CYTOSCAPE  ....................................................................  14  

3.   RESULTS  ..........................................................................................................................  15  

3.1   DIFFERENTIAL  EXPRESSION  ...............................................................................................  15  

3.1.1   EXPECTATION  OF  THE  MICROARRAY  DATA  ................................................................................  15  

3.1.2   ASM  AND  HEART  CANDIDATE  GENES  ..........................................................................................  15  

3.2   NETWORK  VISUALIZATION  IN  GAGGLE  ...............................................................................  17  

Page 5: Visualization hang zhong

5    

3.2.1   NETWORKS  .....................................................................................................................................  17  

3.2.2   FINDINGS  FROM  THE  NETWORK  VISUALIZATION  IN  GAGGLE  ..................................................  20  

3.2.2.1   GAGGLE  AS  INFORMATION  INTEGRATION  CENTER  ...............................................................  20  

3.2.2.2   MODULE  FROM  ALLEGROMCODE  .............................................................................................  21  

3.2.2.3   MODULE  FROM  WEIGHTED  NETWORK  ....................................................................................  22  

3.2.2.4   FINE-­‐TUNED  LIST  ......................................................................................................................  23  

4.   DISCUSSION  ....................................................................................................................  25  

4.1        ASM  CANDIDATE  GENES  ......................................................................................................  25  

4.2   ANNOTATION  IN  CIONA  INTESTINALIS  ................................................................................  25  

4.3   FUNCTIONAL  RIBOSOME  GROUP  AND  COE  ...........................................................................  26  

4.4   TIME-­‐SERIES  ......................................................................................................................  27  

4.5   LIMITATIONS  OF  THE  CO-­‐EXPRESSION  NETWORK  ...............................................................  28  

FIGURES  AND  TABLES  .........................................................................................................  29  

FIGURE  1   PIPELINE.  ...................................................................................................................  29  

FIGURE  2   NORMALIZED  UNSCALED  STANDARD  ERROR  (NUSE).  .................................................  30  

FIGURE  3   HEAT-­‐MAP  OF  ASM  AND  HEART  CANDIDATE  GENES.  ...................................................  30  

FIGURE  4   OUTPUT  OF  THE  SHORT  TIME-­‐SERIES  EXPRESSION  MINER.  ........................................  31  

FIGURE  5   SELECTING  SOFT  POWER.  ...........................................................................................  31  

FIGURE  6   CIONA  INTESTINALIS  WEIGHTED  CO-­‐EXPRESSION  NETWORK.  ....................................  32  

FIGURE  7   MODULE  SIGNIFICANCE.  .............................................................................................  33  

FIGURE  8   INTRAMODULAR  CONNECTIVITY  AND  MODULE  SIGNIFICANCE.  ...................................  34  

FIGURE  9   STRING    PROTEIN  NETWORK.  .....................................................................................  35  

FIGURE  10   LABELING  IN  WEIGHTED  NETWORK.  ........................................................................  35  

FIGURE  11   THE  1ST  MODULE  INFERRED  BY  ALLEGROMCODE  FOR  UNWEIGHTED  CO-­‐EXPRESSION  

NETWORK.   36  

FIGURE  12   THE  1ST  MODULE  OF  UNWEIGHTED  CO-­‐EXPRESSION  NETWORK  ENRICHMENT.  .........  37  

FIGURE  13   THE  1ST  MODULE  INFERRED  BY  ALLEGROMCODE  FOR  WEIGHTED  CO-­‐EXPRESSION  

NETWORK.   37  

Page 6: Visualization hang zhong

6    

FIGURE  14   THE  1ST  MODULE  OF  WEIGHTED  NETWORK  ENRICHMENT.  .......................................  37  

FIGURE  15   RIBOSOME  GROUP  IN  THE  STRING.  ...........................................................................  38  

FIGURE  16   RIBOSOME  GROUP  IN  STRING  NETWORK  ENRICHMENT.  ............................................  38  

FIGURE  17   RIBOSOME  GROUP  AND  COE.  ....................................................................................  39  

FIGURE  18   GREY  COLOR  GENES.  ................................................................................................  39  

FIGURE  19   TAN  MODULE  ...........................................................................................................  40  

FIGURE  20   BROWN  MODULE  .....................................................................................................  40  

FIGURE  21   TURQUOISE  MODULE  ENRICHMENT.  .........................................................................  41  

FIGURE  22   GENES  IN  TURQUOISE  PLUS    STEM  CONDITION.  ........................................................  41  

FIGURE  23   GENES  OF  TURQUOISE  PLUS  STEM  CONDITION  ENRICHMENT.  ...................................  42  

FIGURE  24   SUB-­‐GROUP  OF  CANDIDATE  GENES  IN  UNWEIGHTED  NETWORK.  ..............................  42  

FIGURE  25   SUB-­‐GROUP  OF  CANDIDATE  GENES  IN  UNWEIGHTED  NETWORK  ENRICHMENT.  ........  43  

FIGURE  26   ASM  CANDIDATE  GENES  IN  WEIGHTED  NETWORK  ENRICHMENT.  .............................  43  

FIGURE  27   ASM  AND  HEART  CANDIDATE  GENES  ........................................................................  44  

REFERENCES  ...........................................................................................................................  45  

 

 

 

 

Page 7: Visualization hang zhong

7    

 

1. INTRODUCTION  

1.1  Gene  regulatory  network  of  cardiogenic  precursors  in  Ciona  

      The   abnormalities   of   the   heart   development   causes   most  

frequent  congenital  diseases  in  humans.  The  conservation  of  the  Gene  

Regulatory   Network   (GRN)   involved   in   heart   development,   cellular  

simplicity,  low  genetic  redundancy  and  relevant  evolutionary  position  

lead   researchers   to   study   the   ascidian   Ciona   intestinalis(Davidson  

2007).  In  Ciona,  a  single  pair  of  blastomeres  called  B7.5  gives  birth  to  

the   anterior   tail  muscle   (ATM)   and   to   the   trunk   ventral   cells   (TVC)  

(Figure   27).   Following   migration   from   the   tail,   the   TVC   undergo  

asymmetric   cell   divisions   at   the   ventral   midline   of   the   trunk.   The  

medial   TVC   give   rise   to   the   heart   while   the   lateral   TVCs   migrate  

toward   the   atrial   placode   where   they   will   form   the   atrial   siphon  

muscles   (ASM).  Thus,   the  TVC  are   similar   to   the  multipotent   cardio-­‐

pharyngeal   progenitors   found   in   vertebrates,   while   ASM   are   likely  

equivalent  to  the  jaw  muscle  in  vertebrates.    

    A   few   years   ago,   the   first   cardiogenic   the   Gene   Regulatory  

Network   (GRN)   in   Ciona   was   proposed   (Christiaen,   Davidson   et   al.  

2008),  decoupling  genes  necessary  for  heart  specification  from  genes  

necessary   for   cell   migration.   Later   study   has   been   shown   that   ASM  

precursors   express   the   transcription   factor   COE   (Stolfi,   Gainous   et   al.  

Page 8: Visualization hang zhong

8    

2010),   which   is   necessary   and   sufficient   to   specify   ASM   fate.    

Misexpression   of   COE   in   the   whole   TVC   lineage   blocks   heart  

development   and   imposes   an   ASM   fate   to   all   cells.   Conversely,  

misexpression   of   a   constitutive   repressor   form  of   COE  provokes   the  

opposite  phenotype,  blocking  ASM   formation  and  causing  all   cells   to  

form   heart   tissue.   Using   the   genome-­‐wide   Microarray   analysis   to  

study  this  crucial  COE  gene  and  find  the  downstream  effectors  of  COE,  

it   is   expected   to   gain   insights   to   the   gene   regulatory  network  of   the  

heart.    

1.2 Microarray  data  analysis  

  Most   of   the   existing   studies   have   focused   on   the   differential  

expression  to  identify  genes  that  distinguish  different  sets  of  samples.  

It’s  quite  common  to  apply  different  testing  method,  such  as  t-­‐test,  F-­‐

test,   or   nonparametric   versions   of   the   Wilcoxon   test   to   rank  

thousands   of   genes,   and   the   most   significant   genes   are   select  

(Gentleman   2005).   Other   specific   statistical   methods   are   also  

commonly  used   in   the  Microarray  data  analysis,   such  as  Significance  

Analysis   of   Microarray   (SAM)     (Tusher,   Tibshirani   et   al.   2001)   and  

LIMMA   (Wettenhall,   Smyth   2004)   using   a   Bayesian   mixture   model.  

  Another   way   of   using   microarray   data   is   to   understand   an  

individual   gene   or   protein’s   network   properties   by   studying   the   co-­‐

expression,  where  genes  that  have  similar  expression  patterns  across  

a   set   of   samples   are   hypothesized   to   have   a   functional   relationship.  

Page 9: Visualization hang zhong

9    

This   co-­‐expression   network-­‐based   approach   is   consistent   with   the  

important  concept  that  has  emerged  over  the  past  decade—genes  and  

their   protein   products   carry   out   cellular   processes   in   the   context   of  

functional   modules   and   are   related   (Barabasi,   Bonabeau   2003,  

Barabasi,  Oltvai  2004).  

1.3 Network  visualization  through  Gaggle  

    It  has  been  well  recognized  that  visualization  plays  a  key  role  in  

helping   to   understand   biological   systems,   particularly   in   the   era   of  

high-­‐throughput   studies   with   a   wealth   of   ‘omics’-­‐scale   data  

(Gehlenborg,  O'Donoghue  et  al.  2010).  This  thesis  applies  the  simple,  

open-­‐source   Java   software   system  Gaggle   (Shannon,   Reiss   et   al.   2006)  

for   co-­‐expression   network   visualization.   Gaggle   is   a   cross-­‐platform  

system  integrated  with  diverse  databases  (KEGG,  BioCyc,  and  String)  

and   software   (Cytoscape,   DataMatrixViewer,   R   statistical  

environment,   and   TIGR   Microarray   Expression   Viewer).   With   four  

simple  data  types  (names,  matrices,  networks,  and  associative  arrays),  

researchers   can   explore   many   different   sources   and   variety   of  

software  tools  by  entering  these  information  into  the  Gaggle  Boss  and  

transferred  to  other  tools.    

 

 

 

Page 10: Visualization hang zhong

10    

 

2. METHODS  

   The  pipeline  of  this  thesis  is  in  Figure  1.    

2.1  Microarray  experimental  design  

  The  microarray  data  used  in  this  study  are  kindly  provided  by  

Dr.  Lionel  Christiaen.  It  consists  of  30,969  probe  sets  from  Affymetrix  

GeneChips.   The   perturbation   group   includes   LacZ   control,   the   over-­‐

expression  and  loss  of  function  of  transcription  factor  Collier/EBF/OIf  

(COE)   in   the   sorted   TVC   cells   at   21   hours   post   fertilization   (hpf)—

after   the  asymmetric  divisions  of   the  TVCs  but  before   completion  of  

the  ASM  migration.  Time-­‐series  group  is  comprised  of  11  time  points,  

every  2  hours  varying  from  8  to  28  hours  in  TVC  cells.    

2.2  Gene  expression  data  

2.2.1 Quality  control    

    This   thesis   applies   the   arrayQualityMetrics   (Kauffmann,  

Gentleman   et   al.   2009),   a   Bioconductor   package   for   quality   control.   It  

provides   an  HTML   report  with   several   diagnostics   plots.   In   general,  

the   array   will   be   discarded   if   it   is   identified   as   an   outlier   in   both  

before  and  after  normalization  in  the  report.    

    The   Microarray   data   firstly   is   imported   in   statistical  

programming   language  R,  and  then  carried  on   the  quality  control  by  

arrayQualityMetrics.   The   sample   LacZ.3   is   removed   since   it   was  

Page 11: Visualization hang zhong

11    

reported  an  outlier  in  both  before  and  after  normalization  (Figure  2).  

2.2.2 Preprocessing  

  The   cell   files   of   the   Microarray   are   normalized   by   the   RMA  

method   (Gentleman   2005).   The   expression   matrix   contains   30,969  

probes   and   48   arrays.   After   the   non-­‐specific   filtering   by   variance  

(IQR=0.5),  the  matrix  contains  15,484  probes,  48  arrays.    

  Using   the   collapseRows   function   in  WGCNA,   the   probes   with  

maximum  variance  are  selected  to  represent  genes.  After  merging  the  

probes,  the  merged  matrix  contains  10,079  probes  and  48  arrays.    

2.3  Statistical  test  

  The  merged  matrix   is   ranked   by  moderated   F   test   and   genes  

are   selected   with   significant   p-­‐value   (<0.05,   using   Limma   package)  

(Smyth   2004)   after   adjusted   by   Benjamini-­‐Hochnerg  method.     After  

ranking,  the  top-­‐rank  matrix  contains  4,307  probes  and  48  arrays.    

  The   top-­‐rank   matrix   is   imported   to   one   of   the   Gaggle   Geese  

MultiExperiment   Viewer   (MeV)   and   under   Significant   Analysis   for  

Microarrays   (SAM)   test   (COE   versus   COEW   group,   p-­‐value   <   0.05,  

1000  permutation,  FDR  =  0.9).    

2.4  Cluster  analysis  

Page 12: Visualization hang zhong

12    

  Hierarchical   clustering   is   performed   for   ASM   and   Heart  

candidate   genes   using   MeV,   using   Pearson   correlation   metric   and  

average  linkage  clustering.    

  The  time-­‐series  group  data,  totaling  36  arrays,  are  averaged  for  

each  time  point  and  imported  to  Short  Time-­‐series  Expression  Miner  

(STEM),  using  STEM  Clustering  Method.  

2.5  Functional  enrichment  analysis  

  Blast2GO   (B2G)     (Conesa,   Gtz   et   al.   2005)   is   a   comprehensive  

bioinformatics   tool   for   annotation,   visualization   and   analysis   in  

functional   genomics   research.   It   offers   a   suitable   platform   for  

functional  research  in  non-­‐model  species,  such  as  Ciona  intestinalis.      

  DNA   sequences   in   fasta   format   were   loaded   to   Blast2GO.  

15,629   genes   remained   in   the   Blast2GO,   followed   by   blasting,   go-­‐

mapping  and  yielded  Go-­‐terms   for  3,964  genes.  The   test  group   from  

different   lists   is   tested   against   the   reference   group   (3,964   genes)  

using  the  Fisher’s  Exact  Test  (p-­‐value  <  0.05,  FDR  correction).    

2.6 Generation  of  networks  

2.6.1 String  protein  network  

  Using  the  Ensembl  gene  name  in  this  filt.gene  matrix  as  input,  

the  genes  of  interest  in  the  Search  Tool  for  the  Retrieval  of  Interacting  

Genes   (STRING)   database   (Szklarczyk,   Franceschini   et   al.   2011)   are  

extracted   from   the   STRING   website   in   Text   Summary   format   and  

Page 13: Visualization hang zhong

13    

parsed   to   Cystoscape   simple   interaction   format   (SIF)     (Shannon,  

Markiel  et  al.  2003)  by  python  programming  language.    

2.6.2 Unweighted  co-­‐expression  network  

  The   Pearson   Correlation   Coefficient   for   all   pair-­‐wise  

comparisons   of   genes   is   calculated   from   filt.gene   matrix   in   R.   High  

correlated   genes   are   selected   with   cutoff   0.9   and   parsed   to   simple  

interaction  format  (SIF)    (Shannon,  Markiel  et  al.  2003)  by  python.    

2.6.3 Weighted  co-­‐expression  network  

2.6.3.1 Network  construction  

  The  procedure   can  be   found   in   the  WGCNA  website   (Horvath  

2011).    

2.6.3.2 Module  detection  

  Pearson  correlation  coefficients  are  calculated  for  all  pair-­‐wise  

comparisons   of   genes   across   all   samples.     The   resulting   Pearson  

correlation  matrix  is  transformed  into  the  weighted  adjacency  matrix  

with   the   above   power   beta   6.   The   average   linkage   hierarchical  

clustering   is   used   to   group   genes   on   the   basis   of   the   topological  

overlap  dissimilarity  measure  of   their  network   connection   strengths  

(Zhang,   Horvath   2005).   Using   a   dynamic   tree-­‐cutting   algorithm  

(Langfelder,  Zhang  et  al.  2008),  13  modules  are  found  with  the  minimum  

cluster  size  of  70  (Figure  6).  Genes  that  are  not  assigned  to  modules  

are  assigned  the  color  grey.    

Page 14: Visualization hang zhong

14    

2.6.3.3 Module  significance  

  The  p  value  of  moderated  t  test  is  the  output  from  topTable  of  

AffylmGUI  package  in  R  (Smyth  2004).      

2.7 Network  visualization  

2.7.1 File  format    

  The  output  files  from  WGCNA  are  parsed  to  simple  interaction  

format  (SIF)    (Shannon,  Markiel  et  al.  2003)  by  python.    

2.7.2 Analyzing  network  by  plugin  in  Cytoscape  

  AllegroMCODE  and  Network  Analysis  plugin   in  Cytoscape   are  

used   to   analyze   the   network.   Finding   the   cluster   automatically   is  

achieved   by   AllegroMCODE.  

Page 15: Visualization hang zhong

15    

 

3. RESULTS  

3.1 Differential  expression    

3.1.1 Expectation  of  the  Microarray  data  

Genes   that   are   up-­‐regulated   in   the   overexpression   of   COE   or  

down-­‐regulated   in   loss   of   function   of   COE   are   considered   ASM  

candidate   genes   downstream   of   COE,   while   genes   that   are   down-­‐

regulated  in  overexpression  of  COE  or  up-­‐regulated  in  loss  of  function  

of  COE  are  considered  Heart  candidate  genes  repressed  by  COE  (Stolfi,  

Gainous  et  al.  2010).    

Using   the   COE   and   COEW   group   as   two   classes   in   the  

Significant  Analysis   for  Microarrays   (SAM),   the   contrast  would   yield  

ASM  and  Heart  candidate  genes.    

3.1.2 ASM  and  Heart  candidate  genes  

3.1.2.1   Lists  from  SAM  

    336  significant  genes  are  derived  from  SAM  and  separated  into  

206  ASM  candidate  genes  (negative  in  SAM,  expression  of  COE  group  

lower   than   that   of   COEW   group)   and   130   Heart   candidate   genes  

(positive   in  SAM,  expression  of  COE  group  higher  than  that  of  COEW  

group).     These   two   groups   can   be   distinguished   by   the   first   three  

columns  in  the  heat-­‐map  (Figure  3,  Figure  27).    

Page 16: Visualization hang zhong

16    

  Based  on  the  Hierarchical  Clustering  and  observation,  the  ASM  

candidate  genes  can  be  roughly  divided  into  three  large  groups:  

  A1.  The  first  group  (up-­‐down-­‐up-­‐ASM,  61  genes),  shows  a  “U”  

shape   curve   through   the   time-­‐series   experiments,   with   the   earliest  

up-­‐regulation   right   at   the   experimental   time   point   of   8   hours.   This  

group  contains  Snail  (‘SNAIL’  in  the  thesis),  SET  and  MYND  Domain  1  

(SMYD1)  and  Myodblast  determination  protein  (Myod,   ‘MYOD’  in  the  

thesis).    

  A2.   The   second   group   (early-­‐ASM,   45   genes),   including   COE  

and   Myocyte   Regulatory   Light   Chain   (MRLC5,   ‘MYL5’   in   the   thesis)  

gene,  shows  early  up-­‐regulation  around  14  hours.    

  A3.   The   third   group   (late-­‐ASM,   100   genes)   has   relatively   late  

up-­‐regulation  after  18  hours,  with  myosin  heavy  chain  genes  (MHC3),  

tropomyosin   1(TPM1,   ‘CTM1’   in   the   thesis)   and  muscle   like   actin   2  

(MA2)  in  the  group.    

  The   Heart   candidate   genes   can   be   divided   into   two   large  

groups:  

  H1.   The   first   group   (early-­‐Heart,   99   genes)   shows   early   up-­‐

regulation  (before  20  hours),  containing  heart  markers  BMP2/4,  NK4,  

NOTRLC/HAND-­‐LIKE,  and  ETS/POINTED2.    

Page 17: Visualization hang zhong

17    

  H2.  The   second  group   (late-­‐Heart,  31  genes)  displays   relative  

late  up-­‐regulation  (after  20  hours),  with  mesenchyme  specific  gene  3  

(MECH3)  in  the  group.    

  As  expected,   two   lists  of  genes  have  some   important  markers  

in  them  and  noticeable  temporal  expression.  But  these  ASM  and  Heart  

candidate  genes  didn’t  show  Go-­‐term  enrichment  from  the  Blast2GO,  

which  might   indicate   the   need   to   fine-­‐tune   the   list,   even   though   the  

Blast2GO  with  few  go  terms  is  another  concern.  Further  improvement  

of  the  ASM  and  Heart  candidate  gene  list  would  be  necessary  to  know  

the  effect  of  the  non-­‐specific  filtering,  selecting  the  probe  for  a  gene  by  

maximum  variance  and  SAM  ranking.    

3.1.2.2   Clusters  from  STEM  

Total  7  significant  model  profiles  showed  in  the  STEM  output.  

23  out  of  the  206  ASM  candidate  genes  are  in  the  significant  profiles.  

Most  of   them  are   in  the  profile  20,  similar  to  the   late-­‐ASM,   including  

the  MHC3,  MA2  and  MYL5  genes.   For   the  Heart   candidate   genes,   13  

out  of  130  are  in  the  significant  profiles.    

3.2 Network  Visualization  in  Gaggle  

3.2.1 Networks  

3.2.1.1 STRING  protein  network    

  The   STRING   (Szklarczyk,   Franceschini   et   al.   2011)   protein  

network  is  created  to  make  good  use  of  the  existing  data  resources.    It  

Page 18: Visualization hang zhong

18    

provides   both   experimental   and   predicted   interaction   information  

from   computational   techniques,   presented   as   different   colors   in   the  

edge  (Figure  9).    

3.2.1.2 Co-­‐expression  network  

  The   network-­‐based   approaches,   also   termed   graph-­‐based  

approaches,   aim   to   extract   recurrent   expression   patterns   or  

conserved   module   from   the   rapid   accumulation   of   Microarray  

datasets.  The  Microarray  dataset  is  modeled  as  a  relation  graph  where  

each  node  represents  one  gene  and  two  genes  are  connected  through  

the   edge   based   on   certain   expression   correlation   parameter   (Zhang,  

Horvath  2005)   to  measure   the   similarity  between  expression  profiles  

(Pearson   Correlation   Coefficient   is   used   in   this   thesis).   The   graph,  

namely   network,   can   be   represented   by   an   adjacency   matrix   that  

encodes   whether   a   pair   of   nodes   is   connected.   For   unweighted  

networks,   entries   are   1   or   0.   For  weighted   networks,   the   adjacency  

matrix  reports  the  connection  strength  for  the  gene  pairs,  between  1  

and   0   (Zhang,   Horvath   2005).   The   concept   of   connectivity   in   graph  

theory,   also   termed   degree,   can   be   depicted   as   the   row   sum   of   the  

adjacency  matrix,  measuring   the  direct  neighbors  of   the  node   in   the  

unweighted   networks   and   connection   strengths   in   the   weighted  

network.        

Two  co-­‐expression  networks  are  generated  in  this  thesis.    

Page 19: Visualization hang zhong

19    

  The  unweighted  co-­‐expression  network  is  formed  by  the  genes  

with   the  Pearson  Correlation  Coefficient  higher   than  0.9.  A   total  766  

nodes   are   in   this   unweighted   network   with   clustering   coefficient  

0.311  (output  result   from  the  Network  Analysis  plugin   in  Cytoscape,  

measuring  the  cohesiveness  of  the  neighborhood  of  a  node).    

  The   genes  with   the   top   5000   strong  weight   are   outputted   to  

build   the   weighted   co-­‐expression   network   (cutoff   for   the   weight   is  

0.23),  a  total  of  814  nodes,  with  clustering  coefficient  0.728.    

  The  unweighted  network  has  more  isolated  clusters  with  only  

2  nodes  linked  by  1  edge.  The  weighted  network  has  greater  density  

with   some   hubs   (high   connectivity),   and   also   contains   colors   in   the  

node  for  the  different  modules  detected  in  the  WGCNA.    

           Though   these   two   networks   are   different   in   the   adjacency  

matrix,   they   are   both   based   on   Pearson   Correlation   Coefficient   to  

present   the   genes   of   high   similarity   in   the   graph   in   terms   of   their  

closeness.  In  other  words,  genes  of  same  expression  profiles  across  all  

of  the  experiments  would  be  close  to  each  other  in  the  network.  These  

network-­‐based  approaches  allow  for  the  exploration  of  the  position  of  

a  biological  entity  in  the  context  of  its  local  neighborhood  in  the  graph  

and   network   as   a   whole,   and   less   troubled   by   inherent   noise   that  

confound  conventional  pairwise  approaches  (Freeman,  Goldovsky  et  al.  

2007).    

Page 20: Visualization hang zhong

20    

3.2.2 Findings  from  the  network  visualization  in  Gaggle    

3.2.2.1 Gaggle  as  information  integration  center  

                          In  this  post-­‐genomic  era,  biologists  often  face  the  challenge  to  

freely   explore   the   experimental   and   computational   data   from  many  

different  sources  and  diverse  software  tools,  such  as  storing  different  

data  for  genes,  retrieving  data  from  a   list  of  genes,  and  mapping  one  

list  of   genes  with  another.  Once   the  network  has  been   loaded   in   the  

Cytoscape,   Gaggle,   as   an   information   integration   center,   can   help   to  

solve  these  problems  with  respect  to  Microarray  data.  

  Storing  different  data  for  genes  can  be  achieved  by  labeling.  As  

shown   in   the   Figure   9   and   10,   two   networks   present   data   from   6  

different  sources,  such  node  color   for  module,  node   label   for  ASM  or  

Heart   candidate   genes,   node   shape   for   significance   in   moderated   F  

test,   node   size   for   connectivity,   edge   color   for   different   interaction,  

and  distance  between  nodes   for  closeness.  Therefore   the  network   in  

Cytoscape  functions  as  a  visual  database.    

  Retrieving  data  from  a  list  of  genes,  such  as  expression  matrix,  

is   also   feasible   through   the  basic   function   “broadcast”   in  Gaggle.  For  

example,  a  list  of  genes  of  interest  in  the  Cytoscape  can  be  sent  to  the  

Gaggle  Boss,  and  then  broadcast  to  Data  Matrix  Viewer  (DMV),  which  

can  output  the  expression  matrix.    

Page 21: Visualization hang zhong

21    

  Mapping   one   list   of   genes   with   another   can   be   done  

conveniently   in   Gaggle   thourhg   the  many   functions   that   it   offers.   In  

the   MultiExperiment   Viewer   (MeV),   a   sub-­‐list   of   genes   can   be  

launched   in   a   new   viewer.   In   Cytoscape,   the   function   “Create   new  

network   from   selected   nodes”   can   be   used   in   this   task.   Between  

different   tools,   the   function   “broadcast”   would   serve   as   a   bridge   to  

transfer  the  list  and  map  it  in  the  existing  tools.  

3.2.2.2 Module  from  AllegroMCODE  

  The  main  goal  of  the  co-­‐expression  network  visualization  is  to  

find  the  highly  correlated  genes  (module)  related  to  the  ASM  or  Heart  

network,  specifically  aiming  to  infer  targets  of  the  transcription  factor  

COE.    

In   the   unweighted   network   without   predefined  modules,   the  

modules  can  be  automatically  detected  by  AllegroMCODE,  a  plugin  in  

Cytoscape   to   find   highly   interconnected   groups   of   nodes   in   a   huge  

complex  network.  The  1st  module  detected  by  AllegroMCODE  for   the  

unweighted   network   is   shown   in   the   Figure   11.   This   module   is  

significantly   enriched   in   biological   process   (Figure   12),   such   as  

biosynthetic  process  and  cellular  biosynthetic  process.    

  For  the  weighted  network,  the  1st  module  (Figure  13)  detected  

by   AllegroMCODE   contains   largely   turquoise   module   genes   (only   1  

Page 22: Visualization hang zhong

22    

grey  color  gene.  This  module  is  significantly  enriched  in  intracellular  

process  (Figure  14).    

    Comparing   these   1st   modules   of   unweighed   and   weighted  

network,  they  both  contain  ribosome  related  genes  (gene  name  starts  

with  “RP”).    Because  these  two  networks  are  both  generated  from  the  

same  Microarray   data,   an   external   reference  would   be   necessary   to  

determine   whether   this   ribosome   group   is   found   by   chance.   The  

common   list   of   23   genes   is   from   the   comparison   between   the   1st  

module   in   weighted   network   and   all   turquoise   module   genes   in  

STRING  network,  which  has  16  ribosome  related  genes.  

3.2.2.3 Module  from  weighted  network  

  Weighted   correlation   analysis   (WGCNA)   has   advantages   in  

identifying   candidate   targets   with   its   unique   mathematical   features  

(Langfelder,  Horvath  2008).  While  the  highly  correlated  genes  can  be  

grouped   into   different   modules,   those   genes   that   are   far   from   the  

modules  are  depicted   in  grey.  Figure  18  shows   that   these  grey  color  

genes   in   the   weighted   network   are   often   with   fewer   edges   and  

targeted   at   miRNA,   which   are   reasonably   different   from   other  

functional  modules.    

  In   Figure   7   and   Figure   8,   the   tan   and   brown   modules   have  

strong  module   significance   (the   significance   is   defined   as   –log10   (p-­‐

value   in  moderated   t   test)).   By   visualizing   these   two  modules   from  

Page 23: Visualization hang zhong

23    

their   top   50   intramodular   connectivity   genes   respectively,   these  

modules  can  be  found  enriched  in  the  ASM  and  Heart  candidate  genes.  

Interestingly,  NK4  gene  is  in  the  tan  module  with  other  genes  (Figure  

19).  Islet  (ISL)  gene,  which  is  not  in  the  candidate  list  yet  reported  to  

be  ASM  gene,  is  in  the  brown  module  with  some  known  markers,  such  

as   MA2,   MHC3,   NOTRLC/HAND-­‐LIKE,   and   ETS/POINTED2   (Figure  

20).    These  results  would  be  helpful  to  be  a  starting  point  for  making  

hypothesis  of  the  Heart  network  in  Ciona.    

  As   the   largest   module   in   the   weighted   network,   enriched   in  

cellular   process   and   others   (Figure   21),   it   is   natural   to   consider  

limiting  the  list  of  the  turquoise  module  genes  with  other  conditions.  

The  list  of  genes  resulted  from  turquoise  module  and  STEM  condition  

shows   a   clear   temporal   expression   and   enrichment   in   muscle   and  

heart   related   go-­‐terms   (Figure  22,   Figure  23),  while   containing  only  

four  genes  found  in  the  list.    

3.2.2.4 Fine-­‐tuned  list  

  The   network   in   Gaggle   can   serve   as   a   visualization   center   as  

well  as  a   fine-­‐tuning   filter   for  a   list  of  genes,  because   the  network   is  

built  upon  the  high  correlated  pair  of  genes  with  reduced  noise.   It   is  

by   no   means   the   genes   that   are   not   in   the   network   that   should   be  

discarded,   but   it   is   good   to   have   expected   go-­‐term   enrichment   to  

confirm   the   list.   Because   the   go-­‐term   enrichment   is   related   to   the  

Page 24: Visualization hang zhong

24    

proportion   of   genes   with   the   same   go-­‐terms,   the   number   of   noisy  

genes  in  the  whole  list  would  have  a  great  impact  on  the  enrichment.  

Importing   the   candidate   list   to   the   co-­‐expression   network   would  

reduce  the  noise  and  yield  better  enrichment  result.    

  By   “broadcasting”   function   in   the   MeV,   the   Cytoscape   can  

receive  and  label  the  336  significant  genes  in  the  unweighted  network  

with   yellow   color,   and   then   create   a   sub-­‐network   for   the   candidate  

genes.  A  subgroup  of   the  candidate  genes  (Figure  24)   is  significantly  

enriched   in   muscle   and   heart   related   go-­‐terms   (Figure   25),   which  

previously   could   not   be   reported   from   the   Blast2GO.   The   ASM  

candidate  genes  in  the  network  are  also  enriched  in  muscle  and  heart  

go-­‐terms  (Figure  26),  while  the  Heart  candidate  genes  in  the  network  

are  still  not  reported  enrichment  from  the  Blast2GO.    

 

Page 25: Visualization hang zhong

25    

 

4. DISCUSSION  

4.1 ASM  candidate  genes  

  COE   is   necessary   and   sufficient   to   specify   ASM   fate   (Stolfi,  

Gainous   et   al.   2010).     It   is   understandable   that   COE   expresses   earlier  

than  the  late-­‐ASM  genes  (A3  group),  such  as  MHC3,  TPM1,  MA2.  While  

for  the  up-­‐down-­‐up-­‐ASM  (A1  group),  it  has  the  earliest  up-­‐regulation,  

with  MYOD  in  the  group.  In  Xenopus,  the  cross-­‐regulatory  interactions  

of  COE  orthologs  with  genes  of  the  Myogenic  Regulatory  Factor  (MRF)  

family,  such  as  MYOD  and  MYF5,  are  crucial   for  muscle  commitment  

and   differentiation   (Green,   Vetter   2011).   However,   how   COE   may  

repress   the   cardiac   fate   and   promote   cell  migration   in   Xenopus   has  

never  been   studied.  A  possible  hypothesis   is   that   in  Ciona,   the   early  

functions   controlled   by   COE   in   ASM   precursors   are   independent   on  

MRF   activation   since   the   MRF   in   the   A1   group   has   earlier   up-­‐

regulation  than  COE  in  the  A2  group.    

And  the  A1  group  genes  are  more  likely  to  be  TVC  genes,  which  

also  can  explain   the   fact   that   there  are  heart   related  go-­‐terms   in   the  

enrichment  of  the  ASM  genes  in  the  weighted  network  (Figure  26).    

4.2 Annotation  in  Ciona  intestinalis    

  The  draft  of  genome  sequence  of  the  ascidian  Ciona  intestinalis  

(Dehal,   Satou   et   al.   2002)   has   been   a   valuable   research   resource.  

Page 26: Visualization hang zhong

26    

However,   there   are   numerous   inconsistencies  with   the   gene  models  

because  of   the   intrinsic   limitations   in   gene  prediction  programs  and  

the   fragmented   nature   of   the   assembly   (Satou,   Mineta   et   al.   2008).  

Therefore   the   annotation   job   for   the   probe   in   this   study   focuses   on  

combining   available   resources   from   various   databases,   such   as  

Aniseed   (Tassy,   Dauga   et   al.),   Ensembl   Genome   Browser   (Kersey,  

Lawson  et   al.   2010),   CIPRO   (Endo,  Ueno  et   al.),   STRING   (Szklarczyk,  

Franceschini  et  al.  2011),  UCSC  Genome  Browser  (Karolchik,  Hinrichs  

et   al.   2011),   and   also   internal   files   from   Dr.   Lionel   Christiaen’s   lab.  

There   are   16,250  non-­‐redundant   genes   in   the   30,969  probes,  which  

will   be   the   criteria   to  map   a   probe   to   a   gene.   It   is   unavoidable   that  

there  are  differences  between   the  gene  annotation   in   this   thesis  and  

other  sources.      

4.3 Functional  ribosome  group  and  COE    

The   highly   linked   ribosome   genes   in   the   STRING   network  

(Figure  19),  enriched  in  ribosome  process  (Figure  20),  naturally  lead  

to   a   question—what   is   the   relationship   between   this   functional  

ribosome  group  and  COE.  By  broadcasting   this   list  of   ribosomes  and  

COE   genes   to   MeV,   the   heat-­‐map   and   expression   plot   show   the  

similarity  in  the  time-­‐series  experiments  of  ribosome  group  and  COE.  

And   this   group   of   ribosome   genes   has   quite   a   stable   expression  

profile.   It   is   likely   to   find   more   housekeeping   genes   in   the   same  

module  as  the  ribosome  group,  which  is  not  the  focus  of  this  thesis.  

Page 27: Visualization hang zhong

27    

4.4 Time-­‐series  

Though   the   clustering   algorithms,   such   as   Hierarchical  

clustering   (Eisen,   Spellman   et   al.   1998),   K-­‐means,   and   Self-­‐organizing  

Maps   (SOM)   (Tamayo,   Slonim   et   al.   1999),   can   be   used   to   analyze   the  

Microarray   data   and   yield   many   biological   insights,   they   are   not  

designed  for  time-­‐series  data  since  they  assume  that  data  at  each  time  

point  is  collected  independent  of  each  other,  and  ignore  the  sequential  

nature   of   time-­‐series   data   (Ernst,  Nau   et   al.   2005).   This   thesis   applies  

the   Short   Time-­‐series   Expression   Miner   (STEM)   method   to   learn  

about   the   time-­‐series   experiments   with   the   hope   of   finding   clues  

about  the  true  biological  pattern,  which  is  designed  for  the  analysis  of  

short   time   series  Microarray   gene   expression   data   (Ernst,   Bar   Joseph  

2006).  The  algorithm  (Ernst,  Nau  et  al.  2005)  of  STEM  starts  by  selecting  

a  set  of  potential  expression  profiles,  covering  the  entire  space  of  all  

possible  expression  profiles  that  can  be  generated  by  the  genes  in  the  

experiment,   and   each   represents   a   unique   temporal   expression  

pattern.   Next,   each   gene  will   be   assigned   to   one   of   the   profiles   and  

after   the   permutation   resulting   in   different   large   clusters   with  

significant  model  profiles  by  greedy  algorithm  (Ernst,  Nau  et  al.  2005),  

which  are  colored  in  the  top  list  in  the  user  interface.    

It  is  worth  to  mention  that  the  STEM  is  designed  for  short  time-­‐

series   (defined   3   –   8   time   points   in   their   website);   while   the   time  

points  in  this  Microarray  dataset  is  11.    

Page 28: Visualization hang zhong

28    

4.5 Limitations  of  the  co-­‐expression  network    

      The  co-­‐expression  network  approaches  have  several  limitations  

including   the   following.  First,   the  network   similarity   is  based  on   the  

Pearson   Correlation   Coefficient,   which   is   sensitive   to   outliers.  

Therefore   the  quality  of   the   input  matrix  would  be   important   to   the  

final  result.   It  would  be  helpful  to  try  the  data  transformation  or  use  

Spearman’s  rank  correlation  coefficient.    

  A  second  limitation  is  that  the  Pearson  Correlation  Coefficient  

based   co-­‐expression   network   is  more   suitable   for   finding   global   co-­‐

expression   genes(Qian,   Dolled   Filhart   et   al.   2001),   and   it   cannot  

accurately  detect  the  time-­‐delayed  or  transient  response  of  the  down-­‐

stream  effectors  for  the  time-­‐series  experiments.  It  would  be  better  to  

use   local   clustering   (Qian,   Dolled   Filhart   et   al.   2001)   to   find   the   time-­‐

delay  or   local   co-­‐expression  genes,  or  other   tools   specialized   in   long  

time-­‐series   experiments   like   The   Graphical   Query   Language   (GQL)  

(Costa,  Schnhuth  et  al.  2005).    

  A   third   limitation   is   that   it   is  difficult   to  pick   thresholds   for  a  

biological   network.   The   hard-­‐threshold   for   the   unweighted   network  

would  arbitrarily  cut  off  some  biological  meaningful  edges.  The  weak  

weight  modules  would  also  be  cut  off  in  the  weighted  network  while  it  

is   possible   that   this   kind   of   weak   linkage   would   be   biologically  

meaningful.    

Page 29: Visualization hang zhong

29    

Figures  and  tables  

 

Figure  1   Pipeline.    

Page 30: Visualization hang zhong

30    

 

Figure  2   Normalized  unscaled  standard  error  (NUSE).    

One  of  the  tests  in  the  arrayQualityMetrics,  NUSE,  detected  sample  

LacZ3  as  an  outlier.    

 

Figure  3   Heat-­‐map  of  ASM  and  Heart  candidate  genes.    

ASM  candidate  genes  are  red  in  the  first  and  third  column.  A1:  up-­‐

down-­‐up-­‐ASM.  A2:  early-­‐ASM.  A3:  late-­‐ASM.  Heart  candidate  genes  

are  red  in  the  second  column.  H1:  early-­‐Heart.  H2:  late-­‐Heart.    

Page 31: Visualization hang zhong

31    

 

Figure  4   Output  of  the  Short  Time-­‐series  Expression  Miner.    

Significant  clusters  are  colored  at  the  top  row.    

5 10 15 20

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Scale independence

Soft Threshold (power)

Scal

e Fr

ee T

opol

ogy

Mod

el F

it,si

gned

R^2

1

2

3 45 6 7 8 9 10 11 12 13 14 15 16 17 18

19 20

5 10 15 20

050

010

0015

00Mean connectivity

Soft Threshold (power)

Mea

n C

onne

ctiv

ity

1

2

3

45 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

 

Figure  5   Selecting  soft  power.    

The   soft   threshold   power   beta   of   6   is   chosen   for   calculating   the  

adjacency  matrix  since  it  reached  a  high  topology  model  fit  (R^2)  and  

high  mean  connectivity.    

 

Page 32: Visualization hang zhong

32    

 

Figure  6   Ciona  intestinalis  weighted  co-­‐expression  network.    

The  dendrogram  results  from  average  linkage  hierarchical  clustering.  

The   color-­‐band   below   the   dendrogram   denotes   the  modules,   which  

are   defined   as   branches   in   the   dendrogram.   Of   the   10,   079   genes,  

6162  were   clustered   into   13  modules,   and   the   remaining   genes   are  

colored  in  grey.  

 

Page 33: Visualization hang zhong

33    

black blue brown green greenyellow grey magenta pink purple red tan turquoise yellow

Dynamic−cutree Module Significance(COE−COEW modt) p= 3.1e−86

Dynamic Module

coes

ig0.

00.

20.

40.

60.

8

black blue brown green greenyellow grey magenta pink purple red tan turquoise yellow

Cou

nts

010

0020

0030

0040

00

 

Figure  7   Module  significance.  

Module   significance   is   determined   as   the   average   absolute   gene  

significance  (defined  by  minus  log  of  a  p-­‐value)  measure  for  all  genes  

in  a  given  module.  

Page 34: Visualization hang zhong

34    

●●●

●●●

● ● ●●● ●●●●

●● ●●

●●

●● ●● ●

● ●● ●●

●●● ●●

●● ●

●●

● ●● ●●●

●● ● ●● ●●

● ●

●●●

●●● ● ●●

●●

●●●

● ● ●●

●● ●●●●

●●●●

●●

●●

●●●●

●● ●

●●

●●

●●●

● ●●●●●

● ●

●●● ●

●●●● ●

●●

●●

●●

●●

● ●●● ●●●●●● ● ●●● ●

●● ●●● ●●●●

●●● ●●●

●● ●●●● ●

●●● ●●●●●

●●●

●●●●● ●●●●●●● ●

●●●

●●

●●●

●● ●● ●●

●●●●

●● ●● ● ●

●●

●●●●

● ● ●● ●● ●● ●●●● ●●●●●

●●

●●●● ●●● ● ●

● ● ●● ● ●●●●●●●●

●●●

●●

●● ●●●●

●●

● ●

●●●●●● ●●

●●

● ●● ●●

●●● ●

●●●

● ● ●●●

●●

●● ●●●

●●●●● ●●

●●●●

●●●

●●●

●●● ●●●

●●●● ●●

● ●●●●●●

●●●●●● ●

●● ● ●

●●●● ●●●

● ●●●●●

●●●●●●● ●

● ●●

●● ● ●●

●●●●

●●●●●●●

●●

●●●●●●

●●

● ●● ● ●● ●●● ●●●●

●●● ● ●●●

●●●

● ●●

●●● ●●●●

●●●●

●●● ●●● ●

●● ●●●

●● ●

●●●●●

● ●

●● ●

● ●●

● ●●●●

● ●●●

●●●●

●●

●●

●●●● ●●● ●

●● ●●●●●● ●●●

●●

●● ●●● ●

●●

● ●

●● ●

●●●

●●

●●

●●● ●●● ●● ●

●●●●

●●

●●

●● ●● ●●

●●●● ●

●●●●●

●●

●● ● ●●

●●● ● ●

●●●●●

●●●●● ●●● ●● ●

●●●●

●●● ●

●●

●●● ●●

●● ●●● ●

●●●

●●

● ●●●

●●●● ●●●

●● ●● ● ●

● ●● ●●

●●

●●

●●

●●●

●●

● ●●

●●●●●●

●●

●●

●●

●●

●●●●

● ● ●●

●●

●●● ●●●● ●● ●

●● ●●

●●

●●●●●●● ●● ●

●● ●

●●●●●● ●●●●●●●●

●●

● ●●● ●●●

● ●●●

● ● ●●●● ●●●● ●

●●

●●●

● ● ●● ●●● ●

●●●

●●●● ●

●●

●● ●●●

●●

●●

● ●●●● ●●

●●

●●

●●●

●●

●●

●●●

●● ●●● ●

●●

●●●

●● ●●●●

●●

●● ●●●

● ●●●

●●●●

●●●

● ●●●● ●

●●●●

● ●●●● ●●● ●

● ●● ●●●●

● ●●●

●●● ● ●●

● ●

●●●

●● ●

●● ●● ●●● ●● ●

●●

● ●●●●

●● ●●● ●● ●●

●●

●●

● ●●●

●●

●●

●● ●

●● ●●

●●

●●●●

●● ●● ●●

●●●●●

●●● ●●●●●●

●●●

●●●

●● ● ●●●●●

●●

●●●

●●

● ●●

●●

●●●●●

●● ●●●

● ●●●

●●

●●●

●● ●● ●●● ●● ●

●●

● ●●

●●

●●● ●

●●●

●●●

●●● ●

● ●●● ●●

●●

●●

●●●

● ●●● ●

● ●

●●

● ●

●●●

● ●

●●

● ●● ●●

● ●● ●●

●●●

●●

●●

●●● ●●

●●●

●●● ●

●●●●

●●●● ●

●●●●● ●●

●●

●● ●●●●

●●●●●

●● ●●●● ● ●●●

●●● ●●

●● ●

●●

● ●●● ●● ●●● ●

●● ●●●●●● ●

●●

●●●

● ● ●●●●●

●●●

● ●●● ●●

●●

● ●●●

●● ●

●●● ● ●● ●

●● ●

●●●●

●●● ● ●

●●●

●●●● ● ●●● ●●●●

●●●● ● ● ●

●●● ●

●●●●●●● ●

●● ●● ●

●●● ●● ●

●●

●● ●●●

●● ●●

●●

●●

●●●

● ●

●●

●●

●●●●

●●

●●●

●●●

●●●● ●●● ●

●●●

●●

● ●●

● ●

● ●●● ●

●●

●●● ●●● ●●

●● ●●●●● ●● ●●

●●

●●●

●●

●●●● ● ●●

● ● ●● ● ●●

● ●●●●●●

●● ●

●●●●

●●● ●●● ●

●●

●●●

●●●●

●●

●●

● ●●

●●● ●●

●● ●●

●●●●

●●●

● ● ●●

●●●●

●● ●●●

●●●●

●● ●● ●●●

●●

●●●●●● ●●●●●

●● ●

● ●● ●●● ●●

●● ●●● ●

● ●●●

● ●●● ●●

●●

●●● ●● ●

●● ●●●

● ●●●●●●●

●●● ●●●● ●●●● ●●●●● ●

●●● ●●●

●●●●●● ●● ●● ●●

●●●●●● ●

● ● ● ● ●●●

● ●●

●●

●●●● ●● ●

●●

●●●●●

●●

●●●

●●●

●●●●●●●● ●

●●

●● ●

● ●● ●●●●●●

●●

●● ● ● ●●●

●●

●●●

●● ●●

●●●

● ●●●●

●●

●●●

●●

●● ●●●●●●

●●

●●

●●

●●● ●● ● ●● ● ●●●

● ●● ●

●●● ●●

●● ●

● ●●● ●

● ●●●● ●● ●

● ●●●●

●●●●

● ●●●

● ●●●●●

●●

●●●

●●● ●●

●●●

● ●●

●● ●

●●●●●●

●●

●●

●●

●●

●●

● ●●●●●●

●● ●●●●●

●● ●● ● ●●●●

●●●● ●

●●●● ●

●●

●●●

●● ●● ●●

●●●●

●● ●●●

●●

●● ●●●●

●●

●●●

●●●

●● ●●●

●● ●●

●● ●●

●● ●●● ●●

●● ● ●● ●●●● ●●

●●

●●

●●● ●● ●●● ●

● ●● ●●●

●●●

●● ●● ●● ●●●● ● ●●● ●● ● ●●● ●

●●●● ●● ●

●●●●

●● ● ●●● ●●●●●●

●●●

●●●

●●●

● ● ●

●● ●

●●

●●

●●

●● ●●

●●

●● ●●● ● ●●

●●● ● ●●● ●● ●

●●

●● ●●

●●

●●●

●●●●● ●● ●●

●●

●● ● ●● ●

●●

●●● ●

●● ●●

●● ●●●●

●●

●● ●● ●●

●● ●

●● ●

●●

●●● ●●● ●

●● ●●

●● ●●●●

● ●●

●●●

●●●●

●●

● ●●●●

●● ● ●● ●●●

●●● ●

●●

●●

●●

● ●●●●

●●● ●● ●● ●●● ●●

●●●

●●● ●●●●●●

●● ●●

●●●

●●

●●

● ●●

●●●●●● ●

●●● ● ●●●● ●

●● ●● ●●

●●●

●●

● ●● ●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●

● ● ●

●●●

●●●● ●●

● ●●●●

● ●

●●● ●●

●●●

●● ●

●●

● ●

●●●

●●● ●●

●●

●●●● ●●

● ● ●●● ●

●● ●●●

● ●

●●● ●● ●● ●●

●●●●●●

● ●●●●● ●

●●●

●●● ●● ●

●● ●

● ●●●●● ● ●●● ●●●● ●

●●

● ●●

●●●● ●●●● ●●

●● ●●●

●●● ●● ● ● ●

●●● ●●

●●

●●

● ● ●● ●●●

●●●

● ●●● ●●●●

●●●

● ● ●●

● ●●●

●●●● ● ●

●● ●

● ●● ●● ● ●

●●● ●

●●● ● ●● ●

●●●●●●●●●●

●● ●

●● ●●● ●● ●●

●●● ● ●●

●● ●● ●●

●●

●●

● ●●●

● ●● ●●●

● ●●

●●●● ●●●● ●●●●

●●

●●● ●

●●●●

●● ●●●

●● ●●

●●●● ●

● ●●

●●

●●

● ●●

●● ●● ●●

● ●

●●●

●●

● ●●● ●●●

● ●●●●

●●

●●●●●●

● ●●● ●●

●●●

●● ●●● ●●

●●●●

●●●

● ●●● ●●● ●●● ●●● ●●●●●●●●

●●● ●●●

●●●● ●●● ●

●●●

● ●●

● ●

●● ●●

●●

●● ●● ●● ●●●

●●●●● ● ●● ●●●●

●● ●● ●

●●

● ●●●●●● ●

●●●●

● ●●● ●●

●● ●

●● ●●●●

●●● ●●

● ●●

●●●

●●

●● ●●● ●●●●

● ●● ●

● ●●●● ●● ●●●● ●●

● ●●●●

●●●● ●

●●●●●

● ● ●●●

●● ●● ●●●

●● ●●●●●●

●● ●

●●●●

●●

●●● ●●●● ●●●

●●●●

●●●●

●●

●● ●●

●●●●●●● ●

● ●

●●●

● ●●●

●● ●●

● ●

● ●

●●●●●● ●

●● ●

●●

●●●

● ●

●●● ●●

●● ●●●●

●●● ● ●●● ●

●●●

●● ●● ●●●

●●● ●

●●● ●

● ●●●●● ●●● ●● ●●

●●●●● ●

●● ● ●●●

●●

● ●

●● ●●

●●

●●●●

● ●●●● ●●●●●●

●●● ●●● ●●

●●●●● ●● ●●● ●

●●●

●● ●● ●●

●●●

●●

●●●●●

● ●●●●● ●●●

● ●● ●

●●●●●●

●●●●●

●●●●

●● ●●●

●● ●●●● ●●●

●●

●●● ●●●●

●●●●

●●● ●●

● ●●●

●●●

●● ●●●

● ●●●●●

●●●●●●

●●●●● ●

●●

● ●●●●

●● ●●

●●

●●

●●

●●●

●●

● ●●●●

●● ●

●●●●

●●●●●

●●

●● ●●●●

●●● ●

●●●●● ●

●●●

● ●

● ●●●

●● ●●

● ●

●●

●●● ●●●●●

●●●

● ●●●●●●● ●●

●● ●

●● ● ●● ●

●●

● ●●

●●● ●

●●

●●

●●●●

●●●● ●●

●●

●●

●●●

●● ●

●●●●● ●

●●●●

●● ●●●●● ●●

●●

●●● ●

● ●

●●●● ● ●●●

●●

●●● ● ●

● ●●

●●●

●●● ●●●●●

●● ●●●

●●● ●

● ●● ●●●

●● ●●

●●●

●●

● ●● ●●

●● ●●

●●

●●

●● ●

●● ●●

●●

●●●●●● ●●

● ●●●

●●

● ●●●●●● ●●●● ●●●

● ●●●● ●●

●●

● ●●

●●●●●●●● ●●

●●

●● ●● ●●●●

●●●

●●●● ●●● ●

● ●●

●●● ● ●●

●●

●●●●

●●

●●

●●

●●

● ●●●

●●

● ●● ●●

●●

●● ●● ●● ●●●● ●● ●●● ●● ●●● ● ●●

●●●●

●●●●

●●

●●

●●

●● ●●● ●

●●

●●

●●

●●● ●

● ●●

●●●●

●●● ●●●● ●

●●●

●●● ● ● ●●

●●●●●

● ●

● ●●●●

●●

●●●

● ●● ●●

●●● ● ●●●●

● ●●

● ●●

●● ●● ●●●●

● ●●●●● ●● ●

●●

●● ●●●

●● ● ●

●● ●

●●

● ●●

●● ● ●●

●●● ●●● ●● ●● ●

● ●●●

● ●●

●●●

●●●

●●●

●●

●●

●●

●● ●●

● ●● ●● ● ●●

● ●

●●● ● ●●● ●

●● ●● ●●●●●● ●

●●●●●●

●●●

0 2 4 6 8 10 12

01

23

45

6

grey cor=−0.023, p=0.14

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

● ●

● ●●●

●●

●●

●●

●●

●●

●●

● ●

●●

● ●

●●

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

pink cor=−0.066, p=0.36

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●● ●

●●

● ● ●

●●●

●●

●●

●● ●●

●●

● ●●

●● ●

●● ●●

●●

●● ●

●● ●

●●●

●● ● ●

●●●

● ●●

● ● ●●

●●●●

●●

● ●

●●

● ●

●● ●

●●●

●● ●

●●

●●

●●

● ●● ●

● ●

● ●

●●

●● ●

● ●● ●● ●●●

● ● ●● ●●●

● ●●●●●●

● ● ●●

●●

●● ●● ●●● ●

●●

●●●

●●●●

●●●

●●

● ●

●●

● ●

●●

●● ● ●●

●●●

●●

● ● ●

●●●●

● ● ●●●

●● ●●●

●●

● ●● ●● ●

●● ●●

●●●●

● ●

●●●●

● ●●

● ●

●●

●●

●● ●

●●

●●

●●

●●● ●● ● ●● ● ● ●●

● ●● ●

●●

● ●●

● ●●●

● ●

● ●

●●●

● ●

● ●

●●

●● ●

●●

● ●●●●●

●● ●

● ● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●●●

●● ●● ●●●

●●● ●

●●●● ●●

●●

●●

● ●●

●●

● ●● ● ●●●

●●

●●

●●

●● ● ●

●●●●

● ●

●●

●● ● ●

●●

●●

● ●●

● ● ●

●●● ●

● ● ● ● ●● ●●● ●●

●●

●●

●●●

●●

●● ●

●● ●

●●

●● ●

●●●

●●

● ● ●●●

●● ●

● ●

●●●●

●●

●●

● ●●

●●

●●●

●● ●● ●

● ●●●

● ●●

●●●

●●

●●●●●

● ●

●●

● ● ●●●

● ●

●●

● ●●

●● ●

● ●●

● ●●● ●

●●

●● ●

●●●

●●

● ●●

●●

●● ●●

●●●●

●● ●

●●

●●

●●

●● ●●

● ● ●● ●

●●

●●

●●●●

● ●

●●

●●

● ●●●

●● ●

●●●● ●

● ● ●●

●●

● ● ●●● ●

●●

●●

●●

● ●

●●●●

● ●●

● ●

● ● ●●

●●●●●●

●●

●● ●●●

● ●●

●●

● ●●●

● ●●

●●

● ●●●● ●● ●●

● ●●

●●

●● ●● ●●●

● ●

●● ●

●●●

●●●

●●● ●●●●●

●●

●●

●●●

●● ●●

●●● ●●

●●●

●●

●● ●

●●

●●●

● ●● ●●● ●● ●● ●

●●●

● ●

●●

●●

●●

●●●

●●● ●

●● ●

● ●● ●

●● ●

● ●

●●

●●● ●

●●

●●●

●● ●

● ●

●●●

●● ●● ●●

● ● ●

● ● ●

●●

● ●● ●

● ●

●●

●● ● ●●

● ● ● ●● ● ●●●

●● ●

●●●●

●●● ●●●●

●●

●●

●●●● ●

●● ●

●●●

●● ●

●● ●● ● ●●●

● ●

●●

●●● ●

●●●●●

●●●

●●● ● ●

●●

●● ●● ● ●

●●

●●●

● ●●

●●

● ●

●●

●●●

●●●●●●

●●

50 100 150 200 250

01

23

45

turquoise cor=−0.0093, p=0.75

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●

●●

● ●●

● ●

●●

● ●

● ●

● ●

●●

●●

●●

●●

● ●●●

●●

●●

●●● ●●

●●

●●

●●

●●

5 10 15 20 25 30

0.0

0.5

1.0

1.5

magenta cor=0.11, p=0.19

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●

●● ●

●●

●●●●●

●●●

●●● ●

●●

●●●

●●

●●

●●●●

●●

●●

●●

● ●●

●●

●●

● ●●●

●●

●●● ●

●●

●●

● ●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●

●●● ●● ●●●●

●●

●●

●●

●● ●●

●●

●●

●●●

●● ●

●●

●●

●●

● ●●

●●●●

●● ●

●●●

●● ●●

●●

●●

● ●●

● ●

●●

●●

●● ●

●● ●

●●

● ●

●●

●●

●●●●

●●

●● ●

●●

●● ●●

●●●

●●●

●●●

●●

●●

●●● ●

●● ●●●●

●●● ●

●● ●●●● ● ●

●● ● ●

● ●

●●

●●

●●●●●

●●

●● ●●

● ●●

● ●●

●●

● ●

●●

●●

●●

● ●●●● ●●●

●●●

●●●●●

●●● ● ●

●●

● ●●

●●●

●●

●●

●●●

●●

●●● ●●

● ●

●●

●●

●●

●●

●●●

●●

● ●

●●

●●

●●●

●●

●●

● ●●

● ●●●

●●

●●●●

●●

●●

●● ●●● ●

● ●

●●● ● ●

●●

●●●

●●●

●● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

0 10 20 30 40

0.0

0.5

1.0

1.5

2.0

2.5

red cor=−0.09, p=0.036

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●

●●

●●●

●●●●●●

●●

● ●●●

●●● ●●●

●●●●●

●●

● ●●

●●●●●

●●

● ●

●●

●●

●●●●

●●

●●

●●

●●●

●● ●●●● ●● ● ●●●●

●● ●

●● ●

●●

●●●● ●●

●●

●● ●●

●●● ●●

●● ● ●●●

●● ●

●●

●●

● ● ●● ● ●

●● ●

●●●

●●

●●●

●●

●●● ●●●●

●●

● ●

● ●

●● ●●

●●

●●

●●

●●

● ●

●●●●

●●

● ●

●●

●●●

● ●●●

●●

●●

● ●● ●●

●●

●●●

●●

●●

●●

●●

●●

● ●●●

●●●

●●

●●

●●

●●

●●

●● ●

●●●● ●

●●

●●●

●●● ● ●●●●

●●

● ●

●●

●●

●● ●

●●

●●● ●

●●

●●

● ●●●

●●●●●● ●

●●

● ●●●●●● ●● ●●

● ●●

●●●

●● ●

●●●

●●●

●●●

●●●●●● ●● ●●● ●

●●●●

●●●

● ●●

●● ●●●

● ●

●●

●●● ●● ●●

●●

●●●

●●

●●●

●●●●

●●

●●

●●●●●●

●●

●●

●●● ●

●●●

● ●●●● ●●●●● ●

●●● ●

●● ●● ●

●●

●●●●●

● ●●

●●●

●●●●

●●●

●●●●●

●●●●

●● ●●● ●●●

●● ●

●●●●●

●●

●●●

●●● ●

● ●●

●●●

●●●●●●

●●●

●●

●●●●

●● ●● ●

●●●● ●●●●●

●●

●●

●●●

●● ●

●●

●●

●●●

●●●

●●

●●●

●●

●●●

●●● ●●●

●●●

● ● ●●

● ●● ●●● ●● ●●●

●●●●●● ●●●

●●●

●●

●●

● ● ●●

●●

●● ●

●●

● ●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●

● ●

●●

●●●●● ● ●

●●●●

● ●●

●●● ●●● ●●

●●

●●

● ●

●●● ●●●

●●●● ●

●●

●●

●●

● ●● ●

●●● ●●

● ●●

● ●●

●●●●

● ●

●●

●●● ●●

●●

●●●

●●

●●●

●●●●●

● ●●●●●

●●●

●●●

●●●

●●●

●●

●● ●●●

●●

● ●

●●

●●

●●

●●

● ●●

●●●● ●● ●● ●●

●● ●

●●●

●● ●●● ●

● ●●●● ● ●●●

●●

● ●●

●●●

●● ●●●●

●●●● ●●●

●● ●

● ●●●●

●●

●●

●●

●●

●● ●●

●●

●●●● ●●

● ●

●●

● ●●

●●●●

● ● ●●●● ●●●

●●●

● ●●

● ●

●●

0 5 10 15 20 25

01

23

4

blue cor=0.28, p=2.3e−22

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●●

●●

●●

●●

● ●

●●

● ●

●● ●●

●●

●●

● ●

●●

●●

●●

0 2 4 6 80

12

34

tan cor=0.5, p=1e−07

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce●● ●

●●

● ●

●●●

●● ●

●●

●●●●●

● ●●

● ●

●●

●●●● ●●

●● ●●

●●

● ●

●●

●●●●

● ●●

●●●●●● ●●●

●●

● ●●●

●● ●

●●●

● ●

●● ●●●●

● ● ●●●●

●●●

● ●●●

● ●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●● ●●

● ●●●

●●

●●

●●●●●●

●●

●●

●●

●●

● ●●

●●●

●●●● ●

● ●●

●●● ●

● ●

●●

●● ●●

●●

●●●

● ●●●●

●●

●●

●●

●●

●●

●● ●●●

●●

●●●

●●●

●●

●●

●●

● ●

●●

● ●●

●●

●● ●

● ●●

●●

●●● ●●

●●● ●

●●● ●●●●

●●●●

●● ●

●●

● ● ●●

●●●

●●●●

●●●

●●●

●●●

●● ●

●●●●● ●●●

●●

●●

●●

●● ●

●●● ●

●●

●●

●●●

●● ●

●●●

●●●●●

●●●

●●

●●

●●● ●●●●

●● ●

●●

● ●

●●

●●

● ●●●●●

●●●●

●●

●●

● ●●●●●●

●●

●●

● ● ●

●● ●●

●●

● ● ●●●●●

● ●●●

● ●

●●●●

●● ●●

●●●

●●

●●

●●

●●●

● ●●●

● ●●

●●

●●●

● ●

●●

●●

●●

●●●

● ●●●

●●●●●

●●

●● ●●

●●●

●●

●●

● ●●

●●

●●●● ●●

●●

●●●● ●

● ●●

●●

●●●

●●

● ●●● ●

● ●

●●●●●

0 20 40 60 80

01

23

45

6

brown cor=0.61, p=5.9e−79

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●

●●

●●

● ●

●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●●

●●

●●

●●

●●

●●●

● ●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●●

●●

●●

●●●●

●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●

● ●●●

●●●

●●

●●●

● ●

●●

●●

●●

● ●

● ●

● ●●

●●●

● ●

● ●●

●●

● ●●●

●●

●●

●●

●●

● ●

●●●

●●●

● ● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●

0 5 10 15 20 25 30

0.0

0.5

1.0

1.5

2.0

2.5

black cor=0.24, p=2.6e−06

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●

● ●

●●●

● ●

●●●

●● ●

● ●●

● ●

●●

●●

●●

● ●

●●●

●●

● ●

●●

●●

●●

● ●

●●●

●●

2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

greenyellow cor=−0.13, p=0.14

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●●

● ●● ●

●●

●●

●●●

●●

●●●

●●

● ●●

● ●●●

●●

●●

●●●●

●●

● ● ●

● ●●

●●

●●●● ●

●●●

● ● ● ●●

●● ●●●●

●● ●

●●●●

●●

●●

●●

● ●

●●

●●● ● ●●

●●

●●●

● ●

●●

●●

●●

● ●

● ●

● ●

●●

●● ● ●● ●●

●●

●●

●●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

● ●●

● ●●

●●● ●

●●

●●

●●

●●●●

●●●

●●

● ●

●●

●●●●●

● ● ●● ●●

●●

● ●

●●

● ●

●● ●

●●

●●●

●●● ●●

● ●●

● ●●●

●●

●●

●●

●●

●●● ●●●

●●●

●●

● ●

●●

●●

●●●

●● ●●

●●

●●

● ●●

●●●

●●

●●●●●

● ●

● ●●● ●

●●

●●

●●

●●● ●●

●● ●●

●●●

●●

●●

● ●●

●● ●

●●

●●

●●●

● ●● ●

●●

● ●● ●

●●

● ●●

● ●

●●

●●●

● ●●

●●

● ●●

●●

● ●

●●

●● ●

●●●●

● ●

●●●

●●

● ●● ●●● ●●

●●

●●

●●

● ●

●●

●●

●●

●● ●

● ●

●●●●

●●●●● ●

●●

● ●●●

●●●

●●●●

● ●●

●●

●●

● ●

●●●

●●●

●●●

●●

●● ●

●●

0 10 20 30 40 50 60

0.0

0.5

1.0

1.5

yellow cor=−0.044, p=0.27

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●●

●●

● ●● ●

●●●

●●

●●●●

● ●

●●

●● ●

● ●●

●●●●

●●

●●●

●●

● ●●

●●

●●●

●● ●

●● ●●● ●

●●●●

●●

●●

●●

●●

● ●●●

●●

●●

●● ●● ● ●●

● ● ●●

●●

●●

●● ●

● ●●● ●

●●

● ●●

●●

● ●

●●●

●●

●●

●●

●●

●● ●●

● ●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●

●●●

● ●●●●●

● ●● ● ●●

●●●

●● ●

●●

●●●●

●●

●●

● ●

●●●

● ●●●

●●●

● ●●

●●●

●●

●●●

● ●

●●●

●●●●

● ●●●●●

● ●

●●

● ●

●●

●●

●● ● ●●

●●●

● ●●●

●●

● ●●

●●●●

●●

● ● ●●●

●●

●●●●

● ●

●●

●● ●● ●

●●●

●●

●●●

● ●

● ●●

● ●● ●

●●

●●

●● ●●

● ●●●

●●

● ●●● ●

● ●

●●●● ●

●● ●●

●●●

●●●●

●●

●● ●● ●

●● ● ●

●●

●●●

●●

● ● ●

●●

● ●

●●

●●● ●●●●●

●●

●●

●●

● ●●

●●

●●●●

●●● ●

●●

● ● ●● ●

●●

●●

●● ● ●

● ●

●●

0 5 10 20 30

0.0

0.5

1.0

1.5

2.0

green cor=−0.079, p=0.054

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

●●

●●

●●

● ●

●●

●●

● ●

● ●

●●

● ●

● ●

●●

●●

●●●

●●

●●

●● ●

● ●

●●

●●

●●

●● ●

●●

●●●●

● ●

● ●

2 4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

0.5

0.6

purple cor=−0.094, p=0.27

Connectivity

Gen

e C

OE−C

OE

W S

igni

fican

ce

 

Figure  8   Intramodular  connectivity  and  module  significance.  

Intramodular  connectivity  measures  how  connected,  or  co-­‐expressed,  

a  given  node  is  with  respect  to  the  nodes  of  a  particular  module.  It  is  

the  connectivity  in  the  subnetwork  defined  by  the  module.    

 

 

 

Page 35: Visualization hang zhong

35    

 

 

Figure  9   STRING  protein  network.      

The  edge  colors  represent  different  evidences.  Neighborhood:  green;  

Gene   Fusion:   red;   Coocurrence:   blue;   Coexpression:   black;  

Experimental:   magenta;   Databases:   cyan;   Textmining:   greenyellow;  

Homology:  light-­‐blue.    

 

Figure  10   Labeling  in  weighted  network.    

Different   labelings   in   the   network   represent   different   data.   Node  

color:  module  color;  node  border  color:   significant  clusters   in  STEM;  

node  shape:  significant  genes  in  moderated  F  test  are  diamond  shape,  

Page 36: Visualization hang zhong

36    

while   not   significant   genes   are   round   shape;   node   label   color:   ASM  

candidate  genes  are  blue,  Heart  candidate  genes  are  red.  

 

Figure  11   The   1st   module   inferred   by   AllegroMCODE   for  

unweighted  co-­‐expression  network.    

 

 

Page 37: Visualization hang zhong

37    

Figure  12   The   1st   module   of   unweighted   co-­‐expression   network  

enrichment.  

 

Figure  13   The  1st  module  inferred  by  AllegroMCODE  for  weighted  

co-­‐expression  network.    

 

Figure  14   The  1st  module  of  weighted  network  enrichment.    

Page 38: Visualization hang zhong

38    

 

Figure  15   Ribosome  group  in  the  String.    

 

Differential GO-term Distribution

Test Set Reference Set

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0 4 5 5 0 5 5 6 0 6 5 7 0 7 5 8 0 8 5 9 0 9 5% Sequences

ribosomestructural constituent of ribosome

translationribonucleoprotein complexstructural molecule activity

cytosolic ribosomecytosolic part

small ribosomal subunittranslational elongation

cellular protein metabolic processgene expression

cellular macromolecule biosynthetic processmacromolecule biosynthetic process

cytosolic small ribosomal subunitendocrine pancreas development

translational terminationprotein metabolic process

non-membrane-bounded organelleintracellular non-membrane-bounded organelle

macromolecular complexcellular protein complex disassembly

cellular macromolecular complex disassemblyprotein complex disassembly

endocrine system developmentmacromolecular complex disassembly

cellular biosynthetic processpancreas development

viral genome expressionviral transcription

viral infectious cyclecellular component disassembly

viral reproductive processbiosynthetic process

cytosolreproductive cellular process

cellular macromolecule metabolic processviral reproduction

cytoplasmic partmacromolecule metabolic process

large ribosomal subunitreproduction

ribosome biogenesismacromolecular complex subunit organization

cellular macromolecular complex subunit organizationreproductive process

rRNA metabolic processrRNA processing

cytosolic large ribosomal subunitcytoplasm

cellular metabolic processrRNA binding

ribonucleoprotein complex biogenesisprimary metabolic process

RNA bindingribosomal small subunit biogenesis

ncRNA processingdevelopmental process

intracellular organelleorganelle

multicellular organismal developmentncRNA metabolic process

system developmentorgan development

erythrocyte homeostasisintracellular

metabolic processcellular component biogenesis

GO

Te

rms

 

Figure  16   Ribosome  group  in  STRING  network  enrichment.    

Page 39: Visualization hang zhong

39    

 

Figure  17   Ribosome  group  and  COE.    

 

 

Figure  18   Grey  color  genes.    

Page 40: Visualization hang zhong

40    

 

Figure  19   Tan  module    

 

 

Figure  20   Brown  module    

Page 41: Visualization hang zhong

41    

 

Figure  21   Turquoise  module  enrichment.    

 

Figure  22   Genes  in  turquoise  plus  STEM  condition.    

Page 42: Visualization hang zhong

42    

 

Figure  23   Genes  of  Turquoise  plus  STEM  condition  enrichment.    

 

Figure  24   Sub-­‐group  of  candidate  genes  in  unweighted  network.    

Page 43: Visualization hang zhong

43    

Differential GO-term Distribution

Test Set Reference Set

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5 25.0 27.5 30.0 32.5 35.0 37.5 40.0 42.5% Sequences

cardiac muscle tissue developmentheart process

heart contractionpositive regulation of heart contraction

striated muscle tissue developmentmyofibril assembly

actomyosin structure organizationmuscle contraction

muscle tissue developmentcardiac cell differentiation

muscle system processactin filament-based movement

circulatory system processblood circulation

regulation of heart contractionheart development

muscle structure developmentcellular component assembly involved ...

striated muscle cell developmentsarcomere

contractile fiber partmuscle cell development

heart morphogenesismyofibril

contractile fiberstriated muscle cell differentiation

system processanatomical structure formation involved ...

striated muscle thin filamentsarcomere organization

cellular component morphogenesiscardiac myofibril assembly

stress fiberactin cytoskeleton organization

muscle cell differentiationmuscle organ development

positive regulation of multicellular organismal processactin cytoskeleton

positive regulation of cell adhesioncardiac cell development

cardiac muscle cell developmentactin filament-based process

GO

Te

rms

 

Figure  25   Sub-­‐group   of   candidate   genes   in   unweighted   network  

enrichment.    

 

 

Figure  26   ASM  candidate  genes  in  weighted  network  enrichment.    

 

Page 44: Visualization hang zhong

44    

 

Figure  27   ASM  and  Heart  candidate  genes  

Part  A  illustrates  the  generation  of  ASM  and  Heart  cells  from  TVC.  Part  

B  summerizes  different  temporal  expression  groups  of  ASM  and  Heart  

candidate  genes,  with  the  count  numbers  and  known  markes.  Arrows  

represent  the  trend  of  their  temporal  expression.    

 

 

 

 

 

 

 

Page 45: Visualization hang zhong

45    

References  

1.  BARABASI,  A.  and  BONABEAU,  E.,  2003.  Scale-­‐free  networks.  Scientific  American,  288(5),  pp.  60-­‐69.  

2.  BARABASI,  A.  and  OLTVAI,  Z.,  2004.  Network  biology:  Understanding  the  cell's  functional  organization.  Nature  Reviews  Genetics,  5(2),  pp.  101-­‐U15.  

3.  CHRISTIAEN,  L.,  DAVIDSON,  B.,  KAWASHIMA,  T.,  POWELL,  W.,  NOLLA,  H.,  VRANIZAN,  K.  and  LEVINE,  M.,  2008.  The  transcription/migration  interface  in  heart  precursors  of  Ciona  intestinalis.  Science,  320(5881),  pp.  1349-­‐1352.  

4.  CONESA,  A.,  GTZ,  S.,  GARCA-­‐GMEZ,  J.,  TEROL,  J.,  TALN,  M.  and  ROBLES,  M.,  2005.  Blast2GO:  a  universal  tool  for  annotation,  visualization  and  analysis  in  functional  genomics  research.  Oxford:  Oxford  University  Press.  

5.  COSTA,  I.,  SCHNHUTH,  A.  and  SCHLIEP,  A.,  2005.  The  Graphical  Query  Language:  a  tool  for  analysis  of  gene  expression  time-­‐courses.  Oxford:  Oxford  University  Press.  

6.  DAVIDSON,  B.,  2007.  Ciona  intestinalis  as  a  model  for  cardiac  development.  London,  UK:  Academic  Press.  

7.  EISEN,  M.B.,  SPELLMAN,  P.T.,  BROWN,  P.O.  and  BOTSTEIN,  D.,  1998.  Cluster  analysis  and  display  of  genome-­‐wide  expression  patterns.  Washington,  D.C.:  National  Academy  of  Sciences.  

8.  ERNST,  J.  and  BAR  JOSEPH,  Z.,  2006.  STEM:  a  tool  for  the  analysis  of  short  time  series  gene  expression  data.  London:  BioMed  Central.  

9.  ERNST,  J.,  NAU,  G.  and  BAR  JOSEPH,  Z.,  2005.  Clustering  short  time  series  gene  expression  data.  Oxford:  Oxford  University  Press.  

10.  FREEMAN,  T.,  GOLDOVSKY,  L.,  BROSCH,  M.,  VAN  DONGEN,  S.,  MAZIRE,  P.,  GROCOCK,  R.,  FREILICH,  S.,  THORNTON,  J.  and  ENRIGHT,  A.,  2007.  Construction,  visualisation,  and  clustering  of  transcription  networks  from  microarray  expression  data.  San  Francisco,  CA:  Public  Library  of  Science.  

11.  GENTLEMAN,  R.,  2005.  Bioinformatics  and  Computational  Biology  Solutions  Using  R  and  Bioconductor.  New  York:  Springer-­‐Verlag.  

Page 46: Visualization hang zhong

46    

12.  GREEN,  Y.  and  VETTER,  M.,  2011.  EBF  proteins  participate  in  transcriptional  regulation  of  Xenopus  muscle  development.  San  Diego  [etc.]:  Academic  Press.  

13.  HORVATH,  S.,  2011.  Weighted  Network  Analysis  :  Applications  in  Genomics  and  Systems  Biology.  New  York:  Springer.  

14.  KAUFFMANN,  A.,  GENTLEMAN,  R.  and  HUBER,  W.,  2009.  arrayQualityMetrics-­‐-­‐a  bioconductor  package  for  quality  assessment  of  microarray  data.  Oxford:  Oxford  University  Press.  

15.  LANGFELDER,  P.  and  HORVATH,  S.,  2008.  WGCNA:  an  R  package  for  weighted  correlation  network  analysis.  Bmc  Bioinformatics,  9,  pp.  559.  

16.  LANGFELDER,  P.,  ZHANG,  B.  and  HORVATH,  S.,  2008.  Defining  clusters  from  a  hierarchical  cluster  tree:  the  Dynamic  Tree  Cut  package  for  R.  Oxford:  Oxford  University  Press.  

17.  QIAN,  J.,  DOLLED  FILHART,  M.,  LIN,  J.,  YU,  H.  and  GERSTEIN,  M.,  2001.  Beyond  synexpression  relationships:  local  clustering  of  time-­‐shifted  and  inverted  gene  expression  profiles  identifies  new,  biologically  relevant  interactions.  London,:  Academic  Press.  

18.  SHANNON,  P.,  MARKIEL,  A.,  OZIER,  O.,  BALIGA,  N.,  WANG,  J.,  RAMAGE,  D.,  AMIN,  N.,  SCHWIKOWSKI,  B.  and  IDEKER,  T.,  2003.  Cytoscape:  a  software  environment  for  integrated  models  of  biomolecular  interaction  networks.  Cold  Spring  Harbor,  N.Y.:  Cold  Spring  Harbor  Laboratory  Press.  

19.  SHANNON,  P.,  REISS,  D.,  BONNEAU,  R.  and  BALIGA,  N.,  2006.  The  Gaggle:  An  open-­‐source  software  system  for  integrating  bioinformatics  software  and  data  sources.  Bmc  Bioinformatics,  7,  pp.  176.  

20.  SMYTH,  G.,  2004.  Linear  models  and  empirical  bayes  methods  for  assessing  differential  expression  in  microarray  experiments.  [Berkeley,  CA]:  Berkeley  Electronic  Press.  

21.  STOLFI,  A.,  GAINOUS,  T.B.,  YOUNG,  J.J.,  MORI,  A.,  LEVINE,  M.  and  CHRISTIAEN,  L.,  2010.  Early  Chordate  Origins  of  the  Vertebrate  Second  Heart  Field.  Science,  329(5991),  pp.  565-­‐568.  

22.  SZKLARCZYK,  D.,  FRANCESCHINI,  A.,  KUHN,  M.,  SIMONOVIC,  M.,  ROTH,  A.,  MINGUEZ,  P.,  DOERKS,  T.,  STARK,  M.,  MULLER,  J.,  BORK,  P.,  JENSEN,  L.  and  VON  MERING,  C.,  2011.  The  STRING  database  in  2011:  

Page 47: Visualization hang zhong

47    

functional  interaction  networks  of  proteins,  globally  integrated  and  scored.  [London]:  Information  Retrieval  Ltd.  

23.  TAMAYO,  P.,  SLONIM,  D.,  MESIROV,  J.,  ZHU,  Q.,  KITAREEWAN,  S.,  DMITROVSKY,  E.,  LANDER,  E.S.  and  GOLUB,  T.R.,  1999.  Interpreting  patterns  of  gene  expression  with  self-­‐organizing  maps:  methods  and  application  to  hematopoietic  differentiation.  Washington,  D.C.:  National  Academy  of  Sciences.  

24.  TUSHER,  V.,  TIBSHIRANI,  R.  and  CHU,  G.,  2001.  Significance  analysis  of  microarrays  applied  to  the  ionizing  radiation  response.  Proceedings  of  the  National  Academy  of  Sciences  of  the  United  States  of  America,  98(9),  pp.  5116-­‐5121.  

25.  WETTENHALL,  J.  and  SMYTH,  G.,  2004.  limmaGUI:  A  graphical  user  interface  for  linear  modeling  of  microarray  data  RID  B-­‐5276-­‐2008.  Bioinformatics,  20(18),  pp.  3705-­‐3706.  

26.  ZHANG,  B.  and  HORVATH,  S.,  2005.  A  general  framework  for  weighted  gene  co-­‐expression  network  analysis.  Statistical  Applications  in  Genetics  and  Molecular  Biology,  4,  pp.  17.