26
Data Analy)cs for Personalized Medicine Aryya Gangopadhyay UMBC Presented at the 3 rd Interna7onal Conference on Personalized Medicine, June 2629, 2014

DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Embed Size (px)

DESCRIPTION

Presented at the 3rd International Conference on Personalized Medicine, June 26-29, 2014. Dr. Gangopadhyay is Chair of the Department of Information Systems at University of Maryland Baltimore County.

Citation preview

Page 1: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Data  Analy)cs  for  Personalized  Medicine  

Aryya  Gangopadhyay  UMBC  

Presented  at  the  3rd  Interna7onal  Conference  on  Personalized  Medicine,  

June  26-­‐29,  2014  

Page 2: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Scope  •  Big  data  promise  (Pentland  et  al  2013)  –  US  Healthcare  industry  can  save  $200  billion  per  year  

•  Need  complete  picture  –  Reality  mining  (MIT  Tech.  Review  2008)  –  Socio-­‐demographics  –  EMRs  –  Biological  data  

•  Interac7ons  in  the  network  –  Topology-­‐based  analysis  –  Centrality-­‐based  analysis  –  Perturba)ons  (diseases  as  network  perturba)ons:  del  Sol  et  al  

2010)    •  Network  par77oning  •  Visualiza7on  

Page 3: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

•  “Within  10  years  every  healthcare  consumer  will  be  surrounded  by  a  virtual  cloud  of  billions  of  data  points”  [Hood  et  al.  2013]  

 

Big  data  in  healthcare  

Page 4: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Interconnec)ons  

–  Biological  processes  are  interconnected  systems  –  Analyze  interac)ons  –  Resilient  against  random            perturba)ons  –  Vulnerable  to  targeted            aXacks  

CIDeR:  Large,  mul7-­‐dimensional,  mul7modal,  dynamic  

Page 5: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Extensions  to  our  previous  work  –  Updated  the  network  •  Nodes:  5168  to  9767  •  Edges:  14410  to  27744  

–  Previous  analysis  •  Network  characteris)cs:  CC,  diameter,  path  lengths,  etc.  •  Node-­‐based  analysis  

– Developed  a  new  method  for  iden)fying  effectors  and  receptors  

•  Perturba)on  analysis  – Extensions  •  How  do  we  par))on  the  network?  • What  criteria  to  use  and  why?  • What  are  the  effects  of  such  par))oning?  

Page 6: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Network  extracted  from  CIDeR:  2014  

•  Nodes:  9767  •  Edges:  27744  •  Diameter:  15  •  #  CC:  89  •  Avg.  PL:  4.7  •  Avg.  degree:  2.8  

Page 7: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Node  Centrality  measures:  correla)ons  

x  =  Authority  Y  =  Betweenness  Centrality  Correla)on:  0.8  

x  =  Clustering  Coefficient  Y  =  Betweenness  Centrality  Correla)on:  -­‐0.02  

x  =  Hub  Y  =  Authority  Correla)on:  0.88  

x  =  PageRank  Y  =  Authority  Correla)on:  0.92  

Page 8: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Correla)ons  of  Node  Centrality  measures  

Clustering.Coefficient  

Clustering.Coefficient  

Hub  

Hub  

Authority  

Authority  

PageRank  

PageRank  

Eigenvector.Centrality  

Eigenvector.  Centrality  

Betweenness.Centrality  

Betweenness.  Centrality  

Eccentricity  

Eccentricity  

Page 9: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Overall  network  characteris)cs  

•  PageRank,  hub  and  authority  scores  are  strongly  correlated  

•  Clustering  coefficient  is  nega)vely  correlated  with  other  node  centrality  measures  

•  Implica7ons:  1.  Nodes  that  are  strong  effectors  are  also  strong  receptors  2.  Less  central  nodes  are  not  connected  to  each  other  but  

mainly  with  an  influen)al  node  3.   Influen7al  nodes  are  mostly  connected  to  each  other  4.  Fully  connected  sub-­‐graphs  are  small  and  rare  

Page 10: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Par))oning  the  graph  

•  How  can  we  capture  the  above  characteris)cs?  •  Modularity:        •  The  objec)ve  is  to  maximize  Q    •  Intui)on:    –  Put  influen)al  nodes  in  separate  clusters  –  Create  dense  sub-­‐communi)es  (common  neighbors)    

•  Algorithms  (op)mal  solu)on  is  NP-­‐hard:  Brandes  2007):  –  Spectral  clustering  based  (Newman  2006)  – Greedy  algorithm    (Blondel  et  al.  2008)  

Q =12m

(Aij −did j2m

)i∈Cl , j∈Cl

∑l=1

k

Page 11: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Clusters  formed  by  maximizing  modularity  

Page 12: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Dendrogram  of  top  8  Disease  Clusters  

C  

C  

Page 13: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Cluster  100  

Nodes:  1177  Edges:  2122  

Page 14: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Cluster  82  

Nodes:  1200  Edges:  2554  

Page 15: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

K-­‐core  •  Objec7ve:  Restrict  analysis  to  regions  of  increased  centrality  and  connectedness  

•  K-­‐core:  largest  sub-­‐graph  where  all  nodes  have  a  minimum  degree  of  k  (Batagelj  2002).  

•  K=5  (mode=2  for  the  en)re  network)  •  Protein  Interac)on  Networks  (Wuchty  et  al  2005,  Hamelin  et  al  2008)  

Taken  from  Hamelin  et  al  2008  

Page 16: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

5-­‐core  graph:  color  code-­‐Type  

Page 17: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

5-­‐core  graph:  color-­‐code:  Modularity  class  

Page 18: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Disease  Clusters  (top  5)  dendrogram  

C  

C  

Page 19: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

5-­‐core  graph:  Cluster  5  (26%)  

Page 20: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

5-­‐core  graph:  Cluster  6  (22%)  

Page 21: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

5-­‐core  graph:  Cluster  0  (16%)  

Page 22: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

5-­‐core  graph:  Cluster  3  (13%)  

Page 23: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

5-­‐core  graph:  Cluster  4  (12.5%)  

Page 24: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

Comparison  of  clusters  

Page 25: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD

•  Contribu7ng  areas  •  Biology,  bioinforma)cs,  sociology,  SNA,  Physics,  applied  mathema)cs,  Computer  and  informa)on  sciences    

•  Summary  •  Holis)c    analysis  of  health  data  •  Analysis  based  on  node  centrality  •  Network  par))oning  •  Studying  the  effect  of  perturba)on  

•  Where  do  we  go  from  here  •  Create  a  taxonomic  structure  of  elements  and  interac)ons  •  Search  tool    •  Biological  and  clinical  implica)ons  

Conclusion  

Page 26: DataAnalytics for Personalized Medicine by Aryya Gangopadhyay, PhD