37
Copyright GeneGo 2000-2007 Andrej Bugrim Andrej Bugrim GeneGo, Inc. GeneGo, Inc. Protein scoring based on significance in biological networks

Andrej Bugrim GeneGo, Inc

  • Upload
    tania

  • View
    40

  • Download
    3

Embed Size (px)

DESCRIPTION

Andrej Bugrim GeneGo, Inc. Protein scoring based on significance in biological networks. Two problems of systems biology. How to reconstruct condition-specific networks in biologically robust way How to utilize reconstructed networks in day-to-day laboratory practice - PowerPoint PPT Presentation

Citation preview

Page 1: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Andrej BugrimAndrej BugrimGeneGo, Inc.GeneGo, Inc.

Protein scoring based on significance in biological networks

Page 2: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Two problems of systems biology

• How to reconstruct condition-specific networks in biologically robust way

• How to utilize reconstructed networks in day-to-day laboratory practice

Still need to answer questions centered on individual genes/proteins:– Which genes are most important for a

condition/disease?– What are the best drug targets?– What are the most robust biomarkers?

Page 3: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Sources of the problems

• Biological networks are very interconnected due to presence of hubs. Hubs almost always provide “shortest path” connectivity

• Multiple paths can be generated to connect a pair of nodes - no way to discriminate between alternative hypothesis

• Resulting networks are often large and biologically intractable. It is hard to understand roles of individual nodes

Page 4: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Some earlier solutions

• Use “canonical pathways” as basis for reconstruction– Limited to known pathways

• Penalize hubs when reconstructing networks– Does not discriminate between individual hubs

Page 5: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Our solution

Find nodes that are significant in providing connectivity in condition-

specific dataset

Page 6: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Finding topologically significant nodes

A

B C

Topologically significantTopologically significant Not topologically significant

4 out 6 under nodes regulated by B are differentially expressed: more than random

share = significant

Only 1 out of 6 nodes regulated by C is differentially expressed: could be due to random event

= not significant

In reality algorithm also considers nodes beyond first-degree neighbors

Differentially expressed genes Non-differentially expressed genes

Page 7: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Why JAK1 is significant in this dataset?

Regulation via JAK1

JAK1 provides essential network conduit between PLAUR and many differentially expressed targets of STAT1

Topological significance helps to find important links in pathways that do not come up on HT screens

Feedback loops

Page 8: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Node scoring algorithm

1. Let K be a set of experimentally-derived nodes of interest (e.g. nodes representing differentially expressed genes). K is the subset of the global network of size N.

2. Calculate shortest path network S by building directed paths from each node in K to other nodes in K, wherever possible. S is a subset of N and may contain nodes in addition to K. Also some nodes from K may become part of S

3. Lets consider node i є (S) and one of the nodes of the experimental set j є K.

4. Calculate the shortest path networks between j and every other node in the global network (N-1 pairs) and count how many of them contain i. This number is Nij < N-1.

5. Calculate the shortest path networks between j and all other nodes in the experimental set and count how many of them contain node i This number Kij < K-1.

6. The probability that node i would be present Kij-times or more in the shortest path networks of i by chance follows a hyper-geometric distribution:

7. Repeat the procedure for all nodes (j) in the subset, calculating K p-values for node i (pij), each of these values showing relevance of node i to individual members of the set K. As we want to identify the nodes which are statistically significant to at least one or more members of the experimental set we define the p-value associated with node i as the minimum of the pij values.

Page 9: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Algorithm validation: PSORIASIS

• Psoriasis is recognized as the most common T cell-mediated inflammatory disease in humans.

• Genetic linkage to as many as six distinct disease loci has been established but the molecular etiology and genetics remain unknown.

• To begin to identify psoriasis disease-related genes and construct in vivo pathways of the implicated processes, genome-wide expression screens of psoriasis patients need to be undertaken

• The disease-related gene map may provide new insights into the pathogenesis of psoriasis

Page 10: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Page 11: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Data

• 4 samples from 4 psoriasis patients were taken at 2 different times

– At the time of developed psoriatic lesion (P)– And at the time of its complete healing (N)

– The samples were taken from the same exact spot on the same patient, which eliminates a great deal of experimental bias and uncertainty.

• Affymetrix Human U95A microarray technology was then utilized to evaluate the expression data

• Only the differentially expressed genes between the sample from the lesion (P) and the from the normal (N) were then used for comprehensive analysis with new algorithm and in MetaCore 4.0

Page 12: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Algorithm validation

• As “experimental set” we use 266 differentially expressed genes identified in the paper

•The shortest path network connecting these genes is built using global network of protein interactions from MetaCore™. Statistical significance of each node in this network is calculated as described above

•To evaluate whether the nodes deemed significant by our method are indeed likely to be disease-related we perform automated search of PubMed abstracts for co-occurrence of corresponding gene name and word “psoriasis” for every gene in the shortest path network. Different statistical measures are plotted as function of node’s p-value

•Functional analysis of high-scored genes is performed in MetaCore™

Page 13: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Fraction of genes related to “psoriasis” scales with significance

Page 14: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

High-scoring nodes have higher fraction of psoriasis hits

Page 15: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Enrichment with psoriasis hits among differential genes

Page 16: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

No correlation with node degree

Page 17: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Page 18: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Functional analysis: GeneGo processes

Page 19: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Map Map Folders Cell process p-Value Genes

IFN gamma signaling pathway

Cell signaling/Immune responseFunction groups/Cyto/chemokines

cytokine and chemokine mediated signaling pathway, immune response

1.88E-26 32 63

Prolactin receptor signaling Function groups/Growth factorsFunction groups/Hormones

intracellular receptor-mediated signaling pathway, response to hormone stimulus

4.57E-24 30 62

Regulation of G1/S transition (part 2)

Cell signaling/Cell cycle control cell cycle 3.30E-22 22 33

Chemokines and adhesion Cell signaling/Cell adhesionFunction groups/Cyto/chemokines

cytokine and chemokine mediated signaling pathway, cell adhesion

6.46E-22 45 174

EGF signaling pathway Cell signaling/Growth and differentiation/Epidermal cell differentiationFunction groups/Growth factors

intracellular receptor-mediated signaling pathway, response to extracellular stimulus

4.89E-21 28 64

PDGF signaling via STATs and NF-kB

Cell signaling/Growth and differentiation/Growth and differentiation (common pathways)Function groups/Growth factors

intracellular receptor-mediated signaling pathway, response to extracellular stimulus

5.18E-21 23 40

IGF-RI signaling Cell signaling/Growth and differentiation/Growth and differentiation (common pathways)Function groups/Growth factors

intracellular receptor-mediated signaling pathway, response to

extracellular stimulus

1.63E-20 29 72

AKT signaling Function groups/Kinases protein kinase cascade 5.96E-19 25 57

TGF, WNT and cytoskeletal remodeling

Cell signaling/Cell adhesion cell adhesion 6.15E-19 45 204

Page 20: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Functional analysis: IFN-gamma map

Page 21: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

VEGF – key pathway identified!

Simonetti O, Lucarini G, Goteri G, Zizzi A, Biagini G, Lo Muzio L, Offidani A. VEGF is likely a key factor in the link between inflammation and angiogenesis in psoriasis: results of an immunohistochemical study. Int J Immunopathol Pharmacol. 2006 October-December;19(4):751-760

Page 22: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Glucocorticoid – another key pathway

Page 23: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Page 24: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Conclusions from algorithm validation

• High-scored nodes are significantly enriched in disease-related genes

• Important disease-related pathways are identified

• Important drug targets are highly scoed

Page 25: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Integration of genomic and proteomic sets

• LNCap prostate cell lines– Treated with Androgen– Untreated - control

• Data:– Proteomic data - ~ 70 proteins exclusively present in treated cells– Gene Expression profiling of Androgen-treated cells

• Analysis– Topological analysis of Androgen-specific protein network– Correlation between topologically significant nodes and gene expression– Functional analysis in MetaCore™– Network analysis in MetaCore™

Page 26: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Revealing regulation of LNCaP cells response to Androgen

by differentially expressed genes

by Androgen-specific proteins

by topologically significant node

Topologically significant nodes reveal regulation

Gene Expression and Proteomic

data reveal target pathways

Page 27: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Correlation between expression and significance

Among topologically significant genes the fraction of differentially expressed genes is high

P-value related to topological significance

P-va

lue

rela

ted

to d

iffer

enti

al e

xpre

ssio

n

Page 28: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Androgen receptor signaling

1- Differentially expressed gene

2 – Androgen-specific protein

3- Topologically significant node

Page 29: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Regulation of lipid Metabolism

Differentially expressed genes identified by microarray and confirmed by proteomic screen

Topologically significant nodes revealed by the new algorithm

Page 30: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Fatty acid metabolism: target pathway

Page 31: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Role of PBEF

Page 32: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Possible regulation of PBEF by AR

PBEF occurs in both, expression and proteomic datasets – possibly activated by androgen receptor via HIF1 or HNF4

Page 33: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Possible feedback from Insulin and IGF-1R back to AR

Page 34: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Conclusions

• Presented method allows assigning priority to nodes in biological networks built on condition-specific datasets

• The presented method is able to predominantly select genes with high relevance to condition of interest

• The presented method could be used for cross-validation of different datatypes, identification of novel drug targets and validation of existing targets

Page 35: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Putting it all together: network activity inference– Identifying causal relation between putative input and output signals– Tracking effects of molecular perturbation trough activation/inhibition

cascades

Z Z Z

Experimental data: start cascade

Experimental data: terminate cascade

Inferred activity

Experimental data

Predicted input

Predicted target

Scoring intermediary nodes

Page 36: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

“Druggable” network modules

Page 37: Andrej Bugrim GeneGo, Inc

Copyright GeneGo 2000-2007

Acknowledgements

GeneGoZoltan DezsoYuri NikolskyTatiana Nikolskaya

University of MichiganAdaikkalam VellaichamySaravana M DhanasekaranArun SreekumarArul ChinnaiyanGilbert Omenn