Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Machine Reading for
Cancer Panomics
Hoifung Poon
1
Overview
2
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
……
……
Disease Genes
Drug Targets
……KB
Cancer Systems Modeling
High-Throughput Data
3
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
……
……
Disease Genes
Drug Targets
…KB
Extract Pathways
from PubMed
Overview
High-Throughput Data
Grounded
Semantic Parsing
4
Panomics
5
… ATTCGGATATTTAAGGC …
Genome Transcriptome Epigenome
……
Genotype Phenotype
6
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
……
……
Disease Genes
Drug Targets
……
High-Throughput Data
?
Precision Medicine
8
Before Treatment 15 Weeks
Vemurafenib on BRAF-V600 Melanoma
Vemurafenib on BRAF-V600 Melanoma
9
Before Treatment 15 Weeks 23 Weeks
Cancer
Hundreds of mutations
Most are “passenger”, not driver
Can we identify likely drivers?
10
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC … Normal cells
Tumor cells
Traditional Biology
11
Targeted Experiments Discovery
One
hypothesis
Genomics
12
High-Throughput ExperimentsDiscovery
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
Many
hypotheses
?
Genomics
13
Discovery
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
Bottleneck #1: Knowledge
High-Throughput Experiments
Bottleneck #2: Reasoning
Example: Tumor Molecular Board
14
www.ucsf.edu/news/2014/11/120451/bridging-gap-precision-medicine
10-20 highly trained specialists
Tens of hours on each patient
Problem: Hard to scale
U.S. 2014: 1.6 million new cases, 585K deaths
Wanted: Decision support for clinical genomics
15
Example: Tumor Molecular Board
Pathway Knowledge
Genes work synergistically in pathways
16
Why Hard to Identify Drivers?
Complex diseases Perturb multiple pathways
17Hanahan & Weinberg [Cell 2011]
Why Cancer Comes Back?
Subtypes with alternative pathway profile
Compensatory pathways can be activated
18
EphA2 EphB2
Ovarian Cancer
Why Cancer Comes Back?
Subtypes with alternative pathway profile
Compensatory pathways can be activated
19
EphA2 EphB2
Ovarian Cancer
X
Cancer Systems Modeling
20
Gene A DNA mRNA Protein Protein Active
Transcription Translation Activation
… ATTCGGATATTTAAGGC …
Functional activity
Mutation effect
Drug Target
……
21
Gene A DNA mRNA Protein Protein Active
Gene B DNA mRNA Protein Protein Active
Gene C DNA mRNA Protein Protein Active
Transcription Factor
Protein Kinase
Knowledge Model
22
Gene A DNA mRNA Protein Protein Active
Gene B DNA mRNA Protein Protein Active
Gene C DNA mRNA Protein Protein Active
Transcription Factor
Protein Kinase
?Knowledge Model
23
Gene A DNA mRNA Protein Protein Active
Gene B DNA mRNA Protein Protein Active
Gene C DNA mRNA Protein Protein Active
Transcription Factor
Protein Kinase
?Knowledge Model
24
Gene A DNA mRNA Protein Protein Active
Gene B DNA mRNA Protein Protein Active
Gene C DNA mRNA Protein Protein Active
Transcription Factor
Protein Kinase
!Knowledge Model
Approach: Graph HMM
25
Gene A DNA mRNA Protein Protein Active
Transcription Factor
Protein Kinase
Gene B DNA mRNA Protein Protein Active
Gene C DNA mRNA Protein Protein Active
Extract Pathways from PubMed
26
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
……
……
Disease Genes
Drug Targets
……KBHigh-Throughput Data
PubMed
24 millions abstracts
Two new abstracts every minute
Adds over one million every year
27
…
VDR+ binds to
SMAD3 to form
…
…
JUN expression
is induced by
SMAD3/4
…
PMID: 123
PMID: 456
……
28
Machine Reading
29
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41 envelope
protein of human immunodeficiency virus type 1 ...
Machine Reading
30
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41 envelope
protein of human immunodeficiency virus type 1 ...
IL-10human
monocytegp41 p70(S6)-kinase
Machine Reading
PROTEINPROTEINPROTEINCELL
31
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41 envelope
protein of human immunodeficiency virus type 1 ...
Involvement
up-regulation
IL-10human
monocyte
SiteTheme Cause
gp41 p70(S6)-kinase
activation
Theme Cause
Theme
Machine Reading
REGULATION
REGULATION REGULATION
PROTEINPROTEINPROTEINCELL
32
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41 envelope
protein of human immunodeficiency virus type 1 ...
Involvement
up-regulation
IL-10human
monocyte
SiteTheme Cause
gp41 p70(S6)-kinase
activation
Theme Cause
Theme
Machine Reading
REGULATION
REGULATION REGULATION
PROTEINPROTEINPROTEINCELL
Semantic Parsing
Long Tail of Variations
33
TP53 inhibits BCL2.
Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.
BCL2 transcription is suppressed by P53 expression.
The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …
……
negative regulation532 inhibited, 252 inhibition, 218 inhibit, 207 blocked, 175
inhibits, 157 decreased, 156 reduced, 112 suppressed, 108 decrease, 86 inhibitor, 81 Inhibition, 68 inhibitors, 67 abolished, 66 suppress, 65 block, 63 prevented, 48 suppression, 47 blocks, 44 inhibiting, 42 loss, 39 impaired, 38 reduction, 32 down-regulated, 29 abrogated, 27 prevents, 27 attenuated, 26 repression, 26 decreases, 26 down-regulation, 25 diminished, 25 downregulated, 25 suppresses, 22 interfere, 21 absence, 21 repress ……
Bottleneck: Annotated Examples
GENIA (BioNLP Shared Task 2009-2013)
1999 abstracts
MeSH: human, blood cell, transcription factor
Challenge for “supervised” machine learning
Can we breach this bottleneck?
34
Free Lunch #1:
Distributional Similarity
Similar context Probably similar meaning
Annotation as latent variables
Textual expression Recursive clusters
Unsupervised semantic parsing
35
Poon & Domingos, “Unsupervised Semantic Parsing”.
EMNLP 2009. Best Paper Award.
Recursive Clustering
36
TP53 inhibits BCL2.
Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.
BCL2 transcription is suppressed by P53 expression.
The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …
……
Recursive Clustering
37
TP53 inhibits BCL2.
Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.
BCL2 transcription is suppressed by P53 expression.
The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …
……
Recursive Clustering
38
TP53 inhibits BCL2.
Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.
BCL2 transcription is suppressed by P53 expression.
The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …
……
Recursive Clustering
39
TP53 inhibits BCL2.
Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.
BCL2 transcription is suppressed by P53 expression.
The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …
……
BCL2, BCL-2 proteins,
B-cell CLL/Lymphoma 2
……
TP53,Tumor
suppressor P53
……
inhibits, down-regulates,
suppresses, inhibition, …
Theme Cause
Free Lunch #2:
Existing KBs
Many KBs available
Gene/Protein: GeneBank, UniProt, …
Pathways: NCI, Reactome, KEGG, BioCarta, …
Annotation as latent variables
Textual expression Table, column, join, …
Grounded semantic parsing
40
Relation Extraction
41
Regulation Theme Cause
Positive A2M FOXO1
Positive ABCB1 TP53
Negative BCL2 TP53
… … …
NCI-PID
Pathway KB
TP53 inhibits BCL2.
Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.
BCL2 transcription is suppressed by P53 expression.
The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …
……
Relation Extraction
42
Regulation Theme Cause
Positive A2M FOXO1
Positive ABCB1 TP53
Negative BCL2 TP53
… … …
NCI-PID
Pathway KB
TP53 inhibits BCL2.
Tumor suppressor P53 down-regulates the activity of BCL-2 proteins.
BCL2 transcription is suppressed by P53 expression.
The inhibition of B-cell CLL/Lymphoma 2 expression by TP53 …
……
Distant Supervision
43
Involvement of p70(S6)-kinase activation in IL-10
up-regulation in human monocytes by gp41 envelope
protein of human immunodeficiency virus type 1 ...
Involvement
up-regulation
IL-10human
monocyte
SiteTheme Cause
gp41 p70(S6)-kinase
activation
Theme Cause
Theme
Nested Events
REGULATION
REGULATION REGULATION
PROTEINPROTEINPROTEINCELL
GUSPEE
Generalize distant supervision to
extracting nested events
Prior: Favor semantic parse grounded in KB
Outperformed 19 out of 24 participants in
GENIA Shared Task [Kim et al. 2009]
44
Parikh, Poon, Toutanova. “Grounded Semantic Parsing for
Complex Knowledge Extraction”, NAACL-15.
45
GUSPEE
Semantic parser for pathway extraction
Tree HMM
46
Tree HMM
47
Expectation Maximization
48
Virtual EvidenceKey challenge:
non-local evidence
Syntax-Semantics Mismatch
49
Syntax-Semantics Mismatch
50
Syntax-Semantics Mismatch
51
Best Supervised System
52
Preliminary Results
53
54
Prototype-Driven Learning
55
Prototype-Driven LearningOutperformed 19 out of 24 supervised participants
Incomplete KB
56
http://literome.azurewebsites.net
57
Literome
Poon et al., “Literome: PubMed-Scale Genomic Knowledge
Base in the Cloud”, Bioinformatics-14.
PubMed-Scale Extraction
Preliminary pass:
1.5 million instances
13,000 genes, 838,000 unique regulations
Applications:
UCSC Genome Browser, MSR Interactions Track
Expression profile modeling
Validate de novo pathway prediction
Etc.
58
Poon, Toutanova, Quirk, “Distant Supervision for Cancer
Pathway Extraction from Text”. PSB-15.
Machine Science
59
Evans & Rzhetsky, “Machine Science”.
Science, Vol. 329, 2010.
Machine Science
60
Big Data
Machine Science
61
Big Data Rich Knowledge
KB
Machine Science
62
Deep Model
Big Data Rich Knowledge
KB
Machine Science
63
Deep Model
Big Data Rich Knowledge
Hypotheses
KB
Machine Science
64
Deep Model
Big Data Rich Knowledge
Hypotheses
Experiments
KB
Machine Science
65
Deep Model
Big Data Rich Knowledge
Hypotheses
Experiments
KB
Roadmap
Extract richer knowledge:
Cell type, experimental condition, …
Hedging, negation, …
Formulate coherent models:
Supporting evidence, contradiction, …
Intellectual gaps, hypotheses, …
Integrate w. data & experiments:
Cancer panomics Driver genes / pathways
Single-drug response Drug combo prioritization
66
67
Berkeley
AMP Lab
OHSU
Microsoft
Research
68
Berkeley
AMP Lab
OHSU
Microsoft
Research
Decision Support for
Clinical Genomics
69
Raw
Reads
Variant Call
RNA-Seq
Clinical Observation
Knowledge Graph
Decision Support
Clinicians
Literature
70
Raw
Reads
Variant Call
RNA-Seq
Clinical Observation
Knowledge Graph
Decision Support
Clinicians
Literature
NLP
Decision Support for
Clinical Genomics
71
72
73
We Have Digitized Life
74
Next: Digitize Medicine
75
Knock down genes A, B, C → Cure
Pennisi, “The CRISPR Craze”.
Science, Vol. 341, 2013.
Summary
Precision medicine is the future
Cancer systems modeling
Graphical model: Pathways + Panomics data
Extract pathways from PubMed
Machine reading by grounded semantic parsing
Literome: KB for genomic medicine
76
Collaborators
77
Chicago: Andrey Rzhetsky, Kevin White
OHSU: Brian Drucker, Jeff Tyner
Berkeley AMP Lab: David Patterson
Wisconsin: Mark Craven, Anthony Gitter
UCSC: Max Haeussler, David Haussler
Microsoft Research: Chris Quirk, Kristina
Toutanova, David Heckerman, Scott Yih, Lucy
Vanderwende, Bill Bolosky, Ravi Pandya
Summary
78
… ATTCGGATATTTAAGGC …
… ATTCGGGTATTTAAGCC …
……
……
Disease Genes
Drug Targets
……KBHigh-Throughput Data