Upload
maulik-kamdar
View
110
Download
3
Embed Size (px)
Citation preview
Graph Analytics in Pharmacology over the Web of Life Sciences Linked Open Data
26th World Wide Web Conference (WWW)
Perth, 4th – 8th April 2017
M A U L I K R . K A M D A R A N D M A R K A . M U S E N
Stanford Center for Biomedical Informatics [email protected]
Semantic Web: Publishing Data as a Graph
5
589.25
mol_weight
Gleevec (Mol. Wt.: 589.25 g/mol, Half-Life: 18 hours) inhibits PDGFR, involved in signal transduction.
“18 hours”half-life
x-ref
GleevecDrugB: DB00619
Gleevec
Resource Description Framework (RDF)
Inhibits
target name
type
GO:0007165(Signal
Transduction)
process
PDGFRKEGG: D01441http://bio2rdf.org/kegg:D01441
http://bio2rdf.org/drugbank:DB00619
Uniform Resource Identifier
Semantic Web: Querying the Graph
< 1000
mol_weight
?half-life
x-ref
?
?
What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins
involved in signal transduction?
SPARQL Query Language6
Inhibits
?target name
type
GO:0007165(Signal
Transduction)
process
Life Sciences Linked Open Data Cloud – query federation
• Challenges associated with retrieving information from LSLOD sources• Pattern-based method to rewrite queries across LSLOD sources• An application in mechanism-based pharmacovigilance - PhLeGrA
What this talk is about …
7
Query Federation: Rewriting and executing queries across different sources
QUERY FEDERATION
Drug molecular-weight < 1000 target
process = “GO:0007165” half-life
9Schwarte, et al. ISWC 2012
Drug molecular-weight < 1000 target half-life
Drug molecular-weight < 1000 target
process = “GO:0007165”
What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit
proteins involved in signal transduction?
Heterogeneity in the LSLOD Cloud
10
Gleevecmolecular-weight
493.61 Gleevecmol_weight
589.25
Label Mismatch: Different labels for classes, relations and attributes
(clinical features) (biological features)
Heterogeneity in the LSLOD Cloud
11
Gleevecmolecular-weight
493.61 Gleevecmol_weight
589.25
Label Mismatch: Different labels for classes, relations and attributes
(clinical features) (biological features)
Heterogeneity in the LSLOD Cloud
12
Gleevec PDGFRdrug-target
Gleevec
Inhibits
PDGFRtarget
name
type
PubMed: 21152856
source
Model Mismatch: Different graph patterns to capture granularity
Gleevecmolecular-weight
493.61 Gleevecmol_weight
589.25
Label Mismatch: Different labels for classes, relations and attributes
(clinical features) (biological features)
Heterogeneity in the LSLOD Cloud
13
• Inconsistent Meanings
• Inconsistent URI labels for classes, relations and attributes
• Inconsistent Attribute values for entities
• Inconsistent Graph patterns for SPARQL queries
• Incomplete Relations between entities
Query Rewriting fails over the LSLOD Cloud
What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction?
?s a <Drug>?s <molecular-weight> ?mw?s <target> ?protein ?s <half-life> ?hl?mw < 1000 g/mol?protein <hasGO> <GO:0007165>
?s a <Drug>{?s <molecular-weight> ?mw}{?s <half-life> ?hl}?mw < 1000 g/mol
?s a <Drug>{?s <target> ?protein}?protein <hasGO> <GO:0007165>
Query Rewriting
14
Using Graph Patterns for Query Rewriting
?Drug DrugBank:drug-target ?Protein?Drug KEGG:target ?blank KEGG:link ?Protein
Mapping Rules:
15
?Drug hasTarget ?Protein
Using Graph Patterns for Query Rewriting
?Drug DrugBank:drug-target ?Protein?Drug KEGG:target ?blank KEGG:link ?Protein
Mapping Rules:
What are the half-lives of drugs that have Mol. Wt < 1000 g/mol and inhibit proteins involved in signal transduction?
?s a <Drug>?s <hasMolWt> ?mw?s <hasTarget> ?protein ?s <hasHalfLife> ?hl?mw < 1000 g/mol?protein <hasGO> <GO:0007165>
?s a <Drug>{?s <molecular-weight> ?mw}?s <drug-target> ?protein {?s <half-life> ?hl}?mw < 1000 g/mol
?s a <Drug>?s <mol_wt> ?mw{?s <target> ?protein_blank?protein_blank <link> ?protein}?protein <hasGO> <GO:0007165>
QueryRewriteQuery Rewriting
16
?Drug hasTarget ?Protein
Life Sciences Linked Open Data Cloud – query federation
• Challenges associated with retrieving information from LSLOD sources• Pattern-based method to rewrite queries across LSLOD sources• An application in mechanism-based pharmacovigilance - PhLeGrA
What this talk is about …
17
PhLeGrA – Linked Graph Analytics in Pharmacology
18
Phlegra is a spider genus of the Salticidae family, commonly termed jumping spiders.
Entities and Relations from 4 different sources are retrieved to create the k-partite Network
This k-partite network is generated in < 1 day
20
Query Federation overcomes heterogeneous Distribution of Entities and Relations
R1: Drug hasTarget ProteinE1: Drug
• Similar and complete unique entities and relations exist between data sources• Necessary to get the complete picture, but also determine sources of noise
21
Several underlying mechanisms are possible …
http://onto-apps.stanford.edu/phlegra 22
The story so far …
25
Pattern-based federation methods can retrieve data from multiple sources in the Life Sciences Linked Open Data Cloud, and can enable development of advanced
methods for mechanism-based pharmacovigilance.
…
Acknowledgments
Musen Lab, Stanford
Biomedical Informatics Training Program
Michel Dumontier
US NIH Grant HG004028
26
PhLeGrA – Linked Graph Analytics in Pharmacology
27
www.stanford.edu/~maulikrk/research.htmlwww.onto-apps.stanford.edu/phlegra