Användningen av länkade data principer och semantic web standards i läkemedelsforskningen
Kerstin Forsberg (@kerfors on Twitter, SlideShare etc.)AZ IT | R&D Information
Kompetensförstärkning kring länkade öppna data - dialog, webbinarier och vitbok
Webbinarium arrangerat av18 mars 2014
(Ursäkta mixen av engelska
och svenska)
AZIT | R&D Information
About AstraZeneca
• Alongside our own R&D, we partner with others, combining skills and resources to broaden the potential for successful innovation.
• We believe that only by working together with others who have a part of play in improving healthcare can real progress be made.
• We work closely with others in the healthcare community, including physicians and those who pay for healthcare, to understand their challenges and how we can combine skills and resources to achieve a common goal: improved health.
2 Kerstin Forsberg | Vinnova webbinarium 18 mars 2013
Länkade Data i Läkemedelsforskningen
Två exempel på hur AstraZeneca arbetar med europeiska forskningssamarbeten och internationella standard organisationer för att göra kemidata och kliniska studie data enklare att använda med hjälp av nästa generations web teknik.
3 Kerstin Forsberg | Vinnova webbinarium 18 mars 2013 Set area descriptor | Sub level 1
Webben fyllde 25 år 12 mars
4 Kerstin Forsberg | Vinnova webbinar 18 mars 2013 AZIT | R&D Information
Web of (Linked) Data
Web of Documents
An Intro To The Semantic Web: Why You Need To Know About It Sooner Than Later , by Samantha Wong Image Source: Frederic Martin
RDF (semantic web basen) fyllde 15 år 22 febr.
5 Kerstin Forsberg | Vinnova webbinar 18 mars 2013 AZIT | R&D Information
Web of (Linked) Data
Web of Documents
subject predicat object
Common Model (“Triples”)
Resource Description Framework
The Project
The Innovative Medicines Initiative• EC funded public-private
partnership for pharmaceutical research
• Focus on key problems– Efficacy, Safety,
Education & Training, Knowledge Management
The Open PHACTS Project• Create a semantic integration hub (“Open
Pharmacological Space”)…• Delivering services to support on-going drug
discovery programs in pharma and public domain• Not just another project; Leading academics in
semantics, pharmacology and informatics, driven by solid industry business requirements
• 23 academic partners, 8 pharmaceutical companies, 3 biotechs
• Work split into clusters:• Tehnical Build• Scientific Drive• Community & Sustainability
Pre-competitive Informatics:Pharma are all accessing, processing, storing & re-processing external research data
LiteraturePubChem
GenbankPatents
DatabasesDownloads
Data Integration Data AnalysisFirewalled Databases
Repeat @ each
companyx
Lowering industry firewalls: pre-competitive informatics in drug discovery Nature Reviews Drug Discovery (2009) 8, 701-708 doi:10.1038/nrd2944
ChEMBL DrugBankGene
OntologyWikipathways
UniProt
ChemSpider
UMLS
ConceptWiki
ChEBI
TrialTrove
GVKBio
GeneGo
TR Integrity
“Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM”
“What is the selectivity profile of known p38 inhibitors?”
“Let me compare MW, logP and PSA for known oxidoreductase inhibitors”
Number sum Nr of 1 Question
15 12 9 All oxidoreductase inhibitors active <100nM in both human and mouse
18 14 8Given compound X, what is its predicted secondary pharmacology? What are the on and off,target safety concerns for a compound? What is the evidence and how reliable is that evidence (journal impact factor, KOL) for findings associated with a compound?
24 13 8Given a target find me all actives against that target. Find/predict polypharmacology of actives. Determine ADMET profile of actives.
32 13 8 For a given interaction profile, give me compounds similar to it.
37 13 8The current Factor Xa lead series is characterised by substructure X. Retrieve all bioactivity data in serine protease assays for molecules that contain substructure X.
38 13 8Retrieve all experimental and clinical data for a given list of compounds defined by their chemical structure (with options to match stereochemistry or not).
41 13 8
A project is considering Protein Kinase C Alpha (PRKCA) as a target. What are all the compounds known to modulate the target directly? What are the compounds that may modulate the target directly? i.e. return all cmpds active in assays where the resolution is at least at the level of the target family (i.e. PKC) both from structured assay databases and the literature.
44 13 8 Give me all active compounds on a given target with the relevant assay data
46 13 8Give me the compound(s) which hit most specifically the multiple targets in a given pathway (disease)
59 14 8 Identify all known protein-protein interaction inhibitors
Business Question Driven Approach
http://www.sciencedirect.com/science/article/pii/S1359644613001542
RDFNanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices
Identity Resolution
Service
Chemistry RegistrationNormalisation & Q/C
IdentifierManagement
Service
Indexing
Co
re P
latf
orm
P12374EC2.43.4
CS4532
“Adenosine receptor 2a”
RDF
VoID
Db
RDFNanopub
Db
VoID
RDF
Db
VoID
RDFNanopub
VoID
Public Content Commercial
Public Ontologies
User Annotations
Apps
Platform Explorer
Standards
Apps
API
“Provenance Everywhere”
Clinical data standards, today’s documentation
12 Kerstin Forsberg | Vinnova webbinarium 18 mars 2013 AZIT | R&D Information
Human readable documentation in 200+ pages PDF:s, Excel:s (and some in XML).
Clinical data standards in the Semantic WebEnable end-to-end interoperable data standards for clinical research
13 Kerstin Forsberg | Vinnova webbinarium 18 mars 2013 AZIT | R&D Information
Clinical data standards in the Semantic WebExample: 14 RDF triples describing one variable (“AEACN”)
14 Kerstin Forsberg | Vinnova webbinarium 18 mars 2013 AZIT | R&D Information
RDF triples describing one variable/data elementand also linked to related standard parts
• CDISC2RDF started Oct 2012 as a pre-competitive project with AZ, Roche, W3C et al. to show case Semantic Web standards and Linked Data principles.
• FDA meeting Nov 2012: Solutions for Study Data Exchange Standards Meeting – W3C Semantic Web presentation
• June 2013 the Semantic Technology project, a FDA/PhUSE working group for Emerging Technologies, with 25+ repr. from FDA, CDISC, Pharma:s, CRO:s and software vendors.
• Oct 2013 press release: Representing existing standards (SDTM, CDASH,SEND, ADaM) in RDF.
Clinical standards in the Semantic WebCommunity building and knowledge sharing
15 Kerstin Forsberg | Vinnova webbinarium 18 mars 2013 AZIT | R&D Information
CDISC Interchange Europe 2011 and 2012presentations from Roche and AstraZeneca
AstraZeneca’s view on “Semantics”
Enabling the hyperconnected enterprise
16 Kerstin Forsberg | Vinnova webbinarium 18 mars 2013 AZIT | R&D Information
“We need to build a linked data architecture enabling us to ask questions and solve business problems across a heterogeneous information landscape extending beyond the traditional boundaries of the enterprise.”
semanticsconnectsusall
AZIT | R&D Information
Acknowledgements
AZ’s Linked Data of Practice members:Tom Plasterer (lead), Jim Morris, Courtland Yockey, Sorana Popa, Rob Hernandez, Mike Westaway, Rajan Desai, Simon Rakov, Dana Crowley, Ian Dix, Johan Törnqvist
Collaborators and Advisors:• Charlie Mead – IO Informatics• Dean Allemang – Working Ontologist• Frederik Malfait – IMOS consulting / Roche• Phil Ashworth – TopQuadrant
17 Kerstin Forsberg | Vinnova webbinarium 18 mars 2013
Thank you! [email protected]