70
1 From chemoinformatics to repositioning Péter Antal Computational Biomedicine (Combine) workgroup Department of Measurement and Information Systems, Budapest University of Technology and Economics

Bioinformatika, mint a mesterséges intelligencia és a

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bioinformatika, mint a mesterséges intelligencia és a

1

From chemoinformatics to repositioning

Péter Antal

Computational Biomedicine (Combine) workgroupDepartment of Measurement and Information Systems,

Budapest University of Technology and Economics

Page 2: Bioinformatika, mint a mesterséges intelligencia és a

Overview• Chemoinformatics• The „big data”/omic era of chemo- and bioinformatics• Data and knowledge fusion in biomedicine• The semantic unification of pharmacological spaces• Multi-aspect virtual screening• Drug repositioning

2

Page 3: Bioinformatika, mint a mesterséges intelligencia és a

Chemoinformatics• Gasteiger, Johann, and Thomas Engel, eds.

Chemoinformatics: a textbook. John Wiley & Sons, 2006.• Bajorath, Jürgen. Chemoinformatics for Drug Discovery.

John Wiley & Sons, 2013.• Karthikeyan, Muthukumarasamy, and Renu Vyas.

Practical Chemoinformatics. Springer, 2014.• Brown, Nathan. In Silico Medicinal Chemistry:

Computational Methods to Support Drug Design. No. 8. Royal Society of Chemistry, 2015.

3

Page 4: Bioinformatika, mint a mesterséges intelligencia és a

Practical chemoinformatics

4

1. Open-Source Tools, Techniques, and Data in Chemoinformatics

2. Chemoinformatics Approach for the Design and Screening of Focused Virtual Libraries

3. Machine Learning Methods in Chemoinformatics for Drug Discovery

4. Docking and Pharmacophore Modelling for Virtual Screening

5. Active Site-Directed Pose Prediction Programs for Efficient Filtering of Molecules

6. Representation, Fingerprinting, and Modelling of Chemical Reactions

7. Predictive Methods for Organic Spectral Data Simulation

8. Chemical Text Mining for Lead Discovery 9. Integration of Automated Workflow in

Chemoinformatics for Drug Discovery10. Cloud Computing Infrastructure

Development for Chemoinformatics

Page 5: Bioinformatika, mint a mesterséges intelligencia és a

In Silico Medicinal Chemistry: Computational Methods to Support Drug Design

5

Page 6: Bioinformatika, mint a mesterséges intelligencia és a

In Silico Medicinal Chemistry

6

Page 7: Bioinformatika, mint a mesterséges intelligencia és a

In Silico Medicinal Chemistry

7

Page 8: Bioinformatika, mint a mesterséges intelligencia és a

E D. Green et al. Nature 470, 204-213 (2011) doi:10.1038/nature09764

Accomplishments of genomics research

Page 9: Bioinformatika, mint a mesterséges intelligencia és a

Big „omic” data sets in biomed.

9

Page 10: Bioinformatika, mint a mesterséges intelligencia és a

Multiple levels in biomedicine

Genome(s)

Phenome (disease, side effect)

Transcriptome

Proteome

Metabolome

Environment&life style

Drugs

Page 11: Bioinformatika, mint a mesterséges intelligencia és a

Moore’s Law for Data Explosion (Carlson’s law)

Sequencing costs per mill.

base

Publicly available

genetic data

NATURE, Vol 464, April 2010

• x10 every 2-3 years

• Data volumes and complexity that IT has never faced before…

Page 12: Bioinformatika, mint a mesterséges intelligencia és a

Large-scale cohorts in UK

12

UK Biobank:• 1million< adults• aged 40-69,• 2006-2036<• genes x lifestyle x environment diseases

• open 2012-

Page 13: Bioinformatika, mint a mesterséges intelligencia és a

Number of genome-wide association studiesTo

tal N

umbe

rof P

ublic

atio

ns

Calendar Quarter

0

200

400

600

800

1000

1200

1400

2005 2006 2007 2008 2009 2010 2011 2012

1350

Page 14: Bioinformatika, mint a mesterséges intelligencia és a

NHGRI GWA Catalogwww.genome.gov/GWAStudies

Published Genome-Wide Associations through 12/2012Published GWA at p≤5X10-8 for 17 trait categories

Page 15: Bioinformatika, mint a mesterséges intelligencia és a

Disease network

L.A.Barabási:PNAS, 2007, The human disease network

Page 16: Bioinformatika, mint a mesterséges intelligencia és a

Repositories for gene expression• Gene Expression Omnibus (NCBI)• http://www.ncbi.nlm.nih.gov/geo/

Page 17: Bioinformatika, mint a mesterséges intelligencia és a

Gene expression profiles

• Justin Lamb: The Connectivity Map: a new tool for biomedical research, Nature, 7,pp 54-60, 2007

Compounds Cell lines

Each cell is transcriptional

proifle

Page 18: Bioinformatika, mint a mesterséges intelligencia és a

STRING - Protein-Protein Interactions

• http://string-db.org/

Page 19: Bioinformatika, mint a mesterséges intelligencia és a

Unification of biology: Gene Ontology

• Ontologies:– Gene Ontology (GO): http://www.geneontology.org/– Enzyme Classification (EC)– Unified Medical Language Systems (UMLS)– OBO

Page 20: Bioinformatika, mint a mesterséges intelligencia és a

The Human Phenotype Ontology

http://human-phenotype-ontology.github.io/

Page 21: Bioinformatika, mint a mesterséges intelligencia és a

Number of biomedical publications

21Little Science, Big Science, by Derek J. de Solla Price, 1963

0

200000

400000

600000

800000

1000000

1200000

1950 1960 1970 1980 1990 2000 2010

Number of annual papers

Page 22: Bioinformatika, mint a mesterséges intelligencia és a

The fusion bottleneck(~limits of personal cognition)

Page 23: Bioinformatika, mint a mesterséges intelligencia és a

The

phar

ma

gap

Page 24: Bioinformatika, mint a mesterséges intelligencia és a

Semantic publishing:papers vs DBs/KBs

M. Gerstein, "E-publishing on the Web: Promises, pitfalls, and payoffs for bioinformatics," Bioinformatics, 1999M. Gerstein: Blurring the boundaries between scientific 'papers' and biological databases, Nature, 2001P. Bourne, "Will a biological database be different from a biological journal?," Plos Computational Biology, 2005M. Gerstein et al: "Structured digital abstract makes text mining easy," Nature, 2007.M. Seringhaus et al: "Publishing perishing? Towards tomorrow's information architecture," Bmc Bioinformatics,

2007.M. Seringhaus: "Manually structured digital abstracts: A scaffold for automatic text mining," Febs Letters, 2008.D. Shotton: "Semantic publishing: the coming revolution in scientific journal publishing," Learned Publishing, 2009

24

Page 25: Bioinformatika, mint a mesterséges intelligencia és a

Network of databases and knowledge bases in biomedicine

25

• 10k< relevant biological databases and knowledge-bases• Petabytes of sequence and high-throughput gene/protein data• ~10.000.000 concepts and relations explicitly in knowledge bases

Page 26: Bioinformatika, mint a mesterséges intelligencia és a

Combination of elements

genege

ne

target

com

poun

d

gene

disease

binding site

com

poun

d

target protein

bind

ing

site

product

gene

gene

TFBS

pathway

gene

disease

path

way

transcription factor binding site

prod

uct

ATC

GO

EC

HPO

Page 27: Bioinformatika, mint a mesterséges intelligencia és a

“Compound” Google?

27

“The Science Behind an Answer”

artificial intelligence, natural language processing..(???)

abacavir didanosine lamivudine

Why Can’t My Computer Understand Me?http://www.newyorker.com/tech/elements/why-cant-my-computer-understand-me

Page 28: Bioinformatika, mint a mesterséges intelligencia és a

28

E-science, data-intensive science, the fourth paradigm

Page 29: Bioinformatika, mint a mesterséges intelligencia és a

Approaches to fusion• Encyclopedists:

– Wikipedia, Wikidata,– Linked Open Data (LOD),– Semantic publishing

• Automated cross-domain querying– Forms– Workflow systems– Natural language understanding, Machine reading

• Automated reasoning– Watson

• Automated discovery systems („Automation of science”)– Adam, Eve

• Large-scale similarity-based fusion (applied in repositioning) 29

Page 30: Bioinformatika, mint a mesterséges intelligencia és a

OPS: scientific pharma questions

30

Page 31: Bioinformatika, mint a mesterséges intelligencia és a

A problem with public data: parallel works on cleaning...integration

31

Page 32: Bioinformatika, mint a mesterséges intelligencia és a

A Resource Description Framework (RDF) háttér

• The data model of the Semantic Web• RDF statement

– subject: resource identified by an IRI– predicate (property): resource identified by an

IRI– object: resource or literal (constant value)

• Graph databases of RDF triples

32

Page 33: Bioinformatika, mint a mesterséges intelligencia és a

Relational databases vs. Triplestores (graph databases)Relational databases• Relations are separated from data (cases)• Tables&keys define the formal model (syntax)

for the data (cases)• Model-based (~predefined)• Meaning (semantics) is informal (out of scope

of the DB)• Singular databases (~they are separated)

Triplestores• Unified representation of relations and data• Triples („graph database”) stores the dynamic

model for the data, together with the factual data

• Model-free (~relations as data)• Meaning is defined by the (explicit) relations

(~ontology)• Linked open data space (using universal

identifiers & ontologies)

33

Cf. Neumann’s principle: instructions is data

Page 34: Bioinformatika, mint a mesterséges intelligencia és a

SPARQL

• a query language specification for querying over RDF triples

34

Page 35: Bioinformatika, mint a mesterséges intelligencia és a

/Linked data

35

• Bio2RDF• ~11 billion triples• 35 datasets:

clinicaltrials, dbSNP, DrugBank, KEGG, PIR, GOA, OrphaNet, PubMed, SIDER..)

• local: chembl, pathwaycommons, reactome, wikipathways

• http://download.bio2rdf.org/release/3/release.html

Page 36: Bioinformatika, mint a mesterséges intelligencia és a

Chem2Bio2RDF I.

36

Page 37: Bioinformatika, mint a mesterséges intelligencia és a

Chem2Bio2RDF II.

37

Page 38: Bioinformatika, mint a mesterséges intelligencia és a

• Discovery Platform for cross-domain fusion. • Public, curated, linked data.

– The data sources you already use, integrated and linked together: compounds, targets, pathways, diseases and tissues.

• Everything in triples: Subject-predicate-object

38

Open Pharmacological Space

Precursor: Gene Ontology: tool for the unification of biology, Nature, 2000

Page 39: Bioinformatika, mint a mesterséges intelligencia és a

@gray_alasdair Big Data Integration 39

Page 40: Bioinformatika, mint a mesterséges intelligencia és a

• Discovery Platform to cross barriers. • The data sources you already use, integrated

and linked together: compounds, targets, pathways, diseases and tissues.

• ChEBI, ChEMBL, ChemSpider, ConceptWiki, DisGeNET, DrugBank, Gene Ontology, neXtProt, UniProt and WikiPathways.

• For questions in drug discovery, answers from publications in peer reviewed scientific journals.

40

Page 41: Bioinformatika, mint a mesterséges intelligencia és a

Top questions in the pharma industry I. (Open PHACTS)

41

Page 42: Bioinformatika, mint a mesterséges intelligencia és a

Top questions II.

42

Page 43: Bioinformatika, mint a mesterséges intelligencia és a

Open PHACTS: databases

43

Page 44: Bioinformatika, mint a mesterséges intelligencia és a

Dataset Downloaded Version Licence TriplesBio Assay Ontology CC-By 10,360CALOHA 8 Apr 2015 2014-01-22 CC-By-ND 14,552ChEBI 4 Mar 2015 125 CC-By-SA 1,012,056ChEMBL 18 Feb 2015 20.0 CC-By-SA 445,732,880ConceptWiki 12 Dec 2013 CC-By-SA 4,331,760DisGeNET 31 Mar 2015 2.1.0 ODbL 15,011,136Disease Ontology 2015-05-21 CC-By 188,062DrugBank 19 Feb 2015 4.1 Non-commercial 4,028,767ENZYME 2015_11 CC-By-ND 61,467FDA Adverse Events 9 Jul 2012 CC0 13,557,070

Total: ~3 Billion triples

Page 45: Bioinformatika, mint a mesterséges intelligencia és a

Dataset Downloaded Version Licence TriplesGene Ontology 4 Mar 2015 CC-By 1,366,494Gene Ontology Annotations 17 Feb 2015 CC-By 879,448,347NCATS OPDDR Nov 2015 Oct 2015 2,643neXTProt (NP) 1 Feb 2014 1.0 CC-By-ND 215,006,108OPS Chemical Registry 4 Nov 2014 CC-By-SA 241,986,722

HMDB 3.6 HMDB

MeSH 2015 MeSH

PDB Ligands 2 PDB

OPS Metadata CC-By-SA 2,053UniProt 2015_11 CC-By-ND 1,131,186,434WikiPathways 20151118 CC-By 11,781,627

Total: ~3 Billion triples

Page 46: Bioinformatika, mint a mesterséges intelligencia és a

OPS: open tools for free academic use

46

Page 47: Bioinformatika, mint a mesterséges intelligencia és a

Open PHACTS with non-shared, private data for commercial users

47

Page 48: Bioinformatika, mint a mesterséges intelligencia és a

Open PHACTS: advantages I.

48

Page 49: Bioinformatika, mint a mesterséges intelligencia és a

Open PHACTS: advantages II.

49

Page 50: Bioinformatika, mint a mesterséges intelligencia és a

Attrition in drug discovery

Page 51: Bioinformatika, mint a mesterséges intelligencia és a

De novo drug discovery and development10-17 years process and around 1B USD~10% probability of success from Phase 1 to Market

Drug repositioning3-12 years process and up to 80% cost reductionSignificantly higher probability of success from Phase 1 to Market due to reduced safety and pharmacokinetic uncertainty

De novo discovery vs. repositioning

Page 52: Bioinformatika, mint a mesterséges intelligencia és a

Scientific motivations for repositioning/rescue

L.A.Barabási:PNAS, 2007,

M.Campillos:Science, 2008Ingenuity Pathway Analysis

A disease-disease similarity network

A drug-drug networkA gene regulatory network

1, Multiple targets

2, Multifactorial diseases

4, Complex pathways (accumulating knowledge)

3, Personalized aspects: 3a, pharmaceutical/phenotypic: efficacy, side effects 3b, genetic/epigenetic

5, New measurements (accumulating omic data)

ENCODE: tissue specific regulation

6, Drugome (2000-7000, 1941) + failed drugs (~2000, +100 new yearly)

Page 53: Bioinformatika, mint a mesterséges intelligencia és a

Scientific motivations for repositioning II.

• Magic bullet vs. Promiscuous/dirty drugs• Monogenic vs multifactorial disease• Selective optimisation of side activities (SOSA)• Network pharmacy• Personalized („precision”) drugs (for sub-

populations)– Special external applicability conditions– „Pathway” drugs

53

Page 54: Bioinformatika, mint a mesterséges intelligencia és a

Repositioning publications

54

Ashburn TT, Thor KB: Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov

2004, 3(8):673-683.

Campillos M, Kuhn M, Gavin AC, Jensen LJ, Bork P: Drug target identification using side-effect similarity. Science 2008, 321(5886):263-266.

Joachim von Eichborn Manuela S. Murgueitio, Mathias Dunkel, Soeren Koerner,Philip E. Bourne, Robert Preissner: PROMISCUOUS: a database for network-based drug-repositioning, Nucleic Acids Research, 2010, 1–7

Michael Kuhn, Monica Campillos, Ivica Letunic, Lars Juhl Jensen, Peer Bork,*SIDER: A side effect resource to capture

phenotypic effects of drugs, Molecular Systems Biology 6:343, 2010……..

0

20

40

60

80

2004200520062007200820092010201120122013

Page 55: Bioinformatika, mint a mesterséges intelligencia és a

Repositioning: examples

55Li and Jones: Drug repositioning for personalized medicine, Genome Medicine 2012, 4:27

Page 56: Bioinformatika, mint a mesterséges intelligencia és a

Information sources in repositioning and lead discovery.

Profile Repositioning HTS-based Dimension

Chemical X X 100-10000

Target protein X X n x 10000

Taxonomy X 3 (depth)

Side effect X 10000

Literature X 100000

Gene Expression X X k x 1000

Off-label use X 10000

Page 57: Bioinformatika, mint a mesterséges intelligencia és a

Chemical fingerprints• MACCS 2D, Molcon-Z, Dragon, 3D,..

• Schrödinger Canvas using Tanimoto distance

•Structurefingerprint

810 drugs

011001001011010101...

001010000001110100...

Page 58: Bioinformatika, mint a mesterséges intelligencia és a

Target profiles I.

58

•Targets: 10,774•Compound records: 1,715,667•Distinct compounds: 1,463,270•Activities: 13,520,737•Publications: 59,610

ChEMBL is a database of bioactive drug-like small molecules, it contains 2-D structures, calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.) and abstracted bioactivities (e.g. binding constants, pharmacology and ADMET data).

https://www.ebi.ac.uk/chembl

Page 59: Bioinformatika, mint a mesterséges intelligencia és a

Target profiles II.

Compounds: 68,280,589Tested Compounds: 2,081,593Substances: 196,539,272Tested Substances: 3,121,708BioAssays: 1,112,184RNAi BioAssays: 62BioActivities: 228,469,266Protein Targets: 9,847Gene Targets: 41,361

59

SdPcsubstance contains more than 180 million records. Pccompound contains more than 63 million unique structures. PCBioAssay contains more than 1 million BioAssays. Each BioAssay contains a various number of data points.

Page 60: Bioinformatika, mint a mesterséges intelligencia és a

Side-effect profiles• DailyMed textmining

– qualitative:SIDER adatbázis (http://sideeffects.embl.de)– quantitative: exact prevalences

• E.g. Olanzapine

514 drugs

Page 61: Bioinformatika, mint a mesterséges intelligencia és a

Taxonomies

• Anatomical Therapeutic Chemical Classification System (ATC)– 5 levels:

• Main anatomic, • Main therapeutic • therapeutic/pharmacological subgroup • chemical/therapeutic/pharmacological subgroup

• Drugs.com– http://www.drugs.com/

• RxNorm, Aetionomy 61

Page 62: Bioinformatika, mint a mesterséges intelligencia és a

Chemical

Target

Pathway

“Disease”

Side effect

drugi

drugj

Combination of chemical and side effect information for better target prediction

M.Campillos: Drug target identification using side-effect similarity, Science, 2008

Page 63: Bioinformatika, mint a mesterséges intelligencia és a

Potential avenues of drug repositioning

63Li and Jones: Drug repositioning for personalized medicine, Genome Medicine 2012, 4:27

Page 64: Bioinformatika, mint a mesterséges intelligencia és a

In silico/virtual screening using LOD

Chemical Side-effects Target prot. MMoA Pathways

Tanimoto

Linked Open Data (LOD), e.g. Open PHACTS

Representation

Surrogate

Compound representations

Compound-compound similaritiesDavis,Shrobe,Szolovits, 1993

Page 65: Bioinformatika, mint a mesterséges intelligencia és a

Similarity-based virtual screening1, The “One-One-One” phaseHenrickson J, Johnson M, Maggiori G: Concepts and applications of molecular similarity. 1991, New York: John

Willey & Sons.Willett P, Barnard J, Downs G: Chemical similarity searching. Journal of Chemical Information and Computer

Sciences 1998, 38(6):983-996.

2, The „data fusion” phase “One-Many-One”Ginn C, Willett P, Bradshaw J: Combination of molecular similarity measures using data fusion. Perspectives in

Drug Discovery and Design 2000, 20(1):1-16.

3, The „group fusion” phase “Many-Many-One”Whittle M, Gillet V, Willett P, Loesel J: Analysis of data fusion methods in virtual screening: Similarity and group

fusion. Journal of Chemical Information and Modeling 2006, 46(6):2206-2219.Keiser M, Roth B, Armbruster B, Ernsberger P, Irwin J, Shoichet B: Relating protein pharmacology by ligand

chemistry. Nature Biotechnology 2007, 25(2):197-206.Chen B, Mueller C, Willett P: Combination rules for group fusion in similarity-based virtual screening. Molecular

Informatics 2010, 29(6-7):533-541.Gardiner E, Holliday J, O'Dowd C, Willett P: Effectiveness of 2D fingerprints for scaffold hopping. Future Medicinal

Chemistry 2011, 3(4):405-414.Svensson F, Karlén A, Sköld C: Virtual screening data fusion using both structure- and ligand-based methods.

Journal of Chemical Information and Modeling 2011, 52(1):225-232.

Page 66: Bioinformatika, mint a mesterséges intelligencia és a

B

A

S1S2S3S4 S5

B

A

S*

B

Q2

S S S

B

S*

Q3

Q1

Q2

Q3

Q1

B

S*

Q2

Q3

Q1

B

Q2

S1 S2S3S4 S5

Q1

S2S3S4 S5

Q3

S2S3S4 S5

Q2

Q1

Q3

S*

B

A

S

1, Similarity-basedapproach

2, Data fusion

3, Group fusion

4, Query Driven Fusion Framework

Page 67: Bioinformatika, mint a mesterséges intelligencia és a

Similarity-based fusion in drug repositioning

Chemical Side-effects Target prot. MMoA Pathways

Query-based optimal fusion

Tanimoto

Query: set of corresponding drugs

QDF2

Page 68: Bioinformatika, mint a mesterséges intelligencia és a

On the use of query analysis• The information content of

– the query,– the information resources,– and the unknown observations(!)

• allow a one-class analysis of the query(data description)

• and this induction is used in prioritization.JOINTLY OPTIMIZED:1. weighting the members in the query (e.g. detection of outliers in the question),

GETTING THE RIGHT/IMPROVED QUESTION

2. weighting the similarity measures (e.g.information resources),GETTING THE SCORING (SIMILARITY) FOR THE RIGHT/IMPROVED QUESTION

3. scoring/ranking the aggregate similarity of the unknown data points to the.

QDF2

Page 69: Bioinformatika, mint a mesterséges intelligencia és a

The repositome

The „repositome” of FDA approved drugs (row) for the ATC level 4 classes (columns).

Page 70: Bioinformatika, mint a mesterséges intelligencia és a

Thank you for your attention!