Transcript
Page 1: ChEMBL US tour December 2014

ChEMBL:  Resources  for  Drug  Discovery  

John  P.  Overington  @johnpoverington  [email protected]  

Page 2: ChEMBL US tour December 2014

EMBL-­‐EBI  

Page 3: ChEMBL US tour December 2014

ChEMBL  Strategy  

•  Comprehensively  catalogue  historical  drug  discovery  •  Include  successes  and  failures  

•  Large  scale  abstracHon  curaHon  of  primary  literature  •  Direct  deposiHons  

•  Drugs  can  be  small  molecules,  pepHdes,  recombinant  proteins,  siRNA,  cells,  viruses,  etc.  

•  ‘Learn’  rules  for  drug  discovery  ‘success’  •  Target  selecHon  and  prioriHsaHon  -­‐  druggability  •  Lead  discovery,  opHmisaHon,  clinical  candidate  selecHon  •  Develop  approaches  to  new  target  classes  –  e.g.  PPIs  

•  Make  all  data  freely  available  to  enHre  community  •  Encourage  re-­‐use,  integraHon  and  cross-­‐linking  

Page 4: ChEMBL US tour December 2014
Page 5: ChEMBL US tour December 2014

Target  Discovery  

Lead  Discovery   Lead  OpHmisaHon  

Preclinical  Development  

Phase  1   Phase  2   Phase  3   Launch  (Phase  4)  

Drug  Discovery  

>1,638,000  compound  records  >12,800,000  bioacHviHes  ~57,150  abstracted  papers  ~10,579  targets  

~12,000  clinical  candidates  

~1,600  drugs  

•   Target  idenHficaHon  •   Microarray  profiling  •   Target  validaHon  •   Assay  development  •   Biochemistry  •   Clinical/Animal  disease  models  

•   High-­‐throughput  Screening  (HTS)  •   Fragment-­‐based  screening  •   Focused  libraries  • Screening  collecHon  

•   Medicinal  Chemistry  •   Structure-­‐based  drug  design  •   SelecHvity  screens  •   ADMET  screens  •   Cellular/Animal  disease  models  •   PharmacokineHcs  

•   Toxicology  •   In  vivo  safety  pharmacology  •   FormulaHon  •   Dose  predicHon  

PK  tolerability   Efficacy  

Safety  &  Efficacy  

IndicaHon  discovery,  repurposing  &  expansion  

Med.  Chem.  SAR   Clinical  Candidates   Drugs  

Discovery   Development   Use  

ChEMBL content ChEMBL19  content  

Page 6: ChEMBL US tour December 2014

4th  generaHon  3rd  generaHon  2nd  generaHon  1st  generaHon  Prototype  

N

O

N

O

O

H

NN

N

Cl Cl

NN

N

O

N

O

N

O

O

H

NN

N

Cl Cl

N

O

N

O

O

O

H

N

N

Cl Cl

Drug Optimisation

N

N

N+

O

O

Azomycin (1956)

Streptomyces natural product trichomonacidal ‘toxic’

Metronidazole 1962

N

N

N+

O

O

O

N

N

Cl

N

N

Cl

Cl

O

Cl

Cl

N

N

Cl

Cl

O

Cl

Clotrimazole 1970

Miconazole 1970

Econazole 1972

N

N

Cl

Cl

S

Cl

N

N

N+

O

O

SO O

N

N

Tinidazole 1970

Bifonazole 1981

Sulconazole 1980

Ketoconazole 1978 Itraconazole 1984

Terconazole 1980

Voriconazole 2002

N N

F

F

OH

N

N

N

F

Fluconazole 1988

OH

N

N

NN

NN

F

F

Fosfluconazole 2004

O

O

NN

NN

N

F F

NN

N

O

OH

Posaconazole 2005

triazole  Imidazole  

O

N

N

NN

NN

F

F

PO

OHOH

N

N

N

NN

After W. Sneader

Page 7: ChEMBL US tour December 2014

Overview  of  EMBL-­‐EBI  Chemistry  Resources  

UniChem  –  InChI-­‐based  resolver  (full  +  relaxed  ‘lenses’)  

3rd  Party  Data    

ZINC,  PubChem,  ThomsonPharma  DOTF,  IUPHAR,  DrugBank,  KEGG,  

NIH  NCC,  eMolecules,  FDA  SRS,  PharmGKB,  

Selleck,  ….    

ChEMBL    

BioacHvity  data  from  literature  

and  deposiHons  

 

ChEBI    

Structures  and  metadata  for  metabolites.  Chemical  Ontology  

 

Atlas    

Ligand-­‐induced  transcript  response  

PDBe    

Ligand  structures  

from  structurally  defined  protein  

complexes    

SureChEMBL    

Ligand  structures  from  patent  literature  

 

RDF  and  REST  API  interfaces  

REST  API  Interface  

15K   750  >15M  1.5M  40K  

~75M  

Page 8: ChEMBL US tour December 2014

ChEMBL  

Page 9: ChEMBL US tour December 2014

What  Is  the  ChEMBL  Data?  

Page 10: ChEMBL US tour December 2014

SAR  Data  

Compound  

Assay  

Ki=4.5  nM  

>Thrombin   MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLERECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGTNYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYTTDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVTTHGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGYCDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLFEKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDRWVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWRENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTANVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGGPFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE

 ED2=230  nM  

What  Is  the  ChEMBL  Data?  

Inhibition of human Thrombin

PTT (partial thromboplastin time)

Page 11: ChEMBL US tour December 2014

ChEMBL  Target  Types  

Protein  complex  

e.g.  NicoHnic  acetylcholine  receptor   e.g.  Muscarinic  receptors   e.g.  DNA  

e.g.  Mitochondria  

e.g.  Trachea  e.g.  HEK293  cells   e.g.  Drosophila  

e.g.  PDE5  

Protein   Nucleic  Acid                Protein  family  

Cell  line   Tissue  

Sub-­‐cellular  fracHon  

Organism  

Page 12: ChEMBL US tour December 2014

ChEMBL  

•  The  world’s  largest  primary  public  database  of  medicinal  chemistry  data  –  ~1.6  million  compounds,  ~10,000  targets,  ~12  million  bioacHviHes  

•  Truly  Open  Data  -­‐  CC-­‐BY-­‐SA  license  

•  ChEMBL  data  also  loaded  into  BindingDB,  PubChem  BioAssay  and  BARD  

hqps://www.ebi.ac.uk/chembl  

A.  Gaulton  et  al  (2012)  Nucleic  Acids  Research  Database  Issue.  40  D1100-­‐1107  

Page 13: ChEMBL US tour December 2014

•  New  Public  chemistry  patent  resource  

•  ‘Acquired’  SureChem  product  from  Digital  Science  –  AutomaHcally  extracted  chemical  structures  from  full-­‐text  patent  

–  ~15  million  chemical  structures  

–  Updated  daily  –  Plan  to  add  molecular  target,  sequence,  disease,  animal  model,  cell-­‐line  indexing….  

SureChEMBL  hqps://www.surechembl.org  

Page 14: ChEMBL US tour December 2014

hqps://www.ebi.ac.uk/chembl  

Page 15: ChEMBL US tour December 2014

About  ChEMBL  

Page 16: ChEMBL US tour December 2014

Compound  View  -­‐  1  

Page 17: ChEMBL US tour December 2014

Compound  View  -­‐2    

Page 18: ChEMBL US tour December 2014

Compound  View  –  3  

Page 19: ChEMBL US tour December 2014

Compound  View  -­‐  4  

Page 20: ChEMBL US tour December 2014

Target  Search  

Page 21: ChEMBL US tour December 2014

Browse  Targets  

Page 22: ChEMBL US tour December 2014

Browse  Targets  -­‐  Organism  

Page 23: ChEMBL US tour December 2014

Browse  Drugs  

Page 24: ChEMBL US tour December 2014

Drugs  

Page 25: ChEMBL US tour December 2014

Targets  of  Launched  Drugs  

Overington  et  al,  Nat.  Rev.  Drug  Disc.,  5,  pp.  993-­‐996  (2006)  

Page 26: ChEMBL US tour December 2014

Drug  Targets  and  Drugs  

Santos  et  al,  unpublished  

Page 27: ChEMBL US tour December 2014

Different  Types  of  Drugs  

Santos  et  al,  unpublished  

SyntheHc  small  molecule  

Natural  product-­‐derived    small  molecule  

Monoclonal  anHbody  

Other  protein  

Polymer  

PepHde  

OligonucleoHde  

Oligosaccharide  

Inorganic  

Other  

Other  

Drugs  Approved  2013   Assigned  USANs  2013  

Page 28: ChEMBL US tour December 2014

Affinity  of  Drugs  for  their‘Targets’  Ki,  Kd,  IC50,  EC50,  &  pA2  endpoints  for  drugs  against  their‘efficacy  targets’  

2   3   4   5   6   7   8   9   10   11   12  0  

50  

100  

150  

200  

250  

300  

350  

400  

Freq

uency  

-­‐log10  affinity  

10mM   1mM   100mM   10mM   1mM   100nM   10nM   1nM   100pM   10pM   1pM  

Overington,  et  al,  Nature  Rev.  Drug  Discov.  5  pp.  993-­‐996  (2006)  Gleeson  et  al,  Nature  Rev.  Drug  Discov.  10  pp.  197-­‐208  (2011)  

Page 29: ChEMBL US tour December 2014

Privileged  Target  Families  Rhodopsin-­‐like  GPCR  

PDBe:  3sn6  Ion  channels  PDBe:  4kfm  

Nuclear  receptors  PDBe:  3e00  

Protein  kinases  PDBe:  4foc  

22%  of  drug  targets  33%  of  small  mol  drugs  

12%  of  drug  targets  18%  of  small  mol  drugs  

6%  of  drug  targets  17%  of  small  mol  drugs  

13%  of  drug  targets  2.4%  of  small  mol  drugs  

Over  53%  of  all  targets  and  70%  of  drugs  modulate  these  four  target  classes  

Page 30: ChEMBL US tour December 2014

Santos,  unpublished  

Privileged  Target  Families  ChEMBL17   Drugs  

Page 31: ChEMBL US tour December 2014

NFκB  Pathway  

Page 32: ChEMBL US tour December 2014

FDA  Approved  Drugs  

Page 33: ChEMBL US tour December 2014

Clinical  Candidates  

Page 34: ChEMBL US tour December 2014

Clinical  Candidates  

Page 35: ChEMBL US tour December 2014

Clinical  Candidates  •  Database  of  clinical  development  candidates  

–   Contains  ~12,000  2-­‐D  structures/sequences  •   EsHmated  size  ~35-­‐45,000  compounds  

–   Work  in  progress  •  Deeper  coverage  of  key  gene  families  •  e.g.  Protein  kinases,  399  disHnct  clinical  candidates  

Page 36: ChEMBL US tour December 2014

Pharma  Industry  ProducHvity  File  RegistraHon  number  vs.  USAN  date  

0  

100,000  200,000  300,000  400,000  500,000  600,000  700,000  800,000  

1960   1965   1970   1975   1980   1985   1990   1995   2000   2005   2010  

Phase 2b date

~Discovery date

Overington,  unpublished  

Page 37: ChEMBL US tour December 2014

Clinical  Kinome  

Overington,  Al-­‐Lazikani  &  Wennerberg,  unpublished  

Page 38: ChEMBL US tour December 2014

Kinase  Inhibitors  in  Clinical  Development  

Overington,  Bellis,  Al-­‐Lazikani  &  Wennerberg,  unpublished  

Page 39: ChEMBL US tour December 2014

Clinical  Kinome  

•  399  Clinical  stage  human  small  molecule  protein  kinase  inhibitors  – 29  Approved  small  molecule  kinase  inhibitors  – 38  Phase  3  – 143  Phase  2  – 189  Phase  1  

•  Phase  1:2  raHo  is  atypical  due  to  many  kinase  inhibitor  trials  being  phase  1/2  oncology  trials  

•  2D  structures  for  311  of  these  

Page 40: ChEMBL US tour December 2014

Kinase  Inhibitor  Polypharmacology  

US  launched  

TofaciHnib   TozaserHb  (Ph.  II)  

LapaHnib  GefiHnib  ErloHnib  

Staurosporine  (no  trials)  

SuniHnib   Sorafenib   ImaHnib   DasaHnib  

Adapted  from  Ghoreschi  et  al,  Nature  Immunology  10,  356  -­‐  360  (2009)  

Page 41: ChEMBL US tour December 2014

GSK  PKIS  Data  

Page 42: ChEMBL US tour December 2014

ChEMBL  –  Assay  Reliability  

Page 43: ChEMBL US tour December 2014

F.A. Krüger & J.P. Overington (2012) ‘Global analysis of small molecule binding to related protein targets’ PLoS Comp. Biol. 8, e1002333

Page 44: ChEMBL US tour December 2014

44

Differences Between Human And Rat Orthologs

Distribution of affinity differences

Human vs Rat

pKd  Human  

pKd  R

at  

-­‐log(Kd)  Human  

density

 

|human  pKd  -­‐  rat  pKd|  

Page 45: ChEMBL US tour December 2014

45

Differences Between Different Assays

Distribution of inter-assay affinity differences

density

 

Binding affinity in human and rat assays

pKd  Assay1  

pKd  A

ssay2  

|human  pKd  -­‐  human  pKd|  

Page 46: ChEMBL US tour December 2014

Density distributions of ortholog and inter-assay differences

pKii  -­‐  pKij  

density

 

Ortholog vs Intra-assay Differences

Krüger,  PLoS  Comp.  Biol.  8,  e1002333,  DOI:10.1371/journal.pcbi.1002333    

Page 47: ChEMBL US tour December 2014

ChEMBL  –  Domain  AnnotaHon  

Page 48: ChEMBL US tour December 2014

Domain-­‐level  AnnotaHon  

•  Site of binding is important in understanding and controlling function •  often several sites within same target protein

•  Recently annotated binding sites (where possible) for entire ChEMBL target dictionary •  used Pfam domains http://www.pfam.org

Page 49: ChEMBL US tour December 2014

Domain  ‘poisoning’  of  sequence  queries  

Krüger BMC Bioinformatics, 13, S11 DOI:10.1186/1471-2105-13-S17-S11

Kinase  SYK  (Q64725),  R.  norvegicus  

Phosphatase  SH-­‐PTP2  (P35235)  ,  R.  norvegicus  

Page 50: ChEMBL US tour December 2014

Domain-­‐level  Binding  Sites  Depleted and Enriched Pfam Domains Neur_chan_memb -1.63 zf-C4 -0.94 ANF_receptor -0.88 SH2 -0.83 Pkinase_C -0.70 fn3 -0.53 SH3_1 -0.51 Lig_chan -0.50 C2 -0.50 C1_1 -0.50 Guanylate_cyc -0.46 HATPase_c -0.46 I-set -0.44 adh_short -0.39 PH -0.39 Ank -0.39 ….. Metallophos 0.35 Phospholip_A2_1 0.38 Peptidase_M10 0.41 Asp 0.45 SNF 0.48 Hist_deacetyl 0.48 Carb_anhydrase 0.50 Peptidase_C1 0.51 Trypsin 0.51 Beta-lactamase 0.57 p450 1.00 Hormone_recep 1.19 Ion_trans 1.66 Neur_chan_LBD 2.02 Pkinase_Tyr 2.12 Pkinase 5.87 7tm_1 7.30

Krueger  and  Overington,  unpublished  

Page 51: ChEMBL US tour December 2014

Binding Between Multiple Domains IdenHfied  only  12  mulH-­‐domain  architectures  (corresponding  to  120  ChEMBL  targets)  with  ligand  binding  mediated  via  more  than  one  domain.    

PDBe:  3goi  

Krüger BMC Bioinformatics, 13, S11 DOI:10.1186/1471-2105-13-S17-S11

Page 52: ChEMBL US tour December 2014

hqps://www.ebi.ac.uk/chembl/research/ppdms  

Page 53: ChEMBL US tour December 2014

Better prediction of pathway perturbation

Overington, unpublished

Page 54: ChEMBL US tour December 2014

Domain specific modulation – mTor

Sirolimus (rapamycin) PI-103

HEAT repeat FAT FRB kinase RD FATC

r  

Gable

Rictor mSIN1 MLST8

Raptor

Tel2 FBXW7

DEPTOR

FKBP-12

mSLT8 FKBP-38 Rheb

S6K1

Overington, unpublished

PRAS40 DEPTOR

mTORC1 mTORC2

FKBP-12 binding

mTORC binding

Immunosuppression, Cancer Cancer

Page 55: ChEMBL US tour December 2014

Acknowledgements  ChEMBL  Database  Anne  Hersey  Anna  Gaulton  Mark  Davies  Michal  Nowotka  George  Papadatos  Jon  Chambers  Louisa  Bellis  Rita  Santos  Gerard  Van  Westen  Ruth  Akhtar  Francis  Atkinson  Patricia  Bento  Ramesh  Donadi  John  Paul  Overington    Ins5tute  of  Cancer  Research  Bissan  Al-­‐Lazikani  Paul  Workman    FIMM,  Helsinki  Krister  Wennerberg    University  of  Dundee  Andrew  Hopkins  

hqp://chembl.blogspot.com  


Recommended