57
Protein Ligand Interactions Docking (Part 2) Andrea Schafferhans December 2013

Protein Ligand Interactions Docking

Embed Size (px)

Citation preview

Page 1: Protein Ligand Interactions Docking

Protein Ligand Interactions Docking

(Part 2) Andrea Schafferhans

December 2013

Page 2: Protein Ligand Interactions Docking

Similar proteins have similar interaction partners

(?)

2

Page 3: Protein Ligand Interactions Docking

Interactions with (small) molecules

•  Protein (structure) recap •  Binding sites •  Physical chemistry recap •  Binding site analysis •  Docking

lein mit seinem Bild von Schlüssel und Schloss einensolch großen Einfluss auf die Wirkstoffforschung aus-geübt hat! Emil Fischer wäre sicher glücklich und stolzgewesen, hätte er die Ergebnisse der Röntgenstruktur-analysen von Protein-Ligand-Komplexen sehen kön-nen, z. B. von Retinol (= Vitamin A), eingelagert in dasRetinol-Bindeprotein, ein Transportprotein diesesMoleküls (Abb. 4.1).

Viele Bindestellen diskriminieren überaus spezi-fisch zwischen chemisch nahe verwandten Analoga.Bei der Proteinbiosynthese darf nicht das kleinsteMissgeschick passieren. Friedrich Cramer hat die Er-kennungsmechanismen des Einbaus der Aminosäu-ren Valin und Leucin genauer untersucht. Diese bei-den Aminosäuren unterscheiden sich in ihren Seiten-ketten nur durch den Austausch einer Methylgruppegegen eine Ethylgruppe. Das kleinere Valin sollte ohneWeiteres in das „Schloss“ für Leucin passen, aber viel-leicht etwas weniger fest binden. Eine eindeutigeUnterscheidung, wie sie für eine fehlerfreie Protein-synthese unbedingt erforderlich ist, kann nur über ei-ne mehrfach wiederholte Erkennung erfolgen. Genaudas ist der Fall. Eine unter Energieverbrauch mehr-fach wiederholte „misstrauische“ Prüfung lässt denFehler auf eine Quote von unter 1 : 200 000 sinken.Wegen dieser scharfen Kontrolle mit Rückkopplunghaben aber auch die richtigen Bindungspartner nurzum Teil Erfolg. Über 80 % werden als „zweifelhaft“abgewiesen. In der Bilanz gibt das immer noch eineGenauigkeit von etwa 1 : 40 000.

Weniger selektiv ist das Retinol-Bindeprotein. Hierist offensichtlich eine so hohe Genauigkeit für eineeinwandfreie Funktion nicht erforderlich. Neben dem„gestreckten“ Retinol bindet es auch ein „geknicktes“

Isomeres und chemisch verwandte Substanzen. Wie-der andere Proteine diskriminieren nur sehr wenig.Beispiele dafür sind die Verdauungsenzyme (Ab-schnitt 23.3), Enzyme für metabolische Umsetzun-gen, z. B. die Cytochrome (Abschnitt 27.6), und dasGlycoprotein GP-170, das für die Resistenz von Tu-morzellen verantwortlich ist (Abschnitt 30.7). EinTransportprotein aus Bakterien, das Oligopeptid-bin-dendes Protein A (OppA), kann in seiner Bindetaschebeliebige Peptide mit zwei bis fünf Aminosäuren mitetwa gleicher Affinität binden, ein extremer Fall von„chemischer Promiskuität“.

Linus Pauling übertrug das Schlüssel-Schloss-Prin-zip auf den Übergangszustand einer enzymatischenUmsetzung. Während der Bindung des Substrats er-folgt oft eine flexible Anpassung. Der Übergangszu-stand der Reaktion bindet fester an das Enzym als dasSubstrat oder das Produkt (Abschnitt 22.3). Er wirddurch die funktionellen Gruppen der Bindestelle sta-bilisiert. Das Schlüssel-Schloss-Prinzip wurde wegender Beweglichkeit des Liganden und der Bindestellemehrfach in Frage gestellt. Aber auch bei einem Si-cherheitsschloss sind die Zapfen beweglich und damitein essenzieller Bestandteil des Mechanismus.

Daniel E. Koshland entwickelte in den 1950er-Jah-ren die Theorie des induced fit, der induzierten An-passung. Sie besagt, dass ein Ligand durch seine Bin-dung an das Protein eine Änderung der Konforma-tion induziert. Sie schafft erst die Voraussetzung füreinen bestimmten Effekt, z. B. die enzymatische Spal-tung des Substrats. Dieser Mechanismus widersprichtnicht dem Schlüssel-Schloss-Prinzip, denn, wie be-schrieben, auch bei einem Sicherheitsschloss gibt esbewegliche Teile. Kleine induzierte Anpassungen spie-

4. Protein-Ligand-Wechselwirkungen als Grundlage der Arzneistoffwirkung 51 4

Abb. 4.1 Wie ein Schlüssel im Schlossliegt das Vitamin A (= Retinol) in der Bin-detasche seines Transportproteins. DieOberfläche des Liganden ist grün darge-stellt. Vom Protein ist nur die direkte Um-gebung der Bindetasche zu sehen. Zurbesseren Übersichtlichkeit wurde der vorund hinter der Bindestelle liegende Teildes Proteins ausgeblendet.

Image Source: Wirkstoffdesign: Entwurf und Wirkung von Arzneistoffen by Gerhard Klebe

3

Page 4: Protein Ligand Interactions Docking

Analysing protein ligand interactions

Properties to search: •  Ionic interactions / salt bridges •  Hydrogen bonds •  Metal coordination •  Cation-Pi interactions •  Pi stacking •  Hydrophobic interactions •  Clashes

4

http://www.chemcomp.com/journal/ligintdia.htm

Page 5: Protein Ligand Interactions Docking

Characterising (empty) binding sites

Properties to characterise: •  Geometry •  Amino acid composition •  Solvation •  Hydrophobicity •  Electrostatics •  Interactions with functional groups

5

Page 6: Protein Ligand Interactions Docking

Hydrophobicity

Measured by logP (partitioning between water and octanol) •  Map atom / residue based

contributions •  Calculate interaction

energies of hydrophobic probes (e.g. GRID)

6

Page 7: Protein Ligand Interactions Docking

Electrostatics

•  Map electrostatic potential onto surface (e.g. using DelPhi, see http://structure.usc.edu/howto/delphi-surface-pymol.html)

•  CAVE: dependence on protonation!

7

Page 8: Protein Ligand Interactions Docking

Functional groups

•  Superstar –  Analyse the spatial distribution of

functional groups in CSD è density maps

–  Break the protein into fragments found in CSD

–  Map the observed distribution of interaction partners onto the protein

Verdonk ML, Cole JC, Taylor R: SuperStar: a knowledge-based approach for identifying interaction sites in proteins. Journal of molecular biology 1999, 289:1093-108.

8

Page 9: Protein Ligand Interactions Docking

Binding site comparison

•  Align structures in 3D •  Analyse differences and similarities of

–  Amino acid composition –  Local conformation –  Pocket size –  Presence of interaction

partners

•  Straightforward in case of –  Sequence similarity or –  Structural similarity

9

Page 10: Protein Ligand Interactions Docking

RELIBASE

10

Page 11: Protein Ligand Interactions Docking

RELIBASE

•  Stores binding sites from PDB structures •  Allows superposition of related binding sites •  Computes differences between binding sites Hendlich M, Bergner A, Günther J, Klebe G: Relibase: Design and Development of a Database for Comprehensive Analysis of Protein-Ligand Interactions. Journal of Molecular Biology 2003, 326:607-620. http://relibase.ccdc.cam.ac.uk

11

Page 12: Protein Ligand Interactions Docking

Similar but not homologous binding sites

•  cAMP-dependent protein kinase (1cdk) with adenyl-imido-triphosphate

•  trypanothione reductase (1aog) with flavine-adenine-dinucleotide

12

Page 13: Protein Ligand Interactions Docking

Similar but not homologous binding sites

13

Graphics from www.ebi.ac.uk/pdbsum/

Page 14: Protein Ligand Interactions Docking

Similar but not homologous binding sites

14

Graphics from Schmitt S, Kuhn D, Klebe G. Journal of molecular biology 2002, 323:387-406

Page 15: Protein Ligand Interactions Docking

Problems in binding site comparison

•  Automatically locate binding site •  Capture important features in efficient representation •  Search efficiently across all structures

–  Find best superimposition –  Score the alignment

15

Page 16: Protein Ligand Interactions Docking

Binding site comparison methods •  Representation by

–  Coordinate set with physico-chemical or evolutionary properties •  Atoms •  Chemical groups •  Surface points

–  3D shape descriptors •  Superimposition by

–  Geometric hashing –  Graph theory, clique search

•  Similarity measurement by –  RMSD –  Residue conservation –  Physico-chemical property similarity

16

Page 17: Protein Ligand Interactions Docking

CavBase – Structure representation •  Cavity detection with LIGSITE (stored in Relibase)

•  Cavity-flanking residues represented as pseudo-centers: –  Donor –  Acceptor –  Donor-Acceptor –  Aliphatic –  PI –  several per residue if necessary

•  Create Graph: –  Nodes: pseudo-centers –  Edges: distances between the pseudo-centres

17

Graphics from Schmitt S, Kuhn D, Klebe G. Journal of molecular biology 2002, 323:387-406

Page 18: Protein Ligand Interactions Docking

CavBase – Alignment Create associated graph:"

Node: ""node from protein A and node from protein B with similar interaction properties"

Edge:""member nodes in protein A and B are connected member node distance <12Å distance difference <2Å

Find maximal common subgraph (Bron-Kerbosh) è similar arrangement of pseudo-centers in original graphs 18

Page 19: Protein Ligand Interactions Docking

CavBase – Scoring •  Scoring based on

overlap of similarly typed surface patches

Kuhn D, Weskamp N, Schmitt S, Hüllermeier E, Klebe G: From the Similarity Analysis of Protein Cavities to the Functional Classification of Protein Families Using Cavbase. Journal of Molecular Biology 2006, 359:1023-1044

19

Page 20: Protein Ligand Interactions Docking

SOIPPA – Structure representation

•  Delaunay tesselation of Cα atoms à 1 tetrahedron/Cα

•  Environmental boundary (red) and protein boundary (blue)

Bourne PE, Xie L: A robust and efficient algorithm for the shape description of protein structures and its application in predicting ligand binding sites. BMC Bioinformatics 2007, 8:S9. Bourne PE, Xie L: A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics 2009, 25:i305-312.

20

Page 21: Protein Ligand Interactions Docking

SOIPPA – Structure representation (2)

•  Each Cα characterized by –  Vector with distance and direction

of boundaries à “geometric potential”

–  Substitution matrix

•  Graph: Node: Cα Edge: connection of tetrahedra

21

Xie L., Bourne PE. Bioinformatics 2009, 25:i305-312.

Page 22: Protein Ligand Interactions Docking

SOIPPA - Alignment Create associated graph:"

Node: ""node(A) + node(B) with similar “geometric potential” ""weight: amino acid frequency profile similarity"

Edge:""member nodes in protein A and B are connected""distance difference <2Å surface normal difference <30°

Find maximum-weight common subgraph (MWCS)

22

Xie L., Bourne PE. Bioinformatics 2009, 25:i305-312.

Page 23: Protein Ligand Interactions Docking

SOIPPA – Scoring •  Sum over aligned residue pairs:

Residue similarity "weighted by distance

and normal vector angle

•  Statistical significance of score Background score distribution: –  compare unrelated structures with random sequences –  fit resulting score distribution to extreme value distribution è function giving probability of randomness dependent on score

23

Sij = (Mij ! pdij ! paij )i, j"

Xie L., Bourne PE. Bioinformatics 2009, 25:i305-312.

Page 24: Protein Ligand Interactions Docking

Example 1: Explaining side effects

Problem: side effects of ERα modulators (SERMs)

Finding “off target” effects: •  Map sequences to structures (BLAST) •  Limit to “druggable” proteins •  Search with SOIPPA => SERCA (Sarcoplasmic Reticulum

Ca2+ channel ATPase)

24

Xie L, Wang J, Bourne PE (2007) In silico elucidation of the molecular mechanism defining the adverse effect of selective estrogen receptor modulators. PLoS Comput Biol 3(11)

Page 25: Protein Ligand Interactions Docking

Example 1: Validating results

•  Inverse search

25

Xie L, Wang J, Bourne PE (2007) In silico elucidation of the molecular mechanism defining the adverse effect of selective estrogen receptor modulators. PLoS Comput Biol 3(11)

Distribution of Binding Site Similarity Scores from searching representative structures against SERCA for BHQ (A) and TG1 (B) Sites, Respectively

Page 26: Protein Ligand Interactions Docking

Example 2: Repositioning known drug

Problem: new tuberculosis drugs needed, but many parameters to optimise

Finding compound to reuse against InhA: •  Search other structures binding Adenine

(ATP, ADP, NAD, FAD, ...) •  Compare binding sites with SOIPPA à SAM-dependent methyltransferases (e.g. COMT)

26

Kinnings SL, Liu N, Buchmeier N, Tonge PJ, Xie L, et al. (2009) Drug Discovery Using Chemical Systems Biology: Repositioning the Safe Medicine Comtan to Treat Multi-Drug and Extensively Drug Resistant Tuberculosis. PLoS Comput Biol 5(7)

Page 27: Protein Ligand Interactions Docking

Example 2: Structure match

catechol-O-methyltransferase (COMT), SAM, inhibitor InhA, NAD, ligand

27

Page 28: Protein Ligand Interactions Docking

Docking

•  Scoring •  Positioning •  Example programs •  Performance •  Example application

lein mit seinem Bild von Schlüssel und Schloss einensolch großen Einfluss auf die Wirkstoffforschung aus-geübt hat! Emil Fischer wäre sicher glücklich und stolzgewesen, hätte er die Ergebnisse der Röntgenstruktur-analysen von Protein-Ligand-Komplexen sehen kön-nen, z. B. von Retinol (= Vitamin A), eingelagert in dasRetinol-Bindeprotein, ein Transportprotein diesesMoleküls (Abb. 4.1).

Viele Bindestellen diskriminieren überaus spezi-fisch zwischen chemisch nahe verwandten Analoga.Bei der Proteinbiosynthese darf nicht das kleinsteMissgeschick passieren. Friedrich Cramer hat die Er-kennungsmechanismen des Einbaus der Aminosäu-ren Valin und Leucin genauer untersucht. Diese bei-den Aminosäuren unterscheiden sich in ihren Seiten-ketten nur durch den Austausch einer Methylgruppegegen eine Ethylgruppe. Das kleinere Valin sollte ohneWeiteres in das „Schloss“ für Leucin passen, aber viel-leicht etwas weniger fest binden. Eine eindeutigeUnterscheidung, wie sie für eine fehlerfreie Protein-synthese unbedingt erforderlich ist, kann nur über ei-ne mehrfach wiederholte Erkennung erfolgen. Genaudas ist der Fall. Eine unter Energieverbrauch mehr-fach wiederholte „misstrauische“ Prüfung lässt denFehler auf eine Quote von unter 1 : 200 000 sinken.Wegen dieser scharfen Kontrolle mit Rückkopplunghaben aber auch die richtigen Bindungspartner nurzum Teil Erfolg. Über 80 % werden als „zweifelhaft“abgewiesen. In der Bilanz gibt das immer noch eineGenauigkeit von etwa 1 : 40 000.

Weniger selektiv ist das Retinol-Bindeprotein. Hierist offensichtlich eine so hohe Genauigkeit für eineeinwandfreie Funktion nicht erforderlich. Neben dem„gestreckten“ Retinol bindet es auch ein „geknicktes“

Isomeres und chemisch verwandte Substanzen. Wie-der andere Proteine diskriminieren nur sehr wenig.Beispiele dafür sind die Verdauungsenzyme (Ab-schnitt 23.3), Enzyme für metabolische Umsetzun-gen, z. B. die Cytochrome (Abschnitt 27.6), und dasGlycoprotein GP-170, das für die Resistenz von Tu-morzellen verantwortlich ist (Abschnitt 30.7). EinTransportprotein aus Bakterien, das Oligopeptid-bin-dendes Protein A (OppA), kann in seiner Bindetaschebeliebige Peptide mit zwei bis fünf Aminosäuren mitetwa gleicher Affinität binden, ein extremer Fall von„chemischer Promiskuität“.

Linus Pauling übertrug das Schlüssel-Schloss-Prin-zip auf den Übergangszustand einer enzymatischenUmsetzung. Während der Bindung des Substrats er-folgt oft eine flexible Anpassung. Der Übergangszu-stand der Reaktion bindet fester an das Enzym als dasSubstrat oder das Produkt (Abschnitt 22.3). Er wirddurch die funktionellen Gruppen der Bindestelle sta-bilisiert. Das Schlüssel-Schloss-Prinzip wurde wegender Beweglichkeit des Liganden und der Bindestellemehrfach in Frage gestellt. Aber auch bei einem Si-cherheitsschloss sind die Zapfen beweglich und damitein essenzieller Bestandteil des Mechanismus.

Daniel E. Koshland entwickelte in den 1950er-Jah-ren die Theorie des induced fit, der induzierten An-passung. Sie besagt, dass ein Ligand durch seine Bin-dung an das Protein eine Änderung der Konforma-tion induziert. Sie schafft erst die Voraussetzung füreinen bestimmten Effekt, z. B. die enzymatische Spal-tung des Substrats. Dieser Mechanismus widersprichtnicht dem Schlüssel-Schloss-Prinzip, denn, wie be-schrieben, auch bei einem Sicherheitsschloss gibt esbewegliche Teile. Kleine induzierte Anpassungen spie-

4. Protein-Ligand-Wechselwirkungen als Grundlage der Arzneistoffwirkung 51 4

Abb. 4.1 Wie ein Schlüssel im Schlossliegt das Vitamin A (= Retinol) in der Bin-detasche seines Transportproteins. DieOberfläche des Liganden ist grün darge-stellt. Vom Protein ist nur die direkte Um-gebung der Bindetasche zu sehen. Zurbesseren Übersichtlichkeit wurde der vorund hinter der Bindestelle liegende Teildes Proteins ausgeblendet.

Image Source: Wirkstoffdesign: Entwurf und Wirkung von Arzneistoffen by Gerhard Klebe

28

Page 29: Protein Ligand Interactions Docking

Correlations? zen. Vergleicht man zwei Liganden, die sich nur in derfunktionellen Gruppe unterscheiden, die die H-Brü-cke zum Protein ausbilden, so kann dabei die Affinitätzunehmen, gleich bleiben oder sogar abnehmen.

Ein eindrucksvolles Beispiel für die Bedeutung vonWasserstoffbrücken stellen die in der Arbeitsgruppevon Paul Bartlett synthetisierten Inhibitoren 4.3der Metalloprotease Thermolysin dar. Dort wurdeein Phosphonamid –PO2NH– gegen ein Phosphinat –PO2CH2– bzw. ein Phosphonat –PO2O– ersetzt. DieResultate dieses Austauschs sind in Tabelle 4.3 zu-sammengefasst. Obwohl die Röntgenstruktur zeigt,dass die NH-Gruppe eine H-Brücke bildet, kann sietrotzdem ohne Verlust an Bindungsaffinität gegen ei-ne CH2-Gruppe ersetzt werden. Dieses Ergebnis wirdverständlich, wenn wir analog zu Abb. 4.6 die Zahl derWasserstoffbrücken vor und nach der Bindung des Li-ganden für das Phosphonamid und für das Phosphi-nat miteinander vergleichen. In beiden Fällen bleibtdie Anzahl der H-Brücken unverändert. Wird die NH-Gruppe hingegen durch ein Sauerstoffatom ersetzt,sinkt die Bindungsaffinität um den Faktor 1000. InWasser kann das Sauerstoffatom, das an die Stelle derNH-Gruppe getreten ist, eine Wasserstoffbrücke zuWasser bilden. Im Protein-Ligand-Komplex des Phos-phonats –PO2O– befindet sich jedoch das elektrone-gative Sauerstoffatom genau gegenüber dem Sauer-stoffatom der Carbonylgruppe von Ala 113. Zwei Ak-zeptorgruppen stehen sich gegenüber. Eine Wasser-

stoffbrücke kann hier nicht gebildet werden. Die Bi-lanz an H-Brücken verbleibt unausgeglichen. Zusätz-lich stoßen sich beide Gruppen ab, woraus dieschlechtere Bindung resultiert. Ein ähnlich gelagerterFall ist in Tabelle 4.4 zu sehen. Hier sind die Bin-

4. Protein-Ligand-Wechselwirkungen als Grundlage der Arzneistoffwirkung 61 4

NH

NH

O O

NH2

F

OO

NH

O

NH

O

NH

NO

NH

O

NHLeu300

4.1

Pro300

NH

NH

OF

OO

NH

O

NH

O

NH

NO

NH

O

NH

H

OH

Leu300

4.2

Pro300

!!G: 7,8 kJ/mol!!H: 6,9 kJ/mol

-T!!S: 0,9 kJ/mol

!!G: -0,8 kJ/mol!!H: 5,1 kJ/mol

-T!!S: -5,9 kJ/mol

Abb. 4.9 Fidarestat 4.1 (links) bildet mit seiner Carboxamidgruppe eine Wasserstoffbrücke zur NH-Funktion von Leu 300 (blau).Durch den Austausch von Leucin gegen Prolin (rot) kann die H-Brücke nicht mehr gebildet werden. Dies führt zu einen !!G-Ver-lust von 7,8 kJ/mol, der im Wesentlichen durch einen enthalpischen Preis (!!H: 6,9 kJ/mol) bezahlt wird. Sorbinil 4.2 (rechts)fehlt die Carboxamidgruppe. Der Austausch Leucin " Prolin lässt die freie Bindungsenthalpie !!G praktisch unverändert. Sorbi-nil bindet aber an den Wildtyp (Leucin, blau) enthalpisch günstiger und entropisch ungünstiger als an die Prolin-Mutante (rot). Eineingelagertes Wassermolekül vermittelt eine H-Brücke zwischen Sorbinil und Leu 300. Dies bringt dem Wildtyp einen Enthalpie-vorteil von ca. 5 kJ/mol. Gleichzeitig ist das Einfangen eines Wassermoleküls für den Wildtyp entropisch ungünstig (–T!!S: ca. –6 kJ/mol) und kompensiert den enthalpischen Vorteil.

Abb. 4.10 Eine Auftragung der Bindungskonstante Ki von 80kristallographisch untersuchten Protein-Ligand-Komplexenzeigt, dass Ki keine direkte Funktion der Zahl der zwischenProtein und Ligand gebildeten Wasserstoffbrücken ist.

Die hydrophoben Wechselwirkungen sind in vielenFällen der dominante Beitrag zur freien Bindungsen-thalpie. In Abb. 4.12 sind für die gleichen 80 Protein-Ligand-Komplexe wie in Abb. 4.10 die lipophilenOberflächenbeiträge, die bei der Komplexbildung ver-graben werden, gegen die experimentell bestimmtenBindungskonstanten aufgetragen. Auch hier ergibtsich eine Streuung der Werte über einen weiten Be-reich.

4.10 Bindung undBeweglichkeit: Kompensationvon Enthalpie und Entropie

Enthalpie und Entropie stehen nach Gl. 4.3 in engemZusammenhang und ergeben in der Summe die freieBindungsenthalpie. Betrachtet man die Bildung vonProtein-Ligand-Komplexen, so fällt !G zwischenschwach bindenden millimolaren Komplexen undstark bindenden nanomolaren Beispielen in ein Fens-ter von ca. 35–55 kJ/mol. Eine Leitstrukturoptimie-rung (Kapitel 8) überstreicht meist einen kleinerenBereich. Typischerweise wird die Bindungskonstanteüber 5–6 Größenordnungen verbessert, was ca. 25–30kJ/mol entspricht. Dabei variiert die Enthalpie !Hbeim Austausch funktioneller Gruppen in einer Leit-

struktur meist über einen wesentlich weiteren Be-reich. Wenn die Änderung von !G bei diesem Wech-sel viel kleiner ausfällt, muss, alleine schon aus nume-rischen Gründen, die Änderung der Enthalpie !Hdurch eine gegenläufige Änderung der Entropie –T!Skompensiert werden. Nur so können große Schwan-kungen in den beiden zuletzt genannten Größen dazuführen, dass !G in einem kleinen Fenster verbleibt.Daraus leitet sich eine wichtige Frage ab: Gibt es einenZusammenhang, warum Enthalpie und EntropieGegenspieler sind und sich bei einer Optimierung zu-mindest teilweise kompensieren? Wie kann man esdennoch erreichen, dass beide Größen optimiert wer-den, ohne dass sich beide Effekte gegenseitig aufhebenund !G praktisch unverändert bleibt?

Entropische Optimierung zielt auf die Vergröße-rung der hydrophoben Oberfläche, die bei der Bin-dung vergraben wird. Diese sehr anschauliche Größebringt zum Ausdruck, dass vergrößerte Liganden einezunehmende Anzahl Wassermoleküle bei der Bin-dung verdrängen. Die Synthese eines starren Ligandenmit korrekt eingefrorenen konformativen Freiheits-graden führt in aller Regel ebenfalls zu einer Verbesse-rung der entropischen Bindungsbeiträge (Abschnitt24.6). Um enthalpisch die Bindung eines Liganden anein Protein zu steigern, müssen vor allem zusätzlichepolare Wechselwirkungen eingebracht werden. Dochdies geschieht in aller Regel um den Preis, dass die zu-sätzlichen polaren Gruppen erst einmal ihre Wasser-hülle abstreifen müssen. Dieser Beitrag zur Desolvata-tion muss aufgebracht werden. Bringt man in demThrombinhemmstoff 4.6 an einen der beiden unsub-stituierten Phenylreste in para-Stellung eine Amidi-nogruppe an, so erhält man für 4.7 eine deutlicheAffinitätsverbesserung, die mit einer starken Zu-nahme der Enthalpie einhergeht (Abb. 4.13). Mit derBenzamidingruppe bildet der Inhibitor eine Salzbrü-cke zu einem im Thrombin vorhandenen Aspartat. Erwird dadurch räumlich stark fixiert, was entropischungünstig ist. Der Hemmstoff 4.6, dem die polareGruppe fehlt, bindet mit ähnlicher Geometrie. Erkann die Salzbrücke allerdings nicht bilden. DieStruktur verweist auf eine erhöhte Restbeweglichkeitdieses Inhibitors in der Bindetasche, was aus entropi-scher Sicht einen Vorteil bringt.

Die beiden Verbindungen 4.8 und 4.9 stellen eben-falls Thrombinhemmstoffe dar. Sie unterscheiden sichnur in der Größe des Cycloalkylrests, der zum Auffül-len einer hydrophoben Tasche des Proteins an dasGrundgerüst angefügt wurde. Beide Inhibitoren besit-zen praktisch die gleiche Bindungsaffinität für Throm-bin. Allerdings spaltet ihre freie Bindungsenthalpiesehr unterschiedlich in Enthalpie und Entropie auf.

4. Protein-Ligand-Wechselwirkungen als Grundlage der Arzneistoffwirkung 63 4

Abb. 4.12 Eine zu Abb. 4.10 analoge Auftragung der Bin-dungskonstante Ki von 80 kristallographisch untersuchtenProtein-Ligand-Komplexen gegen die bei der Bindung vergra-bene hydrophobe Oberfläche zeigt, dass Ki auch keine einfa-che Funktion dieser Größe ist.

Ki dependence on number of H-bonds

Ki dependence on buried lipophilic surface area

Image Source: Wirkstoffdesign: Entwurf und Wirkung von Arzneistoffen by Gerhard Klebe

29

Page 30: Protein Ligand Interactions Docking

Good binders

•  Lipophilic contacts help (entropy) •  H-bonds unpredictable

(may be canceled by desolvation) •  BUT: Don’t bury polar protein atoms •  Rigid ligands can bind stronger (entropy)

30

Page 31: Protein Ligand Interactions Docking

Calculating bound state

•  Prerequisites: –  distinguish likely and unlikely conformations –  distinguish good binders and bad binders

•  Assumption: bound state has one defined minimum

31

Page 32: Protein Ligand Interactions Docking

Intramolecular forces

Konformationsanalyse

ein Wasserstoffatom am „hinteren“ Kohlenstoff aufDeckung (engl. eclipsed). Sie kommen sich nahe, da-her ist diese Geometrie aus sterischen Gründen un-günstig. Bei 60° und 300° stehen die Reste wieder aufLücke, eine gestaffelte, energetisch günstigere Situ-ation ist erreicht (engl. staggered). Durch die räumli-che Nachbarschaft der Methylgruppen, die jetzt, wieman sagt, „gauche“ zueinander stehen, ist diese Geo-metrie aber etwas ungünstiger als die „trans“-Anord-nung. Noch ungünstiger wird die Situation, wenn diebeiden Methylgruppen „hintereinander“ auf Deckungstehen (0°, 360°). Drehungen um die Bindungen zuden endständigen Methylgruppen beeinflussen dieKonformationsenergie nur geringfügig.

Schon beim Zusammenstecken eines mechanischenMolekülmodells lässt sich feststellen, dass man umeinzelne Bindungen Drehungen ausführen kann. Mangibt dem Molekül dabei eine andere Gestalt, oder wieder Chemiker sagt, man überführt es in eine andereKonformation. In einem realen Molekül sind dieDrehungen um diese Bindungen nicht völlig frei. Sieunterliegen einem Potenzial, das Molekül „rastet“während der Drehungen bei bestimmten Winkeln inenergetisch günstigen Lagen ein. Den einfachsten Fallstellt n-Butan dar (Abb. 16.1). Der zentrale Torsions-winkel gibt die relative Stellung der beiden Bindun-gen zu den Methylgruppen an. Dreht man n-Butanaus der „trans“-Lage bei 180° heraus, so stehen bei120° und 240° eine Methylgruppe am „vorderen“ und

16

gauche trans gauche

25 5 kJ

3 8 kJ

14,6 kJ25,5 kJ

Energie(kJ/mol)

3,8 kJ

CH3CHCH3CHCHCHCHCHCHCH3CH CH3

CH3CH3

CH3

CH3

CH3

CH3CH3

CH3CH3CH3CH3

!

0 60 120 180 240 300 360

Torsionswinkel ! [°][ ]Abb. 16.1 Butan, CH3CH2CH2CH3, besteht aus einer linearen Kette von vier Kohlenstoffatomen. Stehen bei Drehungen um diezentrale C–C-Bindung die beiden terminalen Methylgruppen auf Deckung, so beträgt der Torsionswinkel der mittleren Bindung0°. Bei 60° halbiert die Bindung zur „hinteren“ Methylgruppe den Winkel zwischen „vorderer“ Methylgruppe und einem Wasser-stoff. Diese Situation bezeichnet man als gauche-Anordnung. Bei 120° befinden sich eine Methylgruppe und ein Wasserstoffatomauf Deckung zueinander. Bei 180° stehen sich die endständigen Methylgruppen gegenüber. Hier ist die energetisch günstigste La-ge, die trans-Anordnung, erreicht. Von nun an verläuft die Drehung spiegelsymmetrisch, um nach 360° wieder bei der Ausgangs-lage zu enden. Die Anordnung bei 120° und 240° ist gegenüber der Anordnung bei 180° energetisch um 14,6 kJ/mol ungünsti-ger. Die gauche-Anordungen bei 60° und 300° stellen relative oder lokale Minima dar. Sie liegen um 3,8 kJ/mol höher als dasglobale Minimum bei 180°. Die Geometrie bei 0° und 360° ist am ungünstigsten und liegt um 25,5 kJ/mol höher. Will man miteinem Minimierungsverfahren, das nur „bergab“ laufen kann, die drei Minima der Potenzialkurve erreichen, so kann man bei-spielsweise bei den Punkten 110°, 130° und 350° starten.

Image Source: Wirkstoffdesign: Entwurf und Wirkung von Arzneistoffen by Gerhard Klebe

32

Page 33: Protein Ligand Interactions Docking

Typical molecular mechanics force fields 228 Teil III · Experimentelle und theoretische Methoden

zeichnet. Im Jahr 1946 wurde zum ersten Mal vorge-schlagen, dass die Verwendung der drei Terme van derWaals-Wechselwirkung, Bindungsstreckung undWinkeldeformation ausreichen sollte, um die Struk-tur und Energie von Molekülen zu berechnen. Aller-dings war zu diesem Zeitpunkt die Durchführung derentsprechenden Rechnungen noch extrem schwierig.Erst mit der Verfügbarkeit von Computern gewannendie Molekülmechanikrechnungen an Bedeutung. Eintypisches heute verwendetes Kraftfeld enthält zusätz-lich zu den ursprünglich vorgeschlagenen drei Ter-men mindestens einen zusätzlichen Beitrag, der die

Verdrehung von Diederwinkeln berücksichtigt (Abb.15.1). Weiterhin verwenden viele Kraftfelder einenTerm für die elektrostatischen Wechselwirkungen.Dazu muss man jedem Atom eine Partialladung zu-weisen. Die Summe dieser Ladungen ergibt für das ge-samte Molekül dessen Formalladung. In der Regelwird diese auf Null gesetzt.

Zur Beschreibung der Kräfte, die zwischen Ladun-gen auftreten, verwendet man das Coulomb-Gesetz.Dieses Gesetz wertet das Produkt der wechselwirken-den Ladungen im reziproken Verhältnis zu deren Ab-stand aus. Kritisch für eine korrekte Behandlung der

15

E = EBindungslänge + EBindungswinkel + ETorsion + Enichtkovalent

E K b bbBindungen

= !"12 0

2( )

KBindungswinkel

+ !"12 0

2( )# # #

K nTorsionswinkel

+ + !"12 1 2( cos( )$ $ %

" 12 6A r C r q q rij ij ij ij i jnichtgebundeneAtompaare

ij+ ! +" ! !12 6 )( &

Abb. 15.1 E ist die Gesamtenergie eines Moleküls oder eines Komplexes aus mehreren Molekülen. Sie setzt sich aus mehrerenBeiträgen zusammen. Der erste Term beschreibt die Energieänderung bei Dehnung oder Stauchung einer chemischen Bindung.Im vorgestellten Beispiel handelt es sich um ein so genanntes harmonisches Potenzial mit der Kraftkonstante Kb und der Gleich-gewichtsbindungslänge b0 als Parameter. Die Energie als Funktion der Bindungswinkel ! wird durch den zweiten Term erfasst.Auch hier wird ein harmonisches Potenzial mit einer Kraftkonstanten K! und einem Gleichgewichtswert !0 verwendet. Der dritteBeitrag beschreibt die Änderung der Energie bei der Änderung der Torsionswinkel und der letzte Term steht für nichtkovalenteWechselwirkungen. Für diesen letzten Beitrag wird eine Summe aus drei Termen verwendet. Der erste Term Aij/rij

12 ist immer po-sitiv und steigt mit abnehmendem Abstand schnell an. Er beschreibt die Abstoßung zwischen Atomen, die sich zu nahe kommen.Der Parameter Aij ist proportional zur Summe der Atomradien der Atome i und j. Der Beitrag –Cij/rij

6 ist immer negativ und gehtmit zunehmendem Abstand rij gegen Null, wenn auch nicht ganz so schnell wie der Abstoßungsterm. Er beschreibt anziehendeWechselwirkungen, die auch als Dispersionswechselwirkungen bezeichnet werden. Zwischen polaren Molekülen existieren weite-re anziehende Wechselwirkungen, die ebenfalls proportional zu 1/rij

6 sind (zum Potenzialverlauf siehe Abschnitt 18.10, Abb.18.5). Der letzte Term qiqj/Drij beschreibt die elektrostatischen Wechselwirkungen, dargestellt mit einem Punktladungsmodell. " ist die Dielektrizitätskonstante. Die nichtkovalenten Beiträge zur Gesamtenergie, ohne den elektrostatischen Term, werdenauch als van der Waals-Energie bezeichnet.

Image Source: Wirkstoffdesign: Entwurf und Wirkung von Arzneistoffen by Gerhard Klebe

33

Page 34: Protein Ligand Interactions Docking

Force field based scoring functions

•  Only ΔH! •  Most used: CHARMM, AMBER !"#$%&#'()$*+&,-./& 00

1

1

1

1

1$2&3

,/&4.4-4$/5$-6/$-#'741

6.-"

1

8







[#$ &/-#$ .&$ >244.&($ -"2-$ EFGHII$ ,2&$ ,/7>+-#$ #&#'(.#4$ 2&3$ 5/',#4$ 6.-"$ 4#9#'2:$ &/&LEFGHII$ 5/',#5.#:341.&,:+3.&($,/&9#'-#3$9#'4./&4$/5$ -"#$GIK%H$2&3$\S?ZLGG$5/',#5.#:31$;+-$2:4/$7/'#$(#&#'.,$5/',#$5.#:341$4+,"$24-"#$I#',A$5/',#$5.#:38$]!"#"$%&''%()*+^$O4#$/5$-"#4#$5/',#$5.#:34$.4$;#)/&3$-"#$4,/>#$/5$-".4$-+-/'.2:8

!"#$%&#'()$*+&,-./& 00

1

1

1

1

1$2&3

,/&4.4-4$/5$-6/$-#'741

6.-"

1

8







[#$ &/-#$ .&$ >244.&($ -"2-$ EFGHII$ ,2&$ ,/7>+-#$ #&#'(.#4$ 2&3$ 5/',#4$ 6.-"$ 4#9#'2:$ &/&LEFGHII$ 5/',#5.#:341.&,:+3.&($,/&9#'-#3$9#'4./&4$/5$ -"#$GIK%H$2&3$\S?ZLGG$5/',#5.#:31$;+-$2:4/$7/'#$(#&#'.,$5/',#$5.#:341$4+,"$24-"#$I#',A$5/',#$5.#:38$]!"#"$%&''%()*+^$O4#$/5$-"#4#$5/',#$5.#:34$.4$;#)/&3$-"#$4,/>#$/5$-".4$-+-/'.2:8

34

Page 35: Protein Ligand Interactions Docking

Empirical scoring functions •  (Often) In form similar to

force fields, BUT: •  parameters from

regression analysis of binding data

•  desolvation term •  Examples: ChemScore,

GlideScore, AutoDock

AutoDock scoring function

35

Page 36: Protein Ligand Interactions Docking

Knowledge-based scoring functions

•  Based on statistical observations of intermolecular contacts (CSD, PDB)

•  Based on Boltzmann distribution

36

Sippl MJ (1990) Calculation of conformational ensembles from potentials of mean force.. J Mol Biol 213: 859–883 http://en.wikipedia.org/wiki/Statistical_potential

Page 37: Protein Ligand Interactions Docking

Example: Drug Score

37

!Wi, j (r) = " lngi, j (r)g(r)

g(r) =gi, j (r)j!i!

i* j

gi, j (r) =Ni, j (r)

4!r2Ni, j (r)

4!r2r!

Gohlke,H. et al. (2000) Knowledge-based scoring function to predict protein-ligand interactions. J. Mol. Biol., 295, 337-56.

!Wi (SAS,SAS0 ) = " lngi (SAS)gi (SAS0 )

Atom pair interactions: Solvent effects:

gi (SAS) =Ni (SAS)Ni (SAS)SAS!

!"#$ %& %'( )(*+,-( .&/0+12 &3 45 %'6&7,+1+1'+,+%&689 %'( ::(1(62; 8/&6(<< 6(=(>-8 ,(%%(6 6(8?-%8/&7@>6(. %& %'( ::/'(7+/>- 8/&6(<<A

!"#$%&#

'()*+,-./0&"(+ *&01 .*(/ 20//"%*&(0+ 3$+2&(0+#*+, #&*&(#&(2*% ./"3"/"+2"#



!"#$%#&'( )* )''%""#&'#+ )* ,-." .&/#"-'/.)&+



-(88 8+21+V/>1/( +1 %'+8 >6(>A "3 >-- @>+6 .+8%6+C,?%+&189 GQ4 FXH] &3 5RSI /&1%>+1 -(88 %'>1 NHH'+%89 +A(A -(88 %'>1 %(1 '+%8 @(6 ,+1 &1 >=(6>2(A L&6GNX F+A(A NT] &3 >--I &3 %'(8(9 &1( &3 %'( >%&7 %;@(8&3 (+%'(6 %'( -+2>1. &6 %'( @6&%(+1 ,(-&128 %& MA49BA49 #A/>%9 7(%>-9 L9 #-9 &6 W6AJ'( 6>6( &//?66(1/( &3 MA49 L9 #-9 >1. W6 +1 %'( ^C

6>; 8%6?/%?6(8 ?8(. %& /&7@+-( %'( 8%>%+8%+/>- @6(3(6C(1/(8 +8 >-8& 6()(/%(. +1 %'( %(8% .>%> 8(%A _&D(=(69D( >1%+/+@>%( %'>% >8 -&12 >8 +1%(6>/%+&18 +1=&-=+12MA4KLK#-KW6C^ /&1%>/%8 .& 1&% .&7+1>%( %'( (1(6C2(%+/8 &3 -+2>1. ,+1.+129 > 6(-+>,-( 8/&6+12 />1 8%+--,( (8%+7>%(. ,>8(. &1 %'( /&1%6+,?%+&18 6(8?-%+1236&7 %'( 6(7>+1+12 7&6( 36(E?(1%-; @&@?->%(.>%&7 @>+68AY%&78 &3 %;@( BA4 >6( ?8?>--; /&11(/%(. %& &*;C

2(19 />6,&1 >1. 1+%6&2(1 >%&789 6(8?-%+12 +1 @'&8C@'>%(9 @'&8@'&1>%( &6 @'&8@'+1>%( .(6+=>%+=(8AW(+12 1&% (*@&8(. %& %'( 7&-(/?->6 8?63>/(9 %'(7>`&6 /&1%6+,?%+&18 &3 %'(8( 3?1/%+&1>- 26&?@8 >6(.(%(67+1(. ,; %'( @6(3(6(1/(8 >6+8+12 36&7 %'('+2'-; @&@?->%(. .+8%6+,?%+&18 &3 %'( 1(+2',&6+12&*;2(1 >1. 1+%6&2(1 >%&78A Y8 > V68% >@@6&*+C7>%+&19 %'( 8>7( '&-.8 3&6 #A/>% /&.+12 3&6 %'( />6C,&1 >%&7 +1 >7+.+1+?7 >1. 2?>1+.+1+?7 26&?@8AJ'(8( >%&78 >6( >% -(>8% +1 %'( 7&-(/?->6 @->1(8'+(-.(. ,; %'( 8?66&?1.+12 1+%6&2(1 >%&78A

0#*#"#&'# +/-/# )* ,-." .&/#"-'/.)&+

J'( 6(3(6(1/( 8%>%( +8 />-/?->%(. >8 >6+%'7(%+/7(>1 &=(6 >-- 1&67>-+P(. @>+6 /&66(->%+&1 3?1/%+&18&3 >%&7 %;@(8 !"A U% +8 6(@6(8(1%(. ,; V--(. 8E?>6(8+1 L+2?6(8 G >1. 5A !(V1(. +1 %'+8 D>;9 +% 7>; ,(6(2>6.(. >8 > 7(>1 +1%(6>/%+&1 @6(3(6(1/( ,(%D((1::>=(6>2(. >%&7C%;@(8<<9 %'?8 7>+1-; 6(@6(8(1%+121&1C8@(/+V/ /&1%6+,?%+&18 36&7 .(18( @>/0+12(33(/%8AJ'( 1>%?6( &3 /(6%>+1 3(>%?6(8 (*@6(88(. +1 %'(

.+8%6+,?%+&18 &6 %'( 8?,8(E?(1%-; .(6+=(. 8%>%+8%+/>-

!"#$%& '( B>+6 .+8%6+,?%+&1 3?1/%+&18 &3 @&->6K/'>62(.+1%(6>/%+&18 >8 /&7@?%(. ,; (E?>%+&1 FSIA J'( V68%>%&7C%;@( ->,(- 6(3(68 %& %'( -+2>1. >%&789 %'( 8(/&1. %&@6&%(+1 >%&78A J'( 6(3(6(1/( 8%>%( F7(>1 .+8%6+,?%+&13?1/%+&1 &=(6 >-- @>+68I >8 />-/?->%(. ,; (E?>%+&1 FXI +8.(@+/%(. ,; 8&-+. 8E?>6(8A

!"#$%& )( B>+6 .+8%6+,?%+&1 3?1/%+&18 &3 1&1@&->6 >1.>6&7>%+/ +1%(6>/%+&18 >8 /&7@?%(. ,; (E?>%+&1 FSIAY%&7C%;@( /'>6>/%(6+P>%+&1 >1. 6(3(6(1/( 8%>%( >6(.+8@->;(. >8 2+=(1 +1 L+2?6( GA

!"" 1"#2.'/.)& )* 1")/#.&34.5-&2 6&/#"-'/.)&+

!W = ! !Wi, j (r)+l j

"ki

" (1#! ) !Wi (SAS,SAS0 )ki

" + !Wj (SAS,SAS0 )l j

"$

%&&

'

())

Complete potential:

Page 38: Protein Ligand Interactions Docking

Searching poses and conformations

Many degrees of freedom: •  Relative position (3) •  Relative orientation (3) •  Rotatable bonds ligand •  Rotatable bonds protein à Huge search space!

38

Page 39: Protein Ligand Interactions Docking

Typical algorithms

•  Shape matching (e.g. DOCK) •  Incremental construction (e.g. FlexX, SLIDE) •  Genetic algorithms (e.g. GOLD, AutoDock) •  Simulated annealing (e.g. AutoDock) •  Monte Carlo (e.g. MCDock)

39

Page 40: Protein Ligand Interactions Docking

Dock

•  Basic Idea: –  represent active site by

set of spheres –  perform sphere matching

•  sphgen: –  Calculate molecular surface –  Fill active site with spheres –  Mark spheres with properties

40

I. Kuntz, J.M. Blaney, S.J. Oatley, R. Langridge and T.E. Ferrin. J. Mol. Biol., 161 (1982), pp. 269–288.

Image Sources: DOCK User manual, http://bit.ly/sHRvw5

Page 41: Protein Ligand Interactions Docking

Dock (2)

•  Match ligand atoms L and protein spheres K –  two matches (l1,k1), (l2,k2)

are distance-compatible if |d(l1 ,l2 )−d(k1 ,k2 )| ≤ ε

–  search for matchings M={(li,ki)} with max|d(li,lj )−d(ki,kj )|≤ε –  Matching-Graph: nodes L x K, edges between distance-

compatible nodes –  Matchings are cliques in the matching graph

( cliques = completely connected subgraphs)

41

I. Kuntz, J.M. Blaney, S.J. Oatley, R. Langridge and T.E. Ferrin. J. Mol. Biol., 161 (1982), pp. 269–288.

Page 42: Protein Ligand Interactions Docking

Dock (3)

•  Ligand flexibility: Anchor and grow

•  Scoring: –  Depends on level –  Many scoring functions

available: e.g. bump check, AMBER, GRID

42

Page 43: Protein Ligand Interactions Docking

Autodock

•  Receptor represented as grid potential •  Scoring: Empirical scoring function •  Starts from random location and orientation •  Explores translations, rotations and conformation

change –  Simulated annealing:

•  Acceptance of new state depends on energy •  Energy barrier decreases

43

Goodsell, D. S. and Olson, A. J. (1990), Automated Docking of Substrates to Proteins by Simulated Annealing Proteins, 8: 195-202.

Page 44: Protein Ligand Interactions Docking

Autodock (2)

–  Genetic algorithm: •  Movements encoded by bit string (gene) •  Re-combinations, scoring •  best gene selected

–  Lamarckian genetic algorithm (LGA): •  Same as GA, but energy minimisation before scoring

44

Goodsell, D. S. and Olson, A. J. (1990), Automated Docking of Substrates to Proteins by Simulated Annealing Proteins, 8: 195-202.

Page 45: Protein Ligand Interactions Docking

Rarey,M. et al. (1996) A Fast Flexible Docking Method using an Incremental Construction Algorithm. Journal of Molecular Biology, 261, 470-489.

FlexX

•  Ligand flexibility Conformational search of torsion angles (knowledge based)

•  Geometric constraints Restrictive interactions guide (esp. H-bonds)

•  Empirical scoring function Scores by deviation from “ideal” geometry plus entropy term

45

Docking Method using an Incremental Construction Algorithm 473

Figure 1. Interaction geometries. a, Interaction centerand surface pertained to a carbonyl group. b, Three of thefour different types of interaction surfaces: cones, cappedcones and spherical rectangles.

Table 1. Interaction types of FLEXXH-acceptor H-donorMetal acceptor MetalAromatic-ring-atom, methyl, amide Aromatic-ring-center

In each row, the interaction types in the left and the rightcolumn can be matched. The interaction types aromatic-ring-center/aromatic-ring-atom are used to generate the preferredt-shaped arrangement of neighboring aromatic ring systems.

Bohm. In addition, we take into account theinteractions of aromatic groups with the newparameter �Garo = ! 0.7 kJ/mol. The last term(�Glipo ) is a modification of Bohm’s lipophiliccontact energy. In the original function, thisenergetic contribution is intended to be pro-portional to the lipophilic contact area estimatedwith a grid method. However, our preliminaryexperiments with this definition of contact energyhave generated placements that deviate markedlyfrom the crystal structure. We therefore decided tocalculate this term as a sum over all pairwiseatom-atom contacts. It is essential that the functionf*(�R) in (1) account for contacts with a more or lessideal distance and penalize forbiddingly closecontacts. For this reason, we choose:

0 �R > 0.6 A

1 ! �R ! 0.20.4 0.2 A < �R�0.6 A

f*(�R) =��

1 ! 0.2 A < �R�0.2 A

1 ! ! �R ! 0.20.4 ! 0.6 A < �R� ! 0.2 A

�R + 0.60.2 �R� ! 0.6 A

with �R = R ! R0. Here, R is the distance betweenthe atom centers and R0 is its ideal value assumedto be the sum of both van-der-Waals radii, eachincreased by 0.3 A.

The overall docking algorithm of FLEXX

The docking algorithm in FLEXX is based on anincremental construction strategy, which consists ofthree phases:

(1) Base selection. The first phase of the dockingalgorithm is the selection of a connected part of theligand, the base fragment.(2) Base placement. In the second phase, the base

fragment is placed into the active site indepen-dently of the rest of the ligand.(3) Complex construction. In the last phase,

called the construction phase, the ligand isconstructed in an incremental way, starting withthe different placements of the base fragment.

In the current version of FLEXX, the base selectionis performed interactively. The docking algorithmis quite sensitive to the selection of the basefragment. If a part of a ligand is selected that has

shown in Table 1; geometries currently used inFLEXX are summarized in Figure 3.

Estimating the free energy of bindingThe ranking of the generated solutions is

performed using a scoring function similar to thatdeveloped by Bohm (1994) which estimates the freebinding energy �G of the protein–ligand complex.

�G = �G0 + �Grot " Nrot

+ �Ghb �neutral H-bonds

f(�R,��)

+ �Gio �ionic int.

f(�R,��)

+ �Garo �aro int.

f(�R,��)

+ �Glipo �lipo. cont.

f*(�R) (1)

Here, f(�R, ��) is a scaling function penalizingdeviations from the ideal geometry (see below) andNrot is the number of free rotatable bonds that areimmobilized in the complex. The terms �Ghb , �Gio ,�Grot , and �G0 are adjustable parameters. Thesevalues and the function f are taken as developed by

Figure 2. Condition for the formation of interactions: ahydrogen bond between the carbonyl oxygen and thenitrogen. The interaction centers are the oxygen and thehydrogen atom forming the hydrogen bond. They haveto fall mutually on the surrounding interaction surfaces.

Docking Method using an Incremental Construction Algorithm 473

Figure 1. Interaction geometries. a, Interaction centerand surface pertained to a carbonyl group. b, Three of thefour different types of interaction surfaces: cones, cappedcones and spherical rectangles.

Table 1. Interaction types of FLEXXH-acceptor H-donorMetal acceptor MetalAromatic-ring-atom, methyl, amide Aromatic-ring-center

In each row, the interaction types in the left and the rightcolumn can be matched. The interaction types aromatic-ring-center/aromatic-ring-atom are used to generate the preferredt-shaped arrangement of neighboring aromatic ring systems.

Bohm. In addition, we take into account theinteractions of aromatic groups with the newparameter �Garo = ! 0.7 kJ/mol. The last term(�Glipo ) is a modification of Bohm’s lipophiliccontact energy. In the original function, thisenergetic contribution is intended to be pro-portional to the lipophilic contact area estimatedwith a grid method. However, our preliminaryexperiments with this definition of contact energyhave generated placements that deviate markedlyfrom the crystal structure. We therefore decided tocalculate this term as a sum over all pairwiseatom-atom contacts. It is essential that the functionf*(�R) in (1) account for contacts with a more or lessideal distance and penalize forbiddingly closecontacts. For this reason, we choose:

0 �R > 0.6 A

1 ! �R ! 0.20.4 0.2 A < �R�0.6 A

f*(�R) =��

1 ! 0.2 A < �R�0.2 A

1 ! ! �R ! 0.20.4 ! 0.6 A < �R� ! 0.2 A

�R + 0.60.2 �R� ! 0.6 A

with �R = R ! R0. Here, R is the distance betweenthe atom centers and R0 is its ideal value assumedto be the sum of both van-der-Waals radii, eachincreased by 0.3 A.

The overall docking algorithm of FLEXX

The docking algorithm in FLEXX is based on anincremental construction strategy, which consists ofthree phases:

(1) Base selection. The first phase of the dockingalgorithm is the selection of a connected part of theligand, the base fragment.(2) Base placement. In the second phase, the base

fragment is placed into the active site indepen-dently of the rest of the ligand.(3) Complex construction. In the last phase,

called the construction phase, the ligand isconstructed in an incremental way, starting withthe different placements of the base fragment.

In the current version of FLEXX, the base selectionis performed interactively. The docking algorithmis quite sensitive to the selection of the basefragment. If a part of a ligand is selected that has

shown in Table 1; geometries currently used inFLEXX are summarized in Figure 3.

Estimating the free energy of bindingThe ranking of the generated solutions is

performed using a scoring function similar to thatdeveloped by Bohm (1994) which estimates the freebinding energy �G of the protein–ligand complex.

�G = �G0 + �Grot " Nrot

+ �Ghb �neutral H-bonds

f(�R,��)

+ �Gio �ionic int.

f(�R,��)

+ �Garo �aro int.

f(�R,��)

+ �Glipo �lipo. cont.

f*(�R) (1)

Here, f(�R, ��) is a scaling function penalizingdeviations from the ideal geometry (see below) andNrot is the number of free rotatable bonds that areimmobilized in the complex. The terms �Ghb , �Gio ,�Grot , and �G0 are adjustable parameters. Thesevalues and the function f are taken as developed by

Figure 2. Condition for the formation of interactions: ahydrogen bond between the carbonyl oxygen and thenitrogen. The interaction centers are the oxygen and thehydrogen atom forming the hydrogen bond. They haveto fall mutually on the surrounding interaction surfaces.

Docking Method using an Incremental Construction Algorithm474

Figure 3. Interaction geometriesas used in FLEXX: H-bond acceptors(upper left), H-bond donors (upperright), H-atoms at aromatic rings,methyl, and amide groups (bottomleft), aromatic ring center (bottomright). Metal interaction geometriesare always spheres and are notshown here. Interaction distancesare 1.9 A (hydrogen bonding), 2.0 A(metal), and 4.5 A (hydrophobicinteractions). Angular ranges givenby bold numbers are measuredrelative to the indicated vector inthe drawing plane; all other anglesare measured perpendicular to thedrawing plane. The aromatic ringis drawn perpendicular to thedrawing plane (bottom right). Thesketched geometries are representa-tive for several functional groups,e.g. the interaction geometry for thecarbonyl oxygen is also used for sp2oxygens on sulfur and phosphorus.

no clearly predominant directional interactionswith the receptor, the docking algorithm obviouslyhas problems in predicting the correct bindingmode.As the size of the base fragment and the number

of putative interaction groups increases, so does theprobability of predicting the correct binding modeof the base fragment, but the number of confor-mations of such a base fragment unfortunatelyincreases, as well. This causes longer run times,because substantial internal flexibility of the basefragment cannot be handled efficiently in our baseplacement algorithm (otherwise we would not needthe incremental construction). Thus, the number ofpotential interaction groups should be maximizedwhile the number of alternative conformations ofthe base fragment should be minimized.Once the base fragment is selected, the remaining

part of the ligand is automatically divided intofragments. We obtain the best results if thefragments are small. Thus, we cut the remainingpart of the ligand at each rotatable, acyclic singlebond.

The FLEXX base placement algorithmThe algorithm used for placing the base fragment

is described in detail by Rarey et al. (1996). Here, weonly give a short summary of the method.The goal of the base placement algorithm is to

find positions of the base fragment in the activesite such that a sufficient number of favorableinteractions between the fragment and the proteincan occur simultaneously. This problem is relatedto problems in the area of computer vision andpattern recognition. One problem in this area is todetect an object in a photographic scene. Here, wehave to identify a ‘‘position’’ of the object in thescene, such that most points of the object can be

mapped onto points in the scene. Fisher et al. (1995)have already successfully applied a computervision technique, called geometric hashing, to thegeometric docking approach DOCK of Kuntz et al.(1982).The algorithm from computer vision that we

have adapted to the docking problem is called poseclustering (Linnainmaa et al., 1988; Olson, 1994).Here, we describe the adapted version of thealgorithm.Assume, first, that the base fragment is a rigid

object. As mentioned above in the modelingsection, the interaction surfaces on the receptor sideare approximated by finite sets of interactionpoints. A transformation of the base fragment intothe active site is uniquely defined by mapping threeinteraction centers of the fragment onto threeinteraction points of the receptor by simplysuperposing the three point pairs onto each other(assuming that the point sets are not collinear) asillustrated in Figure 4. We call two triangles�-compatible, if the corresponding edge lengths

Figure 4. The fragment placing algorithm: mappingthree interaction centers (grey spheres) of the ligand ontothree discrete interaction points in the active site (blackdots) defines a unique transformation of the ligand intothe active site.

denen Zustände des Adenosinmonophosphats füh-ren. Übertragen auf das einfache Butan-Beispiel (Abb.16.1) bedeutet dies, dass man die Startpunkte sogeschickt gewählt hat, dass alle Minima erreicht wer-den.

16.5 Probleme bei der Suchenach Minima, die demrezeptorgebundenen Zustandentsprechen

Wie bereits beschrieben, erhält man bei einer syste-matischen Konformationsanalyse die lokalen Mini-ma, indem alle erzeugten Geometrien einer Opti-mierung mit einem Kraftfeld unterworfen werden.

Dabei kann es aber Probleme geben. Um dies zu er-klären, soll ein anderes Molekül, die Zitronensäure16.2, in der Bindetasche der Citratsynthase betrach-tet werden. Mit ihren drei Carboxylatgruppen undder OH-Gruppe bildet sie sieben Wasserstoffbrü-cken zu drei Histidinen und zwei Argininen (Abb.16.5). Betrachtet man jedoch das freie, nicht protein-gebundene Molekül und minimiert dessen Geo-metrie im isolierten Zustand, so wird es eine Kon-formation annehmen, in der sich die Wasserstoff-brücken intramolekular absättigen (Abschnitt 15.5).Natürlich kann man von anderen Geometrien star-ten, doch immer wieder wird die Minimierung zuKonformationen mit intramolekularen Wasserstoff-brücken führen. Solche Wasserstoffbrücken liegenaber im proteingebundenen Zustand äußerst seltenvor. Daher besitzen die im isolierten Zustand mini-mierten Konformationen keine Relevanz für die Ver-hältnisse im Protein.

16. Konformationsanalyse 243 16

! ! NH3 2

-N

N

NH2

OPO

OH

!1 !4

NNOO

16.1OHOH

Häufigkeit [%]

40

60

203040

Häufigkeit [%]

0

200

10

0 30 60 90 120 150 180 210 240 270 300 330 360

0 30 60 90 120 150 180 210 240 270 300 330 360 0 30 60 90 120 150 180 210 240 270 300 330 360

0 30 60 90 120 150 180 210 240 270 300 330 360

Häufigkeit [%]

60 15

Häufigkeit [%]

0

20

40

0

5

10

Abb. 16.4 Den offenkettigen Bindungen des Adenosinmonophosphats sind Häufigkeitsverteilungen der Torsionswinkel zugeord-net, wie sie in Kristallstrukturen niedermolekularer organischer Moleküle gefunden werden. Diese Torsionswinkel-Histogrammesind für Fragmente zusammengestellt, die repräsentativ für die Fragmente des Testmoleküls sind. Für die Winkel !1–!3 ergebensich klare Bevorzugungen bestimmter Werte, !4 entspricht einer breiten Verteilung aller möglichen Winkel. Man verwendet diesesWissen in der Konformationsanalyse und beschränkt die Suche für !1–!3 auf die bevorzugten Wertebereiche.

Page 46: Protein Ligand Interactions Docking

FlexX (2)

Placement by incremental construction: 1.  Base fragment selection:

–  many interaction groups –  few conformations

2.  Base placement 3.  Score 4.  Complex construction

–  tree search –  directional interactions first

46

Docking Method using an Incremental Construction Algorithm474

Figure 3. Interaction geometriesas used in FLEXX: H-bond acceptors(upper left), H-bond donors (upperright), H-atoms at aromatic rings,methyl, and amide groups (bottomleft), aromatic ring center (bottomright). Metal interaction geometriesare always spheres and are notshown here. Interaction distancesare 1.9 A (hydrogen bonding), 2.0 A(metal), and 4.5 A (hydrophobicinteractions). Angular ranges givenby bold numbers are measuredrelative to the indicated vector inthe drawing plane; all other anglesare measured perpendicular to thedrawing plane. The aromatic ringis drawn perpendicular to thedrawing plane (bottom right). Thesketched geometries are representa-tive for several functional groups,e.g. the interaction geometry for thecarbonyl oxygen is also used for sp2oxygens on sulfur and phosphorus.

no clearly predominant directional interactionswith the receptor, the docking algorithm obviouslyhas problems in predicting the correct bindingmode.As the size of the base fragment and the number

of putative interaction groups increases, so does theprobability of predicting the correct binding modeof the base fragment, but the number of confor-mations of such a base fragment unfortunatelyincreases, as well. This causes longer run times,because substantial internal flexibility of the basefragment cannot be handled efficiently in our baseplacement algorithm (otherwise we would not needthe incremental construction). Thus, the number ofpotential interaction groups should be maximizedwhile the number of alternative conformations ofthe base fragment should be minimized.Once the base fragment is selected, the remaining

part of the ligand is automatically divided intofragments. We obtain the best results if thefragments are small. Thus, we cut the remainingpart of the ligand at each rotatable, acyclic singlebond.

The FLEXX base placement algorithm

The algorithm used for placing the base fragmentis described in detail by Rarey et al. (1996). Here, weonly give a short summary of the method.The goal of the base placement algorithm is to

find positions of the base fragment in the activesite such that a sufficient number of favorableinteractions between the fragment and the proteincan occur simultaneously. This problem is relatedto problems in the area of computer vision andpattern recognition. One problem in this area is todetect an object in a photographic scene. Here, wehave to identify a ‘‘position’’ of the object in thescene, such that most points of the object can be

mapped onto points in the scene. Fisher et al. (1995)have already successfully applied a computervision technique, called geometric hashing, to thegeometric docking approach DOCK of Kuntz et al.(1982).The algorithm from computer vision that we

have adapted to the docking problem is called poseclustering (Linnainmaa et al., 1988; Olson, 1994).Here, we describe the adapted version of thealgorithm.Assume, first, that the base fragment is a rigid

object. As mentioned above in the modelingsection, the interaction surfaces on the receptor sideare approximated by finite sets of interactionpoints. A transformation of the base fragment intothe active site is uniquely defined by mapping threeinteraction centers of the fragment onto threeinteraction points of the receptor by simplysuperposing the three point pairs onto each other(assuming that the point sets are not collinear) asillustrated in Figure 4. We call two triangles�-compatible, if the corresponding edge lengths

Figure 4. The fragment placing algorithm: mappingthree interaction centers (grey spheres) of the ligand ontothree discrete interaction points in the active site (blackdots) defines a unique transformation of the ligand intothe active site.

Docking Method using an Incremental Construction Algorithm 475

Figure 5. Merging initial placements. Two initialplacements with different sets of interactions but similartransformations (left) will be transformed into a singleplacement (right) by merging the sets of interactions andrecomputing the transformation.

Figure 6. Search tree during the complex constructionalgorithm. The first level contains different placements ofthe base fragment and levels i� 2 placements of thepartial ligand up to fragment i ! 1. Black nodes representplacements which will be considered in the next iteration.

differ by at most � and the corresponding cornershave compatible interaction types.Therefore, the first step of the base placement

algorithm has to solve the following problem: Foreach triangle of interaction centers of the basefragment find all �-compatible triangles of inter-action points in the active site of the receptor.We have developed a new data structure for this

problem based on line segment hashing and anefficient query algorithm for constructing trianglesout of line segments (Rarey et al., 1996). Comparedto hashing triangles directly as in geometrichashing (Fischer et al., 1995), the storage require-ment is reduced from cubic to quadratic in thenumber of points.A transformation is derived from each match

between a triangle of interaction centers of the basefragment onto a triangle of interaction points inthe receptor. Now, two filters are applied to theplacement: First the angular constraints of theinteraction geometries of the fragment are checked(i.e. does the interaction center on the receptor sideapproximately coincide with the interaction surfaceon the ligand side). Second, the placed fragment ischecked for overlap with the receptor.Normally the list of placements obtained from

the first step contains many similar transform-ations, for two reasons. First, changing a point in atriangle to a nearby point on the same interactionsurface changes the transformation only slightly.Second, often a fragment can form more than threeinteractions simultaneously.Therefore, the second step in the base place-

ment algorithm clusters the placements accordingto an appropriate distance function. For thispurpose, we use the rms deviation between twoplacements.The method we apply for clustering, is a

complete-linkage hierarchical cluster algorithm(Duda & Hart, 1973). The main strategy is that thetwo clusters with minimal distance are merged intoone cluster iteratively as long as the minimaldistance between two clusters is less than apredefined threshold. We have developed atime-efficient algorithm for this step. The details ofthe cluster algorithm can be found in Rarey et al.(1996).

All placements inside the same cluster arecombined to one solution by merging the lists ofinteractions and recomputing a superposition of allinteraction centers of the ligand onto the interactionpoints of the receptor (see Figure 5) (Kabsch, 1976).In a third step, a final overlap test is performed

and for the non-overlapping placements energiesare computed with Bohm’s function (Bohm, 1994).In the case of small base fragments it may happen

that no match between triangles can be found. Thenwe can apply a variation of the described algorithmwhich matches pairs of interaction centers ontopairs of interaction points in the active site (insteadof triangles). The remaining degree of freedom (therotation around the axis defined by the pair ofinteraction points) can be fixed by rotating theligand such that the interaction centers of the twointeraction groups on the receptor side lie on theinteraction surfaces of the interaction groups on theligand side.

The FLEXX complex construction algorithm

Once a set of favorable placements for the basefragment has been computed, we can start theincremental construction process that adds theremaining fragments to the alternative placementsof the base fragment.We formulate the incremental construction as a

tree search problem. A node in the tree representsa placement of a connected part of the ligand. Onthe first level of the tree, the different placements ofthe base fragment are arranged. Each of thefollowing levels contains the alternatives for addingthe next fragment to the placement represented bythe parent node (see Figure 6). The order in whichthe fragments are added is kept constant during thetree search. If there are alternatives, we first addfragments which can form hydrogen bonds or saltbridges because these interactions are moredirectional and thus geometrically more preciselydefined. The geometric conditions for attaching afragment depends on the partial placement.Therefore, the number of successors in the treechanges with the tree node (see Figure 6).

Docking Method using an Incremental Construction Algorithm 479

Figure 10. Chemical formulae of the ligands in the test suite (1 to 12). Base fragments used are surrounded by boxes.

good agreement with the experimental value of!13.1 kJ/mol (Verlinde et al., 1991). Hydrogenbonds to Gly173 and Gly235 have been found. Inthe crystal structure five additional interactions towater molecules are present, but we did not includethese water molecules in the binding site. In thissituation, the algorithm searches for alternativehydrogen bonds. In fact, nearly all solutions showadditional hydrogen bonds to Lys13 or Ser213. Asa consequence, the calculated and the observedpositions of the sulfate ion differ slightly.

1ldmThe complex 1ldm of lactate dehydrogenase and

oxamate (II) has been solved (Abad-Zapatero et al.,1987) to a resolution of 2.1 A. The ligand binds(with �G = ! 30.8 kJ/mol) with eight hydrogenbonds in a polar pocket composed of Gln100,Arg106, Arg169, His193 and Thr245. The dockingalgorithm generates 28 placements. The highest-ranking solution exhibits six hydrogen bonds, all ofwhich are also observed experimentally. Convinc-ing agreements in structure (rms = 0.62 A) andbinding affinity (!32.8 kJ/mol) are found.

2phhThe complex 2phh between p-hydroxybenzoate

hydroxylase and p-hydroxybenzoate (III) has beensolved by van der Laan et al. (1989). In the crystal

structure the ligand is bound through its carboxy-late group to Tyr222, Ser212 and Arg214 andits hydroxyl group to Tyr201. The dockingalgorithm generates 179 different solutions. Thepredicted binding energies of !32.1 kJ/mol and!30.0 kJ/mol, respectively, for the two highest-ranking solutions are very close to the experimentalbinding energy of !26.7 kJ/mol (Entsch et al., 1976).They are separated from the predicted energiesof the next solutions by more than 3 kJ/mol. Theaccording rms deviations of 0.58 A and 0.29 A,respectively, show that nearly the same bindinggeometry as in the crystal structure has beengenerated.

3ptbBenzamidine (IV) is a ligand of trypsin. The

structure of the complex (3ptb) has been resolved(Marquart et al., 1983). Presumably the ligand bindsprotonated to the enzyme. The docking algorithmproduces 14 different placements. The energy of thehighest-ranking solution (!28.3 kJ/mol) is about14 kJ/mol lower than that of the proceedingsolution, and is in very good agreement with themeasured value of !27.2 kJ/mol (Mares-Guia &Shaw, 1965). Although one water molecule hasbeen omitted from the active site, a convincingstructural agreement is found: the rms deviation is0.48 A. The hydrogen bonds to Ser190, Gly219 andAsp189 have been reproduced.

Page 47: Protein Ligand Interactions Docking

How to evaluate docking?

•  Reproduction of co-crystallised confirmation •  Ranking similar to binding constants •  Enrichment in database search

47

EF =nHitsel

nHittotNsel

Ntot

Page 48: Protein Ligand Interactions Docking

Redocking example

48

•  reference ligand •  Docking: first rank •  Docking: best result

Page 49: Protein Ligand Interactions Docking

Docking accuracy – Top scorers

On the basis of those results, we can order programs in thefollowing way: GOLD ! eHiTS[ Surflex[ Glide[ LigandFit[ FlexX ! AutoDock. The best programs have the averageRMSDtop score around 2.7 A, and it increases to nearly 4.5 A forthe weakest FlexX. As expected, better results were observed forbest pose conformations (Fig. 4). For those poses, the meanRMSD value was even below 2 A for GOLD, eHiTS, and Sur-flex. For other programs, meaningful improvements were alsoobserved in comparison with top score poses results. Neverthe-less, the ability of correct posing by programs was measuredrather by top score than best pose, which seems quite inad-equate. The value of 3 A for mean RMSDtop score may result ina situation where important contacts between the ligand and theprotein would be missed or at least their geometry significantlychanged. Moreover, the percentage of pairs for which top scoreconformation is below 2 A shows that even for the best pro-grams the success rate is below 60%, and in some cases evenbelow 40%. This means that for almost 600 different complexesmost docking programs failed. Because no correct pose could bepicked up in the first place, also the scoring function was unableto predict true binding affinities of ligands, as their contacts withthe protein would be recreated incorrectly.

Although the results for best pose that are acceptable in thecase of a few programs, yet this cannot change our negativeopinion about the software, as the native pose is obviously notknown before docking. It is virtually impossible to choose itmanually, as millions of poses are generated during a typicalvirtual screening experiment. Moreover, top score conformationsare rarely classified simultaneously as best pose, also their posi-tion is rather randomly distributed among all generated poses or-dered by the docking score. It is not clear why there is such asignificant difference between those two types of conformations.It could be the result of imperfection of the docking algorithmor the scoring function itself, which cannot correctly select theconformation of all generated conformations. It should also beemphasized that performing time consuming in situ optimization

of top score conformation may not result in drastic improvementof pose RMSD, as it was pointed by Li et al.53

The Influence of Starting Conformations

Even more interesting results were observed when differentstarting conformations were used. It is common belief that if theinput structure is similar to the native one, the better poses arepredicted by docking programs.27 To verify this opinion, werepeated our experiment using the following conformations: oneidentical with that from X-ray (case called redocking) and twoothers generated using popular software Corina and Omega2(cases called Corina one and Omega one, respectively).Although those two were designed to recreate the three-dimen-sional structure of a ligand, however, not its bound conforma-tion, sometimes output poses can be very similar to the confor-mation of a ligand in the protein complex. Figure 5 presents ahistogram of RMSD values obtained for all conformers gener-ated by all docking programs. The RMSD distribution for thegenerated poses is preserved for all cases regardless of the typeof used input structure. Interestingly, most conformations fallinto RMSD category between 0 A and 2 A, which is the positivetendency because the mean RMSDtop score values on the entireset fall rather between 3 A and 5 A. This seems to support theopinion that programs are more accurate in probing search spacethan it was concluded previously. In the case of redocking, themajority of conformations have RMSD between 0 and 1 A, butthere were more conformations with much higher RMSD (ofover 10 A) than when Corina and Omega2 input conformationswere used. Yet, in our opinion, this can be explained by the factthat an old version of eHiTS was used in that case, which failedto produce valid conformations. For eHiTS, when older 6.2 ver-sion of program was used mean RMSDtop score exceeded 10 A.That situation would place eHiTS as worst software in ourwork. However, for four other cases, i.e., Corina one, Corinaten, Omega one, and Omega ten, eHiTS 9.0 was used. New ver-sion introduces many new features one most important in our

Figure 4. Docking accuracy of programs on 1300 protein–ligandcomplexes from refined set PDBbind 2007 database for best poseconformations.

Figure 3. Docking accuracy of programs on 1300 protein–ligandcomplexes from refined set PDBbind 2007 database for top scoreconformations.

748 Plewczynski et al. • Vol. 32, No. 4 • Journal of Computational Chemistry

Journal of Computational Chemistry DOI 10.1002/jcc

49

Augustyniak,R. and Ginalski,K. (2010) Review Can We Trust Docking Results? …. J Comput Chem 32, 742–755

Page 50: Protein Ligand Interactions Docking

Docking accuracy – Best poses

50

On the basis of those results, we can order programs in thefollowing way: GOLD ! eHiTS[ Surflex[ Glide[ LigandFit[ FlexX ! AutoDock. The best programs have the averageRMSDtop score around 2.7 A, and it increases to nearly 4.5 A forthe weakest FlexX. As expected, better results were observed forbest pose conformations (Fig. 4). For those poses, the meanRMSD value was even below 2 A for GOLD, eHiTS, and Sur-flex. For other programs, meaningful improvements were alsoobserved in comparison with top score poses results. Neverthe-less, the ability of correct posing by programs was measuredrather by top score than best pose, which seems quite inad-equate. The value of 3 A for mean RMSDtop score may result ina situation where important contacts between the ligand and theprotein would be missed or at least their geometry significantlychanged. Moreover, the percentage of pairs for which top scoreconformation is below 2 A shows that even for the best pro-grams the success rate is below 60%, and in some cases evenbelow 40%. This means that for almost 600 different complexesmost docking programs failed. Because no correct pose could bepicked up in the first place, also the scoring function was unableto predict true binding affinities of ligands, as their contacts withthe protein would be recreated incorrectly.

Although the results for best pose that are acceptable in thecase of a few programs, yet this cannot change our negativeopinion about the software, as the native pose is obviously notknown before docking. It is virtually impossible to choose itmanually, as millions of poses are generated during a typicalvirtual screening experiment. Moreover, top score conformationsare rarely classified simultaneously as best pose, also their posi-tion is rather randomly distributed among all generated poses or-dered by the docking score. It is not clear why there is such asignificant difference between those two types of conformations.It could be the result of imperfection of the docking algorithmor the scoring function itself, which cannot correctly select theconformation of all generated conformations. It should also beemphasized that performing time consuming in situ optimization

of top score conformation may not result in drastic improvementof pose RMSD, as it was pointed by Li et al.53

The Influence of Starting Conformations

Even more interesting results were observed when differentstarting conformations were used. It is common belief that if theinput structure is similar to the native one, the better poses arepredicted by docking programs.27 To verify this opinion, werepeated our experiment using the following conformations: oneidentical with that from X-ray (case called redocking) and twoothers generated using popular software Corina and Omega2(cases called Corina one and Omega one, respectively).Although those two were designed to recreate the three-dimen-sional structure of a ligand, however, not its bound conforma-tion, sometimes output poses can be very similar to the confor-mation of a ligand in the protein complex. Figure 5 presents ahistogram of RMSD values obtained for all conformers gener-ated by all docking programs. The RMSD distribution for thegenerated poses is preserved for all cases regardless of the typeof used input structure. Interestingly, most conformations fallinto RMSD category between 0 A and 2 A, which is the positivetendency because the mean RMSDtop score values on the entireset fall rather between 3 A and 5 A. This seems to support theopinion that programs are more accurate in probing search spacethan it was concluded previously. In the case of redocking, themajority of conformations have RMSD between 0 and 1 A, butthere were more conformations with much higher RMSD (ofover 10 A) than when Corina and Omega2 input conformationswere used. Yet, in our opinion, this can be explained by the factthat an old version of eHiTS was used in that case, which failedto produce valid conformations. For eHiTS, when older 6.2 ver-sion of program was used mean RMSDtop score exceeded 10 A.That situation would place eHiTS as worst software in ourwork. However, for four other cases, i.e., Corina one, Corinaten, Omega one, and Omega ten, eHiTS 9.0 was used. New ver-sion introduces many new features one most important in our

Figure 4. Docking accuracy of programs on 1300 protein–ligandcomplexes from refined set PDBbind 2007 database for best poseconformations.

Figure 3. Docking accuracy of programs on 1300 protein–ligandcomplexes from refined set PDBbind 2007 database for top scoreconformations.

748 Plewczynski et al. • Vol. 32, No. 4 • Journal of Computational Chemistry

Journal of Computational Chemistry DOI 10.1002/jcc

Page 51: Protein Ligand Interactions Docking

Correlation with experimental affinities Top score Best pose Pearson Spearman Pearson Spearman

eHiTS 0.38 0.47 0.29 0.39 FlexX 0.10 0.06 0.09 0.07 Glide 0.25 0.26 0.23 0.23 GOLD 0.17 0.18 0.06 0.12 LigandFit 0.11 0.04 0.08 0.07 Surflex 0.33 0.34 0.22 0.31 AutoDock 0.25 0.27 0.19 0.20

51

Page 52: Protein Ligand Interactions Docking

Correlation coefficients

52

Page 53: Protein Ligand Interactions Docking

Enrichment in database screening

Docking and scoring

We first applied “free docking”, that is, we did not define anyconstraints in the docking procedure based on knowledgeabout ligand binding. All docking poses obtained from GOLD,GlideSP, GlideXP, Surflex and FleX were ranked with the scoringfunctions implemented as objective functions in the dockingalgorithms and, furthermore, with nine other different scoringfunctions. For reasons of clarity, we will only show the enrich-ments obtained from the combination of the best dockingprogram and scoring function. Details are listed in the Sup-porting Information.

Analysis of these results (see Figure 2 and Table S1) revealedthat GlideSP on average outperforms the other docking pro-grams. In a next step, we wanted to investigate whether—orto show that—considering knowledge about ligand binding interms of docking constraints improves the enrichments. Thesedocking constraints (shown in Figure 1) were defined aftercareful visual inspection of all available crystal structures of thefour investigated targets.

For ACE (Figure 1a), GlideSP provided the best enrichmentfor the top 10 scorers, identifying nine actives. Considering thetop-ranked 50 compounds, all docking programs providedcomparable enrichments. Consideration of the Zn2+ constraintconsiderably improved the docking results for ACE. Indeed,from the set of 50 active compounds and 950 decoys, only 23actives and 19 decoys could be accommodated in the ACE

binding site. Thus, this stringent constraint identifies nearly50% of the actives, whereas only 2% of the decoys are re-trieved as false positives.

For docking into the COX-2 crystal structure (Figure 1b), Gli-deSP again outperforms the other docking programs. Here,consideration of a constraint did not (substantially) improvethe virtual screening results. This is probably due to the factthat the COX-2 ligands bind in diverse orientations and thatthere are no dominant (polar) interactions used by a large frac-tion of active binders.

It must be stated that HIV-1 protease (Figure 1c) representsa challenge for structure-based virtual screening, since it has alarge, flexible binding site. Analysis of superimposed HIV-1 pro-tease crystal structures revealed that different ligands inducedifferent binding-site conformations. Due to the large size ofthe binding site, known HIV-1 protease inhibitors generallyhave a large molecular weight and a partial peptidic or pepti-domimetic character. Since this represents a handicap for bothligand- and structure-based virtual screening and since thegoal of this study was not to show that virtual screening worksin “easy” cases, but rather to provide a fair comparison be-tween ligand- and structure-based virtual screening, we decid-ed to include this pharmaceutically relevant target in ourstudy.

For HIV-1 protease, the best enrichment for the ten top scor-ers is obtained by Surflex. However, visual inspection revealedthat the docking poses were unrealistic. The top scorers re-

Figure 2. Comparison of docking tools in virtual screening. Enrichments at 1, 5 and 10% are shown. For each docking program, the scoring function provid-ing the best enrichment is listed.

150 www.chemmedchem.org ! 2010 Wiley-VCH Verlag GmbH&Co. KGaA, Weinheim ChemMedChem 2010, 5, 148 – 158

MED A. Evers et al.

53

Krüger,D.M. and Evers,A. (2010) Comparison of structure- and ligand-based virtual screening protocols …. ChemMedChem, 5, 148-58.

Docking and scoring

We first applied “free docking”, that is, we did not define anyconstraints in the docking procedure based on knowledgeabout ligand binding. All docking poses obtained from GOLD,GlideSP, GlideXP, Surflex and FleX were ranked with the scoringfunctions implemented as objective functions in the dockingalgorithms and, furthermore, with nine other different scoringfunctions. For reasons of clarity, we will only show the enrich-ments obtained from the combination of the best dockingprogram and scoring function. Details are listed in the Sup-porting Information.

Analysis of these results (see Figure 2 and Table S1) revealedthat GlideSP on average outperforms the other docking pro-grams. In a next step, we wanted to investigate whether—orto show that—considering knowledge about ligand binding interms of docking constraints improves the enrichments. Thesedocking constraints (shown in Figure 1) were defined aftercareful visual inspection of all available crystal structures of thefour investigated targets.

For ACE (Figure 1a), GlideSP provided the best enrichmentfor the top 10 scorers, identifying nine actives. Considering thetop-ranked 50 compounds, all docking programs providedcomparable enrichments. Consideration of the Zn2+ constraintconsiderably improved the docking results for ACE. Indeed,from the set of 50 active compounds and 950 decoys, only 23actives and 19 decoys could be accommodated in the ACE

binding site. Thus, this stringent constraint identifies nearly50% of the actives, whereas only 2% of the decoys are re-trieved as false positives.

For docking into the COX-2 crystal structure (Figure 1b), Gli-deSP again outperforms the other docking programs. Here,consideration of a constraint did not (substantially) improvethe virtual screening results. This is probably due to the factthat the COX-2 ligands bind in diverse orientations and thatthere are no dominant (polar) interactions used by a large frac-tion of active binders.

It must be stated that HIV-1 protease (Figure 1c) representsa challenge for structure-based virtual screening, since it has alarge, flexible binding site. Analysis of superimposed HIV-1 pro-tease crystal structures revealed that different ligands inducedifferent binding-site conformations. Due to the large size ofthe binding site, known HIV-1 protease inhibitors generallyhave a large molecular weight and a partial peptidic or pepti-domimetic character. Since this represents a handicap for bothligand- and structure-based virtual screening and since thegoal of this study was not to show that virtual screening worksin “easy” cases, but rather to provide a fair comparison be-tween ligand- and structure-based virtual screening, we decid-ed to include this pharmaceutically relevant target in ourstudy.

For HIV-1 protease, the best enrichment for the ten top scor-ers is obtained by Surflex. However, visual inspection revealedthat the docking poses were unrealistic. The top scorers re-

Figure 2. Comparison of docking tools in virtual screening. Enrichments at 1, 5 and 10% are shown. For each docking program, the scoring function provid-ing the best enrichment is listed.

150 www.chemmedchem.org ! 2010 Wiley-VCH Verlag GmbH&Co. KGaA, Weinheim ChemMedChem 2010, 5, 148 – 158

MED A. Evers et al.

Page 54: Protein Ligand Interactions Docking

HIV – Application example

Problem: HIV mutates, making inhibitors inactive Solution:

Dock library of molecules (ca. 13 Mio) against wild-type and mutant Intersection of top scorers Re-dock, re-score Visually inspect 1500 structures Test 9 à 3 Hits (µM)

54

Waszkowycz,B. et al. (2011) Outstanding challenges in protein-ligand docking and structure-based virtual screening. Wiley Interdisciplinary Reviews: Computational Molecular Science, 1, 229-259.

WIREs Computational Molecular Science Outstanding challenges in protein–ligand docking and structure-based virtual screening

O

O2N

OHO

S

O2N

OHO

3 (IC50 = 260 nM) 4 (IC50 = 170 nM)

FIGURE 11 | Chemical structures of compounds derived fromvirtual screening hits.163

FIGURE 12 | X-ray structure of compound 4163 showing thehydrogen bonds between the ligand and the protein as yellow dashedlines. Note also the face-to-face stacking interaction between thenitro-bearing phenyl ring and the tryptophan beneath it.

particular, the mutation Tyr181Cys confers resistanceto many of the HIV-1 reverse transcriptase (HIV-1RT) inhibitors on the market. Therefore, the searchfor new HIV-1 RT inhibitors with a better resistanceprofile is still being pursued. Nichols et al.164 reportedthe discovery of three new active series using a virtualscreening approach. With the resistance issue in mind,they selected three crystal structures of HIV-RT:

• PDB code 1RT4 (wild type).• PDB code 2BE2 (wild type with different ori-

entation of Tyr181).• PDB code 1JLA (Tyr181 mutation).

They applied two hierarchical VS approaches.In the parallel mode, they docked the ZINC com-mercial library63 (approximately two million struc-tures) into the three protein structures using Glide SPand selected the top-scoring 5000 from each run forGlide XP refinement. In the serial approach, the XPrefinement was applied to the 4684 structures com-mon to the top-100,000 in the three runs. A total of

O

N

HN

O

NC

N

N S

NHHO

N

NH

O

O O

EC50 (WT) 6.2 µMEC50 (Y181C) 12.0 µM EC50 (WT) NA

EC50 (Y181C) 7.5 µM

EC50 (WT) 4.8 µMEC50 (Y181C) NA

FIGURE 13 | Structure and anti HIV-1 activity data for the threehits found by virtual screening against HIV-1 reverse transcriptase.164

1500 structures were visually inspected and nine se-lected for purchase and screening. Remarkably, threeof them showed activity (Figure 13).

The three hits provide suitable starting pointsfor lead optimization and the authors state thatprogress has been made on two of them. This ap-plication highlights some of the benefits of dockingagainst multiple crystal structures. The authors weresuccessful in their search for activity against mutantproteins, but a similar strategy could also be appliedto achieve selectivity.

!2-Adrenergic Receptor LigandsHistorically, and currently, GPCRs represent one ofthe most important classes of drug target. Thus, ithas been a frustration for drug designers that un-til very recently, no X-ray structures of this impor-tant family had been solved because of the difficul-ties in crystallizing these membrane-bound proteins.The publication of the bovine rhodopsin X-ray struc-ture in 2000 provided some basis for structure-baseddesign/discovery of GPCR ligands by means of theconstruction of homology models; but it was not un-til 2007 that the first X-ray structure of a therapeu-tically relevant, ligand-mediated GPCR—the human!2-adrenergic receptor—was reported165 (Figure 14).This was followed more recently by structures of the!1-adrenergic receptor (turkey) and the adenosineA2A receptor (human).

With the structure of the !2-adrenergic recep-tor in hand, two groups have described their at-tempts to exploit the new information it providesfor the purposes of structure-based virtual screening.Topiol and Sabio166,167 used the !2-adrenergicreceptor:corazolol complex together with the Glideprogram to conduct a virtual screen of 400,000

Volume 1, March /Apr i l 2011 251c! 2011 John Wi ley & Sons , L td .

Page 55: Protein Ligand Interactions Docking

Example 1: Explaining side effects

Problem: side effects of ERα modulators (SERMs)

Finding “off target” effects: •  Map sequences to structures (BLAST) •  Limit to “druggable” proteins •  Search with SOIPPA => SERCA (Sarcoplasmic Reticulum

Ca2+ channel ATPase)

55

Xie L, Wang J, Bourne PE (2007) In silico elucidation of the molecular mechanism defining the adverse effect of selective estrogen receptor modulators. PLoS Comput Biol 3(11)

Page 56: Protein Ligand Interactions Docking

Example 1: Validating results

•  Inverse search

•  Docking –  SERM, similar compounds, correlate –  Look for correlation

56

Correlation of Docking Scores of N-Moiety SERMs to ERa and SERCA red line: the linear regression of scores, green line: ideal correlation

We believe our integrated approach is immediatelyvaluable to the drug discovery and development process.The predicated panel of off-targets can be used to prioritizein vitro screening experiments, thus reducing costs andincreasing our ability to identify adverse drug effects.Furthermore, the predicated off-target binding mechanismsprovide insights for optimizing drug leads by taking intoaccount not only targeted receptors but also off-targets sothat unwanted side effects can be reduced in the early stage ofdrug development. In our SERM case study, the N-moietybinding sub-sites of ERa and SERCA are more similar thanthe C-moiety sub-sites. Thus, it is more likely that we willachieve highly specific SERMs by optimizing the C-moiety. Incertain cases the primary and off-target binding sites may behighly similar. As a result, it is difficult, if not impossible, toreduce the competitive binding of the off-target by leadoptimization. Other strategies have to be applied to minimizethe off-target binding; for example, through an increase inbioavailability of drugs, or optimization of the administrationregimen. Knowledge of off-targets will be invaluable for thispurpose. For instance, SERCA’s competitive binding toSERMs can be reduced by delivering SERMs enveloped withhydrophobic agents. SERMs have to first pass through thelipid bilayers of the cytoplasm membrane where thepredicated SERMs off-target binding site in SERCA is located.Conceivably, the envelope will reduce off-target binding andpermit more of the respective drugs to reach the final ERatarget in the nucleus.

ConclusionSERMs are potent anti-cancer drugs. By combining, first,

functional site similarity searching on a structural proteome-

wide scale, second, small molecule screening, and, finally,protein–ligand docking, a potential mechanism for theadverse effect of SERMs has been established. Specifically,we provide evidence for off-target binding of SERMs,resulting in the inhibition of a SERCA transmembranedomain which leads to a disruption in calcium homeostasis.The computational prediction presented here is supportedby experimental observations from in vitro and clinicalstudies. Our methodology provides opportunities to developfurther refined SERMs with fewer side effects. On a largerscale there exists the opportunity to explore off-targetsbinding for any existing pharmaceutical or compound ofpharmaceutical interest for which a 3D structural model isavailable. At this time we are beginning to systematicallyanalyze all commercially available pharmaceuticals in aneffort to explain any observed side effects.

Methods

Structural models of the human proteome. Sequences of all PDB[23] structures are mapped to Ensembl [25] human protein sequences(43,738 proteins) using BLAST [24]. A total of 10,730 PDB structuresmap to 3,158 Ensembl human proteins with a sequence identity above95%. These 10,730 structures are considered as structural models forthe human proteome. They form 2,586 sequence clusters when usinga sequence identity of 30%.

Structural coverage of the druggable human proteome. Theexisting druggable human proteome is determined by mappingEnsembl [25] human protein sequences against all sequences of drugtargets from Drugbank [57] using BLAST [24]. Homologous sequencesfrom the human proteome, with e-values less than 0.001, constitutethe druggable human proteome—a total of 13,865 human proteinscorresponding to 2,002 unique drug targets. Among the 10,730human protein structural models, 1,585 belong to the existingdruggable human proteome and correspond to 929 unique drug

Figure 8. Correlation of Binding Affinity Scores by Docking Molecular Analogs of N-Moiety and C-Moiety of SERMs to ERa and SERCA Proteins

(A,B) N- and C-moieties to ERa and SERCA TG1 sites, respectively.(C,D) N- and C-moieties to ERa and SERCA BHQ site, respectively. The red line represents the linear regression of docking scores. The green line indicatesthe optimal score correlation between two identical binding sites. The docking score is from eHits [31]. Docking score correlations from Surfex [32]show the same trends although the absolute values are different (unpublished data).doi:10.1371/journal.pcbi.0030217.g008

PLoS Computational Biology | www.ploscompbiol.org November 2007 | Volume 3 | Issue 11 | e2172330

Molecular Mechanism for Adverse Drug Effect

Page 57: Protein Ligand Interactions Docking

Summary

•  docking performance –  correct structure in about 70% of the test cases

•  prediction of binding affinity is very difficult 1.  Ranking complex geometries à good, not perfect 2.  Ranking different ligands à weak correlations 3.  Free energy estimation à more or less unsolved

•  Challenges –  handling protein flexibility –  improving reliability of structure and affinity prediction

•  BUT: Better than random guessing

57