48
Advanced Bioinformatics Lecture 7: Computer-aided lead identification ZHU FENG [email protected] http://idrb.cqu.edu.cn/ Innovative Drug Research Centre in CQU 创创创创创创创创创创创创创创创

Advanced Bioinformatics Lecture 7: Computer-aided lead identification

  • Upload
    jackie

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Advanced Bioinformatics Lecture 7: Computer-aided lead identification. ZHU FENG [email protected] http://idrb.cqu.edu.cn/ Innovative Drug Research Centre in CQU. 创新药物研究与生物信息学实验室. Table of Content. Schematic of DOCKing Pharmacophore-based docking INVDOCK Strategy - PowerPoint PPT Presentation

Citation preview

Page 1: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Advanced BioinformaticsLecture 7: Computer-aided lead identification

ZHU [email protected]

http://idrb.cqu.edu.cn/Innovative Drug Research Centre in CQU

创新药物研究与生物信息学实验室

Page 2: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

1. Schematic of DOCKing

2. Pharmacophore-based docking

3. INVDOCK Strategy

4. Ligand-based drug design

5. Classification of drugs by SVM

Table of Content

2

Page 3: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Given two molecules find their correct association

What is docking?

3

+ =Recep

tor

Ligand

T

Complex

Computationally predict the structures of protein-ligand complexes from

their conformations and orientations. The orientation that maximizes the

interaction reveals the most accurate structure of the complex.

Page 4: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Ligand

−Molecule that binds

with a protein

Protein active

site(s)

−Allosteric binding

−Competitive binding

Function of

binding interaction

−Natural and artificial

General protein–ligand binding

4

Page 5: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Docking strategy

5

PDB file

Surface Representation

Patch Detection

Matching Patches

Scoring & Filtering

Candidatecomplexes

Page 6: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Schematic of docking methodology

6

(A) the target binding site is

filled with site points

(B) distances between atoms in

a molecule are matched to

that of site points

(C) a transformation matrix is

calculated for an orientation

(D) the molecule is docked into

the binding site, and the fit

of that conformer is scored

Page 7: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Design of HIV-1 protease inhibitorStep 1: creation of spheres to fit a cavity

7

Page 8: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Design of HIV-1 protease inhibitorStep 2: place a ligand to match the position of spheres

8

Page 9: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Design of HIV-1 protease inhibitorStep 3: check chemical complementarity

9

Page 11: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Surface representation, that efficiently represents the docking surface and identifies the regions of interest

− Connolly surface

− Lenhoff technique etc.

Some techniques

Dense MS surface (Connolly) Sparse surface (Shuo Lin et al.)

11

Page 12: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Each atomic sphere is given the van der Waals radius of the atom

Rolling a Probe Sphere over the Van der Waals surface leads to the Solvent Reentrant Surface or Connolly surface

Connolly surface

12

Page 13: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Computes a “complementary” surface for the receptor instead of the Connolly surface, i.e. computes possible positions for the atom centers of the ligand

Lenhoff technique

13

Atom centers of the ligand

van der Waals surface

Page 14: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Pharmacophore-based dockingBasic idea

14

Appropriate spatial disposition of a small

number of functional groups in a molecule is

sufficient for achieving a desired biological

effect.

The ensemble formation will be guided by

these functional groups

Page 15: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

5.2

4.2-4.7

6.7

4.8

5.1-7.1

3-D representation of a protein binding site

15

Distances betweenbinding groupsin Angstroms and the type of interactionis searchable

Page 16: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Pharmacophore Fingerprint

16

Appropriate spatial disposition of a small

number of functional groups in a molecule is

sufficient for achieving a desired biological

effect.

The ensemble formation will be guided by

these functional groups

Page 17: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Schematic of PhDOCK methodology

17DOCK PhDOCK

Page 18: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Advantages and disadvantages of PhDOCK

18

Advantages: speed increase due to (1) rapid elimination of

ligands containing functional groups which would interfere

with binding. (2) speed increase over docking of individual

molecules. (3) more information pertaining to the entire

molecule is retained (no rigid portions). (4) Chemical matching

and critical clusters are encouraged.

Disadvantages: (1) complex queries are extremely slow. (2) the

majority of the information contained in the target structure is

not considered during the search.

Page 19: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

19

Existing methods

Given a protein, find putative

binding ligands from chemical

database

Given Lock, find Key

Forward lead identification

Science 1992; 257:1078

INVDOCK methods

Given a ligand, find putative

protein targets from protein

database

Given Key, find Lock

Backward MOA prediction

Proteins 1999; 36:1

INVDOCK Strategy

Page 20: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

INVDOCK Test on Drug Target Prediction Anticancer Drug Tamoxifen

20

PDB Id Protein Experimental Findings1a25 Protein Kinase C Secondary Target1a52 Estrogen Receptor Drug Target1bhs 17 beta HSD dehydragenase Inhibitor1bld bFGF Factor Inhibitor1cpt Cytochrome P450-TERP Metabolism1dmo Calmodulin Secondary Target

Proteins. 1999; 36:1

Tamoxifen is a famous anticancer

drug for treatment of breast cancer.

It was approved by FDA in 1998 as

the 1st cancer preventive drug. 30

million people are expected to use it.

Page 21: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Compound

Number of experimentally confirmed or implicated toxicity targets

Number of toxicity targets predicted by INVDOCK

Number of toxicity targets missed by INVDOCK

Number of toxicity targets without structure or involving covalent bond

No. of INVDOCK predicted toxicity targets without experimental finding

Aspirin 15 9 2 4 2

Gentamicin 17 5 2 10 2

Ibuprofen 5 3 0 2 2

Indinavir 6 4 0 2 2

Neomycin 14 7 1 6 6

Penicillin G 7 6 0 1 8

Tamoxifen 2 2 0 0 4

Vitamin C 2 2 0 0 3

Total 68 38 5 25 29

INVDOCK Test on Drug Target Prediction Drug Toxicity Targets (J. Mol. Graph. Mod. 2001, 20, 199)

21

Page 22: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

The docked (blue) and crystal (yellow) structure of ligands in some PDB ligand-protein complexes. The PDB Id of each structure is shown. 22

Results of docking studies

Page 23: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Protein-Protein cases from protein-protein docking benchmark:Enzyme-inhibitor – 22 cases Antibody-antigen – 16 cases

Protein-DNA docking: 2 unbound-bound cases

Protein-drug docking: tens of bound cases (Estrogen receptor, HIV protease, COX)

Performance: Several minutes for large protein molecules and seconds for small drug molecules on standard PC computer.

Dataset and Testing Results

Endonuclease I-PpoI (1EVX) with DNA (1A73). RMSD 0.87Å, rank 2

DNA

Endonuclease

Docking solution

Estrogen receptor

Estradiol molecule from complex

Docking solution

Estrogen receptor with estradiol (1A52). RMSD 0.9Å, rank 1, running time: 11 seconds 23

Page 24: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

A drug is classified as either belong (+) or not belong (-) to a class

Drug class: inhibitor of a protein, BBB penetrating, genotoxic, etc.

Protein class: enzyme EC3.4 family, DNA-binding, etc.

By screening against all classes, the property of a drug or the function of a protein can be identified

Drug

Class-1 SVM

Class-2 SVM

……

-

+

Classification of Drugs by SVM

Class-n SVM -

-

Drug belongsto class-2

24

Page 25: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

What is SVM?

• Support vector machines, a machine learning method based on

artificial intelligence, learning by examples, statistical learning,

classify objects into one of the two classes.

Advantages of SVM:

• Diversity of class members (no racial discrimination).

• Use of structure-derived physico-chemical features as basis for

drug classification (no structure-similarity required in the

algorithm).

Classification of drugs by SVM

25

Page 26: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Artificial Intelligence (AI)

26

Page 27: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Inductive learning (example-based learning)

Machine learning method

27

Page 28: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

A = (1, 1, 1)B = (0, 1, 1)C = (1, 1, 1)D = (0, 1, 1)E = (0, 0, 0)F = (1, 0, 1)

Machine learning methodFeature vectors

28

Page 29: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)

Z

Input space

X

Y

BAE

F

Feature vector

Machine learning methodFeature vectors in input space

29

Page 30: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

SVM Method

BorderNew border

Project to a higher dimensional space

Drug familymembers

Nonmembers

Drug familymembers

Nonmembers

30

Page 31: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Support vector

Support vector

New border

Protein familymembers

Nonmembers

SVM Method

31

Page 32: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Protein familymembers

Nonmembers

New border

Support vector

Support vector

SVM Method

32

Page 33: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Best Linear Separator?

33

Page 34: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

c

d

Find closest points in convex hulls

34

Page 35: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

c

d

Plane bisect closest points

35

Page 36: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Maximize distanceBetween two parallel supporting planes

Distance = “Margin” =

36

Best Linear SeparatorSupporting plane method

Page 37: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Best Linear SeparatorSupporting plane method

37

Page 38: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Border line is nonlinear

38

SVM Method

Page 39: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Non-linear transformation: use of kernel function

39

SVM Method

Page 40: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

40

SVM Method

Page 41: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

41

SVM Method

Page 42: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

42

SVM Method

Page 43: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

43

SVM Method

Page 44: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

44

SVM Method

Page 45: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

SVM for classification of drugs

How to represent a drug?• Each structure represented by specific feature vector

assembled from structural, physico-chemical properties Simple molecular properties (molecular weight, no. of

rotatable bonds etc. 18 in total) Molecular Connectivity and shape (28 in total) Electro-topological state polarity (84 in total) Quantum chemical properties (electric charge,

polaritability etc. 13 in total) Geometrical properties (molecular size vector, van

der Waals volume, molecular surface etc. 16 in total)

J. Chem. Inf. Comput. Sci. 44,1630 (2004) J. Chem. Inf. Comput. Sci. 44, 1497 (2004)

Toxicol. Sci. 79,170 (2004)

45

Page 46: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Computer loaded with SVMProt

SVMclassifier for every

Drug class

Identified classes

Drug designed or property predicted

Send structure to classifier

Input structurethrough internet

Option two

Option one

Input structureon local machine

Your drug structure

Which class your drug belongs to?

DrugChemical Structure

Chemical Structure

46

SVM-based drug design and property prediction software

Page 47: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Protein inhibitor/activator/substrate prediction• 86% of the 129 estrogen receptor activators and 84% of 101 non-

activators correctly predicted.

• 81% of 116 P-glycoprotein substrates and 79% of 85 non-substrates correctly predicted

Drug toxicity prediction• 97% of 102 TdP+ and 84% of 243 TdP- agents correctly predicted

• 73% of 229 genotoxic and 93% of 631 non-genotoxic agents correctly predicted

Pharmacokinetics prediction• 95% of 276 BBB+ and 82% of 139 BBB- agents correctly predicted

• 90% of 131 human intestine absorption and 80% of 65 non-absoption agents correctly predicted.

47

SVM drug prediction results

Page 48: Advanced Bioinformatics Lecture 7: Computer-aided lead identification

Projects Q&A!

1. Biological pathway simulation

2. Computer-aided anti-cancer drug design

3. Disease-causing mutation on drug target

48

Any questions? Thank you!