44
The Inverse Protein The Inverse Protein Folding Problem* Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 int work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya, X. H anada-China Industrial Workshop, 2005Hong Kong Baptist University

The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Embed Size (px)

Citation preview

Page 1: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

The Inverse Protein Folding The Inverse Protein Folding Problem*Problem*Arvind Gupta

Simon Fraser UniversityMay 24, 2005

*Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya, X. Huang

Canada-China Industrial Workshop, 2005 Hong Kong Baptist University

Page 2: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

OutlineOutline• Background

• Forces in Protein Folding

• Hydrophobic-Polar Model

• Protein Databank

• Determining Attributes of the Ideal Lattice

• Future Steps

Page 3: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

DNA• Genetic code• A “string” of nucleotides over A C G T• Code for all proteins• Self-replicating

Page 4: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Proteins

• A “string” over 20 amino acids• In solvent will fold into a unique 3D spatial

structure with minimal energy

Page 5: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Protein Structure

• Structure determines protein function.• Proteins normally are in an aqueous environment• Proteins are globular.

Page 6: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Proteins in the body

• Proteins are involved in all processes in the body:

Insulin

Hemoglobin

Page 7: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Proteins and diseases

M. Thorpe, Protein Folding, HIV and Drug Design, Physics and Technology Forefronts (2003).

Page 8: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Forward Protein Folding ProblemForward Protein Folding Problem

• Identify the protein structure for a specific amino acid sequence.

MAGWTRLS..

• Central open problem in biology• NP-hard under most models

Page 9: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Inverse Protein Folding ProblemInverse Protein Folding Problem• Given a structure (or a functionality) identify an

amino acid sequence whose fold will be that structure (exhibit that functionality).

• Crucial problem in drug design.• NP-hard under most models.

Page 10: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Forces acting on ProteinsForces acting on Proteins• Hydrogen Bonding

• Van der Waals interactions

• Ion pairing

• Disulfide bonds

• Intrinsic properties

(conformational preference)

• Hydrophobicity: the dominant

force in protein folding (Dill, 1990)

Hydro (water) philic (loving)phobic

(fearing)

Page 11: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Hydrophobic InteractionsHydrophobic Interactions

• Each amino acid can be classified as either hydrophobic or hydrophilic (polar)

• Hydrophobic [Polar] are in a higher [lower] energy state in an aqueous environment.

Page 12: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Hydrophobic – Polar (HP) ModelHydrophobic – Polar (HP) Model

• Introduced by Dill (1985) and Chan (1985)• “0” for polar; “1” for hydrophobic• Protein sequence embedded on lattice• Each amino acid in exactly one cell• Interactions across adjacent cells• Empty lattice cells contain water• Given protein maximize hydrophobic interactions

(native fold).• IE: Given 0-1 string embed onto a lattice,

maximizing adjacent 1’s.

Page 13: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

The 2-D Square LatticeThe 2-D Square Lattice

• Hydrophobic “1”: Polar “0”:• Peptide bond: Hydrophobic interaction:• Example.

Protein:

Page 14: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Inverse protein foldingInverse protein folding

• Problem: For a given shape find a protein (amino acid string) with a native fold approximating the shape.

• Example.

Page 15: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Constructible structuresConstructible structures

Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.

• Proof by induction:– Base case:

p(S)=010010010010

Page 16: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Constructible structuresConstructible structures

Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.

• Proof by induction:– Inductive case:

Page 17: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Constructible structuresConstructible structures

Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.

• Proof by induction:– Inductive case:

Page 18: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Constructible structuresConstructible structures

Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.

• Proof:– Folds are saturated: every hydrophobic “1” is involved

in two hydrophobic interactions– saturated implies native

Page 19: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Stability of proteinsStability of proteins

Together 82 native folds!

• Proteins is stable if it has unique “native fold” (fold with minimal energy).

• Most natural proteins are stable.• The protein in our example is not stable:

Page 20: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Stability of proteinsStability of proteins

Conjecture: For any constructible structure S, the protein p(S) is stable.

• Tested for >20,000 constructible structures.• Mathematically proved for two simple infinite

classes of constructible structures L0 and L1.

L0: L1:

Page 21: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Boundary squaresBoundary squares

• Diagonal frame: the smallest diagonal rectangle containing all hydrophobic “1”-s.

• Boundary square: hydrophobic “1” lying on the border of diagonal frame.

5 boundary squares

Page 22: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Boundary squaresBoundary squares• Useful to find the last tile of constructible

structure.• A saturated fold has at least 4 of them.

Lemma. Let p=0{0,1}*0 be a protein string not containing 11, 000 and 10101 as a substring. For every saturated fold of p, each boundary square not adjacent to a terminal is the main square of a corner-closed core.

Page 23: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Proof for LProof for L00 structures structures• Take a saturated fold for p(S), L0.

• It has at least 4 boundary squares, and at least 2 not adjacent to a terminal (the first or the last amino acid).

• By Lemma, each is contained in a corner-closed core, i.e., is a red 1 of substring 1001001 of the protein string.

• In p(S)=0(10010)n(01001)n0, there are only two occurrences of substring 1001001, and they are overlapping.

• Hence, cores match each other and form a fully-closed core (closed on 3 sides) - the last tile.

• Cut the last tile and apply induction.

Page 24: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

LL11 structures are more complex structures are more complex• p(S)=0(10010)n010(10010)m(01001)m01(01001)n-10

• p(S) contains one occurrence of substring 10101 (Lemma cannot be directly applied) and three occurrences of 1001001 (two corner-closed cores does not imply a fully-closed core).

Page 25: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Choosing a LatticeChoosing a Lattice• 2D is easier

Fewer options for combinatorial case analysisMore visually intuitiveTorsion angles describe protein mainchain

• 3D is more relevantMore biologically relevantMore representative of actual protein

structuresDirectly applicable to known protein structures

Page 26: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Protein Data Bank (PDB)

• Worldwide repository for

3-D biological macromolecular structure data• Contains 30857 known protein structures (May17,2005)

• Structures derived using different techniques– Nuclear Magnetic Resonance spectroscopy– X-ray crystallography

• PDB ‘known structures’ are really models of the structure of a protein

Page 27: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Determining Ideal Lattice AttributesDetermining Ideal Lattice Attributes

1. Should all edges of the lattice be identical in length?

2. How should distances between non-adjacent lattice points behave?

3. What angles should the lattice have?

4. How regular should the lattice be?

Use PDB statistics to answer these questions

Page 28: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Assemble a Set of Proteins

a) Protein structures generated using X-ray diffraction

b) High resolution structures (<= 1.75 Å)c) Model fits the experimental data well

Result: 3704 Protein structures in subset

Create a protein structure subset of good quality protein structures from the PDB:

Page 29: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Q1: Uniform Edge Length?

Overall distribution of consecutive residue distance:

Consecutive residue distance appears consistently with length 3.8 Å.

Answer to Question 1: All edge lengths should be uniform with length 3.8 Å.

Page 30: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Q2: Non-adjacent Vertex Distances?

Overall distribution of non-consecutive

residue distance:

Answer to Question 2: Non-adjacent vertices should be at least 3.8 Å apart.

• minimum distance: 3.06 Å

• only 10 distances < 3.5Å

• 1813 distances < 3.8Å

(out of 426 billion pairs).

Page 31: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Q3: Lattice Angles?

One amino acid

Amino acid chain

Page 32: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Q3: Lattice Angles?

• Calculate C angles: angle produced by three consecutive C atoms

• Group results by middle amino acid residue type

Overall distribution of C angles:

Bimodal distribution:

• Sharp peak at 90o

• Shallow peak at 120o

Page 33: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Q3: Lattice Angles?Some differences appear for C angles around certain amino acids:Shown: Proline, Phenylalanine, Aspartic acid

Page 34: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Q4: Lattice Regularity?• Determine average corresponding coordinate root

square mean deviation (c-RMS) values between the original PDB structure and lattice approximated structures (over the entire 3704 PDB protein subset)

n

ban

iii

1

2||RMS-c

ai = coordinates of lattice vertex corresponding to bi

bi = coordinates of residue in protein X-ray structure

Page 35: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Q4: Lattice Regularity?• Periodic Lattices: Cubic and Face-Centered-Cubic (FCC)

• Randomized Lattices: Shift each vertex in periodic lattices by a random value from normal (0, 0.0025) distribution, preserve edges

• De Novo Random Lattices: Generate random nodes and edges, maintain average degree and edge length of periodic

lattices

Page 36: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Q4: Lattice Regularity?• average c-RMS values generally increase as the

randomization of the lattices increase

Answer to Question 4: Periodic lattices achieve better approximation of protein structure than random lattices of the same degree

lattice model

degreeaverage c-RMS

periodic lattice

Randomized periodic lattice

de novo random lattice

FCC 12 1.82 1.967 4.85

Cubic 6 3.11 3.21 3.96

Page 37: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Results: Ideal Lattice Attributes

• Uniform edge lengths of 3.8Å

• Mimimum distance between any two vertices of 3.8Å

• Supporting mainly 90o and 120o angles

• Periodic in structure

Page 38: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Candidate lattices (space-filling)Candidate lattices (space-filling)

cubic hex. prism truncatedoctahedron

cuboctahedron

truncated tetrahedron

Page 39: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Candidate lattices (vector-based)Candidate lattices (vector-based)

Face-centered cubic (FCC)

Side+FCC (S+FCC)

Extended FCC (e-FCC)

Page 40: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

RMS comparison of latticesRMS comparison of latticesc-RMS d-RMS a-RMS

Truncated Octahedron

5.3053 3.2479 13.0982

Hexagonal Prism 3.8704 2.4312 10.0313

Truncated Tetrahedron

3.6913 2.4133 19.9030

Simple Cubic 3.1123 2.1081 21.1005

Cubeoctahedron 2.5581 1.7427 8.3526

FCC 1.8212 1.4369 8.3346

S+FCC 2.1791 1.5819 6.2022

e-FCC 1.5385 1.1048 2.5700

Page 41: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Angle comparison of latticesAngle comparison of lattices

LatticeTrunc. octahedron

Hexagonal prism

Trunc. tetrahedron

Cubic

Cubocta-hedron

FCC S+FCC e-FCC

Degree 4 5 6 6 8 12 18 42

Closeness to 90

20 18 42 18 30 30 28.82 31.40

Closeness to 120

10 24 36 36 34.29 32.73 36.47 38.72

Page 42: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Future

1. Investigate candidate lattices to determine an ideal lattice for inverse protein folding

2. Mathematically prove that the ideal lattice can generate stable sequences for specified protein shapes within the HP model

3. Attempt to assign specific amino acids to lattice sites

Page 43: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Future4. Investigate protein sequences generated

by the model for stability and folding properties.

5. Incorporate other protein folding forces– Hydrogen Bonding– Van der Waals interactions – Intrinsic properties (conformational preference)– Ion pairing– Disulfide bonds

Page 44: The Inverse Protein Folding Problem* Arvind Gupta Simon Fraser University May 24, 2005 *Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya,

Questions?Questions?