Upload
elijah-lawrence
View
212
Download
0
Embed Size (px)
Citation preview
The Inverse Protein Folding The Inverse Protein Folding Problem*Problem*Arvind Gupta
Simon Fraser UniversityMay 24, 2005
*Joint work with J. Manuch, C. Mead, L. Stacho, B. Bhattacharyya, X. Huang
Canada-China Industrial Workshop, 2005 Hong Kong Baptist University
OutlineOutline• Background
• Forces in Protein Folding
• Hydrophobic-Polar Model
• Protein Databank
• Determining Attributes of the Ideal Lattice
• Future Steps
DNA• Genetic code• A “string” of nucleotides over A C G T• Code for all proteins• Self-replicating
Proteins
• A “string” over 20 amino acids• In solvent will fold into a unique 3D spatial
structure with minimal energy
Protein Structure
• Structure determines protein function.• Proteins normally are in an aqueous environment• Proteins are globular.
Proteins in the body
• Proteins are involved in all processes in the body:
Insulin
Hemoglobin
Proteins and diseases
M. Thorpe, Protein Folding, HIV and Drug Design, Physics and Technology Forefronts (2003).
Forward Protein Folding ProblemForward Protein Folding Problem
• Identify the protein structure for a specific amino acid sequence.
MAGWTRLS..
• Central open problem in biology• NP-hard under most models
Inverse Protein Folding ProblemInverse Protein Folding Problem• Given a structure (or a functionality) identify an
amino acid sequence whose fold will be that structure (exhibit that functionality).
• Crucial problem in drug design.• NP-hard under most models.
Forces acting on ProteinsForces acting on Proteins• Hydrogen Bonding
• Van der Waals interactions
• Ion pairing
• Disulfide bonds
• Intrinsic properties
(conformational preference)
• Hydrophobicity: the dominant
force in protein folding (Dill, 1990)
Hydro (water) philic (loving)phobic
(fearing)
Hydrophobic InteractionsHydrophobic Interactions
• Each amino acid can be classified as either hydrophobic or hydrophilic (polar)
• Hydrophobic [Polar] are in a higher [lower] energy state in an aqueous environment.
Hydrophobic – Polar (HP) ModelHydrophobic – Polar (HP) Model
• Introduced by Dill (1985) and Chan (1985)• “0” for polar; “1” for hydrophobic• Protein sequence embedded on lattice• Each amino acid in exactly one cell• Interactions across adjacent cells• Empty lattice cells contain water• Given protein maximize hydrophobic interactions
(native fold).• IE: Given 0-1 string embed onto a lattice,
maximizing adjacent 1’s.
The 2-D Square LatticeThe 2-D Square Lattice
• Hydrophobic “1”: Polar “0”:• Peptide bond: Hydrophobic interaction:• Example.
Protein:
Inverse protein foldingInverse protein folding
• Problem: For a given shape find a protein (amino acid string) with a native fold approximating the shape.
• Example.
Constructible structuresConstructible structures
Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.
• Proof by induction:– Base case:
p(S)=010010010010
Constructible structuresConstructible structures
Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.
• Proof by induction:– Inductive case:
Constructible structuresConstructible structures
Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.
• Proof by induction:– Inductive case:
Constructible structuresConstructible structures
Theorem: For any constructible structure S, there exists a protein p(S) with a native fold exactly filling the structure S.
• Proof:– Folds are saturated: every hydrophobic “1” is involved
in two hydrophobic interactions– saturated implies native
Stability of proteinsStability of proteins
Together 82 native folds!
• Proteins is stable if it has unique “native fold” (fold with minimal energy).
• Most natural proteins are stable.• The protein in our example is not stable:
Stability of proteinsStability of proteins
Conjecture: For any constructible structure S, the protein p(S) is stable.
• Tested for >20,000 constructible structures.• Mathematically proved for two simple infinite
classes of constructible structures L0 and L1.
L0: L1:
Boundary squaresBoundary squares
• Diagonal frame: the smallest diagonal rectangle containing all hydrophobic “1”-s.
• Boundary square: hydrophobic “1” lying on the border of diagonal frame.
5 boundary squares
Boundary squaresBoundary squares• Useful to find the last tile of constructible
structure.• A saturated fold has at least 4 of them.
Lemma. Let p=0{0,1}*0 be a protein string not containing 11, 000 and 10101 as a substring. For every saturated fold of p, each boundary square not adjacent to a terminal is the main square of a corner-closed core.
Proof for LProof for L00 structures structures• Take a saturated fold for p(S), L0.
• It has at least 4 boundary squares, and at least 2 not adjacent to a terminal (the first or the last amino acid).
• By Lemma, each is contained in a corner-closed core, i.e., is a red 1 of substring 1001001 of the protein string.
• In p(S)=0(10010)n(01001)n0, there are only two occurrences of substring 1001001, and they are overlapping.
• Hence, cores match each other and form a fully-closed core (closed on 3 sides) - the last tile.
• Cut the last tile and apply induction.
LL11 structures are more complex structures are more complex• p(S)=0(10010)n010(10010)m(01001)m01(01001)n-10
• p(S) contains one occurrence of substring 10101 (Lemma cannot be directly applied) and three occurrences of 1001001 (two corner-closed cores does not imply a fully-closed core).
Choosing a LatticeChoosing a Lattice• 2D is easier
Fewer options for combinatorial case analysisMore visually intuitiveTorsion angles describe protein mainchain
• 3D is more relevantMore biologically relevantMore representative of actual protein
structuresDirectly applicable to known protein structures
Protein Data Bank (PDB)
• Worldwide repository for
3-D biological macromolecular structure data• Contains 30857 known protein structures (May17,2005)
• Structures derived using different techniques– Nuclear Magnetic Resonance spectroscopy– X-ray crystallography
• PDB ‘known structures’ are really models of the structure of a protein
Determining Ideal Lattice AttributesDetermining Ideal Lattice Attributes
1. Should all edges of the lattice be identical in length?
2. How should distances between non-adjacent lattice points behave?
3. What angles should the lattice have?
4. How regular should the lattice be?
Use PDB statistics to answer these questions
Assemble a Set of Proteins
a) Protein structures generated using X-ray diffraction
b) High resolution structures (<= 1.75 Å)c) Model fits the experimental data well
Result: 3704 Protein structures in subset
Create a protein structure subset of good quality protein structures from the PDB:
Q1: Uniform Edge Length?
Overall distribution of consecutive residue distance:
Consecutive residue distance appears consistently with length 3.8 Å.
Answer to Question 1: All edge lengths should be uniform with length 3.8 Å.
Q2: Non-adjacent Vertex Distances?
Overall distribution of non-consecutive
residue distance:
Answer to Question 2: Non-adjacent vertices should be at least 3.8 Å apart.
• minimum distance: 3.06 Å
• only 10 distances < 3.5Å
• 1813 distances < 3.8Å
(out of 426 billion pairs).
Q3: Lattice Angles?
One amino acid
Amino acid chain
Q3: Lattice Angles?
• Calculate C angles: angle produced by three consecutive C atoms
• Group results by middle amino acid residue type
Overall distribution of C angles:
Bimodal distribution:
• Sharp peak at 90o
• Shallow peak at 120o
Q3: Lattice Angles?Some differences appear for C angles around certain amino acids:Shown: Proline, Phenylalanine, Aspartic acid
Q4: Lattice Regularity?• Determine average corresponding coordinate root
square mean deviation (c-RMS) values between the original PDB structure and lattice approximated structures (over the entire 3704 PDB protein subset)
n
ban
iii
1
2||RMS-c
ai = coordinates of lattice vertex corresponding to bi
bi = coordinates of residue in protein X-ray structure
Q4: Lattice Regularity?• Periodic Lattices: Cubic and Face-Centered-Cubic (FCC)
• Randomized Lattices: Shift each vertex in periodic lattices by a random value from normal (0, 0.0025) distribution, preserve edges
• De Novo Random Lattices: Generate random nodes and edges, maintain average degree and edge length of periodic
lattices
Q4: Lattice Regularity?• average c-RMS values generally increase as the
randomization of the lattices increase
Answer to Question 4: Periodic lattices achieve better approximation of protein structure than random lattices of the same degree
lattice model
degreeaverage c-RMS
periodic lattice
Randomized periodic lattice
de novo random lattice
FCC 12 1.82 1.967 4.85
Cubic 6 3.11 3.21 3.96
Results: Ideal Lattice Attributes
• Uniform edge lengths of 3.8Å
• Mimimum distance between any two vertices of 3.8Å
• Supporting mainly 90o and 120o angles
• Periodic in structure
Candidate lattices (space-filling)Candidate lattices (space-filling)
cubic hex. prism truncatedoctahedron
cuboctahedron
truncated tetrahedron
Candidate lattices (vector-based)Candidate lattices (vector-based)
Face-centered cubic (FCC)
Side+FCC (S+FCC)
Extended FCC (e-FCC)
RMS comparison of latticesRMS comparison of latticesc-RMS d-RMS a-RMS
Truncated Octahedron
5.3053 3.2479 13.0982
Hexagonal Prism 3.8704 2.4312 10.0313
Truncated Tetrahedron
3.6913 2.4133 19.9030
Simple Cubic 3.1123 2.1081 21.1005
Cubeoctahedron 2.5581 1.7427 8.3526
FCC 1.8212 1.4369 8.3346
S+FCC 2.1791 1.5819 6.2022
e-FCC 1.5385 1.1048 2.5700
Angle comparison of latticesAngle comparison of lattices
LatticeTrunc. octahedron
Hexagonal prism
Trunc. tetrahedron
Cubic
Cubocta-hedron
FCC S+FCC e-FCC
Degree 4 5 6 6 8 12 18 42
Closeness to 90
20 18 42 18 30 30 28.82 31.40
Closeness to 120
10 24 36 36 34.29 32.73 36.47 38.72
Future
1. Investigate candidate lattices to determine an ideal lattice for inverse protein folding
2. Mathematically prove that the ideal lattice can generate stable sequences for specified protein shapes within the HP model
3. Attempt to assign specific amino acids to lattice sites
Future4. Investigate protein sequences generated
by the model for stability and folding properties.
5. Incorporate other protein folding forces– Hydrogen Bonding– Van der Waals interactions – Intrinsic properties (conformational preference)– Ion pairing– Disulfide bonds
Questions?Questions?