experiments? Dynamics - Biological computing · …so what useful information do we get? • Decomposition of the motion to independent modes. In each mode all particles move in the

Dynamics

Eran Eyal May 2011

Dynamics of proteins – what can we learn from experiments?

• X-ray crystallography

• NMR

TYROSINE-PROTEIN KINASE color by B-factor

B-factor

A measure of the uncertainty in the position of individual atoms

Anisotropic Displacement Parameters (ADP) from X-ray crystallography

Anisotropic Fluctuations ΔX ≠ ΔY ≠ ΔZ

Isotropic displacements ΔX = ΔY = ΔZ

<(ΔX)2> = <(ΔY)2> = <(ΔZ)2> = <(ΔR)2>/3

Mean-square fluctuations of residues about their mean positions

<(ΔR)2> = <(ΔX)2> + <(ΔY)2> + <(ΔZ)2>

ΔX

ΔY

ΔZ

Anisotropic Displacement Parameters (ADP) - background

0.15240.03190.0291

0.03190.2649-0.0422

0.0291-0.04220.1672

X Y Z Occ B-factor

ATOM 16 CA THR A 2 0.708 -5.416 -12.414 1.00 6.25ANISOU 16 CA THR A 2 16172 26490 15240 -4220 2910 3190

U11 U22 U33 U12 U13 U23

U =

<(ΔZ)2><ΔZΔY ><ΔZΔX><ΔYΔZ ><(ΔY)2><ΔYΔX><ΔXΔZ ><ΔXΔY ><(ΔX)2>

ΔX

ΔY

ΔZCovariance matrix of each residue

Dynamics from MNR

Dynamics from MNR

http://ignm.ccbb.pitt.edu/oPCA_Online.htm

Dynamics of proteins – computational approaches

• Dynamics of proteins is clearly related to their function.

• Understanding the relation between the two is a main challenge in the field of biophysics

• Molecular Dynamics provides a way to conduct non-equilibrium simulations but only for short time scales (10-7 s)

• Normal Mode Analysis provides a way to analyze equilibrium motion for longer time scales

Times and Amplitude scales

Functionality examples

Type of motion

ms - h (10-3 - 104 s) more than 10 Å

•Hormone activation •Protein functionality

Global Motions:•Heix-coil transition •Folding/unfolding •Subunit association

μs - ms (10-6 - 10-3 s) 5 - 10 Å

•Hinge bending motion •Allosteric transitions

Large Scale Motions:•Domain motion •Subunit motion

ns - μs (10-9 - 10-6 s) 1 - 5 Å

•Active site conformation adaptation •Binding specificity

Medium Scale Motions:•Loop motion •Terminal-arm motion •Rigid-body motion (helices)

fs - ps (10-15 - 10-12 s) less than 1 Å

•Ligand docking flexibility •Temporal diffusion pathways

Local Motions:•Atomic fluctuation •Side chain motion

Modified after: Becker & Watanabe (2001). Dynamic Methods. In Computational & Biochemistry & Biophysics (Edited by Becker et al.)

Source: http://www.lofar.org/BlueGene/Suits.pdf

Molecular Dynamics (MD)

• MD: Movement simulation of the motion of all particles in a molecular system by iteratively solving Newton’s equations of motion.

• Based on Newton’s classical mechanics: F=MA

• Evaluation of energies and forces of interaction of all particles in the system.

• MD provides links between structure and dynamics by enabling the exploration of the conformational energy landscape accessible to macromolecules.

Newton’s equation of motion is given by:

where Fi is the force exerted on particle i, mi is the mass of particle i and ai is the acceleration of particle i.The force can also be expressed as the gradient of the potential energy:

iii amF =

iii VF −∇=

Newton’s equation of motion

Combining these 2 equations yields:

where V is the potential energy of the system. Newton’s equation of motion can then relate the derivative of the potential energy to the changes in position as a function of time.

2

2

dtrdm

drdV i

ii=−

• 1977: 1st protein MD (McCammon, Gelin, Karplus). 9.2 ps, vacuum.

• Today’s ‘regular’ MD: 10-100 ns, explicit solvent.• IBM’s Blue-Gene aim is MD –longer / more accurate runs.

MD: historical prespective

Quality of MD runs depends on:

• Time-scale and resolution of investigated motion.• Availability of good force-field parameterization (partial charges) for required data, e.g. Residue Topology File (RTF) for cofactors (transferability vs. refined accuracy)• Quality of initial data• Details taken into account (solvent, long-range electrostatics, constraints on the system etc.)

1. Hydrogens – A. freeze the fastest modes of vibration by constraining the bonds to hydrogen atoms to a fixed length (SHAKE or RATTLE algorithms). B. treat the water solvent (~half of all simulated atoms) as a rigid body (if using explicit solvent). C. consider simulating without non-polar hydrogens by making larger ‘pseudo-atoms’ (‘united atoms’ force-field).

2. Integrators – Solving Newton’s equation requires a numerical procedure for integrating the 2nd order differential equation. A standard procedure is the finite-difference approach. The coordinates and velocities at time t+Δt are obtained to a sufficient degree of accuracy) from the molecular coordinates and velocities at an earlier time t. The equation is solved for each time-step.

3. Long-range electrostatics – the slow decaying Coulombic interactions (r-1) and dipolar interactions (r-3) and even London dispersion forces (r-6) must be solved over long distances. Cut-offs are problematic as we must conserve energies and not introduce discontinuity in the system. Solutions include: Switch (problematic), PME – Particle Mesh Ewald, P3M – Particle-Particle, Particle-Mesh Ewald, Reaction Field (RF – treats nearby solvent explicitly and rest implicitly), Generalized RF. See description in slides below.

MD Bottlenecks Require Designated Solutions

• Integrator: Choose Δt to be ~1/20 of fastest measured motion.• Hydrogen vibration = fastest movement ~10 fs compute with

designated algorithm: use a 1 fs (10-15 s) time step for the system.• 1,000,000 steps / ns. 30k+ atoms in typical run. ~1,000 machine

instruction each step.• Enon-bonded component complexity: check use of united atoms, implicit

solvent, harmonic constraints for parts of the system.• Complementary, if question includes non-newtonian forces, consider

QM/MM or designated methods.

Molecular Mechanics (MM): Molecular Mechanics (MM): more considerationsmore considerations

MD steps

• Build system (solvent, cofactors, unique forces)• Run system (duration, temperature, only minimization or ‘real’

dynamics)• Analyze system

MMinimizationinimization

• The energy function surface (called energy landscape) of a macromolecule is composed of multiple local minima (stable states) and multiple saddle points (transition states).

• Minimization is a procedure to find a local minimum.

html.course/becker~/il.ac.tau.www://httpSource (old link): Source: http://www.ch.embnet.org/MD_tutorial/

MDMD procedure: cool, heat, cool, produceprocedure: cool, heat, cool, produce

MD Analysis

I)Mean Energy

II)RMS difference between two structures

III)RMS fluctuations

note the relation between the RMS fluctuations and the crystallographic B factors;

∑==

N

i iEE N 1

1

( ) ( )∑=−= −i

iii

rrNii rrRMS 212βαβα

( )∑= −f

averagei

fi

frrN

fluctiRMS 21

( )2238 fluct

ii RMSB π=

MD tricksMD tricks• How far to compute?• How much solvent do we need?• Till the solvent behave as bulk water…• Use periodic boundary conditions:

Translation of system to infinity by rigid translation of all atoms. Each particle interacts with all particles (not itself) in box and in images, but we store only main box. Once a particle leaves the cell, it is replaced by a particle coming from the opposite image.

• For cutoff distance R (VdW 7-10Å, Coulomb 10-15Å), use box > 2R• What will happen if we replace Zn++ by Ca++?• We can use computational alchemy – in each step change the weights between

the systems interaction with each metal, i.e. in the beginning use 95% Zn++ and 5% Ca++ that are computed in parallel. The Zn doesn’t see the Ca and vice versa. Finish by reversing the weights till 100% Ca++ .

Normal Mode Analysis

• A simple analytical tool to explore equilibrium dynamics of proteins

• Approximation of the potential at a local minima by harmonic function

• Introduced shortly after the MD introduction in the beginning of 80’s:

Go, Noguti and Nishikawa (1983) PNAS 80, 3693-3700Brooks and Karplus (1983) PNAS 80, 6571-6575Levitt, Sander and Stern (1985) JMB, 181, 423-427

Stern (1989) Prog Clin Biol Res 289, 87-94.

Elastic network models are a special type of normal mode analysis. The molecule is represented as network of nodes connected by springs.

Elastic Network Models

Representation of protein structure as an elastic network produces, using a single parameter, accurate and detailed description of the dynamics of the system

Tirion (1996), Phys Rev Lett, 77, 1905-1908

The most global dynamic features of the system are maintained even when the system is modeled at a more simplified (“coarse-grained”) level

Doruker, Atilgan and Bahar (2000), Proteins, 40, 512-524Tama & Sanejouand (2001), Protein Eng, 14, 1-6Atilgan et al (2001), Biophys J. 80, 505-515

d < rc

GNM (Gaussian Network Model) and ANM (Anisotropic Network Model)are residue level models widely used for investigating the dynamics of biological systems

Bahar, Atilgan and Erman (1997), Fold Des, 2, 173-181Hinsen (1998), Proteins, 33, 417-429Doruker, Atilgan and Bahar (2000), Proteins, 40, 512-524

Gaussian Network Model (GNM)

RR)(γ T0ijij

ijjij ΓΔΔ=−Γ= ∑

≠

2

||RR|

2V

Ti

N

ii

i

BTK uuC 1 ∑−

=

− =Γ≈1

1

1λγ

Potential:

Kirchhoff (N×N):

Covariance (N×N):

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−−−

−=Γ

1 1 0 1 2 10 1 1

TKBij

γ

λi are the eigenvalues of Γui are the eigenvectors of Γ

⎪⎭

⎪⎬⎫

⎪⎩

⎪⎨⎧

⎟⎟

⎠

⎞

⎜⎜

⎝

⎛⎟⎟⎠

⎞⎜⎜⎝

⎛−∝

⎭⎬⎫

⎩⎨⎧−∝

−− RTkR

21

RRTk

Rp

BT

T

B

ΔΓΔ

ΓΔΔΔ

11exp

2exp)(

γ

γ

( ) ( )⎭⎬⎫

⎩⎨⎧ −−−= − μxΞμx

ΞΞμ,x, 1

21exp

)2(1)(

21

2

TN

Wπ

Relation between probability and potential (Boltzmann):

General form of multivariate probability distribution:

Anisotrpic Network Model (ANM)

2

|)R(R

2V 0

ijijijj

ij )(γ−Γ= ∑

≠

Ti

N

ii

i

uuHC 1 ∑−

=

− =≈63

1

1λ

d < rc

Potential:

Hessian (3N×3N):

Covariance (3N×3N):

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡Γ

=

ijijijijijij

ijijijijijij

ijijijijijij

20

Z ZY ZXZ

ZY YY XYZX YX XX

)(R ij

ijij

γH

…so what useful information do we get?

• Decomposition of the motion to independent modes. In each mode all particles move in the same phase and frequency

• Fluctuations of specific residues

• Correlation between residues

• Each mode has a unique frequency. Slowest modes facilitate moreextensive and cooperative conformation changes. In Macromolecules these changes are often related to the function

Konard Hinsen

“Normal Mode Theory and Harmonic Approximations”

The major advantages of elastic network models over MD simulations:

1. Insight to longer time scales2. Accurate solution 3. Fast computational time – large complexes, database scale analysis

This major disadvantages:

1. Approximate potential – harmonic approximation2. Theoretically, should be valid only around local minima

The motions along the slowest normal mode correlate well with large scale functional motions experimentally observed Intrinsic ability of proteins to undergo functional motions

r=0.87

Ovotransferrin

Correlation between normal modes and large scale conformation change

HIV-1 reverse transcriptase

actin

maltodextrin-binding protein

glutamate-binding protein

LAO

LIR-1

Calmodulin (CaM)

P38 kinase

Yang and Bahar (2005) Structure 13, 893-904

phospholipase ricin

Relation between motion and catalytic function

Slow modes

TheoryBiological motivation

Evaluation of the modelsDynamics of molecules in crystalsAnistropy of mechanical unfolding

Dynamics and EvolutionPhosphorylation sites

Available tools

Residue

SS SY SSSTS SSS SY TT T S YT

Ovocleidin (Egg shell protein) 1gz2, P83515

Red- phosphorylation siteGreen – not known to be phosphorylated

Fluctuations in the slow mode of phosphorylation sites vs non phsphorylation sites

Anisotropic Network Model (ANM)

Ti

N

ii

i

uuHC 1 ∑−

=

− ==63

1

1λ

Covariance (3N×3N):

Covariance of each node: 1H −

ii ADP matrix foreach residue

Doruker et al. (2000) Proteins 40, 512-524Atilgan et al. (2001) Biophys J. 80, 505-515

ExperimentalTheoretical

Blue color indicates small fluctuations

Antifungal protein EAFP2 from Oliver tree

Eyal et al. (2007) Bioinformatics 23, i175-184.

How to compare sets of ADPs ?

ab Tbk aki ak bl

k k lak bl

3 1 d 1 dD ln v v2 2 d 2 d= = =

= − + +∑ ∑∑3 3 3 2

1 1 1

• Correlation coefficient between the ADP values

variance at the short axisvariance at the long axis• Anisotropy (A): 0 < A ≤ 1

• Volume (V): the ellipsoid volumes

• Kullback-Leibler distance (KLD). Overlap index. Considers shape and orientation of ellipsoids a, b (defined by their 3×3 ADP matrices). It can be expressed using their eigenvalues (d) and eigenvectors (v) of the individual ADP matrices:

KLDab

0 ≤ kld ≤ ∞

Test set for examination of experimental and theoretical ADPs

A set of 93 high resolution (R<1.5 Å) non-redundant proteins

0.4290.3430.4860.7710.5710.2600.547186Mean (93 proteins )

off-diagonaldiagonalAll ADP(ΔR)2Anisotropyvolumes KLDPearson correlation coefficient r between

NPDB structures

Correspondence between theory and experiments

Correlation coefficient of directional fluctuations and overall fluctuations at EAFP2

How similar are the experimental data reported forthe same protein in different PDB files?

A closer look at experimental data

The contribution of the rigid body motion to the experimental fluctuations is smaller at denser crystals

0.1540.6580.7930.9160.8700.4640.867195Mean (19 pairs)0.2300.6060.7520.8720.8720.0840.8551751kt7A1kt5A0.1570.6870.7990.9180.8970.3510.8962071bs9A1g66A0.1660.5250.7310.8700.8450.3450.8991294lztA3lztA0.0430.8020.7970.9300.8280.8600.774581k6uA1g6xA0.0660.7540.8720.9620.9000.7980.881901m1qA1m1rA0.2300.6720.8270.9120.9300.3470.9263591q2qA1rgzA0.1210.7110.8880.9570.9490.4780.9392081me3A1me4A0.1200.8370.9030.9280.9330.8430.9213151z8aA1pwmA0.1340.4820.6130.8580.6380.5320.6641511bzpA1a6mA0.1140.6150.7840.9410.8120.4430.8182991i1xA1i1wA0.0700.7790.9480.9620.9630.8960.9901641swzA1sx7A0.0580.6640.9140.9700.9390.0070.9333641oc6A1oc7A0.1290.5340.6380.9330.8060.3670.8062631nymA1m40A0.2000.4670.7190.8460.7950.5350.8322911rtqA1lokA0.2220.6920.7780.9020.8400.6130.8471301oq5A1lugA0.3080.6220.7710.8950.8650.4000.8641581q0nA1f9yA0.1350.8760.8560.9370.9240.7100.9001831kmsA1kmvA0.1830.4740.6380.9020.8300.1760.801511gdnA1pq7A0.2310.7170.8570.9250.9510.0620.9301201uwnX1nwcA

off-diagonaldiagonalAll ADP(ΔR)2Anisotropyvolumes KLDPearson correlation coefficient r between

NPDB structures

0.4120.4170.4800.7970.5690.2400.544173Mean (8 pairs)0.2560.4260.3390.7160.3560.3080.295811ir0A1iqzA0.3170.2810.5060.8000.5600.1280.5871291ieeA3lztA0.3920.5400.6560.8360.7560.0420.7473151t41A1pwmA0.5400.3860.3810.7990.4360.3260.4481511u7sA1a6mA0.1630.6750.7970.9200.8530.5060.8542222a70B2a6zA0.9680.1830.2530.6760.3720.1090.2191511rb0A1f9yA0.3760.3660.4330.8170.5570.2060.5222231ppzA1pq7A0.2870.4820.4800.8120.6440.2940.6821191kouA1nwzA

sam

e cr

ysta

l for

mdi

ff. c

ryst

al

0.4290.3430.4860.7710.5710.2600.547186Mean (93 proteins)

anm

The levels of agreement between

the different crystal forms of the same protein

are comparable to those between

theoretical and experimental predictions

4lzt 3lzt ANM (3lzt)1iee

Hen egg lysozyme

SG: P1 SG: P1SG: P43 21

KLD

B

ette

r agr

eem

ent

Refining the model parameterrc = 10 Å

rc = 12 Å

rc = 15 Å

rc = 18 Å

d < rc

experimentalHistograms of anisotropy values

Anisotropy in mechanical resistance to unfolding

• Processes in living cells depend on the mechanical properties of bio-molecules. Many proteins need resistance to mechanical pressures to fulfill their function, for example within muscle fibers and in the cytoskeleton.

• Recent advances in single-molecule atomic techniques such as force microscopy (AFM) and optical tweezers, allow to examine the response of proteins to well-oriented tensions.

• Unfolding forces in different pulling directions have been measured for green fluorescent protein (GFP), ubiquitin and the lipoyl domain (E2lip3) of acetyl transferase subunit E2p.

• Significant differences have been observed in the responses of the very same molecule to pulling along different directions. Dietz et al. (2006) PNAS 103, 12724–12728

Eyal and Bahar (2008) Biophys j, in press

|||.

cos (k)ij

(0)ij

(k)ij

(0)ij(k)

ij RR|RR

aΔ

Δ≡

Normalized contribution of each mode:

Weighted contribution of each mode:

Effective spring constant for the system

Mechanical resistance of GFP

The contributions of different modes distribute differently in the different directions.

Unfolding along the barrel axis requires weaker forces

Complete mechanical resistance map of GFP

250.41-80

1771.251-41

Exp (pN)

Calc(N/m)

Mechanical resistance of E2lip3

Elastic Network Models for predicting of mechanical resistance

• ENM emerges as a promising tool to estimate anisotropic mechanical resistance of globular proteins

• The method is very efficient and can be used to scan many pulling directions and direct experimental setup.

How do ENM predict path dependent events?

•The theoretical scope of ANM is only “close enough”to local minima

• How can ANM provide prediction to large scale conformation changes and unfolding paths?

• The topology of the folded state contains much of the information required to determine alternative states and transition paths.

http://ignmtest.ccbb.pitt.edu/anm

Documents

experiments? Dynamics - Biological computing · …so what useful information do we get? • Decomposition of the motion to independent modes. In each mode all particles move in the