24
A Simple Model for Protein Structure 施施施 施施施施施施施施 ()

A Simple Model for Protein Structure 施奇廷(東海大學物理系)

  • View
    235

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

A Simple Model for Protein Structure

施奇廷(東海大學物理系)

Page 2: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

The ModelsThe Models

HP Model:

pi is H or P and =1 for contacts

EHH=-2.3, EHP=-1, and EPP=0 (Li et al., Science 273, 666)

For “additive” case: EHH=-2, EHP=-1, and Epipj=0: Epq=-

(pi+pj) where pi=1 (0) for H (P) residues

HP Model (2nd type):

)spps(2

1)s,pH( 22 2

Page 3: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

尋找最低能量態

對於每一種氨基酸序列,將之放入所有可對於每一種氨基酸序列,將之放入所有可能的構形中,計算其能量,找出能量最低能的構形中,計算其能量,找出能量最低者為其基態。注意基態能量不可簡併,否者為其基態。注意基態能量不可簡併,否則為不穩定之構形,將被演化淘汰。例如則為不穩定之構形,將被演化淘汰。例如在在 4x44x4 晶格中,一序列為:晶格中,一序列為:

HHPHHPHPPPPHHPHHHHPHHPHPPPPHHPHH

Page 4: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

HP Model (1st Type)

Page 5: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

第二個模型可以視為 第二個模型可以視為 HP HP 模型之「平均場模型之「平均場近似」:將晶格點的位置分為兩類,一種近似」:將晶格點的位置分為兩類,一種是表面的(是表面的( SS ),一種是核心的(),一種是核心的( CC ),),若一疏水氨基酸出現在核心(不與水接若一疏水氨基酸出現在核心(不與水接觸),則能量可降低一個單位。在此近似觸),則能量可降低一個單位。在此近似下,可將一種形狀用一個下,可將一種形狀用一個 NN 維向量()表維向量()表示,以 示,以 0 0 表 表 SS ,以 ,以 1 1 表 表 CC ,氨基酸序,氨基酸序列亦同( ):以 列亦同( ):以 0 0 表 表 PP ,以 ,以 1 1 表 表 HH 。。

Second Model: A Mean-Field Approximation

s

p

Page 6: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

HP Model (2nd Type)

)0110111101101000(

)0110000001100000(

)0001110010000000(

2

1

p

s

s

Page 7: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

可設計度( Designability )長度為長度為 NN 的序列,一共的序列,一共有有 22NN 種,每一個序列種,每一個序列都找出其對應的基態構都找出其對應的基態構形(基態簡併者除外),形(基態簡併者除外),計算每種構形被選為基計算每種構形被選為基態的次數,即為該構形態的次數,即為該構形的可設計度。的可設計度。

Page 8: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

Designability of a given structure:

Number of peptide sequences choosing a particular geometric structure as its non-degenerate ground state.

Page 9: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

Geometrical under-standing of the HP model (2nd type)

)spps(2

1)s,pH( 22 2

Page 10: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

LS Model: (C. Micheletti et al., PRL 80, 4987)

i

iii zzAzH ))()(()(

σi=L (0, large) or S(1, small);

z(σi)=1 (2) for L (S) residues inside the chain and z(σi)=2

(3) for L (S) residues at the ends of the chain;

zi() is number of contacts at site I;

A(x)=1 for x 0 and –a otherwise (a>0, a=≧ ∞ in the Ref.).

Page 11: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

In the N×N square lattices:

oS

cSS

oL

sLL

cS

sS

cL

sL nnnannaannnannH 2)21(222

Notations: nz is the number of (L or S) on the z-type sites,

z=o (s,c) for corner (side, core) sites, n=znz

for a >> 1 but finite, we get:

))1((22,22)(2 200 LSLSL

sL

oL nNananEEshanannnaH

for a=∞, L is prohibited to be on the core sites→nLc=0

Page 12: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

The most encodable compact structures for the LS model for 6×6 lattice. The shape of the one with highest score is identical to the case of HP model

Page 13: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

Geometrical Properties of the 2D Square Lattices

n00 (n10, n11): number of peptide bonds connecting 00 (10,

11) residues. The 1-0 bonds partition the sequence into n10+1 segments of contiguous 1’s or 0’s.

Constraints for N>4:

1. An isolated single 1 may only occur at an end of a path

2. An isolated single 0 may only either occur at or be one 1-segment away from an end of a path

3. Each of the four corners on the lattice belongs to a 0-segment with at least 4 sites, except when the corner is an end of a path

Page 14: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

4. For a path (1…1), 2n00 + n10 = 8N-8 and 2 n≦ 10 4N-12≦

5. (0010011…1): 2n00 + n10 = 8N-9, and 5 n≦ 10 4N-11≦

6. (0010011…1100100): 2n00 + n10 = 8N-10; and 10 n≦ 10

4N-10 for N>6, and 8 n≦ ≦ 10 4N-10 for N 6≦ ≦

7. (0010011…0) but not 6., 2n00 + n10 = 8N-10, and 4 n≦ 10

4N-12≦

8. (0…0) but not 6. and 7., 2n00 + n10 = 8N-10, 4 n≦ 10 4N-12≦

9. (0…1) but not 5., 2n00 + n10 = 8N-9, 1 n≦ 10 4N-13 ≦

Geometrical Properties of the 2D Square Lattices (conti.)

Page 15: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

Example:

Constraint 4: (1……1) type

Left: maximum n10=12 and Right: minimum n10=2

Page 16: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

Distribution of the Allowed Structures in the Hyperspace

More possible binary sequences with larger n10 are not

allowed to be a structure s than those with smaller n10

from the combinatorial point of view.

Page 17: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

Minimal Hamming distance dH(s1,s2) between two path s1,s2 is

approximately 4k (2k for triangular lattices) if n10=4k or 4k-

2:

1.  (…01111110…10000001…)→(01111000…10011001…)2.  (…01111110…10000001…)→(01100110…10011001…)

Page 18: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

On the average, the designability of s with larger n10 will be larger. And the results will also be true for other shape of 2D lattices.

Page 19: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

Comparison with Protein Data Bank

Metric representation of a sequence p with length l=2k:

k

i

iki

k

i

iik pypx

1

)1(

1

2;2

For a set of sequences collected by the models, calculate the frequency distribution of the subsequences with length 2k of the sequences. And plot it in a unit square. And then Calculate the correlation of the distribution function:

l

m

lj

li

lij mFmFO

2

1

)()()( )()(

where Fi(l)(m) is the normalized frequency of the mth

subsequence with length l in the set i.

Page 20: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

Results and Discussion

Average designabilities of the paths vs. n10 for the (c) 4×7 and

(d) 6×6 lattices, respectively.

Page 21: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

The frequencies of all the subsequences with length 12 observed in

(a) all proteins in PDB,(b) the alpha-helix parts

of (a), (c) the sequences belong

to the highly designable structures,

(d) the sequences belong to the low designable structures of HP model.

Page 22: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

The frequencies of all the subsequences with length 12 observed in

(a) all proteins in PDB,(b) the sequences belong

to highly designable structures of LS model.

(c) normalized frequencies of (a),

(d) normalized frequencies of (b).

Page 23: A Simple Model for Protein Structure 施奇廷(東海大學物理系)
Page 24: A Simple Model for Protein Structure 施奇廷(東海大學物理系)

Summary

HP model HP model 為研究蛋白質結構最簡單之模型,只為研究蛋白質結構最簡單之模型,只考慮親梳水作用考慮親梳水作用

可設計度之研究,可以解釋許多不同的蛋白質,可設計度之研究,可以解釋許多不同的蛋白質,折疊成類似形狀的現象折疊成類似形狀的現象

可設計度高的結構,擁有叫「縐摺」的表面可設計度高的結構,擁有叫「縐摺」的表面→→可可以自然給出表面的以自然給出表面的 -- 螺旋二級結構,與實驗結螺旋二級結構,與實驗結果吻合果吻合

LS model LS model 在數學上與 在數學上與 HP model HP model 是等價的,是等價的,但是物理意義卻不同但是物理意義卻不同

藉由與實際蛋白質序列與結構的比較,我們可藉由與實際蛋白質序列與結構的比較,我們可以判別各個不同的簡化模型之優劣以判別各個不同的簡化模型之優劣