Upload
brittney-roberts
View
214
Download
0
Embed Size (px)
Citation preview
Jia-Ming Chang 0508Graph Algorithms and Their Applications to Bioinformatics
1/38
Determine Protein Structure X-ray
波長約 1 Å 長度接近原子間的距離 研究結晶的狀態的分子行為 定出其晶體結構,也包含蛋白質體結構
X-ray與結構生物學 利用 X-ray繞射法分析高度純化結晶的蛋白質的每個基團和原子的空間定位。
Nuclear magnetic resonance (NMR)NMR是涉及原子核吸收的過程。因為對某些原子核而言,具有自旋和磁矩的性質。因此,若暴露於強磁場中原子核會吸收電磁輻射,這是由磁場誘導而發生能階分裂的結果。科學家並發現,分子環境會影響在磁場中原子核的無線電波的吸收,利用這種特性來分析分子的結構
AVANCE 800 AV IBMS, Sinica 2/38
NMR – Nuclear Spin (1/5)
3/38
NMR – Nuclear Spin (2/5)
4/38
NMR - Magnetic Field (3/5)
5/38
NMR – Resonance (4/5)
6/38
NMR – Chemical Shift (5/5)
7/38
Find out Chemical Shift for Each Atom• Backbone: Ca, Cb, C’, N, NH
HSQC, CBCANH, CBCACONH
C CN
H H
C
C
C
H2
H2
H3
Chemical Shift Assignment (1/2)
One amino acid
8/38
Chemical Shift Assignment (2/2)
H-C-H
H-CC-H
H
-N-C-C-N-C-C-N-C-C-N-C-C-
O
O
O
O
H H
H
H
H O
H
H-C-H
CH3
Backbone
ppm18-23
19-24
16-20
17-23
31-34
55-60
CH3 30-35
9
HSQC Spectra HSQC peaks (1 chemical shifts for an amino acid)
HH NN IntensityIntensity
8.1098.109 118.60118.60 6592003265920032
HSQC
10
CBCA(CO)NH Spectra CBCA(CO)NH peaks (2 chemical shifts for one amino
acid) HH NN CC IntensityIntensity
8.1168.116 118.25118.25 16.3716.37 7923881179238811
8.1098.109 118.60118.60 36.5236.52 6592003265920032
11
CBCANH Spectra CBCANH peaks (4 chemical shifts for one amino acid)
Ca (+), Cb (-)
HH NN CC Intensity Intensity
8.1168.116 118.25118.25 16.3716.37 7923881179238811
8.1098.109 118.60118.60 36.5236.52 -65920032-65920032
8.1178.117 118.90118.90 61.5861.58 -51223894-51223894
8.1198.119 117.25117.25 57.4257.42 109928374109928374
++
--
12
A Dataset Example
N
HHSQC
HNCACB
CBCA(CO)NH
13/38
A Perfect Spin System Group
NN HH CC IntensityIntensity
113.293113.293 7.8977.897 56.29456.294 1.64325e+0081.64325e+008
113.293113.293 7.8977.897 27.85327.853 1.08099e+0081.08099e+008
CCaai-1i-1 CCbb
i-1i-1 CCaaii CCbb
ii
56.294
28.165
62.544 68.483NN HH CC IntensityIntensity
113.293113.293 7.927.92 62.54462.544 8.52851e+0078.52851e+007
113.293113.293 7.927.92 56.29456.294 4.71331e+0074.71331e+007
113.293113.293 7.927.92 68.48368.483 -8.54121e+007-8.54121e+007
113.293113.293 7.927.92 28.16528.165 -3.49346e+007-3.49346e+007
CBCA(CO)NH
CBCANH
i -1
i -1
Ca
Ca
Cb
Cb
14
Coding
Translate the target protein sequence and spin systems into coding sequences based on the following table.
Atreya, H.S., K.V.R. Chary, and G. Govil, Automated NMR assignments of proteins for high throughput structure determination: TATAPRO II. Current Science, 2002. 83(11): p. 1372-1376.
15/38
Backbone Assignment
GoalAssign chemical shifts to N, NH, Ca (and
Cb) along the protein backbone.
General approachesGenerate spin systems
○ A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb).
Link spin systems
16/38
Ambiguities
All 4 point experiments are mixed together
All 2 point experiments are mixed together
Each spin system can be mapped to several amino acids in the protein sequence
False positives, false negatives
17/38
Ambiguous Spin System
NN HH CC IntensityIntensity
106.9106.9 8.878.87 54.9254.92 423879423879
106.9106.9 8.878.87 40.3540.35 524522524522
NN HH CC IntensityIntensity
106.91106.91 8.858.85 59.759.7 235673235673
106.92106.92 8.868.86 54.9354.93 346234346234
106.91106.91 8.868.86 61.561.5 432432432432
106.91106.91 8.858.85 40.3140.31 -335759-335759
106.92106.92 8.868.86 30.530.5 -483759-483759
NN HH CCaai-1i-1 CCbb
i-1i-1 CCaaii CCbb
ii
106.1106.1 8.858.85 54.9354.93 40.3140.31 59.759.7 30.530.5
106.1106.1 8.858.85 61.561.5 40.3140.31 59.759.7 30.530.5
Two possible spin systems
18
Multiple Candidates One spin system maybe assign to many places
of a protein sequence. Spin system(SS)
Protein Sequence: AKFERQHMDSSTSRNLTKDR
NN HH CCaai-1i-1 CCbb
i-1i-1 CCaaii CCbb
ii
119.7119.7 8.848.84 58.458.4 32.732.7 56.356.3 40.840.8
SS SS SS SSPossible place
19
False Positives and False Negatives False positives
Noise with high intensityProduce fake spin systems
False negativesPeaks with low intensityMissing peaks
In real wet-lab data, nearly 50% are noises (false positive).
20/38
Spin System GroupPerfect
False Negative
False Positive
N
HHSQC
HNCACB
CBCA(CO)NH
21/38
Spin System Linking
GoalLink spin system as long as possible.
Constraints Each spin system is uniquely assigned to a
position of the target protein sequence.Two spin systems are linked only if the
chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.
22/38
Previous Approaches Constrained bipartite matching problem*
Can’t deal with ambiguous link Legal matching Illegal matching under constraints
*Xu Y, Xu D, Kim D, Olman V, Razumovskaya J, Jiang T. Automated assignment of backbone NMR peaks using constrained bipartite matching. Computing in Science & Engineering 2002;4(1):50-62.
23/38
Naatural Language Processing ─ Noises or Ambiguity ?
Speech recognition : Homopone selection
台 北 市 一 位 小 孩 走 失 了
台 北 市 小 孩台 北 適 宜 走 失 事 宜 一 位 一 味 移 位
24/38
An Error-Tolerant Algorithm
25
Phrase, Sentence Combination
26
Spin System Positioning
55.266 38.675 44.555 0
44.417 0 55.043 30.04
44.417 0 30.665 28.72
55356 29.782 60.044 37.541
D 50 G 10 R 40 I 50|51
55.266 38.675 44.555 0 => 50 10
44.417 0 55.043 30.04 =>10 40
44.417 0 30.665 28.72 =>10 40
55356 29.782 60.044 37.541 => 40 50
We assign spin system groups to a protein We assign spin system groups to a protein sequence according to their codes. sequence according to their codes.
Spin System
27/38
Link Spin System groups
Segment 3
Segment 2
Segment 155.266 38.675 44.555 0
44.417 0 55.043 30.04
44.417 0 30.665 28.72
55356 29.782 60.044 37.541
D G R I
28/38
Iterative Concatenation DGRI….FKJJREKL
….
Step n Segment 99
1
2
….
56
Spin Systems
1
2
2
47
1Step156…
Step2 Segment 1
Segment 2
Segment 31…
Step n-1 Segment 78 Segment 79…
29/38
Conflict Segments
DGRIDGRIGEIKGRKTLATPAVRRLAMENNIKLSGEIKGRKTLATPAVRRLAMENNIKLSSegment 78
Segment 71
Segment 79
Segment 99 Segment 98
Segment 97
Two kinds of conflict segments
Overlap (e.g. segment 71, segment 99)
Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1)
30/38
Independent Set
Subset S of vertices such that no two vertices in S are connected
www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt 31/38
Independent Set
Subset S of vertices such that no two vertices in S are connected
www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt 32/38
A Graph Model for Spin System Linking
G(V,E) V: a set of nodes (segments). E: (u, v), u, v V, u and v are conflict.
Goal Assign as many non-conflict segments
as possible => find the maximum independent set of G.
33
An Example of G
Seq. : Seq. : GEIKGRKTLATPAVRRLAMENNIKLSEGEIKGRKTLATPAVRRLAMENNIKLSE
Segment1: SP12->SP13->SP14
Segment2: SP9->SP13->SP20->SP4
Segment3: SP8->SP15->SP21
Segment4: SP7->SP1->SP15->SP3
Seg1 Seg3
Seg4 Seg2
Seg1
Seg3
Seg2
Seg4
SP13
SP15
Overlap
Overlap
34/38
Segment weight
The larger length of segment is, the higher weight of segment is.
The less frequency of segment is, the lower of segment is.
35/38
Find Maximum Weight Independent Set of G (1/2)
Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2).
VN(v)
Head_N(v)
36
Find Maximum Weight Independent Set of G (2/2)
Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2).
V
37
An Iterative Approach
We perform spin system generation and linking iteratively.
Three stages.
38/38