45
Sequence Alignment Kun-Mao Chao ( 趙趙趙 ) Department of Computer Scienc e and Information Engineering National Taiwan University, T aiwan WWW: http://www.csie.ntu.edu.tw/~k mchao

Sequence Alignment

  • Upload
    nalani

  • View
    25

  • Download
    0

Embed Size (px)

DESCRIPTION

Sequence Alignment. Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University, Taiwan WWW: http://www.csie.ntu.edu.tw/~kmchao. Useful Websites. MIT Biology Hypertextbook http://www.mit.edu:8001/afs/athena/course/other/esgbio/www/7001main.html - PowerPoint PPT Presentation

Citation preview

Page 1: Sequence Alignment

Sequence Alignment

Kun-Mao Chao (趙坤茂 )Department of Computer Science an

d Information EngineeringNational Taiwan University, Taiwan

WWW: http://www.csie.ntu.edu.tw/~kmchao

Page 2: Sequence Alignment

2

Useful Websites• MIT Biology Hypertextbook

– http://www.mit.edu:8001/afs/athena/course/other/esgbio/www/7001main.html

• The International Society for Computational Biology:– http://www.iscb.org/

• National Center for Biotechnology Information (NCBI, NIH):– http://www.ncbi.nlm.nih.gov/

• European Bioinformatics Institute (EBI):– http://www.ebi.ac.uk/

• DNA Data Bank of Japan (DDBJ):– http://www.ddbj.nig.ac.jp/

Page 3: Sequence Alignment

3

orz’s sequence evolutionorz (kid)OTZ (adult)Orz (big head)Crz (motorcycle driver)on_ (soldier)or2 (bottom up)oΩ (back high)STO (the other way around)Oroz (me)

the origin?

their evolutionary relationships?

their putative functional relationships?

Page 4: Sequence Alignment

4

What?

THETR UTHIS MOREI

MPORT ANTTH ANTHE

FACTS

The truth is more important than the facts.

Page 5: Sequence Alignment

5

Dot MatrixSequence A: CTTAACT

Sequence B: CGGATCATC G G A T C A T

C

T

T

A

A

C

T

Page 6: Sequence Alignment

6

C---TTAACTCGGATCA--T

Pairwise AlignmentSequence A: CTTAACTSequence B: CGGATCAT

An alignment of A and B:

Sequence A

Sequence B

Page 7: Sequence Alignment

7

C---TTAACTCGGATCA--T

Pairwise AlignmentSequence A: CTTAACTSequence B: CGGATCAT

An alignment of A and B:

Insertion gap

Match Mismatch

Deletion gap

Page 8: Sequence Alignment

8

Alignment GraphSequence A: CTTAACT

Sequence B: CGGATCATC G G A T C A T

C

T

T

A

A

C

T

C---TTAACTCGGATCA--T

Page 9: Sequence Alignment

9

A simple scoring scheme

• Match: +8 (w(x, y) = 8, if x = y)

• Mismatch: -5 (w(x, y) = -5, if x ≠ y)

• Each gap symbol: -3 (w(-,x)=w(x,-)=-3)

C - - - T T A A C TC G G A T C A - - T

+8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12

Alignment score

Page 10: Sequence Alignment

10

An optimal alignment-- the alignment of maximum score

• Let A=a1a2…am and B=b1b2…bn .

• Si,j: the score of an optimal alignment between a1a2…ai and b1b2…bj

• With proper initializations, Si,j can be computedas follows.

)b,w(as

)b,w(s

),w(as

maxs

ji1j1,i

j1ji,

ij1,i

ji,

Page 11: Sequence Alignment

11

Computing Si,j

i

j

w(ai,-)

w(-,bj)

w(ai,b

j)

Sm,n

Page 12: Sequence Alignment

12

Initializations

0 -3 -6 -9 -12 -15 -18 -21 -24

-3

-6

-9

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Page 13: Sequence Alignment

13

S3,5 = ?

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 ?

-12

-15

-18

-21

C G G A T C A T

C

T

T

A

A

C

T

Page 14: Sequence Alignment

14

S3,5 = 5

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 5 -1 -1 9

-12 -1 -3 -5 6 3 0 7 6

-15 -4 -6 -8 3 1 -2 8 5

-18 -7 -9 -11 0 -2 9 6 3

-21 -10 -12 -14 -3 8 6 4 14

C G G A T C A T

C

T

T

A

A

C

T

optimal score

Page 15: Sequence Alignment

15

C T T A A C – TC G G A T C A T

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 8 5 2 -1 -4 -7 -10 -13

-6 5 3 0 -3 7 4 1 -2

-9 2 0 -2 -5 5 -1 -1 9

-12 -1 -3 -5 6 3 0 7 6

-15 -4 -6 -8 3 1 -2 8 5

-18 -7 -9 -11 0 -2 9 6 3

-21 -10 -12 -14 -3 8 6 4 14

C G G A T C A T

C

T

T

A

A

C

T

8 – 5 –5 +8 -5 +8 -3 +8 = 14

Page 16: Sequence Alignment

16

Now try this example in class

Sequence A: CAATTGASequence B: GAATCTGC

Their optimal alignment?

Page 17: Sequence Alignment

17

Initializations

0 -3 -6 -9 -12 -15 -18 -21 -24

-3

-6

-9

-12

-15

-18

-21

G A A T C T G C

C

A

A

T

T

G

A

Page 18: Sequence Alignment

18

S4,2 = ?

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 -5 -8 -11 -14 -4 -7 -10 -13

-6 -8 3 0 -3 -6 -9 -12 -15

-9 -11 0 11 8 5 2 -1 -4

-12 -14 ?

-15

-18

-21

G A A T C T G C

C

A

A

T

T

G

A

Page 19: Sequence Alignment

19

S5,5 = ?

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 -5 -8 -11 -14 -4 -7 -10 -13

-6 -8 3 0 -3 -6 -9 -12 -15

-9 -11 0 11 8 5 2 -1 -4

-12 -14 -3 8 19 16 13 10 7

-15 -11 -6 5 16 ?

-18

-21

G A A T C T G C

C

A

A

T

T

G

A

Page 20: Sequence Alignment

20

S5,5 = 14

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 -5 -8 -11 -14 -4 -7 -10 -13

-6 -8 3 0 -3 -6 -9 -12 -15

-9 -11 0 11 8 5 2 -1 -4

-12 -14 -3 8 19 16 13 10 7

-15 -11 -6 5 16 14 24 21 18

-18 -7 -9 2 13 11 21 32 29

-21 -10 1 -1 10 8 18 29 27

G A A T C T G C

C

A

A

T

T

G

A

optimal score

Page 21: Sequence Alignment

21

0 -3 -6 -9 -12 -15 -18 -21 -24

-3 -5 -8 -11 -14 -4 -7 -10 -13

-6 -8 3 0 -3 -6 -9 -12 -15

-9 -11 0 11 8 5 2 -1 -4

-12 -14 -3 8 19 16 13 10 7

-15 -11 -6 5 16 14 24 21 18

-18 -7 -9 2 13 11 21 32 29

-21 -10 1 -1 10 8 18 29 27

G A A T C T G C

C

A

A

T

T

G

A

-5 +8 +8 +8 -3 +8 +8 -5 = 27

C A A T - T G AG A A T C T G C

Page 22: Sequence Alignment

22

Global Alignment vs. Local Alignment

• global alignment:

• local alignment:

Page 23: Sequence Alignment

23

Maximum-sum interval

• Given a sequence of real numbers a1a2…an , find a consecutive subsequence with the maximum sum.9 –3 1 7 –15 2 3 –4 2 –7 6 –2 8 4 -9

For each position, we can compute the maximum-sum interval starting at that position in O(n) time. Therefore, a naive algorithm runs in O(n2) time.

Page 24: Sequence Alignment

24

Maximum-sum interval(The recurrence relation)

• Define S(i) to be the maximum sum of the intervals ending at position i.

0

)1(max)(

iSaiS i

ai

If S(i-1) < 0, concatenating ai with its previous interval gives less sum than ai itself.

Page 25: Sequence Alignment

25

Maximum-sum interval(Tabular computation)

9 –3 1 7 –15 2 3 –4 2 –7 6 –2 8 4 -9

S(i) 9 6 7 14 –1 2 5 1 3 –4 6 4 12 16 7

The maximum sum

Page 26: Sequence Alignment

26

Maximum-sum interval(Traceback)

9 –3 1 7 –15 2 3 –4 2 –7 6 –2 8 4 -9

S(i) 9 6 7 14 –1 2 5 1 3 –4 6 4 12 16 7

The maximum-sum interval: 6 -2 8 4

Page 27: Sequence Alignment

27

An optimal local alignment

• Si,j: the score of an optimal local alignment ending at ai and bj

• With proper initializations, Si,j can be computedas follows.

),(

),(),(

0

max

1,1

1,

,1

,

jiji

jji

iji

ji

baws

bwsaws

s

Page 28: Sequence Alignment

28

local alignment

0 0 0 0 0 0 0 0 0

0 8 5 2 0 0 8 5 2

0 5 3 0 0 8 5 3 13

0 2 0 0 0 8 5 2 11

0 0 0 0 8 5 3 ?

0

0

0

C G G A T C A T

C

T

T

A

A

C

T

Match: 8

Mismatch: -5

Gap symbol: -3

Page 29: Sequence Alignment

29

local alignment

0 0 0 0 0 0 0 0 0

0 8 5 2 0 0 8 5 2

0 5 3 0 0 8 5 3 13

0 2 0 0 0 8 5 2 11

0 0 0 0 8 5 3 13 10

0 0 0 0 8 5 2 11 8

0 8 5 2 5 3 13 10 7

0 5 3 0 2 13 10 8 18

C G G A T C A T

C

T

T

A

A

C

T

Match: 8

Mismatch: -5

Gap symbol: -3

The best

score

Page 30: Sequence Alignment

30

0 0 0 0 0 0 0 0 0

0 8 5 2 0 0 8 5 2

0 5 3 0 0 8 5 3 13

0 2 0 0 0 8 5 2 11

0 0 0 0 8 5 3 13 10

0 0 0 0 8 5 2 11 8

0 8 5 2 5 3 13 10 7

0 5 3 0 2 13 10 8 18

C G G A T C A T

C

T

T

A

A

C

T

The best

score

A – C - TA T C A T8-3+8-3+8 = 18

Page 31: Sequence Alignment

31

Now try this example in class

Sequence A: CAATTGASequence B: GAATCTGC

Their optimal local alignment?

Page 32: Sequence Alignment

32

Did you get it right?

0 0 0 0 0 0 0 0 0

0 0 0 0 0 8 5 2 8

0 0 8 8 5 5 3 0 5

0 0 8 16 13 10 7 4 2

0 0 5 13 24 21 18 15 12

0 0 2 10 21 19 29 26 23

0 8 5 7 18 16 26 37 34

0 5 16 13 15 13 23 34 32

G A A T C T G C

C

A

A

T

T

G

A

Page 33: Sequence Alignment

33

0 0 0 0 0 0 0 0 0

0 0 0 0 0 8 5 2 8

0 0 8 8 5 5 3 0 5

0 0 8 16 13 10 7 4 1

0 0 5 13 24 21 18 15 12

0 0 2 10 21 19 29 26 23

0 8 5 7 18 16 26 37 34

0 5 16 13 15 13 23 34 32

G A A T C T G C

C

A

A

T

T

G

A

A A T – T GA A T C T G8+8+8-3+8+8 = 37

Page 34: Sequence Alignment

34

Affine gap penalties• Match: +8 (w(a, b) = 8, if a = b)

• Mismatch: -5 (w(a, b) = -5, if a ≠ b)

• Each gap symbol: -3 (w(-,b) = w(a,-) = -3)

• Each gap is charged an extra gap-open penalty: -4.

C - - - T T A A C TC G G A T C A - - T

+8 -3 -3 -3 +8 -5 +8 -3 -3 +8 = +12

-4 -4

Alignment score: 12 – 4 – 4 = 4

Page 35: Sequence Alignment

35

Affine gap panalties• A gap of length k is penalized x + k·y.

gap-open penalty

gap-symbol penaltyThree cases for alignment endings:

1. ...x...x

2. ...x...-

3. ...-...x

an aligned pair

a deletion

an insertion

Page 36: Sequence Alignment

36

Affine gap penalties

• Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion.

• Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion.

• Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

Page 37: Sequence Alignment

37

Affine gap penalties

),(

),(

),()1,1(

max),(

)1,(

)1,(max),(

),1(

),1(max),(

jiI

jiD

bawjiS

jiS

yxjiS

yjiIjiI

yxjiS

yjiDjiD

ji

(A gap of length k is penalized x + k·y.)

Page 38: Sequence Alignment

38

Affine gap penalties

SI

D

SI

D

SI

D

SI

D

-y-x-y

-x-y

-y

w(ai,bj)

Page 39: Sequence Alignment

39

Constant gap penalties• Match: +8 (w(a, b) = 8, if a = b)

• Mismatch: -5 (w(a, b) = -5, if a ≠ b)

• Each gap symbol: 0 (w(-,b) = w(a,-) = 0)

• Each gap is charged a constant penalty: -4.

C - - - T T A A C TC G G A T C A - - T

+8 0 0 0 +8 -5 +8 0 0 +8 = +27

-4 -4

Alignment score: 27 – 4 – 4 = 19

Page 40: Sequence Alignment

40

Constant gap penalties

• Let D(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with a deletion.

• Let I(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj ending with an insertion.

• Let S(i, j) denote the maximum score of any alignment between a1a2…ai and b1b2…bj.

Page 41: Sequence Alignment

41

Constant gap penalties

gap afor penalty gapconstant a is where

),(

),(

),()1,1(

max),(

)1,(

)1,(max),(

),1(

),1(max),(

x

jiI

jiD

bawjiS

jiS

xjiS

jiIjiI

xjiS

jiDjiD

ji

Page 42: Sequence Alignment

42

Restricted affine gap panalties• A gap of length k is penalized x + f(k)·y.

where f(k) = k for k <= c and f(k) = c for k > c

Five cases for alignment endings:

1. ...x...x

2. ...x...-

3. ...-...x

4. and 5. for long gaps

an aligned pair

a deletion

an insertion

Page 43: Sequence Alignment

43

Restricted affine gap penalties

),(');,(

),(');,(

),()1,1(

max),(

)1,(

)1,('max),('

)1,(

)1,(max),(

),1(

),1('max),('

),1(

),1(max),(

jiIjiI

jiDjiD

bawjiS

jiS

cyxjiS

jiIjiI

yxjiS

yjiIjiI

cyxjiS

jiDjiD

yxjiS

yjiDjiD

ji

Page 44: Sequence Alignment

44

D(i, j) vs. D’(i, j)

• Case 1: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length <= c D(i, j) >= D’(i, j)

• Case 2: the best alignment ending at (i, j) with a deletion at the end has the last deletion gap of length >= c

D(i, j) <= D’(i, j)

Page 45: Sequence Alignment

45

Max{S(i,j)-x-ky, S(i,j)-x-cy}

kc

S(i,j)-x-cy