54
Post-processing long pairwise alignments 陳陳陳 93/4/28 Zheng Zhang et al., Bioinformatics Vol.15 no. 12 1999

Post-processing long pairwise alignments

  • Upload
    samara

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Post-processing long pairwise alignments. 陳啟煌 93/4/28. Zheng Zhang et al., Bioinformatics Vol.15 no. 12 1999. Outline. Motivation Theoretical basis of the proposed algorithms How to build up Useful Tree An application. Motivation. Avoid local alignment problems - PowerPoint PPT Presentation

Citation preview

Page 1: Post-processing long pairwise alignments

Post-processing long pairwise alignments

陳啟煌 93/4/28

Zheng Zhang et al., Bioinformatics Vol.15 no. 12 1999

Page 2: Post-processing long pairwise alignments

Outline Motivation Theoretical basis of the proposed

algorithms How to build up Useful Tree An application

Page 3: Post-processing long pairwise alignments

Motivation Avoid local alignment problems

Smith-Waterman lead to inclusion of an arbitrarily poor internal segment.

Others approaches may generate an alignment score less than some internal segment

Page 4: Post-processing long pairwise alignments

Smith-Waterman approach

810

11132350

103580000

1810132035071335258082580000

1158000201358003502800258000000000

C G G A T C A T

C

T

T

A

A

C

T

The best

score

Page 5: Post-processing long pairwise alignments

Inclusion of a poor segment Inclusion of an arbitrarily poor

region in an alignment Smith-Waterman approach potential

flaws.

Page 6: Post-processing long pairwise alignments

X-Alignments An X-Drop within an alignment,

where X>0 is fixed in advance. A region of consecutive columns

scoring less than <-X Alignments contain no X-Drop, we

call X-alignments

Page 7: Post-processing long pairwise alignments

BLAST

In Blast Step 3: Extend hits.

Terminate if the score of the extension fades away. (That is, when we reach a segment pair whose score falls a certain distance below the best score found for shorter extensions.)

hit

Page 8: Post-processing long pairwise alignments

2X-Drop

Page 9: Post-processing long pairwise alignments

Non-normal alignment

The HSP has been extended to the right side in such a way that the entire alignment score less than the section from a to b

Page 10: Post-processing long pairwise alignments

The Proposed Approach Provide techniques for decomposing a

long alignment into sub-alignments that avoid the both problems. Show how to scan an alignment to collect

information from which a decomposition corresponding any X can be found almost instantaneously.

Provide a method for detecting variations in the rate of genome evolution

Page 11: Post-processing long pairwise alignments

Useful Tree

Page 12: Post-processing long pairwise alignments

X-full alignment An alignment are normal if each

of its prefixes or suffixes has non-negative score.

An alignment is not contained in any longer normal alignment is called full

X-alignment + maximal X-normal is called X-full

Page 13: Post-processing long pairwise alignments

X-full alignment 0-full alignment is maximal runs of

columns of A with non-negative scores. For every X, X-full alignments are

pairwise disjoint. If X<YX-full alignment contained in

Y-full alignment. -full alignments are just full

alignments

Page 14: Post-processing long pairwise alignments

Useful Tree Encode X-full alignments for all X≥ 0

in tree data structure. Leaves: 0-full alignments & maximal runs

of negative score columns alternately Terminal Leaves: add two special leaves

with score - Each internal node is a disjoint union of its

three children. Keep alignment’s score and the minimum sub-alignment’s score

Page 15: Post-processing long pairwise alignments

Time complexity Construct time: O(N) Search Time:

If k such alignments,need inspect at most 3k+1 nodes

(2k+1) leaves+((2k+1-1)/2) internal nodes =3k+1 nodes

Page 16: Post-processing long pairwise alignments

Decompose rules Alignment A A1,A2,…., A2n-1

# of sub-alignment is odd i :score of Ai Negative & Non-negative score

alternately 0=2n= -∞

Page 17: Post-processing long pairwise alignments

Theoretical basis

Page 18: Post-processing long pairwise alignments

Theoretical basis

Page 19: Post-processing long pairwise alignments

Theoretical basis Lemma1:X is consistent Lemma2:A normal drop is

consistent with X

Page 20: Post-processing long pairwise alignments

Lemma 3

Page 21: Post-processing long pairwise alignments

Useful tree definition Each node of T is a segment

consistent with X. Each leaf of T is of the form [i,i+1) Each internal node [a,d) has

exactly three children. [a,b),[b,c) and [c,d) and the signs of their scores alternate.

Page 22: Post-processing long pairwise alignments

Lemma 4

Page 23: Post-processing long pairwise alignments

Possible negative merge LEMMA 5. Assume that three consecutive roots in our

sequence, [a,b),[b,c),and [c,d), satisfy 0 ≤ (b,c)< min(- (a,b),- (c,d))

Then merging these trees into a single tree with root [a,d) creates a useful tree and the resulting sequence still satisfies P1 and P2.

If a,b,c and d satisfy this lemma,[a,d) is a possible negative merger.

Page 24: Post-processing long pairwise alignments

Possible positive merge LEMMA 6. Assume that five consecutive roots in our

sequence, [a,b),[b,c),[c,d),[d,e) and [e,f) satisfy 0 > (c,d) ≥ max( (a,b), (e,f)) neither [a,d) nor [c,f) is a possible negative

Then merging these trees into a single tree with roots[b,c),[c,d),[d,e)into a single root[b,e) creates a useful tree and the resulting sequence still satisfies P1 and P2.

If a,b,c,d,e and f satisfy this lemma,[a,d) is a possible positive merger.

Page 25: Post-processing long pairwise alignments

Lemma 7

Page 26: Post-processing long pairwise alignments

Theoretical basis Normal rise and normal drop Useful Tree contains every

segments Possible negative merger Possible positive merger Always exists possible negative

merger or possible positive merger

Page 27: Post-processing long pairwise alignments

Decompose rules Alignment A A1,A2,…., A2n-1

# of sub-alignment is odd i :score of Ai Negative(odd i) & Non-

negative(even i) score alternately 0=2n= -∞

Page 28: Post-processing long pairwise alignments

Useful Tree build up procedure

1.Push the first leaf on the stack2.While the stack size exceeds 1 or there is an

unvisited leaf do3. if the top three stack items indicate a negative merger then

4. pop three items,merge them and push the result onto the stack

5. else if the top five segments indicate a positive merge then6. pop an item{e,f} perform line 4. and push {e,f} back7. else

8. push the next two leaves onto the stack

Page 29: Post-processing long pairwise alignments

Construct Useful Tree ACAACAGAAACT | | || ||| ATA--AG-CACT Gop:0 Gep:1 Match/mismatch: 1/-1

Page 30: Post-processing long pairwise alignments

1 2 3 4 7 8 9 11S1 A C A A C A G A A A C T

| | | | | | |S2 A T A - - A G C A - C T

S 0 0 0 0 0 0 0 -1 0 0 1 -1 1 -1 1 -1 -13 -13 1 0 1 0 1 0 -13

22

5 6 10

-2-2-1

21

Page 31: Post-processing long pairwise alignments

Push 1 Push 2,3 Push 4,5,

Merge 2,3,4 as a Merge 1,a,5 as b

Push 6,7 Push 8,9

Merge 6,7,8 as c Push 10,11

Merge 9,10,11 as d Merge b,c,d as e

1 2 3 4 7 8 9 11S1 A C A A C A G A A A C T

| | | | | | |S2 A T A - - A G C A - C T

S 0 0 0 0 0 0 0 -1 0 0 1 -1 1 -1 1 -1 -13 -13 1 0 1 0 1 0 -13

22

5 6 10

-2-2-1

21

-2 -1 -131 1 2-1 -2 -1 -1 -131 1 2 2 3

-13 -13 -14 -14 -14 -14

Page 32: Post-processing long pairwise alignments

Source code of this paper http://globin.cse.psu.edu/dist/decom/

Page 33: Post-processing long pairwise alignments

Alignment file #:lav d { "simu elegans briggsae M = 10, I = -10, V = -10, O = 60, E = 2" } s { "s1" 1 12 "s2" 1 9 } h { ">SUPERLINK_RWXL 2782216-2889703" ">dna -c briggsae.dna " }

a { s 562 b 1 1 e 3 3 l 1 1 3 3 99 l 6 4 9 7 99 l 11 8 12 9 99}

1 2 3 4 7 8 9 11S1 A C A A C A G A A A C T

| | | | | | |S2 A T A - - A G C A - C T

S 0 0 0 0 0 0 0 -1 0 0 1 -1 1 -1 1 -1 -13 -13 1 0 1 0 1 0 -13

22

5 6 10

-2-2-1

21

Page 34: Post-processing long pairwise alignments

An Application Different regions of a mammalian genome

evolve at different rates. Provide a method for detecting variations in

the rate of genome evolution To compare the rates of evolution in

different genomic regions from humans and mice. Align each pair of homologous regions and

determined

Page 35: Post-processing long pairwise alignments

Pitfalls Tally statistics only at sequence

not in exons Regions adjacent to an exon

maybe be aligned Remove the exons before producing

the alignment The alignment program is unable to

differentiate the biologically meaning alignment

Page 36: Post-processing long pairwise alignments

Proposed approach First align the sequences using the

exons as guideposts Then re-score the alignment where

positions within exons are masked,so that they cannot be aligned to another nucleotide.

Page 37: Post-processing long pairwise alignments
Page 38: Post-processing long pairwise alignments

References Zheng Zhang et al., “Post-processing long

pairwise alignments”,Bioinformatics, Vol.15 no. 12 1999

http://globin.cse.psu.edu/dist/decom/ Kun-Mao Chao ,Algorithms for Biological

Sequence Analysis Lecture Notes, National Taiwan University, Spring 2004

Page 39: Post-processing long pairwise alignments

Q&A Thank you!

Page 40: Post-processing long pairwise alignments

Possible mistakes, but maybe not

P.1015 left col., last 2 row ∑ k=1 ∑ k=i

P.1015 Right col. [i,i) should be [i,j) P.1016 proof of lemma4 4 [i,i) should

be [i,j) P.1017 proof of lemma5 (b,c) (e,c)

should be (b,c)- (e,c) P.1017 lemma7 (ai-3,ai-2) (ai-4,ai-1)

Page 41: Post-processing long pairwise alignments
Page 42: Post-processing long pairwise alignments
Page 43: Post-processing long pairwise alignments

Proof 1

Lemma1:X is consistent

Page 44: Post-processing long pairwise alignments

Proof of lemma2

Page 45: Post-processing long pairwise alignments

Lemma 3

Page 46: Post-processing long pairwise alignments

Proof of lemma3

Page 47: Post-processing long pairwise alignments

Lemma 4

Page 48: Post-processing long pairwise alignments

Proof of lemma4

Page 49: Post-processing long pairwise alignments

Possible negative merge LEMMA 5. Assume that three consecutive roots in our

sequence, [a,b),[b,c),and [c,d), satisfy 0 ≤ (b,c)< min(- (a,b),- (c,d))

Then merging these trees into a single tree with root [a,d) creates a useful tree and the resulting sequence still satisfies P1 and P2.

If a,b,c and d satisfy this lemma,[a,d) is a possible negative merger.

Page 50: Post-processing long pairwise alignments

Proof of lemma5

Page 51: Post-processing long pairwise alignments

Possible positive merge LEMMA 6. Assume that five consecutive roots in our

sequence, [a,b),[b,c),[c,d),[d,e) and [e,f) satisfy 0 > (c,d) ≥ max( (a,b), (e,f)) neither [a,d) nor [c,f) is a possible negative

Then merging these trees into a single tree with roots[b,c),[c,d),[d,e)into a single root[b,e) creates a useful tree and the resulting sequence still satisfies P1 and P2.

If a,b,c,d,e and f satisfy this lemma,[a,d) is a possible positive merger.

Page 52: Post-processing long pairwise alignments

Proof of lemma6

Page 53: Post-processing long pairwise alignments

Lemma 7

Page 54: Post-processing long pairwise alignments

Proof of lemma7