Upload
samara
View
18
Download
0
Embed Size (px)
DESCRIPTION
Post-processing long pairwise alignments. 陳啟煌 93/4/28. Zheng Zhang et al., Bioinformatics Vol.15 no. 12 1999. Outline. Motivation Theoretical basis of the proposed algorithms How to build up Useful Tree An application. Motivation. Avoid local alignment problems - PowerPoint PPT Presentation
Citation preview
Post-processing long pairwise alignments
陳啟煌 93/4/28
Zheng Zhang et al., Bioinformatics Vol.15 no. 12 1999
Outline Motivation Theoretical basis of the proposed
algorithms How to build up Useful Tree An application
Motivation Avoid local alignment problems
Smith-Waterman lead to inclusion of an arbitrarily poor internal segment.
Others approaches may generate an alignment score less than some internal segment
Smith-Waterman approach
810
11132350
103580000
1810132035071335258082580000
1158000201358003502800258000000000
C G G A T C A T
C
T
T
A
A
C
T
The best
score
Inclusion of a poor segment Inclusion of an arbitrarily poor
region in an alignment Smith-Waterman approach potential
flaws.
X-Alignments An X-Drop within an alignment,
where X>0 is fixed in advance. A region of consecutive columns
scoring less than <-X Alignments contain no X-Drop, we
call X-alignments
BLAST
In Blast Step 3: Extend hits.
Terminate if the score of the extension fades away. (That is, when we reach a segment pair whose score falls a certain distance below the best score found for shorter extensions.)
hit
2X-Drop
Non-normal alignment
The HSP has been extended to the right side in such a way that the entire alignment score less than the section from a to b
The Proposed Approach Provide techniques for decomposing a
long alignment into sub-alignments that avoid the both problems. Show how to scan an alignment to collect
information from which a decomposition corresponding any X can be found almost instantaneously.
Provide a method for detecting variations in the rate of genome evolution
Useful Tree
X-full alignment An alignment are normal if each
of its prefixes or suffixes has non-negative score.
An alignment is not contained in any longer normal alignment is called full
X-alignment + maximal X-normal is called X-full
X-full alignment 0-full alignment is maximal runs of
columns of A with non-negative scores. For every X, X-full alignments are
pairwise disjoint. If X<YX-full alignment contained in
Y-full alignment. -full alignments are just full
alignments
Useful Tree Encode X-full alignments for all X≥ 0
in tree data structure. Leaves: 0-full alignments & maximal runs
of negative score columns alternately Terminal Leaves: add two special leaves
with score - Each internal node is a disjoint union of its
three children. Keep alignment’s score and the minimum sub-alignment’s score
Time complexity Construct time: O(N) Search Time:
If k such alignments,need inspect at most 3k+1 nodes
(2k+1) leaves+((2k+1-1)/2) internal nodes =3k+1 nodes
Decompose rules Alignment A A1,A2,…., A2n-1
# of sub-alignment is odd i :score of Ai Negative & Non-negative score
alternately 0=2n= -∞
Theoretical basis
Theoretical basis
Theoretical basis Lemma1:X is consistent Lemma2:A normal drop is
consistent with X
Lemma 3
Useful tree definition Each node of T is a segment
consistent with X. Each leaf of T is of the form [i,i+1) Each internal node [a,d) has
exactly three children. [a,b),[b,c) and [c,d) and the signs of their scores alternate.
Lemma 4
Possible negative merge LEMMA 5. Assume that three consecutive roots in our
sequence, [a,b),[b,c),and [c,d), satisfy 0 ≤ (b,c)< min(- (a,b),- (c,d))
Then merging these trees into a single tree with root [a,d) creates a useful tree and the resulting sequence still satisfies P1 and P2.
If a,b,c and d satisfy this lemma,[a,d) is a possible negative merger.
Possible positive merge LEMMA 6. Assume that five consecutive roots in our
sequence, [a,b),[b,c),[c,d),[d,e) and [e,f) satisfy 0 > (c,d) ≥ max( (a,b), (e,f)) neither [a,d) nor [c,f) is a possible negative
Then merging these trees into a single tree with roots[b,c),[c,d),[d,e)into a single root[b,e) creates a useful tree and the resulting sequence still satisfies P1 and P2.
If a,b,c,d,e and f satisfy this lemma,[a,d) is a possible positive merger.
Lemma 7
Theoretical basis Normal rise and normal drop Useful Tree contains every
segments Possible negative merger Possible positive merger Always exists possible negative
merger or possible positive merger
Decompose rules Alignment A A1,A2,…., A2n-1
# of sub-alignment is odd i :score of Ai Negative(odd i) & Non-
negative(even i) score alternately 0=2n= -∞
Useful Tree build up procedure
1.Push the first leaf on the stack2.While the stack size exceeds 1 or there is an
unvisited leaf do3. if the top three stack items indicate a negative merger then
4. pop three items,merge them and push the result onto the stack
5. else if the top five segments indicate a positive merge then6. pop an item{e,f} perform line 4. and push {e,f} back7. else
8. push the next two leaves onto the stack
Construct Useful Tree ACAACAGAAACT | | || ||| ATA--AG-CACT Gop:0 Gep:1 Match/mismatch: 1/-1
1 2 3 4 7 8 9 11S1 A C A A C A G A A A C T
| | | | | | |S2 A T A - - A G C A - C T
S 0 0 0 0 0 0 0 -1 0 0 1 -1 1 -1 1 -1 -13 -13 1 0 1 0 1 0 -13
22
5 6 10
-2-2-1
21
Push 1 Push 2,3 Push 4,5,
Merge 2,3,4 as a Merge 1,a,5 as b
Push 6,7 Push 8,9
Merge 6,7,8 as c Push 10,11
Merge 9,10,11 as d Merge b,c,d as e
1 2 3 4 7 8 9 11S1 A C A A C A G A A A C T
| | | | | | |S2 A T A - - A G C A - C T
S 0 0 0 0 0 0 0 -1 0 0 1 -1 1 -1 1 -1 -13 -13 1 0 1 0 1 0 -13
22
5 6 10
-2-2-1
21
-2 -1 -131 1 2-1 -2 -1 -1 -131 1 2 2 3
-13 -13 -14 -14 -14 -14
Source code of this paper http://globin.cse.psu.edu/dist/decom/
Alignment file #:lav d { "simu elegans briggsae M = 10, I = -10, V = -10, O = 60, E = 2" } s { "s1" 1 12 "s2" 1 9 } h { ">SUPERLINK_RWXL 2782216-2889703" ">dna -c briggsae.dna " }
a { s 562 b 1 1 e 3 3 l 1 1 3 3 99 l 6 4 9 7 99 l 11 8 12 9 99}
1 2 3 4 7 8 9 11S1 A C A A C A G A A A C T
| | | | | | |S2 A T A - - A G C A - C T
S 0 0 0 0 0 0 0 -1 0 0 1 -1 1 -1 1 -1 -13 -13 1 0 1 0 1 0 -13
22
5 6 10
-2-2-1
21
An Application Different regions of a mammalian genome
evolve at different rates. Provide a method for detecting variations in
the rate of genome evolution To compare the rates of evolution in
different genomic regions from humans and mice. Align each pair of homologous regions and
determined
Pitfalls Tally statistics only at sequence
not in exons Regions adjacent to an exon
maybe be aligned Remove the exons before producing
the alignment The alignment program is unable to
differentiate the biologically meaning alignment
Proposed approach First align the sequences using the
exons as guideposts Then re-score the alignment where
positions within exons are masked,so that they cannot be aligned to another nucleotide.
References Zheng Zhang et al., “Post-processing long
pairwise alignments”,Bioinformatics, Vol.15 no. 12 1999
http://globin.cse.psu.edu/dist/decom/ Kun-Mao Chao ,Algorithms for Biological
Sequence Analysis Lecture Notes, National Taiwan University, Spring 2004
Q&A Thank you!
Possible mistakes, but maybe not
P.1015 left col., last 2 row ∑ k=1 ∑ k=i
P.1015 Right col. [i,i) should be [i,j) P.1016 proof of lemma4 4 [i,i) should
be [i,j) P.1017 proof of lemma5 (b,c) (e,c)
should be (b,c)- (e,c) P.1017 lemma7 (ai-3,ai-2) (ai-4,ai-1)
Proof 1
Lemma1:X is consistent
Proof of lemma2
Lemma 3
Proof of lemma3
Lemma 4
Proof of lemma4
Possible negative merge LEMMA 5. Assume that three consecutive roots in our
sequence, [a,b),[b,c),and [c,d), satisfy 0 ≤ (b,c)< min(- (a,b),- (c,d))
Then merging these trees into a single tree with root [a,d) creates a useful tree and the resulting sequence still satisfies P1 and P2.
If a,b,c and d satisfy this lemma,[a,d) is a possible negative merger.
Proof of lemma5
Possible positive merge LEMMA 6. Assume that five consecutive roots in our
sequence, [a,b),[b,c),[c,d),[d,e) and [e,f) satisfy 0 > (c,d) ≥ max( (a,b), (e,f)) neither [a,d) nor [c,f) is a possible negative
Then merging these trees into a single tree with roots[b,c),[c,d),[d,e)into a single root[b,e) creates a useful tree and the resulting sequence still satisfies P1 and P2.
If a,b,c,d,e and f satisfy this lemma,[a,d) is a possible positive merger.
Proof of lemma6
Lemma 7
Proof of lemma7