2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 11
資料結構與演算法 ( 上 )
呂學一 (Hsueh-I Lu)http://www.csie.ntu.edu.tw/~hil/
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 22
Outline of this slideOutline of this slide Dynamic programming
– Fibonacci sequence– Stamp problem– Sequence alignment– Matrix multiplication
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 33
Leornardo FibonacciLeornardo Fibonacci1170-1250 1170-1250
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 44
Old hens never die …Old hens never die …They just lay eggs!They just lay eggs! At the beginning of Day 1, there is a hen. Each hen lays an egg every 24 hours. Each egg takes 24 hours to become a hen. F(n) = the number of hens at the end of
day n. Give an algorithm to compute F(n).
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 55
Day 1Day 1
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 66
Day 2Day 2
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 77
Day 3Day 3
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 88
Day 4Day 4
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 99
Day 5Day 5
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1010
E(n) = the number of E(n) = the number of eggs at the end of Day eggs at the end of Day nn E(n) = ? F(n) = ?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1111
The recurrence The recurrence relationrelation
F (n) = F (n ¡ 1) +E (n ¡ 1)= F (n ¡ 1) +F (n ¡ 2):
F (n) = F (n ¡ 1) +E (n ¡ 1)= F (n ¡ 1) +F (n ¡ 2):
F (1) = F (2) = 1.
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1212
The recursive The recursive algorithmalgorithmint F (n) fif (n · 2)return 1;
elsereturn F (n ¡ 1) +F (n ¡ 2);
g
int F (n) fif (n · 2)return 1;
elsereturn F (n ¡ 1) +F (n ¡ 2);
g
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1313
Very Inefficient!Very Inefficient!F(8)
F(7) F(6)
F(6) F(5) F(5) F(4)
F(5) F(4) F(4) F(3) F(4) F(3) F(3) F(2)
1:61800365 = 1:89303571£ 10761:6180030 = 1859325:9
It takes ((1:618)n) time to compute F (n) using therecursivealgorithm.
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1414
Dynamic-Programming Dynamic-Programming ApproachApproachint F (n) flet F [1]= 1;let F [2]= 1;for i = 3 to n dolet F [i]= F [i ¡ 1]+F [i ¡ 2];
return F [n];g
int F (n) flet F [1]= 1;let F [2]= 1;for i = 3 to n dolet F [i]= F [i ¡ 1]+F [i ¡ 2];
return F [n];g
The DP-algorithm takes only O(n) time
and space!
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1515
IllustrationIllustration
n 1 2 3 4 5 6 7
F[n] 1 1 2 3 5 8 13
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1616
Dynamic ProgrammingDynamic Programming A clever way to implement recursion:
– Using storage to avoid unnecessarily duplicated efforts.
– 讓走過的留下痕跡
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1717
QuestionQuestion 有沒有可能維持線性的時間,卻將空間降低到 O(1)?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1818
Another example
Choosing stamps
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 1919
The problemThe problem If the postage is n, what is the minimum
number of stamps to cover the postage?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2020
A recursive algorithmA recursive algorithmint stamp(n) fif (n == 0) return 0;if (n < 0) return 1 ;if (n 2 f2;5;8;14g) return 1;let n2 = stamp(n ¡ 2);let n5 = stamp(n ¡ 5);let n8 = stamp(n ¡ 8);let n14 = stamp(n ¡ 14);return 1+min(n2;n5;n8;n14);
g
int stamp(n) fif (n == 0) return 0;if (n < 0) return 1 ;if (n 2 f2;5;8;14g) return 1;let n2 = stamp(n ¡ 2);let n5 = stamp(n ¡ 5);let n8 = stamp(n ¡ 8);let n14 = stamp(n ¡ 14);return 1+min(n2;n5;n8;n14);
g
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2121
The DP-versionThe DP-versionint stamp(n) flet S[0]= 0;for i = ¡ 1 to ¡ 13 dolet S[i]= 1 ;
for i = 1 to n dolet S[i]= 1+min(S[i ¡ 2];S[i ¡ 5];S[i ¡ 8];S[i ¡ 14]);
return S[n];g
int stamp(n) flet S[0]= 0;for i = ¡ 1 to ¡ 13 dolet S[i]= 1 ;
for i = 1 to n dolet S[i]= 1+min(S[i ¡ 2];S[i ¡ 5];S[i ¡ 8];S[i ¡ 14]);
return S[n];g
The DP-algorithm takes only O(n) time
and space!
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2222
IllustrationIllustration0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
0 * 1 * 2 1 3 2 1 3 2 4 3 2 1 3 2 4 3 2 4 3 2 4 3
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2323
QuestionQuestion 剛剛只是問幾張郵票 . 如果我們想要知道最少張郵票的貼法,究竟是每一種面額的郵票各幾張,應該如何處理?需要額外再花空間嗎?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2424
Sequence Alignment
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2525
Aligning two stringsAligning two strings A = attgatcctag B = acttagtccttcgc
A → a-ttga-tcc-tag- B → actt-agtccttcgc
gapgapgapgapgap
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2626
Measuring an Measuring an alignmentalignment
a t g c -a 2 -2 -2 -2 -1t -2 2 -2 -2 -1g -2 -2 2 -2 -1c -2 -2 -2 2 -1- -1 -1 -1 -1 -1
Scoring matrix
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2727
Other scoring matricesOther scoring matrices
a t g c -a 5 -4 -4 -4 -4t -4 5 -4 -4 -4g -4 -4 5 -4 -4c -4 -4 -4 5 -4- -4 -4 -4 -4 -4
BLAST matrix Transition/Transversion matrix
a t g c -a 0 5 5 1 -1t 5 0 1 5 -1g 5 1 0 5 -1c 1 5 5 0 -1- -1 -1 -1 -1 -1
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2828
Scoring matrix is an Scoring matrix is an artart Log odds matrix
– score[i, j] = log (q(i, j) / p(i) p(j)). PAM matrix
– Point accepted mutations BLOSOM matrix
– Block substitution matrix– Steven Henikoff and Jorja G. Henikoff (1992).
Other specialized scoring matrices– Domenico Bordo and Patrick Argos (1991). – Jean-Michael Claverie (JCB 1993). – Lee F. Kowlakowski and Kenneth A. Rice (Nature 1994)
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 2929
Scoring an Scoring an alignmentalignment
a – t t g a – t c c – t a g - c c t t – a g t c c t t c g c -2-1+2+2-1+2-1+2+2+2-1+2-2+2-1
score = 7
a t g c -a 2 -2 -2 -2 -1t -2 2 -2 -2 -1g -2 -2 2 -2 -1c -2 -2 -2 2 -1- -1 -1 -1 -1 -1
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3030
String alignment String alignment problemproblem Input:
– two strings A and B; and– a scoring table 分 .
Output: – an alignment of A and B that has the
maximum score with respect to 分 .
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3131
Q: Any naïve methods?Q: Any naïve methods? A = attgatcctag B = ccttagtccttcgc
分 a t g c -a 2 -2 -2 -2 -1t -2 2 -2 -2 -1g -2 -2 2 -2 -1c -2 -2 -2 2 -1- -1 -1 -1 -1 -1
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3232
Q: Is there a recursive Q: Is there a recursive method?method? A = attgatcctag B = ccttagtccttcgc
分 a t g c -a 2 -2 -2 -2 -1t -2 2 -2 -2 -1g -2 -2 2 -2 -1c -2 -2 -2 2 -1- -1 -1 -1 -1 -1
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3333
Yes, but very Yes, but very inefficient!inefficient!int align(m;n) fif (m= n = 0) return 0;let x = y = z = 1 ;if (m> 0 and n > 0)let x = align(m¡ 1;n ¡ 1) +Score[A[m];B[n]];
if (m> 0)let y = align(m¡ 1;n) +Score[A[m];¡ ];
if (n > 0)let z = align(m;n ¡ 1) +Score[¡ ;B[n]];
return max(x;y;z);g
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3434
Alignment graphAlignment graph
a
a
t
t
g
c gc t t a t c
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3535
ObservationsObservations Each alignment
corresponds to a maximal path on the alignment graph.
The score of an alignment is the score of its corresponding maximal path.
a
a
t
t
g
c gc t t a t c
c c t t - a g t ca - t t g a - - -
前無古人後無來者
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3636
Score of edgesScore of edges
A[i]
B[j]分 [-, B[j]]
分 [A[i], -] 分 [A[i], B[j]]
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3737
The graph problem
Finding a maximal path with maximum score on the alignment graph (a directed acyclic graph)
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3838
IdeaIdea For each i = 0, 1,
…, |A| and each j = 0, 1,…, |B|, let 點[i, j] keep the maximum score of aligning A[1…i] and B[1…j].
A[i]
B[j]0
0
1 j |B|
1
i
|A|
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 3939
An observationAn observation
點 [i, j] = the maximum of – 點 [i-1, j-1] + 分 [A[i],
B[j]]– 點 [i-1, j] + 分 [A[i], -]– 點 [i, j-1] + 分 [-, B[j]]點 [i, j-1] 點 [i, j]
點 [i-1, j-1] 點 [i-1, j]
分 [-, B[j]]
分 [A[i], -]分 [A[i], B[j]]
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4040
For exampleFor example
0 -1 -2 -3 -4 -5 -6 -7 -8
-3 -4 -5 -2 1 0 -1 -2 -3
-1 -2 -3 -4 -5 -2 -3 -4 -5
-2 -3 -4 -1 -2 -3 -4 -1 -2
-4 -5 -6 -3 0 -1 2 1 0
-5 -6 -7 -4 -1 2 1 0 -1
a
a
t
t
g
c gc t t a t c 分 a t g c -a 2 -2 -2 -2 -1t -2 2 -2 -2 -1g -2 -2 2 -2 -1c -2 -2 -2 2 -1- -1 -1 -1 -1 -1
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4141
The DP-version.The DP-version.int align(m;n) flet C[0;0]= 0;for i = 1 to m let C[i;0]= C[i ¡ 1;0]+Score[A[i]; ¡ ];for j = 1 to n let C[0; j ]= C[0; j ¡ 1]+ScoreS[¡ ;B [j ]];for i = 1 to mfor j = 1 to n flet x = C[i ¡ 1; j ¡ 1]+Score[A[i];B [j ]];let y = C[i ¡ 1; j ]+Score[A[i]; ¡ ];let z = C[i; j ¡ 1]+Score[¡ ;B[j ]];let C[i; j ] = max(x;y;z);
greturn C[m;n];
g
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4242
ComplexityComplexity Space = O(|A|×|B|).
– Each node keeps a score and a pointer, and thus requires only O(1) space.
Time = O(|A|×|B|). – The content of each node can be obtained
from those of at most three nodes in O(1) time.
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4343
QuestionQuestion 剛剛只是算出最佳的成績 . 如果我們想要知道得到這個最佳成績的
alignment 應該如何處理?需要額外再花空間嗎?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4444
For exampleFor example
0 -1 -2 -3 -4 -5 -6 -7 -8
-3 -4 -5 -2 1 0 -1 -2 -3
-1 -2 -3 -4 -5 -2 -3 -4 -5
-2 -3 -4 -1 -2 -3 -4 -1 -2
-4 -5 -6 -3 0 -1 2 1 0
-5 -6 -7 -4 -1 2 1 0 -1
a
a
t
t
g
c gc t t a t c 分 a t g c -a 2 -2 -2 -2 -1t -2 2 -2 -2 -1g -2 -2 2 -2 -1c -2 -2 -2 2 -1- -1 -1 -1 -1 -1
回顧來時徑
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4545
ComplexityComplexity Space = O(|A|×|B|).
– Each node keeps a score and a pointer, and thus requires only O(1) space.
Time = O(|A|×|B|). – The content of each node can be obtained
from those of at most three nodes in O(1) time.
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4646
Application 1
Longest common subsequence
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4747
SubsequenceSubsequence
For any indices 1 ≤ i1 < i2 < … <ik ≤ |A|, A[i1] A[i2] A[i3]…A[ik] is a subsequence of A.
For example, A = 0 1 1 0 1 0 1– 0 1 1 1, 0 0 0, and 1 0 1 0 1 are subsequences
of A.– 0 1 0 1 1 0 is not a subsequence of A.
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4848
Longest Common Longest Common SubsequenceSubsequence
Input: two strings A and B Output: a longest string C that is a
subsequence of both A and B.
Any naïve algorithm?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 4949
It’s an alignment It’s an alignment problem…problem… …with respect to the following scoring
matrix:
分 a t g c -a 1 0 0 0 0t 0 1 0 0 0g 0 0 1 0 0c 0 0 0 1 0- 0 0 0 0 0
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5050
Why?Why? Each alignment with score k corresponds
to a common subsequence of length k.
0 1 1 – 1 0 - - 0 1 1 - - 1 0 1 1 0 1 0 – 1 1 0 1 1 0 1 1
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5151
Application 2
Edit distance between two strings
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5252
Edit operationsEdit operations Inserting a character at position i Deleting a character at position i Replacing a character at position i by a
new character
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5353
Edit distanceEdit distance The edit distance between two strings A
and B is the minimum number of edit operations required to turn A into B.
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5454
The edit distance The edit distance problemproblem Input: two strings A and B Output: the edit distance of A and B.
Any naïve algorithm?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5555
It’s an alignment It’s an alignment problem…problem… …with respect to the following scoring
matrix:
分 a t g c -a 0 -1 -1 -1 -1t -1 0 -1 -1 -1g -1 -1 0 -1 -1c -1 -1 -1 0 -1- -1 -1 -1 -1 -1
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5656
Why?Why? Each alignment with score -k corresponds
to a sequence of k edit operations that turns A into B.
0 1 1 – 1 0 - - 0 1 1 - - 1 0 1 1 0 1 0 – 1 1 0 -1 -1-1 -1-1-1 -1
d r i i i d i
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5757
Matrix multiplication
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5858
Multiplying two Multiplying two matricesmatricesLet A bea p£ qmatrix. Let B bea q£ r matrix. Then theproduct A £ B of A and B is thep£ r matrix M such that
M [i; j ]=X
1· k· qA[i;k]¢B[k;j ]
holds for all indices i and j with 1· i · p and 1· j · r.
It takesO(pqr) timeto obtain A £ B fromA and B. (Why?)
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 5959
IllustrationIllustration
=
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6060
IllustrationIllustration
=
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6161
IllustrationIllustration
=
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6262
Multiplying 3 matricesMultiplying 3 matrices
A £ B £ C = (A £ B) £ C= A £ (B £ C):
The time required by obtaining A £ B £ C could be affectedby which twomatricesmultiply ¯rst.
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6363
An exampleAn example
==
n £ 1 1£ n n £ n
n £ n n £ n n £ n
£(n3)£ (n2)
Theoverall time is
£(n2)+£(n3) = £(n3):
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6464
An exampleAn example
==
n £ 1 1£ n n £ n
n £ n
£(n2)
n £ 1
£(n2)
1£ n
Theoverall time is
£(n2)+£(n2) = £(n2):
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6565
The problemThe problem
F Input: A sequence 0; 1; : : : ; n of positive integers,whereI i ¡ 1 is thenumber of rows of matrix M i , andI i is thenumber of columns of matrix M i .
F Output: An order of performing those n ¡ 1 matrixmultiplications in theminimumnumber of operationsto obtain theproduct
M1 £ M2 £ ¢¢¢£ Mn:
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6666
IllustrationIllustration
0
1
1
2 3
3
4
4
5
5
6
0
6
2
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6767
Any naïve algorithm?Any naïve algorithm?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6868
C(i,j)C(i,j)
Let C(i; j ) be the minimum number of operationsrequired to obtain theproduct
M i £ M i+1 £ ¢¢¢£ M j :
Clearly, theminimumnumber of operations requiredto obtain the product of all n matrices is exactlyC(1;n).
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 6969
RecurrenceRecurrence
C(i; j )
=( 0 if i ¸ j
mini · k<j
(C(i;k) +C(k+1;j ) + i ¡ 1 k j ) otherwise:
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 7070
Dynamic programmingDynamic programming
C 1 2 3 4 5 6 … n
1 0
2 0
3 0
4 0
5 0
6 0
: 0
n 0
如何找出矩陣相乘的順序?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 7171
ComplexityComplexity
F Timecomplexity: O(n3).F Spacecomplexity: O(n2).
I Can this be reduced to o(n2)?
2008 Fall Semester2008 Fall Semester Data Structures and Algorithms (I)Data Structures and Algorithms (I) 7272
Have a wonderful weekend!!
See you next time