Fundamentals of Algorithm Analysis
Algorithm : Design & Analysis[Tutorial - 1]
Qian Zhuzhong (钱柱中) Research: Distributed Computing
Pervasive Computing Service Oriented Computing
Office: 503A, MMW Email: [email protected]
In Previous Classes… Introduction to Algorithm Analysis Asymptotic Behavior of Functions Recursion and Master Theorem Sorting
In Tutorial One
About the Tutorial Algorithm Analysis Revisiting Asymptotic Behavior Revisiting Recursion
About the Tutorial The course
Algorithm design and analysis Coverage
The tutorial: Reemphasize important issues by Further explanation Typical examples Interaction …
Find efficient solutions
Algorithm Analysis Before learning algorithm analysis
Learn to solve problems by“computational thinking”*
Data structures After leaning algorithm analysis
How efficient is my first solution? How to improve?
Better solutions The optimal solution
* http://cs.nju.edu.cn/yuhuang/huangyufiles/alg/computational_thinking.pdf
Solve problems
Naïve solution
Better solutions
optimal solution
… efficiency
Asymptotic Behavior Discussion of asymptotic notations
Some properties An example: Maximum Subsequence Sum
Improvement of Algorithm Comparison of Asymptotic Behavior
The definition of O, and O
Giving g:N→R+, then Ο(g) is the set of f:N→R+, such that for some cR+ and some n0N, f(n)cg(n) for all nn0.
A function fΟ(g) if limn→[f(n)/g(n)]=c<
Giving g:N→R+, then (g) is the set of f:N→R+, such that for some cR+ and some n0N, f(n)cg(n) for all nn0.
A function f(g) if limn→[f(n)/g(n)]=c>0
Giving g:N→R+, then (g) = Ο(g) (g) A function f(g) if limn→[f(n)/g(n)]=c, 0<c<
“little Oh” and ω o
Giving g:N→R+, then o(g) is the set of f:N→R+, such that for any cR+ and some n0N, f(n)cg(n) for all nn0.
A function fo(g) if limn→[f(n)/g(n)]=c=0 ω
Giving g:N→R+, then ω(g) is the set of f:N→R+, such that for any cR+ and some n0N, f(n)cg(n) for all nn0.
A function fω(g) if limn→[f(n)/g(n)]=c=
Analogy
fΟ(g) ≈ a ≤ b f(g) ≈ a ≥ b f(g) ≈ a = b fo(g) ≈ a < b fω(g) ≈ a > b
Properties of O(o), (ω) and Transitive property(O, , , ω, o):
If fO(g) and gO(h), then fO(h), … Reflexive property:
f(n)(f(n)), f(n)O(f(n)), f(n) (f(n)) Symmetric properties
f(g) if and only if g(f) fO(g) if and only if g(f) fo(g) if and only if gω(f)
Order of sum function O(f+g)=O(max(f, g))
Maximum Subsequence Sum
The problem: Given a sequence S of integer, find the largest sum of a consecutive subsequence of S. (0, if all negative items)
An example: -2, 11, -4, 13, -5, -2; the result 20: (11, -4, 13)
A brute-force algorithm: MaxSum = 0; for (i = 0; i < N; i++) for (j = i; j < N; j++) { ThisSum = 0; for (k = i; k <= j; k++) ThisSum += A[k]; if (ThisSum > MaxSum) MaxSum = ThisSum; } return MaxSum;
……
i=0i=1
i=2
i=n-1
k
j=0 j=1 j=2
j=n-1
in O(n3)
the sequence
More Precise Complexity
623
1)23(21)
23(
21
2)1)(2(
2))(1(
2))(1()(...21)1(
11
1:iscost totalThe
23
1
2
11
2
1
1
0
1
1
0
1
nnn
nnini
inininin
inininij
ij
n
i
n
i
n
i
n
i
n
i
n
ij
j
ik
n
i
n
ij
j
ik
Decreasing the number of loops
An improved algorithmMaxSum = 0; for (i = 0; i < N; i++) { ThisSum = 0; for (j = i; j < N; j++) { ThisSum += A[j]; if (ThisSum > MaxSum) MaxSum = ThisSum; } } return MaxSum;
the sequence
i=0i=1
i=2
i=n-1
j
in O(n2)
Power of Divide-and-Conquer
Part 1 Part 2
the sub with largest sum may be in:
Part 1 Part 2or:
Part 1 Part 2
recursion
The largest is the result
in O(nlogn)
Divide-and-Conquer: the Procedure
Center = (Left + Right) / 2; MaxLeftSum = MaxSubSum(A, Left, Center); MaxRightSum = MaxSubSum(A, Center + 1, Right); MaxLeftBorderSum = 0; LeftBorderSum = 0; for (i = Center; i >= Left; i--) { LeftBorderSum += A[i]; if (LeftBorderSum > MaxLeftBorderSum) MaxLeftBorderSum = LeftBorderSum; } MaxRightBorderSum = 0; RightBorderSum = 0; for (i = Center + 1; i <= Right; i++) { RightBorderSum += A[i]; if (RightBorderSum > MaxRightBorderSum) MaxRightBorderSum = RightBorderSum; } return Max3(MaxLeftSum, MaxRightSum, MaxLeftBorderSum + MaxRightBorderSum);
Note: this is the core part of the procedure, with base case and wrap omitted.
A Linear Algorithm
ThisSum = MaxSum = 0; for (j = 0; j < N; j++) { ThisSum += A[j]; if (ThisSum > MaxSum) MaxSum = ThisSum; else if (ThisSum < 0) ThisSum = 0; } return MaxSum;
the sequence
j
Negative item or subsequence cannot be a prefix of the subsequence we want.
This is an example of “online algorithm”
in O(n)
-2 -1 4 6 -8 -5 2 3 -1 -2 900ThisSum
MaxSum 0-20
0-10
044 4
1010
210
-310
0 210
510
410
210
111011
2 3 -1 -2 9
Recursion Problem solving
Divide and conquer Recurrence equation Solve the recurrence
Characteristic Equation Master Theorem
How do we obtain the results? Rationale behind the detailed mathematical
proof
External Path Length The external path length of a 2-tree t is defined as follows:
The external path length of a leaf, which is a 2-tree consisting of a single external node, is 0
If t is a nonleaf 2-tree, with left subtree L and right subtree R, then the external path length of t is the sum of:
the external path length of L; the number of external node of L; the external path length of R; the number of external node of R;
In fact, the external path length of t is the sum of the lengths of all the paths from the root of t to any external node in t.
2-Tree 2-Tree Common Binary
Treeinternal nodes
external nodesno childany type
Both left and right children of these nodes are empty tree
Calculating the External Path LengthEplReturn calcEpl(TwoTree t)
EplReturn ansL, ansR;EplReturn ans=new EplReturn();
1. if (t is a leaf)2. ans.epl=0; ans.extNum=1;3. else4. ansL=calcEpl(leftSubtree(t));5. ansR=calcEpl(rightSubtree(t));6. ans.epl=ansL.epl+ansR.epl+ansL.extNum +ansR.extNum;7. ans.extNum=ansL.extNum+ansR.extNum8. Return ans;
TwoTree is an ADT defined for 2-tree
EplReturn is a organizer class with two field epl and extNum
Correctness of Procedure calcEpl Let t be any 2-tree. Let epl and m be the values
of the fields epl and extNum, respectively, as returned by calcEpl(t). Then: 1. epl is the external path length of t. 2. m is the number of external nodes in t. 3. epl mlg(m) (note: for 2-tree with internal n nodes, m=n+1)
Proof on Procedure calcEpl Induction on t, with the “subtree” partial order:
Base case: t is a leaf. (line 2) Inductive hypothesis: the 3 statements hold for any proper subtree of t, say s. Inductive case: by ind. hyp., eplL, eplR, mL, mR,are expected results for L and
R(both are proper subtrees of t), so: Statement 1 is guranteed by line 6 Statement 2 is guranteed by line 7 (any external node is in
either L or R) Statement 3: by ind.hyp. epl=eplL+eplR+m mLlg(mL)
+mRlg(mR)+m, note f(x)+f(y)2f((x+y)/2) if f is convex, and xlgx is convex for x>0, so,
epl 2((mL+mR)/2)lg((mL+mR)/2)+m = m(lg(m)-1)+m =mlgm.
Characteristic Equation
If the characteristic equation of the recurrence relation has two distinct roots s1 and s2, then
where u and v depend on the initial conditions, is the explicit formula for the sequence.
22
212211 vsusfandvsusf
nnn vsusa 21
0212 rxrx
2211 nnn arara
Number of Valid Strings String to be transmitted on the channel
Length n Consisting of symbols ‘a’, ‘b’, ‘c’ If “aa” exists, cannot be transmitted E.g. strings of length 2: ‘ab’, ‘ac’, ‘ba’, ‘bb’, ‘bc’,
‘ca’, ‘cc’, ‘cb’ Number of valid strings?
Divide and conquer
f(n)=2f(n-1)+2f(n-2), n>2 f(1)=3, f(2)=8
b
a
c
b a c
n-1 n-1
n-2 n-2
Analysis of the D&C solution
0222 xx
Characteristic equation
Solution
nnnf )31(32
32)31(3232)(
Recursion Tree for T(n)=bT(n/c)+f(n)
f(n)
T(1) T(1) T(1) T(1) T(1)T(1) T(1) T(1) T(1) T(1) T(1) T(1) T(1)
f(n/c2)f(n/c2) f(n/c2) f(n/c2) f(n/c2)f(n/c2) f(n/c2) f(n/c2)f(n/c2)
…… ……
f(n/c) f(n/c) f(n/c)
logcn
f(n)
)/( cnbf
)/( 22 cnfb
…
bcnlog
Note: bn cc nb loglog
b
b
Total ?
…
Divide-and-Conquer: the Solution The recursion tree has depth D=lg(n)/ lg(c), so there
are about that many row-sums. The solution of divide-and-conquer equation is the
nonrecursive costs of all nodes in the tree, which is the sum of the row-sums.
The 0th row-sum is f(n), the nonrecursive cost of the root.
The Dth row-sum is nE, assuming base cases cost 1, or (nE) in any event.
1log
0
)/(n
j
jjc
cnfb
Solution by Row-sums
[Little Master Theorem] Row-sums decide the solution of the equation for divide-and-conquer: Increasing geometric series: T(n)(nE) Constant: T(n) (f(n) log n) Decreasing geometric series: T(n) (f(n))
This can be generalized to get a result not using explicitly row-sums.
Master Theorem Loosening the restrictions on f(n)
Case 1: f(n)O(nE-), (>0), then:T(n)(nE)
Case 2: f(n)(nE), as all node depth contribute about equally:
T(n)(f(n)log(n)) case 3: f(n)(nE+), (>0), and f(n)O(nE+),
(), then:T(n)(f(n))
The positive is critical, resulting gaps between cases as well
Looking at the Gap
T(n)=2T(n/2)+nlgn a=2, b=2, E=1, f(n)=nlgn We have f(n)=(nE), but no >0 satisfies
f(n)=(nE+), since lgn grows slower that n for any small positive .
So, case 3 doesn’t apply. However, neither case 2 applies. Why is important?
Example: Matrix Multiplication
Standard Algorithm – by definition Run time = Θ (n3)
Divide-and-conquer Algorithm Idea: n*n matrix = 2*2 of (n/2) * (n/2)
sub-matrices:
Analysis of the D&C Algorithm
)()2/(8)( 2nnTnT
Sub-matrix size
# sub-matrices Adding sub-matrices
Do you have any questions?