Upload
adila
View
48
Download
0
Embed Size (px)
DESCRIPTION
Advanced Compiler Techniques. Control Flow Analysis & Local Optimizations. LIU Xianhua School of EECS, Peking University. Levels of Optimizations. Local inside a basic block Global (intraprocedural) Across basic blocks Whole procedure analysis Interprocedural Across procedures - PowerPoint PPT Presentation
Citation preview
Advanced Compiler Techniques
LIU Xianhua
School of EECS, Peking University
Control Flow Analysis & Local Optimizations
“Advanced Compiler Techniques”
Levels of Optimizations
• Local– inside a basic block
• Global (intraprocedural)– Across basic blocks–Whole procedure analysis
• Interprocedural– Across procedures–Whole program analysis
2
3
The Golden Rules of Optimization Premature Optimization is Evil Donald Knuth, premature optimization
is the root of all evil Optimization can introduce new, subtle bugs Optimization usually makes code harder to
understand and maintain Get your code right first, then, if really
needed, optimize it Document optimizations carefully Keep the non-optimized version handy, or
even as a comment in your code
“Advanced Compiler Techniques”
4
The Golden Rules of Optimization The 80/20 Rule In general, 80% percent of a
program’s execution time is spent executing 20% of the code
90%/10% for performance-hungry programs
Spend your time optimizing the important 10/20% of your program
Optimize the common case even at the cost of making the uncommon case slower
“Advanced Compiler Techniques”
5
The Golden Rules of Optimization Good Algorithms Rule The best and most important way of optimizing
a program is using good algorithms E.g. O(n*log) rather than O(n2)
However, we still need lower level optimization to get more of our programs
In addition, asymptotic complexity is not always an appropriate metric of efficiency Hidden constant may be misleading E.g. a linear time algorithm than runs in 100*n+100
time is slower than a cubic time algorithm than runs in n3+10 time if the problem size is small
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
General Optimization Techniques
• Strength reduction– Use the fastest version of an operation– E.g.
x >> 2 instead of x / 4x << 1 instead of x * 2
• Common sub expression elimination– Eliminate redundant calculations– E.g.
double x = d * (lim / max) * sx;double y = d * (lim / max) * sy;
double depth = d * (lim / max); double x = depth * sx; double y = depth * sy;
6
“Advanced Compiler Techniques”
General Optimization Techniques
• Code motion– Invariant expressions should be executed only
once– E.g.
for (int i = 0; i < x.length; i++) x[i] *= Math.PI * Math.cos(y);
double picosy = Math.PI * Math.cos(y);for (int i = 0; i < x.length; i++) x[i] *= picosy;
7
“Advanced Compiler Techniques”
General Optimization Techniques
• Loop unrolling– The overhead of the loop control code can be
reduced by executing more than one iteration in the body of the loop. E.g.double picosy = Math.PI * Math.cos(y);for (int i = 0; i < x.length; i++) x[i] *= picosy;
double picosy = Math.PI * Math.cos(y);for (int i = 0; i < x.length; i += 2) { x[i] *= picosy; x[i+1] *= picosy;}
8
“Advanced Compiler Techniques”
Compiler Optimizations
• Compilers try to generate good code– i.e. Fast
• Code improvement is challenging–Many problems are NP-hard
• Code improvement may slow down the compilation process– In some domains, such as just-in-time
compilation, compilation speed is critical
9
“Advanced Compiler Techniques”
Phases of Compilation• The first three
phases are language-dependent
• The last two are machine-dependent
• The middle two dependent on neither the language nor the machine
10
11
Phases
“Advanced Compiler Techniques”
Control Flow
• Control transfer = branch (taken or fall-through)• Control flow
– Branching behavior of an application– What sequences of instructions can be executed
• Execution Dynamic control flow– Direction of a particular instance of a branch– Predict, speculate, squash, etc.
• Compiler Static control flow– Not executing the program– Input not known, so what could happen
• Control flow analysis– Determining properties of the program branch structure– Determining instruction execution properties
12“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Basic Blocks
• A basic block is a maximal sequence of consecutive three-address instructions with the following properties:– The flow of control can only enter the
basic block thru the 1st instruction in the block. (no jumps into the middle of the block)
– Control will leave the block without halting or branching, except possibly at the last instruction in the block.
• Basic blocks become the nodes of a flow graph, with edges indicating the order.
13
14
Examples1) i = 12) j = 13) t1 = 10 * i4) t2 = t1 + j5) t3 = 8 * t26) t4 = t3 - 887) a[t4] = 0.08) j = j + 19) if j <= 10 goto (3)10)i = i + 111)if i <= 10 goto (2)12)i = 113)t5 = i - 114)t6 = 88 * t515)a[t6] = 1.016)i = i + 117)if i <= 10 goto (13)
for i from 1 to 10 do for j from 1 to 10 do a[i,j]=0.0
for i from 1 to 10 do a[i,i]=0.0
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Identifying Basic Blocks
• Input: sequence of instructions instr(i)
• Output: A list of basic blocks• Method:– Identify leaders:
the first instruction of a basic block– Iterate: add subsequent instructions to
basic block until we reach another leader
15
“Advanced Compiler Techniques”
Identifying Leaders
• Rules for finding leaders in code– First instr in the code is a leader– Any instr that is the target of a
(conditional or unconditional) jump is a leader
– Any instr that immediately follow a (conditional or unconditional) jump is a leader
16
“Advanced Compiler Techniques”
Basic Block Partition Algorithm
leaders = {1}// start of programfor i = 1 to |n| // all instructions
if instr(i) is a branchleaders = leaders U targets of instr(i) U instr(i+1)
worklist = leadersWhile worklist not empty
x = first instruction in worklistworklist = worklist – {x}block(x) = {x}for i = x + 1; i <= |n| && i not in leaders; i++block(x) = block(x) U {i}
17
“Advanced Compiler Techniques”
E
AB
C
D
F
Basic Block Example
Leaders
1. i = 12. j = 13. t1 = 10 * i4. t2 = t1 + j5. t3 = 8 * t26. t4 = t3 - 887. a[t4] = 0.08. j = j + 19. if j <= 10 goto (3)10. i = i + 111. if i <= 10 goto (2)12. i = 113. t5 = i - 114. t6 = 88 * t515. a[t6] = 1.016. i = i + 117. if i <= 10 goto
(13)
Basic Blocks
18
“Advanced Compiler Techniques”
Control-Flow Graphs
• Control-flow graph:– Node: an instruction or sequence of
instructions (a basic block)• Two instructions i, j in same basic block
iff execution of i guarantees execution of j– Directed edge: potential flow of control– Distinguished start node Entry & Exit• First & last instruction in program
19
“Advanced Compiler Techniques”
Control-Flow Edges
• Basic blocks = nodes• Edges:– Add directed edge between P and S if:• Jump/branch from last statement of P to first
statement of S, or• According to the initial order, S immediately
follows P in program order and P does not end with unconditional branch (goto/return/call)
– Definition of predecessor and successor• P is a predecessor of S• S is a successor of P 20
“Advanced Compiler Techniques”
Control-Flow Edge AlgorithmInput: block(i), sequence of basic
blocksOutput: CFG where nodes are basic
blocks
for i = 1 to the number of blocksx = last instruction of block(i)if instr(x) is a branch/jump
for each target y of instr(x),
create edge (i -> y)if instr(x) is not unconditional branch,
create edge (i -> i+1)
21
Dominator
• Defn: Dominator – Given a CFG(V, E, Entry, Exit), a node x dominates a node y, if every path from the Entry block to y contains x
• In the reverse direction, node x post-dominates block y if every path from y to the exit has to pass through block x.
• Some properties of dominators:– Reflexivity, transitivity, anti-symmetry– If x dominates z and y dominates z, then either x dominates y
or y dominates x• Intuition– Given some BB, which blocks are guaranteed to have
executed prior to executing the BB
22“Advanced Compiler Techniques”
23
Dominator Tree It is said that a block x immediately dominates block y if x
dominates y, and there is no intervening block P such that x dominates P and P dominates y. In other words, x is the last dominator on all paths from entry to y. Each block has a unique immediate dominator.
A dominator tree is a tree where each node's children are those nodes it immediately dominates. Because the immediate dominator is unique, it is a tree. The start node is the root of the tree.
1
35
24
1
35
24
{1,5}
{1,4}
{1,2,3}
{1,2}
{1}
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Loops
• Loops comes from– while, do-while, for, goto……
• Many transformation depends on loops• Back edge: An edge is a back edge if its head
dominates its tail.• Loop definition: A set of nodes L in a CFG is a
loop if1. There is a node called the loop entry: no
other node in L has a predecessor outside L.2. Every node in L has a nonempty path (within
L) to the entry of L.24
25
Example: Back Edges
1
35
24
{1,5}
{1,4}
{1,2,3}
{1,2}
{1}
1
35
24
{1,5}
{1,4}
{1,2,3}
{1,2}
{1}
“Advanced Compiler Techniques”
DAG ( Directed Acyclic Graph )
CFG ( Control Flow Graph )
“Advanced Compiler Techniques”
Loop Examples
• {B3}• {B6}• {B2, B3, B4}
26
“Advanced Compiler Techniques”
Identifying Loops• Motivation–majority of runtimefocus optimization on loop bodies!• remove redundant code, replace expensive
operations ) speed up program
• Finding loops:– easy…
for i = 1 to 1000 for j = 1 to 1000 for k = 1 to 1000 do something
1 i = 1; j = 1; k = 1;2 A1: if i > 1000 goto L1;3 A2: if j > 1000 goto L2;4 A3: if k > 1000 goto L3;5 do something6 k = k + 1; goto A3;7 L3: j = j + 1; goto A2;8 L2: i = i + 1; goto A1;9 L1: halt
or harder(GOTOs)
27
28
Interval Analysis(T1/T2 Trans)
T1 Transformation
T2 Transformation
“Advanced Compiler Techniques”
29
Interval Analysis(T1/T2 Trans)
1
35
24
T2
T2
“Advanced Compiler Techniques”
30
Interval Analysis(T1/T2 Trans)
14
5
23 T1
“Advanced Compiler Techniques”
31
Interval Analysis(T1/T2 Trans)
14
5
23
T2
“Advanced Compiler Techniques”
32
Interval Analysis(T1/T2 Trans)
12345 T1
12345
“Advanced Compiler Techniques”
33
Structure Analysis
StaticFeatures
Desription
1 SS_No. 典型子结构唯一标识 2 Edge_No. 典型子结构中控制流边的唯一标识3 I_last_of_head 该边首基本块最后一条指令的操作码4 Br_direction 该边首基本块最后一条指令的跳转方向5 I_pre_last 该边首基本块最后一条指令的前一条指令的操作码
“Advanced Compiler Techniques”
Weighted CFG
34
Profiling – Run the application on 1 or more sample inputs, record some behavior Control flow profiling
edge profile block profile
Path profiling Cache profiling Memory dependence profiling
Annotate control flow profile onto a CFG weighted CFG
Optimize more effectively with profile info!! Optimize for the common case Make educated guess
BB1
BB2
BB4
BB3
BB5 BB6
BB7
Entry
Exit
20
10 10
10 10
20 0
20 0
20
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Local Optimization
• Optimization of basic blocks
• §8.5
35
“Advanced Compiler Techniques”
Transformations on basic blocks
• eliminating local common sub-expressions• eliminating dead code• reordering statements that do not depend on
one another• applying algebraic laws to reorder operands
of three-address instructions
• All of the above require symbolic execution of the basic block, to obtain def/use information
36
“Advanced Compiler Techniques”
Simple symbolic interpretation:next-use information
• If x is computed in statement i, and is an operand of statement j, j > i, its value must be preserved (register or memory) until j.
• If x is computed at k, k > i, the value computed at i has no further use, and be discarded (i.e. register reused)
• Next-use information is annotated over statements and symbol table.
• Computed on one backwards pass over statement.
37
“Advanced Compiler Techniques”
Next-Use Information• Definitions
1. Statement i assigns a value to x;2. Statement j has x as an operand;3. Control can flow from i to j along a path
with no intervening assignments to x;
Statement j uses the value of x computed at statement i.
i.e., x is live at statement i.
38
“Advanced Compiler Techniques”
Computing next-use
• Use symbol table to annotate status of variables
• Each operand in a statement carries additional information:–Operand liveness (boolean)–Operand next use (later statement)
• On exit from block, all temporaries are dead (no next-use) 39
“Advanced Compiler Techniques”
Algorithm• INPUT: a basic block B• OUTPUT: at each statement i: x=y op z in B,
create liveness and next-use for x, y, z• METHOD: for each statement in B
(backward)– Retrieve liveness & next-use info from a table– Set x to “not live” and “no next-use”– Set y, z to “live” and the next uses of y,z to “i”
• Note: step 2 & 3 cannot be interchanged.– E.g., x = x + y
40
“Advanced Compiler Techniques”
Example1. x = 12. y = 13. x = x + y4. z = y5. x = y + z
Exit:x: live, 6 y: not livez: not live
41
Exit:x: live, 6 y: not livez: not live
5:x: not live, noy: live, 5z: live, 5
4:x: not live, noy: live, 4z: not live, no
3:x: live, 3y: live, 3z: not live, no
2:x: live, 3 y: not live, noz: not live, no
1:x: not live, noy: not live, noz: not live, no
“Advanced Compiler Techniques”
Computing dependencies in BB: the DAG
• Use directed acyclic graph (DAG) to recognize common subexpressions and remove redundant quadruples.
• Intermediate code optimization:– basic block => DAG => improved block =>
assembly
• Leaves are labeled with identifiers and constants.
• Internal nodes are labeled with operators and identifiers 42
DAG Representation of Basic Blocks Construct a DAG for a basic block
1. There is a node in the DAG for each of the initial values of the variables appearing in the basic block.
2. There is a node N associated with each statement s within the block. The children of N are those nodes corresponding to statements that are the last definitions, prior to s, of the operands used by s.
3. Node N is labeled by the operator applied at s, and also attached to N is the list of variables for which it is the last definition within the block.
4. Certain nodes are designated output nodes. These are the nodes whose variables are live on exit from the block; that is, their values may be used later, in another block of the flow graph. “Advanced Compiler Techniques” 43
“Advanced Compiler Techniques”
DAG construction
• Forward pass over basic block• For x = y op z;
– Find node labeled y, or create one – Find node labeled z, or create one– Create new node for op, or find an existing one with
descendants y, z (need hash scheme)– Add x to list of labels for new node– Remove label x from node on which it appeared
• For x = y;– Add x to list of labels of node which currently holds y
44
a = b + cb = a – dc = b + cd = a - d
+
b0 c0
a—
d0
+ cb d
“Advanced Compiler Techniques”
Finding Local Common Subexpr.
• Suppose b is not live on exit.a = b + cb = a – dc = b + cd = a - d
+
+
-
b0 c0
d0a
b, d
c
a = b + cd = a – dc = d + c
45
a = b + cd = a – db = dc = d + c
“Advanced Compiler Techniques”
LCS: another example
a = b + cb = b – dc = c + de = b + c
++ -
b0 c0 d0
a b
e+
c
46
“Advanced Compiler Techniques”
Dead Code Elimination• Delete any root that has no live variables attached• Repeated application of this transformation will
remove all nodes from the DAG that correspond to dead code.
a = b + cb = b – dc = c + de = b + c
++ -
b0 c0 d0
a b
e+
c On exit:a, b livec, e not live
a = b + cb = b – d
47
The Use of Algebraic Identities Eliminate computations
Reduction in strength
Constant folding 2*3.14 = 6.28 evaluated at compile time
Other algebraic transformations x*y => y*x x>y => x-y>0 a=b+c; e=c+d+b; => a=b+c; e=a+d;
“Advanced Compiler Techniques” 48
Representation of Array References x = a[i] a[j]=y killed node
“Advanced Compiler Techniques” 49
x = a[i]a[j] = yz = a[i]
z = x??
Representation of Array References
a is an array. b is a position in the array a.
x is killed by b[j]=y.“Advanced Compiler Techniques” 50
b = a + 12x = b[i]b[j] = y
Pointer Assign. & Proc. Calls Problem of the following assignments
x = *p *q = y we do not know what p or q point to. x = *p is a use of every variable *q = y is a possible assignment to every variable. the operator =* must take all nodes that are currently
associated with identifiers as arguments, which is relevant for dead-code elimination.
the *= operator kills all other nodes so far constructed in the DAG.
Global pointer analyses can be used to limit the set of variables
Procedure calls behave much like assignments through pointers. Assume that a procedure uses and changes any data to which
it has access. If variable x is in the scope of a procedure P, a call to P both
uses the node with attached variable x and kills that node.“Advanced Compiler Techniques” 51
Reassembling BBs From DAG 's
b is not live on exit
b is live on exit
“Advanced Compiler Techniques” 52
Reassembling BBs From DAG 's The rules of reassembling
The order of instructions must respect the order of nodes in the DAG
Assignments to an array must follow all previous assignments to, or evaluations from, the same array
Evaluations of array elements must follow any previous assignments to the same array
Any use of a variable must follow all previous procedure calls or indirect assignments through a pointer.
Any procedure call or indirect assignment through a pointer must follow all previous evaluations of any variable.
“Advanced Compiler Techniques” 53
“Advanced Compiler Techniques”
Peephole Optimization
• Dragon§8.7
• Introduction to peephole• Common techniques• Algebraic identities• An example
54
“Advanced Compiler Techniques”
Peephole Optimization
• Simple compiler do not perform machine-independent code improvement– They generates naive code
• It is possible to take the target hole and optimize it– Sub-optimal sequences of instructions that
match an optimization pattern are transformed into optimal sequences of instructions
– This technique is known as peephole optimization
– Peephole optimization usually works by sliding a window of several instructions (a peephole)55
56
Peephole Optimization Goals:
- improve performance- reduce memory footprint- reduce code size
Method: 1. Exam short sequences of target instructions 2. Replacing the sequence by a more efficient one.• redundant-instruction elimination • algebraic simplifications• flow-of-control optimizations • use of machine idioms
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Peephole OptimizationCommon Techniques
57
“Advanced Compiler Techniques”
Peephole OptimizationCommon Techniques
58
“Advanced Compiler Techniques”
Peephole OptimizationCommon Techniques
59
“Advanced Compiler Techniques”
Peephole OptimizationCommon Techniques
60
“Advanced Compiler Techniques”
Algebraic identities
• Worth recognizing single instructions with a constant operand– Eliminate computations
• A * 1 = A• A * 0 = 0• A / 1 = A
– Reduce strenth• A * 2 = A + A• A/2 = A * 0.5
– Constant folding• 2 * 3.14 = 6.28
• More delicate with floating-point61
“Advanced Compiler Techniques”
Is this ever helpful?
• Why would anyone write X * 1?• Why bother to correct such obvious junk
code?• In fact one might write– #define MAX_TASKS 1
...a = b * MAX_TASKS;
• Also, seemingly redundant code can be produced by other optimizations. – This is an important effect.
62
63
Replace Multiply by Shift A := A * 4;
Can be replaced by 2-bit left shift (signed/unsigned)
But must worry about overflow if language does
A := A / 4; If unsigned, can replace with shift right But shift right arithmetic is a well-known
problem Language may allow it anyway (traditional C)
“Advanced Compiler Techniques”
64
The Right Shift problem Arithmetic Right shift:
shift right and use sign bit to fill most significant bits
-5 111111...1111111011 SAR 111111...1111111101 which is -3, not -2 in most languages -5/2 = -2
“Advanced Compiler Techniques”
65
Addition chains for multiplication If multiply is very slow (or on a machine
with no multiply instruction like the original SPARC), decomposing a constant operand into sum of powers of two can be effective: X * 125 = x * 128 - x*4 + x two shifts, one subtract and one add, which
may be faster than one multiply Note similarity with efficient exponentiation
method
“Advanced Compiler Techniques”
66
Flow-of-control optimizations
goto L1 . . .L1: goto L2
goto L2 . . .L1: goto L2
if a < b goto L1 . . .L1: goto L2
if a < b goto L2 . . .L1: goto L2
goto L1 . . .L1: if a < b goto L2L3:
if a < b goto L2 goto L3 . . .L3:
“Advanced Compiler Techniques”
67
Peephole Opt: an Example
debug = 0. . .if(debug) { print debugging information }
debug = 0 . . . if debug = 1 goto L1 goto L2L1: print debugging informationL2:
Source Code:
IntermediateCode:
“Advanced Compiler Techniques”
68
Eliminate Jump after Jump
Before:
After:
debug = 0 . . . if debug = 1 goto L1 goto L2L1: print debugging informationL2:
debug = 0 . . . if debug 1 goto L2 print debugging informationL2:
“Advanced Compiler Techniques”
69
Constant Propagation
Before:
After:
debug = 0 . . . if debug 1 goto L2 print debugging informationL2:
debug = 0 . . . if 0 1 goto L2 print debugging informationL2:
“Advanced Compiler Techniques”
70
Unreachable Code(dead code elimination)
Before:
After:
debug = 0 . . .
debug = 0 . . . if 0 1 goto L2 print debugging informationL2:
“Advanced Compiler Techniques”
“Advanced Compiler Techniques”
Peephole Optimization Summary
• Peephole optimization is very fast– Small overhead per instruction since
they use a small, fixed-size window
• It is often easier to generate naïve code and run peephole optimization than generating good code!
71
“Advanced Compiler Techniques”
Summary
• Introduction to optimization
• Control Flow Analysis
• Basic knowledge– Basic blocks– Control-flow graphs
• Local Optimizations– Peephole optimizations
72
“Advanced Compiler Techniques”
HW & Next Time
• Homework– EX 8.4.1, 8.5.1, 8.5.2
• Next Time: Dataflow analysis– Dragon§9.2
73
If You Want to Get Started …
• Go to http://llvm.org• Download and install LLVM on your
favorite Linux box– Read the installation instructions to help
you–Will need gcc 4.x
• Try to run it on a simple C program
74“Advanced Compiler Techniques”
编译支持:对循环和函数调用的分析自动识别循环和函数调用结构
分析远距离转移指令的关联关系 插入引导指令 二进制程序
Source codeSource code
C/C++源代码
BB1
BB2 BB3
BB4
Pred-branches
Succ-branches
header
BB1
BB2 BB3
BB4
guide_save_history
guide_restore history
Pred-branches
Succ-branches
header
preheader
tail
BB1
call proc
Succ-branches
Pred-branches
BB1
guide_save_historycall proc
guide_retsore_history
Succ-branches
Pred-branches
循环结构变换和引导指令插入函数结构变换和引导指令插入
PSA 预测技术的编译支持 编译支持:
首先,编译器根据子程序结构信息和静态数据依赖图,分析哪些数据值与间接转移指令具有较强关联性 然后,编译器为每个指令插入引导指令并进行调度
switch(k){ case 0...3: ...//action 1 case 4...7: ...//action 2 case 8...11: ...//action 3}
(a) source code
(b) assemble code
Basic c_value
Case block1Case block2Case block3
...C
c_value
(c) Eexecution of switch-case statement
Normalized c_value
r_value ctable
R G J
r <- k ; C r <- k/4 ; R r <- [GP + sl] ; G jump r ; J
L1:… ;;action 1L2:… ;;action 2
r <- o_addr ; O r <- [r + f_off] ; F jump r ; C
;; (obj->func)();
object Func pointerO F C
obj obj->func
(a) source & assembler code
(b) Execution of function pointer call
r <- o_addr ; O r <- [r + v_off] ; V r <- [r + f_off] ; F jump r ; C
+f()
Base
+f()
Derive
;;Base d <= new Derive();;;d.f();
Obj V_fun_addr
V_fun1V_fun2V_fun3
...O V F C
dvtable D.f
(a) class hierarchy(b) source & assembler code
(c) Eexecution of virtual function call
虚函数调用以虚函数表首地址作为关联数据值
Switch-Case 语句以规格化之后的 case值作为关联数据值
函数指针调用以指针值或指针数组索引作为关联数据值