View
216
Download
0
Category
Preview:
Citation preview
Improved Algorithms for Link-Based Non-tree Clock Network for
Skew Variability Reduction
Anand Rajaram†‡ David Z. Pan† Jiang Hu*
† Dept. of ECE, UT-Austin‡ Texas Instruments, Dallas
* Dept. of EE, TAMU
Outline
IntroductionReview of link-based non-tree clock
networkImproved algorithms (over [Rajaram et al,
DAC’04])› Rule based algorithm (δ Rule)
› Graph theoretical approach (MST-based)Experimental resultsConclusions
Clock Distribution Network
Register
Register
Dmax
Clock Network
1 2d1 Launc
h signals
d2
T
Catch signals
Signal transfer coordinated by clock signal
All registers are supplied with clock signal by clock distribution network
Skew = d1 – d2
Zero skew: d1 = d2
Useful skew, d1 – d2 = δ12
Clocks : Important Considerations & Objectives
One of the biggest & most frequently switching nets Very sensitive to unwanted skew introduced by PVT
› Manufacturing process variations (P)› Power supply voltage noise (V)› Temperature variations (T)
Less clock skew variation a “MUST” for nanometer VLSI designs
Minimizing clock routing wire-length can › Reduce power consumption
Approaches for Reducing Skew Variability
Buffer & wire sizing [Pullela et al., DAC’93; Chung
et al., ICCAD’94; Wang et al., ISPD’04]
Variation aware routing [Lin et al., ICCAD’94; Lu
et al., ISPD’03]
Non-tree clock networks › McCoy et al., ETC’94; Vandenberghe et al., ICCAD’97; Xue et
al., ICCAD’95
› Link based non-tree clock networks [Rajaram et al., DAC’04]
Non-tree: 1-D Spine [Kurd et.al JSSC’01]
1-D spine Applied in Intel Pentium processor design Variations between spines still exists
Spines
Clock sinks or local sub-networks
Non-tree: 2-D Mesh
Top level mesh [Su et. al, ICCAD’01]
Less wire, less effective
Leaf level mesh [Restle et. al, JSSC’01]
Very effective, huge wire
Applied in IBM microprocessors
Clock sinks or local sub-networks
Clock sinks or local sub-networks
Linked Non-tree = Tree + Links[Rajaram et al, DAC’04]
Non-tree = tree + links How to select link pairs is the key! Link = link_capacitors + link_resistor
u
w
i
w
u
Rl
C/2 C/2
u w
Rl
C/2
C/2
Skew Between Link Endpoints
wuloop
linkwu q
R
Rq ,,
~ˆ New skew with link (u, w):
Rlink
u
w
Rloop
wuq ,ˆ Value of
becomes smaller when link is
closer to leaf nodes for a given Rlink
Skew Between any Two Nodes (i, j) with Link (u, w)
Skew variation between any node pair (i, j)
Scenario1: i Tg , j Th => always smaller
Scenario2: i & j Tg (or Th) => could be worse
Scenario3: i Tp , j Tp => could be much worse Key idea: try to avoid Scenario 3 and 2 for link insertion
u
w
P
g
hP: nearest common
ancestor for u and w Tx: Sub-tree rooted at x
Rule Based Algorithms[Rajaram et al, DAC’04]
α-rule: max loop
link
R
R
Lower the α, better the link
β-rule:
max,, 2
CRR wwuu
Lower the β, lesser the tuning required
γ-rule: The nearest common ancestor's depth from root is < γmax
Guidelines for Node Pair Selection for Link Insertion
Select nodes which are hierarchically far apartSelect nodes physically close to each otherSelect nodes with equal nominal delaySelect nodes closer to leaf nodesFor zero skew routing, only select leaf nodes
Merits› Physical characteristics of the links
considered. So bad links avoided.› Independent of balanced nature of clock
structure› Efficient run time
Demerits› No control over distribution of links.› Possibility of links getting added in the
same region Solution
› δ-rule: No two links should have the same pair of ancestors at the depth = δ from the clock source
› Retains the merits of the previous rules and addresses the demerit
A B C D
A B C D
Using δ = 2
Rule Based Algorithms[Rajaram et al, DAC’04]
δ Rule – An Example
A B C D
Crowding of links. Subtrees A and D not linked!
Using δ = 2
δ is the node level from clock source
Graph Theoretical Approach
Select_Node_Pairs(Tv) {
l = v.left_child
r = v.right_child
P = Select_node_pair_between(Tl, Tr, k)
if Depth(v) ≥≥ depth_limit, exit;
P = P Select_Node_Pairs(Tl)
P = P Select_Node_Pairs(Tr)
Return P}
l r
v
Tl1Tl2
Tr1 Tr2
The entire clock tree is recursively divided into two parts and links added between them
This ensures distribution of links throughout the clock tree
Edge weight = Min-distance between sinks of Tli and Trj
Tl1
Tl2
Tr1
Tr2
Graph theoretical approach – Min-matching [Rajaram et al, DAC’04]
Bipartite min-matching algorithm to select the node pairs
Merits› Distribute links evenly through all regions
of the clock network
Demerits› Due to the nature of the min-matching
algorithm, only one link per sub-tree is allowed
› May result in some very lengthy links and increased wire lengths
› Lengthy links might be difficult to route› Complexity of min-matching is O(n3). Not
scalable!
l r
v
Lengthy links
New graph theoretical approach – Minimum Spanning Tree Based
MST algorithm allows more than one link per sub-tree
› More number of short links (cf. bipartite approach)
Retains the merits of the min-matching based approach
› Evenly distribute the links Complexity is O(nlogn)
› Much faster than bipartite matching algorithm O(n3)
l r
v
MST_node_pair_select(Tl, Tr, k){
Divide Tl into k sub-trees, Sl = { Tl1 ,
Tl2 , Tl3 ,… Tlk.}
Divide Tr into k subtrees, Sr = { Tr1 , Tr2 ,
Tr3 ,… Trk.}
Find MST of the completely connected bipartite graph between Sl & Sr
}
Tl1
Tl2
Tr1
Tr2
Sl Sr
l r
v
Tl1Tl2
Tr1 Tr2
MST Based Algorithm
After MST pair selection, iteratively delete edges violating the four rules (α, β, γ, and δ)
Experimental Setup
Benchmarks: r1 – r5 from bounded skew tree work [Cong et. al, ICCAD’95]
Interconnect width variation› Smaller than thickness› More sensitive to variations
Load capacitance variation
-3σ -2σ -1σ +1σ +2σ +3σ
MaxNom
99.74%
Min
All variables assumed to be
Gaussian
TN
dd
T_trials
N_
refi
sinks
2 Standard Deviation = id Delay of sink i
refd Delay of reference sink
Skew Variability measure: Standard Deviation
Experimental Result on Skew Variability
0
0.2
0.4
0.6
0.8
1
Sta
ndard
Devia
tion
w.r.t. clock
tre
es
r1 r2 r3 r4 r5
Test cases
Skew Variability
Sparse MeshDense MeshLink-MLink-RDLink-MST
Benchmark r1 r2 r3 r4 r5
No. of sinks 267 598 862 1903 3100
HSPICE ValidationSkew Variability w.r.t Clock Tree
00.020.040.060.080.1
0.120.140.160.18
r1 r2 r3 r4 r5
Test cases
Sta
ndard
Devia
tion w
.r.t.
clock
tre
es
Link-MST- SPICELink-MST- Elmore
Benchmark r1 r2 r3 r4 r5
No. of sinks 267 598 862 1903 3100
Experimental Result on Wire-length
0
0.5
1
1.5
2
2.5
3
Wire-length
w.r.t c
lock
trees
r1 r2 r3 r4 r5
Test Cases
Wire-length Comparison
Sparse MeshDense MeshLink-MLink-RDLink-MST
Wire-length comparison between link insertion methods
0.9
0.95
1
1.05
1.1
1.15
1.2
Wire-length
w.r.t c
lock
trees
r1 r2 r3 r4 r5
Test Cases
Wire-length Comparison
Link-MLink-RDLink-MST
Conclusions
Two new efficient algorithms for link insertion have been proposed
› Significant skew variability reduction with very small wire-length increase
› Scale very well with size of clock network for both runtime and QOR
Proposed methodology is independent of the nature of variability effects
Friendly to incremental changes
Recommended