View
214
Download
1
Embed Size (px)
Citation preview
Low-power Clock Trees for CPUs
Dong-Jin Lee, Myung-Chul Kimand Igor L. MarkovDept. of EECS, University of Michigan
1ICCAD 2010, Dong-Jin Lee, University of Michigan
Outline
■Motivation and challenges■Modeling and objectives
− Local skew with variation− Local-skew slack− Modeling process variation■Proposed methodology and techniques
− Initial tree construction and buffer insertion− Robustness improvements− Wire snaking and delay buffer insertion■Empirical validation■Summary
2ICCAD 2010, Dong-Jin Lee, University of Michigan
Motivation
■Clock networks− Contribute a significant fraction of dynamic power− A limiting factor in high-performance CPUs and SoCs
■Challenges − Interconnect is lagging in performance
while transistors continue scaling− Multi-objective optimization
– Traditional clock network synthesis constraints– The increasing impact of process variation– Power-performance-cost trade-offs
3ICCAD 2010, Dong-Jin Lee, University of Michigan
Tree vs Mesh
■Objectives− Minimize skew of a high-performance clock tree− Minimize the impact of PVT variations− Clock trees vs meshes, subject to skew < 7.5ps
4
Ro
bu
stn
es
s
Power efficiency
Trees
Ideal clock networks
Meshes
ICCAD 2010, Dong-Jin Lee, University of Michigan
Our Contributions
■The notion of local-skew slack for clock trees
■A tabular technique to estimate the impact of variations
■A path-based technique to enhance the robustness
■A time-budgeting algorithm for clock-tree tuning with minimal power resources
■Fine tuning of clock trees : accurate, fast, power efficient
■Implementation : Contango2.0
■Strong empirical results : low skew, robustness, low power
5ICCAD 2010, Dong-Jin Lee, University of Michigan
Local Skew
■Main objective (concept)− Minimize local skew in the presence of variation
■Definition: Skew− Ψ : Clock tree
− λ(si) : the clock latency (insertion delay) at sink si Ψ∈−
■Definition: Global Skew (ωΨ)−
7ICCAD 2010, Dong-Jin Lee, University of Michigan
■Definition: The worst nominal local skew (ωΨΔ)
− Δ : local skew distance bound
− dist(si,sj) : Manhattan distance between si and sj Ψ∈−
■Definition: The worst local skew with variation (ωΨΔ,ν,y )
− ν : variation model − y : yield (0 <y ≤ 1)
− f(t) : the cumulative distribution function of ωΨΔ,ν
−
Local Skew
8ICCAD 2010, Dong-Jin Lee, University of Michigan
Worst local skew with variation (ωΨΔ,ν,y )
− Probability density function of ωΨΔ,ν
− ΩΔ = 7.5ps, y = 95%, ωΨΔ,ν,y< ΩΔ
− ωΨΔ,ν,y = 6.05ps
Modeling and Objectives - Example
9
0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3ΩΔωΨ
Δ,ν,y
ps
ICCAD 2010, Dong-Jin Lee, University of Michigan
0 2 4 6 8 100
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 10
2
4
6
8
10
0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3PDFCDFInverse CDFPDF
y = 0.95
ωΨΔ,ν,y = 6.05ps
■Building variation-tolerant clock trees
− such that ωΔ,ν,y < ΩΔ (ΩΔ – local skew limit)− subject to slew constraints■Minimizing clock-tree power
Optimization Objectives
10ICCAD 2010, Dong-Jin Lee, University of Michigan
0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3ΩΔωΨ
Δ,ν,y
ps0 2 4 6 8 100
0.05
0.1
0.15
0.2
0.25
0.3
Local-skew Slack σ(s) for sink s Ψ∈
■Definition− σ(s) is the minimum amount of additional delay for s,
so that the tree satisfies ωΨ Δ < ΩΔ
■Example (Ωδ = 5ps)
11ICCAD 2010, Dong-Jin Lee, University of Michigan
Modeling Process Variation
■Impact of variation on skew(si,sj) depends on tree path length(si,sj), num. buffers(si,sj) and type buffers(si,sj)
■Notation− T : technology node− B : buffer and wire library− v : variation model
■Variation-estimation table ΞT,B,ν,y[w,b,t] − worst-case increase in skew (with probability y) between
two sinks connected by a tree path of length w with b buffers and the buffer type t
12ICCAD 2010, Dong-Jin Lee, University of Michigan
w : tree path length b : num. of buffers (2)t : buffer type
A B C D
Modeling Process Variation
■varEst(si,sj)
− the worst case variational skew(si,sj)−
■Key constraint−
13ICCAD 2010, Dong-Jin Lee, University of Michigan
Initial Tree Construction
■ZST-DME algorithm* based on Elmore delay■A simple and robust technique for obstacle avoidance** ■Initial buffer insertion
− t0 : the initial buffer type for initial buffer insertion− Use variation-estimation table with path lengths from
initial tree
− Once t0 is determined, we adapt the fast variant of van Ginneken’s algorithm*** for initial buffer insertion
− Minimize insertion delay, reliable slew rate
14
* : J.-H. Huang et al, “On Bounded-Skew Routing Tree Problem,” DAC‘95
** : D.-J. Lee et al, “Contango: Integrated Optimization of SoC Clock Networks,” DATE‘10
*** : W. Shi et al, “A Fast Algorithm for Optimal Buffer Insertion,” Trans. on CAD 24(6),2005
ICCAD 2010, Dong-Jin Lee, University of Michigan
Robustness Improvement
■Improve robustness after initial buffer insertion so that ωΨ
Δ,ν,y < ΩΔ holds after skew optimization
■
■The target buffer type for a tree-path between sink si and sj, t(si,sj) is defined as the smallest t such that
− choosing smaller buffers reduces capacitance
15ICCAD 2010, Dong-Jin Lee, University of Michigan
Local Skew Optimization : Wire Snaking
16
Ttarget(e) : 11ps Tactual(e) : 7ps
T2actual(e) : 3ps
T3actual(e) : 1ps
ICCAD 2010, Dong-Jin Lee, University of Michigan
■Local-skew optimization techniques− based on the optimal tuning amount
from the slack computation algorithms with varEst(si,sj) ■Improved wire snaking algorithm
− speed, accuracy and routing resources
e
T1target(e) : 11ps T1
actual(e) : 7ps
T2target(e) : 4ps
T3target(e) : 1ps
Tactual(e) : 7psTtarget(e) : 11ps Tactual(e) : 10psTactual(e) : 11ps
Titarget(e) ≥ Ti
actual(e)
■α : to keep Tiactual(e) ≤ Titarget(e) efficiently
■Delay model for wire snaking aims for Tiactual(e) to satisfy the above inequality with the highest α possible
■Look-up tables for length estimation− to enhance the quality of estimation by wire snaking − a set of SPICE simulations for each technology
environment which includes technology model, types of buffers and wires, variation specification
■We achieved α values between 60% and 70% for the ISPD 2010 CNS contest benchmarks
Delay Model for Wire Snaking
17ICCAD 2010, Dong-Jin Lee, University of Michigan
■Wire snaking at buffer outputs is more accurate than at other nodes
■Limiting wire snaking to buffer outputs reduces # of SPICE calls
■Example
Optimal Node Selection for Wire Snaking
18ICCAD 2010, Dong-Jin Lee, University of Michigan
■Highly unbalanced sink capacitances or layout obstacles may result in significant local skew
■Delay buffer insertion− Skew can be reduced by the delay of the inserted buffer− Further precise wire snaking is possible because
the inserted buffer isolates the target node■Example
Delay Buffer Insertion
19ICCAD 2010, Dong-Jin Lee, University of Michigan
ISPD’10 Clock Network Synthesis Contest
■45nm 2GHz CPU benchmarks from IBM and Intel
■Evaluation− Monte-Carlo SPICE simulations with PVT variations− Skew and slew constraints (7.5ps, 100ps)− Objective : total capacitance — proxy for dynamic power
■A rare opportunity to compare multiple strategies for clock-network synthesis
20ICCAD 2010, Dong-Jin Lee, University of Michigan
■ISPD 2010 benchmarks
− 2.6ps nominal local skew− Smaller capacitance than CNSrouter and NTUclock
by 4.22× and 4.13× resp.− Our clock trees yield > 95%, while CNSrouter violates
yield constraints on 3 benchmarks and NTUclock on 7
Empirical Validation
22ICCAD 2010, Dong-Jin Lee, University of Michigan
■Local skew constraints are
all cleared
■Smaller capacitance than NTU
and CUHK by 2.09× and
4.24× resp.
■More robust withsmaller
capacitance
ICCAD 2010 Proceedings
23ICCAD 2010, Dong-Jin Lee, University of Michigan
NTU CUHK Contango2
Bench ωΨΔ,ν,y Cap. ωΨ
Δ,ν,y Cap. ωΨΔ,ν,y Cap.
cns01 7.16 445 7.23 1168 7.01 198
cns02 7.33 934 7.35 2100 7.34 376
cns03 4.88 184 3.95 94 4.18 56
cns04 4.09 196 7.25 125 4.46 72
cns05 3.81 89 7.27 74 4.41 38
cns06 7.49 16 6.79 87 6.05 48
cns07 6.24 23 5.97 128 4.58 73
cns08 5.47 23 5.37 97 5.15 52
Avg. 5.81 2.09 6.40 4.24 5.40 1.0
■Probability density functions (PDF) for skew on ISPD’10 benchmarks
Skew Profiles for Contango2 & CNSrouter
24ICCAD 2010, Dong-Jin Lee, University of Michigan
■When tight local skew constraints, large buffers ensure robustness, increasing capacitance
− Much capacitance can be saved when local skew constraints are loose
■Experiments on ispd10cns08
Trade-off - Power vs Robustness to Variations
25ICCAD 2010, Dong-Jin Lee, University of Michigan
■A tree solution for CPU clock routing− Improves power consumption under tight skew
constraints in the presence of variation− Clock trees can be tuned to have nominal skew below
5 ps and low total skew in the presence of variation− 4x capacitance improvement on average over
mesh structures
■Our clock trees have a higher yield than meshes− meshes are not as easy to tune for nominal skew
Summary
26ICCAD 2010, Dong-Jin Lee, University of Michigan