View
216
Download
0
Embed Size (px)
Citation preview
1
Computing with Computing with Leakage CurrentsLeakage Currents
Nikhil Jayakumar, Kanupriya Gulati, Rajesh Garg and Sunil P. Khatri
ECE Department
Texas A&M University
2
OutlineOutline
Sub-threshold circuits – the opportunity Challenges
Process/temperature/voltage variations Energy minimization in sub-threshold circuits Re-claiming the speed penalty
What’s next?
3
IntroductionIntroduction Power consumption has become a significant
hurdle for recent ICs Higher power consumption leads to
Shorter battery life Higher on-chip temperatures – reduced operating
life of the chip There is a large and growing class of applications There is a large and growing class of applications
where power reduction is paramount – not speed.where power reduction is paramount – not speed. Such applications are ideal candidates for sub-
threshold circuit design. OK, so what is sub-threshold design??
4
As supply voltage scales down, the VT of the devices is scaled down as well.
A larger VT would reduce leakage but increase delay.
Leakage increases exponentially with decreasing VT
Until a few process generations ago, leakage power was negligible compared to dynamic power But leakage power is now becoming comparable with dynamic power. Ouch
(three times). Can we turn this dilemma into an opportunity ?Can we turn this dilemma into an opportunity ?
Sub-threshold LeakageSub-threshold Leakage
Tgs VV
t
ds
t
offTgs
v
V
nv
VVV
osubds ee
L
WII 1 when
5
The OpportunityThe Opportunity
Process Delay(ps) Power(W) P-D-P(J) Delay Power P-D-P Delay Power P-D-P bsim70 14.157 4.08E-05 5.82E-07 17.01X 308.82X 18.50X 9.93X 141.10X 14.43X
bsim100 17.118 6.39E-05 1.08E-06 24.60X 497.54X 20.08X 12.00X 100.96X 8.20X
Sub-threshold Ckt (Vb = VDD)Sub-threshold Ckt (Vb = 0V)Traditional Ckt
Compared traditional circuit with sub-threshold (obtained by simply setting VDD < VT)
Performed simulations for 2 different processes on a 21 stage ring oscillator. Impressive power reduction (100X – 500X) Power-Delay-Product (P-D-P) improves by as much as 20X
P-D-P is an important metric to compare circuit design styles Delay penalty of 10X – 25X can be reduced:
By applying forward body bias (dynamic) By reducing VT values (static)
6
The OpportunityThe Opportunity
VT Delay Power P-D-P VT Delay Power P-D-P
0.18 16.15X 167.52X 10.41X 0.27 23.32X 479.85X 20.60X0.17 14.88X 151.99X 10.09X 0.25 22.43X 464.33X 20.16X0.16 13.78X 137.73X 9.95X 0.23 21.02X 444.23X 20.05X0.15 13.15X 124.59X 8.86X 0.21 18.69X 400.89X 20.27X0.14 12.43X 112.73X 9.40X 0.19 18.42X 366.28X 18.98X0.13 12.32X 101.85X 8.02X 0.17 17.51X 323.26X 17.98X
bsim70 bsim100
We also performed experiments with lower VT values.
VT can be modified with no extra cost
Delays improved, while the PDP improvement remained high.
7
Sub-threshold LogicSub-threshold Logic Advantages
Circuits get faster at higher temperature. Hence no need for expensive cooling techniques.
Device transconductance is an exponential function of Vgs which results in a high ratio of on versus off current. Hence noise margins are near-ideal.
Note that device is never “on”. It is just “off” or “exponentially more off”, so to say
Disadvantages Ids has an exponential dependence on temperature.
Ids is highly dependent on process variations (such as VT variations).
Ids is small. This explains the delay penalty
t
ds
t
offTgs
v
V
nv
VVV
osubds ee
L
WII 1
8
Solving the Problem of Solving the Problem of Delay Sensitivity to Delay Sensitivity to
Process, Voltage and Process, Voltage and Temperature VariationsTemperature Variations
9
Our SolutionOur Solution We propose a technique that uses self-adjusting
body-bias to phase-lock the circuit delay to a beat clock.
Use a network of PLAs to implement circuits. Several PLAs in a cluster share a common Nbulk
node. A representative PLA in each cluster is chosen to
phase lock the delay of the PLAs to the beat clock If the delay is too high, a forward body bias is
applied to speed up the PLA. If the delay is low, the body bias is brought back
down to zero to slow down the PLA.
10
PLA structurePLA structure We use precharged
NOR-NOR PLAs as the structure of choice.
Wordlines run horizontally.
Inputs (and their complements) and the outputs run vertically.
Several PLAs in a cluster share a common Nbulk node.
11
The Charge PumpThe Charge Pump
12
Effectiveness of the Effectiveness of the ApproachApproach
We simulated a single PLA from 0ºC to 100ºC. Also applied VT variations (10%) and VDD variations (10%).
The light region shows the variations on delay over all the corners.
The red region shows The red region shows the delays with the the delays with the self-adjusting body-self-adjusting body-bias circuit.bias circuit.
13
An Example Showing An Example Showing Phase LockingPhase Locking
This figure shows how the body bias (and hence the delay of the PLA) changes with changes in VDD.
The adjustment is very quick (within a few clock cycles).
VDD change0.2V to 0.22V
VDD change0.22V to 0.18V
14
What about Energy What about Energy MinimizationMinimization
Minimum Power does not mean Minimum Power does not mean Minimum Energy…Minimum Energy…
We are interested in mimimum We are interested in mimimum energy operation given the energy operation given the
application scenario envisionedapplication scenario envisioned
15
What about Energy ??What about Energy ??
Minimizing VDD reduces power. But minimum VDD does not mean minimum Energy! There exists an optimum VDD for minimum Energy.
16
Finding the Optimum Finding the Optimum VDDVDD
While one level of PLAs is Evaluating, the others are Precharged.
The Precharged PLAs are consuming leakage power.
Hence optimum VDD depends on logical depth.
staticstatic
dyndyn
owerEvaluatedPrPchgedPoweDD
DEnergyEvaluatingrgyPchgingEneEnergy
2
1
17
The Optimum VDDThe Optimum VDD
The optimum VDD value increases with increased logical depth. The optimum VDD can vary with temperature (since the circuits
get faster with temperature). The optimum VDD can be estimated given the logical depth and
delay for each PLA.
25ºC 100ºC
18
Reclaiming Part of the Reclaiming Part of the Speed PenaltySpeed Penalty
19
MicropipeliningMicropipelining
For high-speed operation, a network of PLAs can be implemented as an Asynchronous Micropipeline.Asynchronous Micropipeline. P1 triggers a precharge event P2 triggers an evaluate event
Latency increases, but throughput improves Latency increases, but throughput improves dramatically.dramatically.
Handshaking Logic
20
Micropipelining ResultsMicropipelining Results
Non-μ pipelined μ-pipelined Improvement Non-μ pipelined μ-pipelined ImprovementC432 2665 475 0.18 7392 10080 1.36C499 2665 475 0.18 9408 12096 1.29alu4 3340 475 0.14 9408 12768 1.36
count 1315 475 0.36 3360 4032 1.20rot 3565 475 0.13 12768 21504 1.68
apex6 2890 475 0.16 16128 24192 1.50C1908 4465 475 0.11 16128 24864 1.54c2670 4015 475 0.12 22848 31584 1.38c1355 3790 475 0.13 14112 20832 1.48c3540 8290 475 0.06 45024 75936 1.69c880 2665 475 0.18 10752 14112 1.31pair 5140 475 0.09 43680 67200 1.54
Avg 0.1533 1.4444
Delay (ns) Area (μ2)Ckt
We get an average speedup of 7X over a non-We get an average speedup of 7X over a non-micropipelined design.micropipelined design.
After this, sub-threshold circuits are slower by a factor of 1.5X -3.5X over their traditional (non micropipelined) counterparts
21
Layout of the PLALayout of the PLA
Each PLA has 16 inputs, 14 outputs and 24 rows (cubes).
22
Ambient Light Powered Ambient Light Powered ICsICs
The approach lends itself to being powered by energy scavenged from ambient light Early studies show that this is feasible New Cadmium Sulfide/Cadmium Telluride solar
panels achieve 0.09W/cm2. (Silicon panels produce 0.015 W/cm2)
Estimated power consumption for a subthreshold processor of this size is about 10mW.
So the CdS/CdTe panel could power our processor with a 9X safety margin
Challenges include how to store energy (battery? Supercapacitors? MIM capacitors?).
23
What next?What next? Explore extensions to structured ASIC approaches Fabrication of a subthreshold design (in 2006)
Mixed-signal – with small processor and transceiver on a single die.
Set up a small hardware lab for debug/diagnosis Validate the experiments we discussed
Hope to use this test-chip to validate other ideas as well.
Develop a design methodology for sub-Develop a design methodology for sub-threshold electronics, tuned for widespread use.threshold electronics, tuned for widespread use.
24
SummarySummary Sub-threshold circuit design is promising due to extreme
low power. The delay phase locking approach helps sub-threshold logic
design overcome the hurdle of sensitivity to PVT variations. This can help achieve a significant yield improvement.
The study on optimum VDD for minimum Energy helps to fix an optimum VDD for a given logical depth.
Micro-pipelining helps bridge the delay gap. Sub-threshold design approaches are appealing for a
widening class of low power or energy applications. Goal : Help bring sub-threshold logic design into the
mainstream of VLSI technology.
25
Thank you!!