Upload
marcus-carson
View
212
Download
0
Embed Size (px)
Citation preview
2
Low Power Techniques
1. Introduction
2. 왜 low power 인가 ?
3. Future Opportunities for Low-Power
4. How to reduce power
Contents
3
• Silicon is the winner, and among many, CMOS is the winner.
• So will it be at least for next 25 years.
1. Introduction
Power
Delay FPGA
P
Flexibility-1
(Programmability)
Reliability-1
Design TAT
CostSize
Full custom
1) Drivers for IC progress
4
There’s no show stopper! (in technology)ex. 양자 / 열역학 (min. switching energy, power dissipation)
전자기학 ( 빛의 속도 ) material, etc.
Except for Multi-Billion $ investment cost!Moore’s law will keep being honored.
Why? 1. No insurmountable obstacle exists.
2. People believes & behaves accordingly.• Huge opportunity exists only if we do good in exploiting
1) cross-breeding, co-utilization and co-development among interactable technologies
2) Technology sharing using network
5
2) Big Picture : If power reduction is THE goal, you need to visit all areas to achieve it.
Speed Power Designtime Feb. Cost Pgmmability
algorithm
architecture
logic
circuit
device
process
material
S/W
6
Analogy : Vertical engineer vs. horizontal engineer IF you want to sell graphic chip, you need to do anything to help achieve it, from design, application to marketing, etc.
graphicswireless giga-bit switch
marketing
application
Legal affairs(IP)
Main facturing
verification
design
testing
simulation
MPEG RAMBUS
Process tuning
P-core
Vertical engineer
Horizontal engineer
7
2. 왜 low power 인가 ?
1) Battery 기술 발전 slow ! : 5-8 배 향상 /200yrs200 년전 : 납축전지 25 watt.hour/kg now : lithium polymer 전지 : 200 watt.hour/kg이에 비하면 반도체기술은 30 년동안 106 배 (CPU 속도 ) 매 3년마다 4 배 (Memory density) Still wild wild frontier stretching before us!
2) 열방출 문제 :You don’t want big cooling tower for each IC’s !
3) Energy 절약 :minimize the amount of energy consumption, and recirculation period, otherwise our earth will be EXHAUSTED.
4) Convenience : too many wires around : mess
8
3. Future Opportunities for Low-Power
1) PDA(Personal Digital Assistant)telephone, pager, pen-based input, schedule keeper, audio/video entertainment fax, video camera, data security with fingerprint and/or voice recognition, speech recognition, appl. S/W, teleconferencing…
2) Tablet(descendent of current Notebook)
Appl.Server
BaseStation(RF)PDA
Function sharingfor “low-power”ing PDA
9
3) Virtual Reality(VR) headset for Games: allows you to move around, only if there’s no wire.: delegate complex processing to fixed server, while
performing only video decompression.4) Military :
No chance for wires, No heavy batteries was your too busy.
– Information warfare :
1) Soldier locates enemy tank using laser rangefinder with GPS
2) request(for airstrike) to control officers
3) aircraft nearby gets command
10
5) Pico-cell based home network for Games
Get all available service,Allow all possible communications among home devices,But with no messy wires.
CablexDSL
SatelliteFTTH
Homeautomation
Homecellular
A/V digitalnetwork
Phone &TVI/F
Temp controlsecurity
PDAcellularvideo-phone
HDTV, VCRGame Camera, Printer
11
6) Medical Uses
pace maker(implanted)
health monitor
hearing aids
7) GPS(for traveller/explorer, driver(car, ship, boat, soldiers …)
8) RF ID(for identifying people, animal, cars…)
passive type : resonant LC circuits
active type(no battery, draws RF power from RF field)
9) Smart Cards :
주민증 , Cash drawing
encryption, COS(card OS)
12
4. How to reduce Power
• By all means possible, algorithm, S/W, architectures, data representation, logic & circuits place & route, clock, process, library, material
1) algorithm :
adjusting # of taps(N) in FIR filters by measuring noise power.
transferfunction
N=10
N=6(low power)
13
2) Software : similar to the case when reducing code size & improving speed of execution
– instruction selection and ordering compiler’s jobto minimize Bus switching
– minimize memory space & access (reduce cache miss)
– codesign for low power
– slow down clock
– halt clock
– lower VDD
– Shut down
14
3) Architectures
• Parallel architecture
– Switching Power
VDD
f
VDD/n, f/n
MUXMUX
P CV fDD12 P n C
V
n
f
n
P
nDD
2
2
2121 ( )
For the same speed
f VDD
tCV
i
CV
V V VDD DD
DD T DD
( )
~2
1
Sacrifice area for low power
15
• Pipelining
i) VDD 가 로 되면 speed 도 로 됨 .
ii) pipeline stage 수를 n 으로 하면 각 stage 의 logic complexity
는 로 되고 , 따라서 speed(throughput) 가 n 배로 됨 .
iii) speed 는 그대로 유지 됨 .( 는 pipelining overhead,
ex : 각 stage delay 의 mismatch …. )
VDD
f
V
nfDD ,
P CV
nf
P
nDD
22 1
21 ( ) ( )
Latches
P CV fDD12
1
n
1
n
1
n
16
• Effective capacitance activity-driven bus placement
priority for placing bus(route, layer)
CV f2
Displaydata
SRAM data address bus : small
: large
Distance from core to pads
• BUS 에서의 switching power 소모를 최소화 :
Decreasing (activity)
Phys.Cap.
mostly READ operationmostly sequential access
17
V(voltage swing) reduction
- low-swing bus
ex. GTL(Xerox)
CTT(Mosaid)
JTL(Jedec)
LVTTL, LVCTT ….
- Charge-recycling bus
I/F I/Fhi-V
lowV
Large C Small C
High V
V
V VDD01.
18
EX-OR
Polaritydecision logic
DATA bus
Polarity signal
Receiveddata
Sourcedate
• BUS invert encoding :- send inverted signals when majority of bits are switching, and deinvert.
19
• F(frequency) lowering :
f/N master clock
Multiply f by Nusing PLLbefore distribution
PLL
PLL
PLL
20
4) Data representation
• Gray code vs. binary 2’s(or 1’s) compl.
# of toggles ratio :
• signed magintude vs. 2’s compl.
Zero-crossing 시 sign-bit Zero crossing 시 full switching
만 변함 .
B
Gn
n
n
n
2 2 1
22
( )
21
5) Logic
• Signal gating : masking unwanted switching activities from propagating forward, causing unnecessary power dissipation.
• Additional power due to control signal generation should be small. Frequency of control signal needs to be slower than the signal frequency.
22
• Logic encoding ; binary vs. Gray code for counters
23
24
• State encoding
E(M1) expectation of # of switchings per transition = 2(0.3+0.4)+1(0.1+0.1)=1.6
E(M2) 1(0.3+0.4+0.1)+2(0.1)=1.0 - assigning don’t cares to either 1 or - for low switching
1 1
0 0 0 1
0.3
0.1
0.1
0.1
(M1)
VS.0 1
0 0 1 1
0.3
0.1
0.1
0.1
(M1)
0.4 0.4
=
25
•Precomputation logic ;– saves power by masking uninfluential input signals into the comb
inational logic with g(x), precomputation logic.– I.e., for the out put f(x), there may be some conditions under whi
ch f(x) is independent of some set of input signals latched in R2, which can be disabled according to g(x).
26
ex.) Binary comparator : f(A,B) = 1 if A>Bg(x) = AnBn
27
• Systematic method to derive a pre-computation function, g(x), given f(x), R1 and R2
• Let f(p1, … pm, x1, …, xn) be Boolean function where p1,…, pm are pr
e-computed inputs corresponding to R1, and x1,…,xn are gated input
s corresponding to R2.
• Let fxi(fxi)be the Boolean function obtained by setting xi=1(xi=0) in f.
• Define Uxi f (= universal quantification of f w.r.t. xi )= fxi * fxi
• Then Uxi f = 1 implies f=1 regardless of the value of xi, because Uxif
=1 means fxi= fxi =1 in the Shannon’s decomposition of f w.r.t. xi
f=xi*fxi +xi*fxi
28
• Let g1 = Ux1 Ux2 … Uxn f
Then g1 =1 implies that f=1 regardless of the values of x1 … xn.I.e., g1=1 is one of the conditions where f is indep. of the input values of x1 … xn.
• Similarly, g0 = Ux1 Ux2 … Uxn f g0=1 implies that f=0 regardless of x1,…xn.
• Then g=g1+g0 is the pre-computation function.I.e. if g = 1, we can disable the loading of x1,…xn into R2 because output f is independent of gated inputs.
• G, computed this way, may not be the unique pre-computation function, but it contains the most number of 1’s in its truth table among all pre-computation functions.
29
• Examples 1)Precomputation architecture based on Shannon’s decomposition;
f(x1,…,xn) = xi *fxi + xi*fxi
30
• Ex 2)Latch-based pre-computation architecture:
31
6) Low Power Circuits
• Use static rather than dynamicto avoid unnecessary precharge
• low static power– self reverse bias for reducing subthreshold current
S
VDD
Pc(Wc)X
I1 I2
Word line drivers
lnID
S=0(active)S=1(stdby)
stdby
Pdi
VGs
act
32
• Compromise between dynamic and leakage power dissipation
33
• Multi-VT(threshold) : speed-critical part : low VT
power-critical part : high VT
- by back-gate bias : routing difficult- by additional implant
• Adiabatic Computing :Power dissipation is due to voltagedrop on R reduce it!by gradual rise & fall of inputs
multi-step clock 파형
C
R
34
• Delay vs. power supply voltage(Td vs. VDD)
Td VDD-1
35
• Power delay product(Energy) vs. delay for various circuits
36
7) Power reduction in clock network• Why bother with clock network?
– In synchronous circuit, clock is generally the highest frequency signal.
– And, clock typically drives a large load as it has to reach many sequential elements.
– In alpha chip, power consumption in the clock network is 40% of total.
• Clock gating:– Most popular method for power reduction of clock
signals– effective when some functional module(ALU, memory or
FPU, etc) is not required for some extended period.– Gated clock suffers additional gate
delay due to gating function.
37
• Reduced clock swing:– Conventional vs. half-swing clocking
38
– Charge sharing circuit for half-swing clock
ddBA
AH V
CCCC
CCV
41
1 low, isCLK When
ddBA
AH V
CCCC
CCV
32
2 high, isCLK When
VH 0.5 Vdd if CA=CB >> C1, C2, C3, C4
39
– Simple charge sharing circuit
40
• Tri-state keeper circuit:– Floating node with its potential somewhere between GND and
VDD is noise-sensitive and can cause DC power dissipation in the fanin gate
– Floating bus suppressor circuit
41
• Blocking gate– Fanin gates connected to a node floating( as it is powered do
wn) can experience large short-circuit current.• Use a blocking NAND gate as below:
42
• Reduction of switching activity:– guarded evaluation:
• adding latches or blocking gates before C/L if its outputs are not used.
• Ex).
43
– Careful bus multiplexing for +vely correlated data stream– Aggressive bus multiplexing for -vely correlated data stream
44
8) process :
• VDD reduction reduce VT
• Standby current 를 줄인다 . VT not too small
• leakage 전류 축소 junction profile, high subthreshold swing
• switching power 축소 parasitic C 축소
(high-speed 와 같은 goal 유지 )
retrograded channel
trench
sidewall pacer for S/D implant
conflict
45
9) Library :
• Small size, various sizes for tr. sizing for delay balancing long intercon. on low C-layer
to reduce glitch
to reduce buffer size
10) Material low inter-layer dielectriclow material for intercon copper
large C
small C