Click here to load reader
Upload
duongnguyet
View
212
Download
0
Embed Size (px)
Citation preview
1
1
CMOS Technology
for Computer Architects
Lecture 4: Static Complimentary CMOS
Iakovos Mavroidis
Giorgos Passas
Manolis Katevenis
FORTH-ICS (University of Crete)
2 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Recap: Propagation delay
tHL = = CL ΔV
ΙAV
Based on previous calculations:
2 L CL VDD
k’n Wn (VDD - VTn )2
When VDD >> VTn
tHL ~ ~ ~ 2 L CL
k’n Wn VDD Linear dependence on L/W,
CL and 1/ VDD
tLH ~ ~ ~
2 L CL
k’p Wp VDD
VT impacts time
tLH = tHL Wp Wp ≈ 3 Wn
~ ~ ~ k’n
k’p
Wn
Same result for good
noise margins
3 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Recap: RC Model for Inverter (Self Load)
VDD
Rn
Vout
CL
Vin
Vout(t) = (1 – e –t/RC) V
T50% = 0.69 Rn CL
T90% = 2.2 Rn CL
Rn = R0 / W
CL = Cint(min) W + Cext
T50% = 0.69 Rmin Cint (min) + Rmin Cext / W When W >> 1 time is almost
constant but Cgate becomes large T50% ≈ 0.69 Rmin Cint(min)
4 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Recap: Cycle Time vs FO4
FO4 for IBM90nm ≈ 40ps (2002)
FO4 for 22nm ≈ 7ps (2011)
2.5GHz = 400ps Clock Cycle
Clock Cycle in 2002 ≈ 10 FO4
Clock Cycle in 2011 ≈ 57 FO4 (5+ times more)!
Less pipeline stages? More complex stages?
Less Layout Engineers? Fewer timing violations? Rely on slower CAD tools?
Decrease Voltage? Increase threshold voltage?
2
5 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Recap: Two Inverters Layout
6 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Lecture Contents
Static CMOS Logic
PUN & PDN
Transistor Sizing
Logical Effort
Delay of a single gate
7 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Combinational vs. Sequential Logic
Combinational Sequential
Output = f ( In ) Output = f ( In, Previous In )
CombinationalLogicCircuit
OutInCombinational
LogicCircuit
OutIn
State
8 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Static Logic and Complementary CMOS
At every point in time (except during the switching transients) each gate output is connected to either VDD or VSS via a low-resistive path
This is in contrast to dynamic circuit style
The outputs of the gates assume at all times the value of the Boolean function implemented by the circuit
Complementary CMOS gates
High noise margins
No static power consumption
Comparable rise and fall times
3
9 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Static Complementary CMOS
PUN and PDN are dual logic networks
PUN and PDN functions are complementary
Required transistors = 2N
VDD
F(In1,In2,…InN)
In1
In2
InN
In1
In2
InN
PUN
PDN
PMOS only
pull-up: make a connection from VDD to F
when F(In1,In2,…InN) = 1
NMOS only
pull-down: make a connection from F to
GND when F(In1,In2,…InN) = 0
10 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Threshold Drops
VDD
VDD PDN
0
CL
CL
PUN
VDD
0
CL
VDD
VDD
VDD
CL
11 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Threshold Drops
VDD
VDD 0 PDN
0 VDD
CL
CL
PUN
VDD
0 VDD - VTn
CL
VDD
VDD
VDD |VTp|
CL
S
D S
D
VGS
S
S D
D
VGS
12 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
NMOS Transistors in Series/Parallel Connection
Transistors can be thought as a switch
NMOS switch closes when switch control input is high
A
B
A • B
A B
A + B
AND: Y = X if A AND B
OR: Y = X if A OR B
NMOS pass a “strong” 0 but a “weak” 1
A
B
Y
X
A B
Y
X
4
13 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
PMOS Transistors in Series/Parallel Connection
PMOS switch closes when switch control input is low
A B
A +B
NOR: Y = X if A AND B = A + B
NAND: Y = X if A OR B = A B
PMOS pass a “strong” 1 but a “weak” 0
A
B
Y
X
A B
Y
X
A
B A B
Y
14 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Dual PUN and PDN
PUP is the dual of PDN
(can be shown using DeMorgan’s Theorem’s)
AB = A + B serial PDN = parallel PUN
A+B = A B parallel PDN = serial PUN
Static CMOS gates are always inverting
AND = NAND + INV
15 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
CMOS NAND
16 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
CMOS NOR
5
17 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Building CMOS Gate
Construct PDN format F = G(In1, In2, in3, …)
Example F = A + BC
A + BC = ((A+BC)’)’ = (A’(BC)’)’ = (A’(B’ + C’))’
i.e. G = A’(B’+C’)
A
C B
Construct dual PUN
18 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Complex CMOS Gate
OUT = D + A • (B + C)
D
A
B C
19 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Complex CMOS Gate
OUT = D + A • (B + C)
D
A
B C
D
A
B
C
20 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Stick Diagrams
Contains no dimensions Represents relative positions of transistors
In
Out
V DD
GND
Inverter
A
Out
V DD
GND B
NAND2
6
21 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Consistent Euler Path
C
A B
X = C • (A + B)
B
A
C
i
j
j
VDD X
X
i
GND
A B
C
PUN
PDN
A B C
Logic Graph
22 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
OAI22 Logic Graph
C
A B
X = !((A+B)•(C+D))
B
A
D
VDD X
X
GND
A B
C
PUN
PDN
C
D
D
A B C D
23 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
OAI22 Layout
B A D
VDD
GND
C
X
Some functions have no consistent Euler path like x = !(a + bc + de) (but x = !(bc + a + de) does!)
24 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Delay Dependence on Input Pattern
A=B=10
A=1, B=10
A=10, B=1
time (psec)
Voltag
e(V
)
NMOS = 0.5m/0.25 m, PMOS = 0.75m/0.25 m, CL = 10 fF
A
B
A B
CL
Input Data Pattern
Delay (psec)
A=B=10 35
A=1, B=10 76
A= 10, B=1 57
7
25 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Transistor Sizing
B
A
A B
F
VDDVDD
A B
A
B
F
VDD
A
A
F
1
2 2 2
2
2
1 1
4
4
Inverter 2-input NAND 2-input NOR
Rn
Rp
Rn = Rp
Rn/2
Rn/2
Rp Rp Rp / 2
Rp / 2
Rn Rn
26 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Transistor Sizing a Complex CMOS Gate
OUT = !(D + A • (B + C))
D
A
B C
D
A
B
C
27 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Transistor Sizing a Complex CMOS Gate
OUT = !(D + A • (B + C))
D
A
B C
D
A
B
C
1
2
2 2
2
2
4
4
6
6
12
12
28 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
tp as a Function of Fan-In and Fan-Out
Complimentary CMOS gates are not so fast:
Fan-in: quadratic due to increasing resistance and capacitance
Fan-out: each additional fan-out gate adds two gate capacitances to CL
tp = a1FI + a2FI2 + a3FO
Gates with FI ≥ 4 must be avoided
8
29 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Fast Complex Gates: Design Technique 1
Transistor ordering
C2
C1 In1
In2
In3
M1
M2
M3 CL
charged 1
0 charged
charged 1
delay determined by time to
discharge CL, C1 and CL
C2
C1 In3
In2
In1
M1
M2
M3 CL
critical path
delay determined by time to
discharge CL
1
1
01 charged
discharged
discharged
critical path
1
30 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Fast Complex Gates: Design Technique 2
Transistor sizing
as long as fan-out capacitance dominates
Progressive sizing
InN CL
C3
C2
C1 In1
In2
In3
M1
M2
M3
MN
Distributed RC line (Elmore delay):
tpHL = 0.69 (R1C1+(R1+R2)C2+(R1+R2+R3)C3+… )
M1 > M2 > M3 > … > MN
Can reduce delay by more than 20%
31 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Fast Complex Gates: Design Technique 2
R1 < R2 < R3
M1
M2
M4
However more area
and more intrinsic
capacitance
Good technique if output capacitance is high
32 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Fast Complex Gates: Design Technique 3
Reducing the voltage swing (not CMOS)
linear reduction in delay
also reduces power consumption
But the following gate is much slower!
Or requires use of “sense amplifiers” on the receiving end to restore the signal level (memory design)
tpHL = 0.69 (CL VDD/2)/ Iav )
= 0.69 (CL Vswing/2)/ Iav )
9
33 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Fast Complex Gates: Design Technique 4
Isolating fan-in from fan-out using buffer insertion
CL CL
It is better to drive a big capacitive load directly with the NAND
gate, or after some buffering ? see Logical Effort (next)
34 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Fast Complex Gates: Design Technique 5
Alternative logic structures
F = ABCDEFGH
Which one is the best? see Logical Effort (next)
35 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Delay Metric: “Intrinsic delay” of Inverter
VDD
Rn Cout= Cint = 3 W Cdunit
(Cext= 0)
Vin = V DD
τ = k Rinv Cdinv
τ = k (Runit /W) (3 W Cdunit )
τ = 3k Runit Cdunit 1W
2W
Propagation delay of inverter ignoring external load
36 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Delay of NAND Gate
τ = 3k Runit Cdunit
Cdnand = 6 W Cdunit = 2 Cdinv
Cgnand = 4 W Cgunit = (4/3) Cginv
Cgunit ≈ Cdunit
h = CL / Cgnand
tpnand = k R (Cdnand + CL)
= k (Runit / W) (6 W Cdunit + CL)
= k (Runit Cgnand/W) (6 W Cdunit / Cgnand + CL/Cgnand)
= k (Runit 4 Cgunit) (6 W Cdunit / 4 W Cgunit + f)
= 3 k (Runit Cgunit) (2 Cdunit / Cgunit + 4/3 f)
= 3 k (Runit Cdunit) (2 + 4/3 f Cgunit / Cdunit )
= τ (2 + 4/3 h)
A
B
A B 2W 2W
2W
2W
CL
10
37 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Delay of NOR Gate
τ = 3k Runit Cdunit
Cdnor = 6 W Cdunit = 2 Cdinv
Cgnor = 5 W Cgunit = (5/3) Cginv
Cgunit ≈ Cdunit
h = CL / Cgnor
tpnor = k R (Cdnor + CL)
= k (Runit / W) (2 W Cdinv + CL)
= k (Runit Cgnor/W) (6 W Cdinv / Cgnor + CL/Cgnor)
= k (Runit 5 Cgunit) (6 W Cdunit / 5 W Cgunit + f)
= 3 k (Runit Cgunit) (2 Cdunit / Cgunit + 5/3 f)
= 3 k (Runit Cdunit) (2 + 5/3 f Cgunit / Cdunit )
= τ (2 + 5/3 h)
1W CL
A + B A
B
A B 1W
4W
4W
38 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Logical Effort
tpgate = τ (p + g . h)
Measure everything in units of τ (divide by τ):
tpgate = p + g . h
p – intrinsic delay - gate parameter f(W)
g – logical effort - gate parameter f(W)
h – effective fanout
Normalize everything to an inverter:
ginv =1, pinv = 1
You can find this formula also as tpgate = p + LE . FO
39 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Delay in a Logic Gate
Gate delay:
d = p + f
effort delay intrinsic delay
Effort delay:
f = g h
logical effort electrical effort (effective fanout) = Cout/Cin
Logical effort is a function of topology, independent of sizing
Electrical effort (effective fanout) is a function of load/gate size
40 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Logical Effort (g)
Inverter has the smallest logical effort and intrinsic delay of all static CMOS gates
Logical effort of a gate presents the ratio of its input capacitance to the inverter capacitance when sized to deliver the same current
Logical effort increases with the gate complexity
A
B
A B 2 2
2
2 1
A
B
A B 1
4
4
A 1
2 A
11
41 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Calculating Logical Effort
g = 1 g = 4/3 g = 5/3
Cinput
42 CMOS Technology for Computer Architects
Spring 2012 – Lecture 4
Logical Effort of Gates
Electrical effort (h)
No
rma
lize
d d
ela
y (τ)
1 2 3 4 5 6 7
tpnand
gnand = 4/3
pnand = 2
dnand = (4/3)h+2
tpnor
gnor = 5/3
pnor = 2
dnor = (5/3)h+2
NOR is “slower” than NAND
tpinv
Intrinsic Delay
ginv = 1
pinv = 1
dinv = h+1
Effort Delay