View
228
Download
0
Category
Preview:
Citation preview
8/3/2019 F11 Lec 12 Misc Topics
1/19
Miscellaneous Topics
EE M216A .:. Fall 2011
Lecture 12
Alireza Tarighat
ee216a@gmail.com
RAM Memory
mailto:ee216a@gmail.commailto:ee216a@gmail.com8/3/2019 F11 Lec 12 Misc Topics
2/19
D. Markovic / Slide 3
Types of Memory
There are many types of memory
Usually distinguished by type of memory and access method
Access methods
Random access memory RAM You can access any memory location at the same speed
Most common type of memory
Content address memory CAM Access memory by a search on its contents
E.g. find location where the upper byte is 250
Memory Types
Static SRAM, read/write memory
Dynamic DRAM, read/write/refresh memory
Read only ROM, read mostly (PROM, EEPROM) Programmable ROM, Electrically Erasable PROM
EE216A - Fall 2011 Misc. Topics | 3
D. Markovic / Slide 4
Semiconductor Memory Classification
Read-Write MemoryNon-Volatile
Read-Write
Memory
Read-Only Memory
EPROM
E2PROM
FLASH
Random
Access
Non-Random
Access
SRAM
DRAM
Mask-Programmed
Programmable (PROM)
FIFO
Shift Register
CAM
LIFO
EE216A - Fall 2011 Misc. Topics | 4
8/3/2019 F11 Lec 12 Misc Topics
3/19
D. Markovic / Slide 5
Memory Architecture: Decoders
Word 0
Word 1
Word 2
WordN2 2
WordN2 1
Storagecell
Mbits Mbits
Nwords
S0
S1
S2
SN2 2
A0
A1
AK2 1
K5 log2N
SN2 1
Word 0
Word 1
Word 2
WordN2 2
WordN2 1
Storagecell
S0
Input-Output(Mbits)
Intuitive architecture for N x M memoryToo many select signals:
N words == N select signalsK = log2N
Decoder reduces the number of select signals
Input-Output(Mbits)
Decoder
EE216A - Fall 2011 Misc. Topics | 5
D. Markovic / Slide 6
Array-Structured Memory ArchitectureProblem: ASPECT RATIO or HEIGHT >> WIDTH
Amplify swing torail-to-rail amplitude
Selects appropriateword
EE216A - Fall 2011 Misc. Topics | 6
8/3/2019 F11 Lec 12 Misc Topics
4/19
D. Markovic / Slide 7
Hierarchical Memory Architecture
Advantages:
1. Shorter wires within blocks2. Block address activates only 1 block => power savings
EE216A - Fall 2011 Misc. Topics | 7
D. Markovic / Slide 8
Read-Write Memories (RAM)
STATIC (SRAM)
DYNAMIC (DRAM)
Data stored as long as supply is applied
Large (6 transistors/cell)
Fast
Differential
Periodic refresh required
Small (1-3 transistors/cell)
Slower
Single Ended
EE216A - Fall 2011 Misc. Topics | 8
8/3/2019 F11 Lec 12 Misc Topics
5/19
D. Markovic / Slide 9
6-transistor CMOS SRAM Cell
WL
BL
VDD
M5M6
M4
M1
M2
M3
BL
EE216A - Fall 2011 Misc. Topics | 9
D. Markovic / Slide 10
3-Transistor DRAM Cell
No constraints on device ratios
Reads are non-destructive
Value stored at node X when writing a 1 = V WWL-VTn
WWL
BL1
M1 X
M3
M2
CS
BL2
RWL
VDD
VDD2 VT
DV
VDD2 VTBL2
BL1
X
RWL
WWL
EE216A - Fall 2011 Misc. Topics | 10
8/3/2019 F11 Lec 12 Misc Topics
6/19
D. Markovic / Slide 11
3-Transistor DRAM Cell
No constraints on device ratiosReads are non-destructive
Value stored at node X when writing a 1 = V WWL-VTn
WWL
BL1
M1 X
M3
M2
CS
BL2
RWL
VDD
VDD2 VT
DVVDD2 VTBL2
BL1
X
RWL
WWL
EE216A - Fall 2011 Misc. Topics | 11
D. Markovic / Slide 12
RAM: Single-Port Access
Typical RAM IO list
CLK (common read/write clock)
DIN (input)
DOUT (output)
ADDR (read/write address)
EN (enable/disable)
WR (write/read)
Usually, there are memory compilers that generate any RAM size
in any process. Once created, they can be instantiated as an HDLmodule in the system.
At any clock cycle, only one address is readable in a RAM block
Single read address (ADDR); Single output data (DOUT)
At any clock cycle, only write or read operation can be performed
Single-PortEE216A - Fall 2011 Misc. Topics | 12
8/3/2019 F11 Lec 12 Misc Topics
7/19
D. Markovic / Slide 13
SRAM: Using standard DFF
RAM block can be created using standard DFFs and a decoder and a MUX
Typically larger than compact optimized SRAM implementations
No need for memory compiler
Implemented using standard cells
Inefficient for large sizes
EE216A - Fall 2011 Misc. Topics | 13
Dataout
D. Markovic / Slide 14
DFF-Based RAM (All Std Cells): 32 words x 32 bits
EE216A - Fall 2011 Misc. Topics | 14
8/3/2019 F11 Lec 12 Misc Topics
8/19
D. Markovic / Slide 15
Different RAM Variations
Single Port
One Address (either READ or WRITE at a time) Smallest, most efficient with array-structured memory cells
Dual Ports (RD/WR)
RD_ADDR & WR_ADDR
Read and write different locations at any cycle
Dual-Read Ports
WR_ADDR, RD_ADDR1, RD_ADDR2
Read two locations at a time
No possible with array-structured memory cells
Easily implementable with register-file structures There is no standard RAM IO definition
Even single-port RAMs could have different IO variations
EE216A - Fall 2011 Misc. Topics | 15
D. Markovic / Slide 16
Dual-Read-Port Register-File
Common register-bank
Duplicate muxes
More routing
EE216A - Fall 2011 Misc. Topics | 16
8/3/2019 F11 Lec 12 Misc Topics
9/19
D. Markovic / Slide 17
Dual Read/Write RAM
Read & Write (different or same) two location at a cycle
A commonly required feature A dual-port RAM is almost twice as big as a single-port RAM
When implemented with memory cells (inevitable for large
sizes)
Architectural solutions to avoid dual-port RAMs
Memory partitioning and address management
If N-word dual port RAM is required, implement N addresses as two
separate N/2-word single-port RAMs.
As part of memory address management, make sure the possible
simultaneous read and write addresses do not belong to the same N/2-
size block.
This is possible in most of applications since read/write addresses arent
totally arbitrary and random!
EE216A - Fall 2011 Misc. Topics | 17
D. Markovic / Slide 18
Dual Read/Write RAM
Architectural solutions to avoid dual-port RAMs
Memory partitioning and address management
If read/write addresses arent totally arbitrary!
Implement dual-port RAM by running a single-port RAM at
twice the speed!
Assume at every clock cycle (Tclk period), ADDR_WR is to be updated
and ADDR_RD is to be read.
Run a single-port RAM at twice clock frequency, in first Tclk/2 period,
update ADDR_WR and in second Tclk/2 read ADDR_RD.
It looks and feels like a dual-port RAM!
Whenever possible, avoid dual-port RAMs
There is a factor of 2 saving in area!
EE216A - Fall 2011 Misc. Topics | 18
8/3/2019 F11 Lec 12 Misc Topics
10/19
Hardware Reuse
D. Markovic / Slide 20
Hardware Reuse
If fastest clock achievable in a technology process is much
faster than the desired throughput:
This can be exploited to aggressively reduce logic area
Large physical modules such as multipliers can be reused
multiple times
Several logical multipliers are implemented using the same physical
multiplier
Example: FIR Filter
EE216A - Fall 2011 Misc. Topics | 20
8/3/2019 F11 Lec 12 Misc Topics
11/19
D. Markovic / Slide 21
Hardware Reuse: FIR Filter
EE216A - Fall 2011 Misc. Topics | 21
din(n)
din(n-1)din(n-2)
din(n-4)
din(n-3)
h0
h1
h2
h4
h3
FF
State-Machine
Sequencer
dout n din n k HL1
=0
FF
din[0] din[1]
din
dout
clk
dout[0] dout[1]
Clock Domain Crossing
8/3/2019 F11 Lec 12 Misc Topics
12/19
D. Markovic / Slide 23
CDC: Clock Domains
EE216A - Fall 2011 Misc. Topics | 23
Single Clock Domain
Multiple Clock Domain
D. Markovic / Slide 24
CDC: Metastability
EE216A - Fall 2011 Misc. Topics | 24
8/3/2019 F11 Lec 12 Misc Topics
13/19
D. Markovic / Slide 25
Clock Domain Crossing signal
CDC: Guaranteed Setup/Hold Violation
When 2 or more designs run on disparate clocks: The clocks will continually skew, guaranteeing setup/hold violations
Signals from one design to another are Clock Domain Crossings (CDCs)
EE216A - Fall 2011 25
D
CLK
Q
Sensor System Guidance System
Tx
Clock B
Clock A
Setup/hold window
Signals that crossasynchronous clock
domains (CDC signals)
WILL violate setup andhold conditions
25
D
CLK
Q
D. Markovic / Slide 26
CDC: Guaranteed Setup/Hold Violation
EE216A - Fall 2011 26
Q
D
CLK
Simulation captures a 1 whilesilicon produces either a 1 or
0
Setup Violation
Q
D
CLK
Hold Violation
Simulation Does NOT Reflect Silicon BehaviorPropagation from D to Q has an ambiguity of 1 clock cycle!
Q in silicon Q in silicon
Simulation captures a 0 whilesilicon produces either a 1 or
0
26
Q in simulationQ in simulation
8/3/2019 F11 Lec 12 Misc Topics
14/19
D. Markovic / Slide 27
CDC: Data Uncertainty
EE216A - Fall 2011 Misc. Topics | 27
D. Markovic / Slide 28
CDC: Data Uncertainty
EE216A - Fall 2011 Misc. Topics | 28
8/3/2019 F11 Lec 12 Misc Topics
15/19
D. Markovic / Slide 29
CDC: Data Uncertainty
EE216A - Fall 2011 Misc. Topics | 29
D. Markovic / Slide 30
CDC: Divergent Paths
FSM1_EN and FSM2_EN
have different profiles
although they are both
derived from the same
input signal.
EE216A - Fall 2011 Misc. Topics | 30
8/3/2019 F11 Lec 12 Misc Topics
16/19
D. Markovic / Slide 31
CDC: Metastability
Synchronization FF is used
when going from CLKA
domain to CLKB domain.
Double sampling can lower
probability of metastability
DB2 can then be used in
downstream logic on clock
domain CLKA
Although metastability canbe solved by double FFs,
other problems with CDC
still persist!
EE216A - Fall 2011 Misc. Topics | 31
D. Markovic / Slide 32
CDC: Timing Closure Across Two Clock Domains
Enforce and guarantee a timing condition between the two clock
domains.
Example: Same C1/C2 Frequency
If tskew and setup/hold for A with respect to C1 can be constrained;
opposite edge of C2 can be used to safely sample A and transfer to C2
domain.EE216A - Fall 2011 Misc. Topics | 32
c1
A
c2
tskew
8/3/2019 F11 Lec 12 Misc Topics
17/19
D. Markovic / Slide 33
CDC: Timing Closure Across Two Clock Domains
Opposite edging works only viable if tskew and setup/hold are less than Tclk/2
period.
Very effective and robust.
Once implemented, it can work for any clock period larger than original design
spec (independent of clock period).
If timings are not met, increasing clock period can eventually make the
system work!
No double sampling required!EE216A - Fall 2011 Misc. Topics | 33
c1
A
c2
tskew
D. Markovic / Slide 34
CDC: Asynchronous Clocks; EN Transfer
If clock synchronization is not possible, design a system/architecture that is
robust to 1-2 clock uncertainty in data transfer
Example:
Assume signal EN is passed from CLK1 to CLK2. The EN is supposed to be
used in CLK2 to start a counter. There will be two counters (one in CLK1
domain and the other in CLK2 domain) expected to be fully synchronized in
ideal case.
Use double-sampling to eliminate metastability
Design your system such that few clock cycles mismatch between the two
domains wouldnt cause malfunction in the overall operation.
EE216A - Fall 2011 Misc. Topics | 34
CLK1
EN
CLK2
tskew
8/3/2019 F11 Lec 12 Misc Topics
18/19
D. Markovic / Slide 35
CDC: Asynchronous Clocks; Data Transfer
For data transfer scenarios, the CDC scheme should guarantee
the following: Correct sampling of first data sample
No data value can be dropped or repeated
Using faster clock frequency in the destination domain can
generally help with data transfer.
A factor of 3 or 4 is generally sufficient
Example:
Use both EN and DATA to transfer DATA from CLK1 to CLK2
EE216A - Fall 2011 Misc. Topics | 35
D. Markovic / Slide 36
CDC: Asynchronous Clocks; Data Transfer
EE216A - Fall 2011 Misc. Topics | 36
DFF1
Clk1
EN
DFF2
Clk1
DATA
DFF1 DFF2
Clk2 Clk2
EN1 EN2
EN
D Q
Clk2
DATA2
StateMachine
8/3/2019 F11 Lec 12 Misc Topics
19/19
D. Markovic / Slide 37
CDC: Asynchronous Clocks; Data Transfer
EE216A - Fall 2011 Misc. Topics | 37
D1
clk1
DATA D2
clk2
EN
D1 D2
EN1
EN2
DATA2
Recommended