Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Hasan HassanMinesh Patel Jeremie S. Kim A. Giray Yaglikci Nandita Vijaykumar
Nika Mansouri Ghiasi Saugata Ghose Onur Mutlu
CROWA Low-Cost Substrate for Improving
DRAM Performance, Energy Efficiency, and Reliability
Presented at ISCA 2019
2
SummaryChallenges of DRAM scaling:
• High access latency → bottleneck for improving system performance/energy
• Refresh overhead → reduces performance and consume high energy
• Exposure to vulnerabilities (e.g., RowHammer)
Copy-Row DRAM (CROW)• Introduces copy rows into a subarray
• The benefits of a copy row:
• Efficiently duplicating data from regular row to a copy row
• Quick access to a duplicated row
• Remapping a regular row to a copy row
CROW is a flexible substrate with many use cases:• CROW-cache & CROW-ref
• Mitigating RowHammer
• We hope CROW enables many other use cases going forward
(20% speedup and consumes 22% less DRAM energy)
regular rows
SA SA SA SA SASA
CR
OW
d
eco
der copy rows
reg
ula
r ro
w
de
cod
er
Source code available in July:github.com/CMU-SAFARI/CROW
3
Outline1. DRAM Operation Basics
2. The CROW Substrate
3. Evaluation
4. Conclusion
CROW-cache: Reducing DRAM Latency
CROW-ref: Reducing DRAM Refresh
Mitigating RowHammer
4
DRAM Organization
CPU
Memory Controller
DRAM Row
Sense Amplifier
Memory Bus
DRAM Cell
DRAM Subarray
5
Accessing DRAM
DRAM Row
Sense Amplifier
DRAM Cell
DRAM Subarray
Activate
Precharge
Read
6
Outline1. DRAM Operation Basics
2. The CROW Substrate
3. Evaluation
4. Conclusion
CROW-cache: Reducing DRAM Latency
CROW-ref: Reducing DRAM Refresh
Mitigating RowHammer
7
Challenges of DRAM Scaling
DRAM
access latency
refresh overhead
exposure to vulnerabilities
1
2
3
8
Our Goal
We want a substrate that enables the duplication and remapping of data within a subarray
9
Memory Controller
The Components of CROW
CROW-table
DRAM Subarray
DRAMregular rows
SA SA SA SA SASA
CR
OW
d
eco
der copy rows
reg
ula
r ro
w
de
cod
er
SA SA SA SA SASA
row
dec
od
er
10
DRAMregular
rows
SA SA SA SA SASA
CR
OW
d
eco
der copy rows
reg
ula
r ro
w
de
cod
er
Memory Controller
CROW Operation 1: Row Copy
DRAM Subarray
ACT-c (copy)S
A
SA SA
SA SA
SA
11
Row Copy: Steps
SenseAmplifier
source row:
destination row:
1 Activation of the source row
2 Charge sharing
3 Beginning of restoration
4 Activation of the destination row
5Restoration of both rows to source data
12
Row Copy: Steps
SenseAmplifier
source row:
destination row:
1 Activation of the source row
2 Charge sharing
3 Beginning of restoration
4 Activation of the destination row
5Restoration of both rows to source data
Enables quickly copying a regular row into a copy row
13
Memory Controller
CROW Operation 2: Two-Row Activation
DRAM Subarray
DRAM
ACT-t (two row)
regular rows
SA SA SA SA SASA
CR
OW
d
eco
der copy rows
reg
ula
r ro
w
de
cod
er
SA
SA
SA
SA
SA
SA
14
Two-Row Activation: Steps
SenseAmplifier
1 Activation of two rows
2 Charge sharing
3 Restoration
fast
both charged or discharged
15
Two-Row Activation: Steps
SenseAmplifier
1 Activation of two rows
2 Charge sharing
3 Restoration
fast
Enables fast access to data that is duplicated across a regular row and a copy row
both charged or discharged
16
Outline1. DRAM Operation Basics
2. The CROW Substrate
3. Evaluation
4. Conclusion
CROW-cache: Reducing DRAM Latency
CROW-ref: Reducing DRAM Refresh
Mitigating RowHammer
17
CROW-cacheProblem: High access latency
Key idea: Use copy rows to enable low-latency access to most-recently-activated regular rows in a subarray
CROW-cache combines:• row copy → copy a newly activated regular row into a copy row• two-row activation → activate the regular row and copy row
together on the next access
Reduces activation latency by 38%
18
Memory Controller
CROW-cache Operationload row X
DRAM
ACT-c
ACT-t
load row X
RequestQueue
1 CROW-table miss
2 Allocate a copy row
3 Issue ACT-c (copy)
1 CROW-table hit
2 Issue ACT-t (two row)
DRAM Subarray
regular rows
SA SA SA SA SASA
CR
OW
d
eco
der copy rows
reg
ula
r ro
w
de
cod
er
SA
SA SA
SA SA
SA
SA
SA
SA
SA
SA
SA
CROW-table
copy row 0 row X
[bank conflict]
19
Memory Controller
CROW-cache Operation
DRAM
ACT-t
1 CROW-table miss
2 Allocate a copy row
3 Issue ACT-c
1 CROW-table hit
2 Issue ACT-t
DRAM Subarray
regular rows
SA SA SA SA SASA
CR
OW
d
eco
der copy rows
reg
ula
r ro
w
de
cod
er
SA
SA SA
SA SA
SA
SA
SA
SA
SA
SA
SA
CROW-table
copy row 0 row XSecond activation of row X is faster
load row X
load row X
RequestQueue
[bank conflict]
20
Outline1. DRAM Operation Basics
2. The CROW Substrate
3. Evaluation
4. Conclusion
CROW-cache: Reducing DRAM Latency
CROW-ref: Reducing DRAM Refresh
Mitigating RowHammer
21
CROW-refProblem: Refresh has high overheads. Weak rows lead to high refresh rate
• weak row: at least one of the row’s cells cannot retain data correctly when refresh rate is decreased
Key idea: Safely reduce refresh rate by remapping a weak regular row to a strong copy row
CROW-ref uses:• row copy → copy a weak regular row to a strong copy row
CROW-ref eliminates more than half of the refresh requests
22
CROW-ref Operation
SA SA SA SA SASA SA
SA SA
SA SA
SA
weak
strongRetention Time Profiler
strongstrong
strongstrong
1Perform retention time profiling
3 On ACT, check the CROW-table
4If remapped, activate a copy row
2Remap weak rows to strong copy rows
23
CROW-ref Operation
SA SA SA SA SASA SA
SA SA
SA SA
SA
weak
strongRetention Time Profiler
strongstrong
strongstrong
1Perform retention time profiling
3 On ACT, check the CROW-table
4If remapped, activate a copy row
How many weak rows exist in a DRAM chip?
2Remap weak rows to strong copy rows
24
Identifying Weak RowsWeak cells are rare [Liu+, ISCA’13]
weak cell: retention < 256ms
~1000/238 (32 GiB) failing cells
DRAM Retention Time Profiler• REAPER [Patel+, ISCA’17]
PARBOR [Khan+, DSN’16]AVATAR [Qureshi+, DSN’15]
• At system boot or during runtime
3.30E-043.30E-11
0%
20%
40%
60%
80%
100%
1 2 4 8
Pro
ba
bil
ity
Weak rows in a subarray
25
Identifying Weak RowsWeak cells are rare [Liu+, ISCA’13]
weak cell: retention < 256ms
~1000/238 (32 GiB) failing cells
DRAM Retention Time Profiler• REAPER [Patel+, ISCA’17]
PARBOR [Khan+, DSN’16]AVATAR [Qureshi+, DSN’15]
• At system boot or during runtime
3.30E-043.30E-11
0%
20%
40%
60%
80%
100%
1 2 4 8
Pro
ba
bil
ity
Weak rows in a subarrayA few copy rows are sufficient to halve the refresh rate
26
Outline1. DRAM Operation Basics
2. The CROW Substrate
3. Evaluation
4. Conclusion
CROW-cache: Reducing DRAM Latency
CROW-ref: Reducing DRAM Refresh
Mitigating RowHammer
27
activate
Mitigating RowHammer
SA SA SA SA SASA
victimaggressor
victim
precharge
Key idea: remap victim rows to copy rows
28
Outline1. DRAM Operation Basics
2. The CROW Substrate
3. Evaluation
4. Conclusion
CROW-cache: Reducing DRAM Latency
CROW-ref: Reducing DRAM Refresh
Mitigating RowHammer
29
Methodology• Simulator
• DRAM Simulator (Ramulator [Kim+, CAL’15])https://github.com/CMU-SAFARI/ramulator
• Workloads• 44 single-core workloads
• SPEC CPU2006, TPC, STREAM, MediaBench• 160 multi-programmed four-core workloads
• By randomly choosing from single-core workloads• Execute at least 200 million representative instructions per core
• System Parameters• 1/4 core system with 8 MiB LLC• LPDDR4 main memory• 8 copy rows per 512-row subarray
Source code available in July:github.com/CMU-SAFARI/CROW
30
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1.00
single-core four-core
No
rmal
ized
D
RA
M E
ner
gy
1.00
1.02
1.04
1.06
1.08
1.10
single-core four-core
Spee
du
pCROW-cache Results
7.1%7.5%8.2% 6.9%
* with 8 copy rows and a 64Gb DRAM chip (sensitivity in paper)
31
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1.00
single-core four-core
No
rmal
ized
D
RA
M E
ner
gy
1.00
1.02
1.04
1.06
1.08
1.10
single-core four-core
Spee
du
pCROW-cache Results
7.1%7.5%8.2% 6.9%
* with 8 copy rows a 64Gb DRAM chip (sensitivity in paper)
CROW-cache improves single-/four-core performance and energy
32
0.70
0.75
0.80
0.85
0.90
0.95
1.00
single-core four-core
No
rmal
ized
D
RA
M E
ner
gy
1.041.051.061.071.081.09
1.11.111.121.13
single-core four-core
Spee
du
pCROW-ref Results
17.2%
7.8%
7.1%
11.9%
* with 8 copy rows and a 64Gb DRAM chip (sensitivity in paper)
33
0.70
0.75
0.80
0.85
0.90
0.95
1.00
single-core four-core
No
rmal
ized
D
RA
M E
ner
gy
1.041.051.061.071.081.09
1.11.111.121.13
single-core four-core
Spee
du
pCROW-ref Results
17.2%
7.8%
7.1%
11.9%
* with 8 copy rows a 64Gb DRAM chip (sensitivity in paper)
CROW-ref significantly reduces the performance and energy overhead of DRAM refresh
34
Combining CROW-cache and CROW-ref
0.70
0.80
0.90
1.00
1.10
1.20
1.30
single-core four-core
Spee
du
p
CROW-(cache+ref)
Ideal CROW-cache + no refresh
0.70
0.72
0.74
0.76
0.78
0.80
single-core four-core
No
rmal
ized
D
RA
M E
ner
gy
CROW-(cache+ref)
Ideal CROW-cache + no refresh
17% 20% 23% 22%
35
Combining CROW-cache and CROW-ref
0.70
0.80
0.90
1.00
1.10
1.20
1.30
single-core four-core
Spee
du
p
CROW-(cache+ref)
Ideal CROW-cache + no refresh
0.70
0.72
0.74
0.76
0.78
0.80
single-core four-core
No
rmal
ized
D
RA
M E
ner
gy
CROW-(cache+ref)
Ideal CROW-cache + no refresh
17% 20% 23% 22%
CROW-(cache+ref) provides more performance and DRAM energy benefits than each mechanism alone
36
Hardware Overhead
For 8 copy rows and 16 GiB DRAM:•0.5% DRAM chip area•1.6% DRAM capacity•11.3 KiB memory controller storage
CROW is a low-cost substrate
37
Other Results in the Paper• Performance and energy sensitivity to:
• Number of copy-rows per subarray
• DRAM chip density
• Last-level cache capacity
• CROW-cache with prefetching
• CROW-cache compared to other in-DRAM caching mechanisms:• TL-DRAM [Lee+, HPCA’13]
• SALP [Kim+, ISCA’12]
38
Outline1. DRAM Operation Basics
2. The CROW Substrate
3. Evaluation
4. Conclusion
CROW-cache: Reducing DRAM Latency
CROW-ref: Reducing DRAM Refresh
Mitigating RowHammer
39
ConclusionChallenges of DRAM scaling:
• High access latency → bottleneck for improving system performance/energy
• Refresh overhead → reduces performance and consume high energy
• Exposure to vulnerabilities (e.g., RowHammer)
Copy-Row DRAM (CROW)• Introduces copy rows into a subarray
• The benefits of a copy row:
• Efficiently duplicating data from regular row to a copy row
• Quick access to a duplicated row
• Remapping a regular row to a copy row
CROW is a flexible substrate with many use cases:• CROW-cache & CROW-ref
• Mitigating RowHammer
• We hope CROW enables many other use cases going forward
(20% speedup and consumes 22% less DRAM energy)
regular rows
SA SA SA SA SASA
CR
OW
d
eco
der copy rows
reg
ula
r ro
w
de
cod
er
Source code available in July:github.com/CMU-SAFARI/CROW
Hasan HassanMinesh Patel Jeremie S. Kim A. Giray Yaglikci Nandita Vijaykumar
Nika Mansouri Ghiasi Saugata Ghose Onur Mutlu
CROWA Low-Cost Substrate for Improving
DRAM Performance, Energy Efficiency, and Reliability
Backup Slides
42
Latency Reduction with MRA
43
activate
Mitigating RowHammer
SA SA SA SA SASA
victimaggressor
victim
precharge
Key idea: remap victim rows to copy rows
44
0.900.951.001.051.101.151.20
lesl
ie3
d
tpch
2
zeu
s
lbm
mcf
stre
am-c
p
lib
q
h2
64
-dec
AV
ER
AG
E(1
-co
re)
HH
HH
Spee
du
pCROW-1 CROW-8
CROW-64 CROW-128
Ideal CROW-cache (100% Hit Rate)
CROW-cache Performance
6.6%
0.7%7.1%
single-core four-core
7.5%
45
0.90
0.95
1.00
1.05
1.10
1.15
1.20
Spee
du
p8 Gbit 16 Gbit 32 Gbit 64 Gbit
CROW-ref Performance
7.1%
single-core
11.9%
four-coresingle-core four-core
46
0.70
0.75
0.80
0.85
0.90
0.95
1.00
No
rmal
ized
D
RA
M E
ner
gy8 Gbit 16 Gbit 32 Gbit 64 Gbit
CROW-ref Energy Savings
17.2%
single-core four-core
7.8%
47
Speedup - CROW-cache
Single-core
48
Speedup - CROW-cache
Four-core
49
Energy – CROW-cache
50
Comparison to TL-DRAM and SALP
CROW-1
CROW-8
TL-DRAM-1
TL-DRAM-8
SALP-512
SALP-256
SALP-128
SALP-512-O
SALP-256-O
SALP-128-O
0.60.81.01.21.41.61.82.02.2
4% 6% 8% 10% 12% 14% 16%
No
rmal
ized
D
RA
M E
ner
gy
Speedup
CROW-1CROW-8
TL-DRAM-1
TL-DRAM-8
SALP-256
SALP-128
SALP-256-O
SALP-128-O
0%
5%
10%
15%
20%
25%
30%
4% 6% 8% 10% 12% 14% 16%
Ch
ip A
rea
Ove
rhea
d
Speedup
51
Slide on RLTL
52
Speedup – CROW-ref
53
Energy – CROW-ref
54
CROW-cache + ref
55
CROW-table Organization
56
tRCD vs tRAS
57
MRA Area Overhead
58
Data 0
Data 1
Cell
time
cha
rge
Sense-Amplifier
Sensing Restore
Cell
SenseAmplifier
Precharge
R/WACT PRE
Ready to AccessCharge Level
tRCD
tRAS
Ready to AccessReady to
Precharge
DRAM Charge over Time