Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
SEESAW:Set Enhanced Superpage
Aware cachingMayank Parasar∑, Abhishek
BhattacharjeeΩ, Tushar Krishna∑
∑School of Electrical and Computer EngineeringGeorgia Institute of Technology
ΩDepartment of Computer Science Rutgers University
Set
Associativityhttp://synergy.ece.gatech.edu/
Outline¡Motivation
¡SEESAW: Concept
¡SEESAW: Micro-architecture
¡Evaluation Methodology
¡Results
¡Conclusion
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
2
6/26/18
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
3
6/26/18
L1 Cache Characteristics
Fast lookup
High hit-rate
EnergyEfficiency
Virtually Indexed Physically Tagged [VIPT] Cache
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
4
6/26/18
TLB
PPN Page Offset
tag Data blockv
VPN Page Offset
Setindex
block offset
Cache HIT/MISS
VA
PA
Way-1Way-2Way-3Way-4
Way-1Way-2Way-3Way-4
set-1
set-NWay-1Way-2Way-3Way-4
Way-1Way-2Way-3Way-4
=
Virtually Indexed Physically Tagged [VIPT] Cache
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
5
6/26/18
TLB
PPN Page Offset
tag Data blockv
VPN Page Offset
Setindex
block offset
Cache HIT/MISS
VA
PA
Way-1Way-2Way-3Way-4
Way-1Way-2Way-3Way-4
set-1
set-NWay-1Way-2Way-3Way-4
Way-1Way-2Way-3Way-4
=
VIPT Caches necessitate:
(set-index + block-offset) <= Page-offset
Impact of Associativity on Access Latency and Energy of cache
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
6
6/26/18
Cache Access Latency Cache Access Energy
Effect of associativity on MPKI of cache
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
7
6/26/18
High Associativity hurts latency and energy without commensurately improving hit rate
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
8
6/26/18
Revisiting L1 Cache Characteristics for VIPT Cache
Fast lookup
High hit-rate
EnergyEfficiency
Virtual
memory!
Virtual
memory!
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
9
6/26/18
Opportunity: SuperpageIs it possible to relax constrains of
Traditional VIPT cache? Yes
How ?
4-KB2-MB
1-GB
More page-offset bitsfor superpage!
HW and OS Support for Superpagesin modern processors
Baseline Page
Super Page
Offset-bits:12
Offset-bits:21
Offset-bits:30
Prevalence of superpages in modern OSes under memory fragmentation
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
10
6/26/18
Ran on 32-core; Sandybridge; 32 GB RAMMemhog causes memory fragmentation; higher
%age indicates higher fragmentation
Outline¡Motivation
¡SEESAW: Concept
¡SEESAW: Micro-architecture
¡Evaluation Methodology
¡Results
¡Conclusion
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
11
6/26/18
SEESAW: Concept
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
12
6/26/18
Less-setsMore-associativity
More-setsLess-associativity
super-page
Base-page
tag Data blockv
Way-1 Way-1
Way-1
Way-1
Way-1
Way-1Way-2 Way-2
Way-2
Way-2
Way-2
Way-2
Way-3 Way-3
Way-3Way-3
Way-3 Way-3
Set:1
Set:2
Set:3
tag Data blockv
Way-1 Way-1
Way-1
Way-1
Way-1
Way-1Way-1 Way-1
Way-1
Way-1
Way-1
Way-1
Way-1 Way-1
Way-1Way-1
Way-1 Way-1
Set:1
Set:3Set:2
Set:4Set:5Set:6
Set:7Set:8Set:9
Faster
Energy-Efficient
Outline¡Motivation
¡SEESAW: Concept
¡SEESAW: Micro-architecture
¡Evaluation Methodology
¡Results
¡Conclusion
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
13
6/26/18
SEESAW: Micro-architecture
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
14
6/26/18
VPNSet
indexblock offset
Cache
VA
TLB
PPN Basepage OffsetPA
set-N
set-1
tag Data blockvWay-3Way-4
Way-3Way-4
Way-3Way-4
Way-3Way-4
Basepage Offset
tag Data blockvWay-1Way-2
Way-1
Way-2
Way-1Way-2
Way-1Way-2
set-1
set-N
Partition bit
Translation Filter Table
(TFT)
Partitiondecoder
Predicts whether page is superpage
Partition-0 Partition-1
Superpage offsetDecodes
partition index from partition bit
SEESAW: Micro-architecture
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
15
6/26/18
VPNSet
indexblock offset
VA
TLB
PPN Basepage OffsetPA
set-N
set-1
tag Data blockvWay-3Way-4
Way-3Way-4
Way-3Way-4
Way-3Way-4
Basepage Offset
tag Data blockvWay-1Way-2
Way-1
Way-2
Way-1Way-2
Way-1Way-2
set-1
set-N
Partition bit
Translation Filter Table
(TFT)
Partitiondecoder
Partition-0 Partition-1Cache
Superpage offset
SEESAW: Superpage access
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
16
6/26/18
VPNSet
indexblock offset
VA
TLB
PPN Basepage OffsetPA
set-N
set-1
tag Data blockvWay-3Way-4
Way-3Way-4
Way-3Way-4
Way-3Way-4
Basepage Offset
tag Data blockvWay-1Way-2
Way-1
Way-2
Way-1Way-2
Way-1Way-2
set-1
set-N
Partition bit
Translation Filter Table
(TFT)
Partitiondecoder
Partition-0 Partition-1Cache
Super Page
Superpage offset
= HIT/MISS
SEESAW: Basepage access
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
17
6/26/18
VPNSet
indexblock offset
VA
TLB
PPN Basepage OffsetPA
set-N
set-1
tag Data blockvWay-3Way-4
Way-3Way-4
Way-3Way-4
Way-3Way-4
Basepage Offset
tag Data blockvWay-1Way-2
Way-1
Way-2
Way-1Way-2
Way-1Way-2
set-1
set-N
Partition index
Translation Filter Table
(TFT)
Partitiondecoder
Partition-0 Partition-1Cache
Not a Super Page
= HIT/MISS
SEESAW: TFT and Partition Decoder
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
18
6/26/18
Superpage?Tag: VA[63:21]
Partitiondecoder
Translation Filter Table
(TFT)
Translation Filter TableØ TFT Lookup
Ø Direct mappedØ False negative due to size
Ø TFT UpdateØ VA mispredictionØ 2MB L1-TLB fillØ 2MB L1-TLB Invalidation
Partition DecoderØ For 32kB CacheØ For 64kB Cache
SEESAW: Cache line insertion policy
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
19
6/26/18
VPNSet
indexblock offset
VA
TLB
PPN Baseline Page OffsetPA
set-N
set-1
tag Data blockvWay-3Way-4
Way-3Way-4
Way-3Way-4
Way-3Way-4
Baseline Page Offset
tag Data blockvWay-1Way-2
Way-1
Way-2
Way-1Way-2
Way-1Way-2
set-1
set-N
Partition bit
Translation Filter Table
(TFT)
Partitiondecoder
Partition-0 Partition-1Cache
Which partition should cache-
line be inserted?
SEESAW: Cache line insertion policy¡4way-8way¡Superpage miss: victim within the partition¡Basepage miss: victim within the set
¡4way¡Uses LRU within the associated partition¡Avoid installing the same line twice¡Saves energy
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
20
6/26/18
SEESAW: System Level Optimization¡Cache coherence ¡Cache coherence lookups use physical address ¡Snoopy provide higher energy benefits over Directory based
coherence
¡Page table modifications¡Superpage splintered into multiple basepages¡Multiple basepages promoted to superpages
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
21
6/26/18
Outline¡Motivation
¡SEESAW: Concept
¡SEESAW: Micro-architecture
¡Evaluation Methodology
¡Results
¡Conclusion
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
22
6/26/18
SEESAW: Simulated system
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
23
6/26/18
SEESAW: Workloads
¡Spec¡Parsec¡Cloudsuite¡Tunkrank
¡Biobench¡Mummer¡Tiger
¡MongoDB
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
24
6/26/18
¡Server Workload¡graph500¡Nutch Hadoop
¡Social-event web service¡Olia
¡Key value store¡Redis
Outline¡Motivation
¡SEESAW: Concept
¡SEESAW: Micro-architecture
¡Evaluation Methodology
¡Results
¡Conclusion
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
25
6/26/18
SEESAW: Performance improvement
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
26
6/26/18
SEESAW observes 3-10% better runtime over baseline
SEESAW: Performance improvement
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
27
6/26/18
Out-of-orderCPU
in-orderCPU
~10% performance improvement for 64kB cache in OoO CPUs
SEESAW: Energy savings
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
28
6/26/18
10-20% more energy savings over CPUs using baseline VIPT caches!
Approx. one-third of energy savings from coherence
SEESAW: TFT analysis and Way-Prediction
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
29
6/26/18
TFT Analysis SEESAW + Way-prediction
16-entry TFT drives miss-rate under 10%SEESAW+WP shows symbiotic behavior
Outline¡Motivation
¡SEESAW: Concept
¡SEESAW: Micro-architecture
¡Evaluation Methodology
¡Results
¡Conclusion
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
30
6/26/18
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
31
6/26/18
Revisiting L1 Cache Characteristic
Fast lookup
High hit-rate
EnergyEfficiency
SEESAW: Conclusion
Mayank Parasar, School of Electrical and Computer Engineering, Georgia Tech
32
6/26/18
SetAssociativity
¡ L1 caches are optimized for latency¡ VIPT imposes indirect restriction on number of
sets in a L1 cache, increasing associativity¡ There is non-linear relation between associativity
and access latency/energy of the L1 cache
¡ Superpages are often used in modern OSes¡ SEESAW provides low-associative access to
superpages, providing both latency and energy benefits
¡ Up to 10 % performance improvement and 20 % energy reduction in modern workloads
¡ SEESAW has extremely low-overhead and is readily implementable