Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
2011 NVRAMOS
Operating System Supports
for SCM as Main Memory Systems
(Focusing on iBuddy)
2011. 4. 19
Jongmoo Choi
http://embedded.dankook.ac.kr/~choijm
NVRAMOS 11
Contents
Overview
Motivation
Observations
Proposal: iBuddy (inverse Buddy)
Performance Evaluation
Conclusion
NVRAMOS 11
Overview
Get sidetracked
삼천포삼천포삼천포삼천포진주진주진주진주
NVRAMOS 11
Motivation
SCM Introduction
� Both DRAM and Storage Characteristics
� Byte-addressable, Non-volatile
� PRAM, MRAM, FRAM, RRAM, …
� Technical Hurdles for using main memory
� Performance
� Endurance
NVRAM (or SCM)
(Source: M. Qureshi et al., “Scalable High Performance Main Memory System
Using Phase-Change Memory Technology”, ISCA,09)
NVRAMOS 11
Motivation
Previous research
� M. Qureshi, et al., "Scalable high performance main memory system using phase-change memory technology", ISCA’09. � Hybrid main memory, Caching, Delayed writes, Line-level writes
� P. Zhou et al., "A durable and energy efficient main memory using phase change memory technology“, ISCA’09.� Removing redundant bit-writes, Row shifting and segment swap
� B. Lee et al., "Architecturing phase change memory as a scalable dram alternative", ISCA’09. � Partial writes: track dirty data in CPU cache
� A. Wang et al., "Conquest: Better performance through a disk/persistent-ram hybrid file system“, USENIX’02.
� J. Condit et al., "Better i/o through byte-addressable, persistent memory", SOSP’09.
� A. Caulfield et al., "Moneta: A high-performance storage array architecture for next-generation, non-volatile memories", MICRO’10.
(Source: M. Qureshi’s ISCA’09 paper) (Source: P. Zhou’s ISCA’09 paper)
NVRAMOS 11
Motivation
Previous research
� Mainly based on hardware-level approach
Any feasible OS-level approach?
� Focusing on endurance issue
� Fair page frame allocation for wear-leveling
� Instincts
� Positive relation between allocation and write
� Burst writes can be mitigated by CPU cache
� Can obtain long term wear-leveling without keeping allocation
counts per each page frame
NVRAMOS 11
Observations
Page frame allocation and write distribution
� Test environments: Intel 8 cores, 32GB DRAM, 450GB*10 Disks
� OS: Linux 2.6.32
� Workload: Unixbench
10000
2000
4000
8000
6000
0x20000000 0x40000000 0x60000000 0x80000000
+ : 쓰기쓰기쓰기쓰기 연산연산연산연산 횟수횟수횟수횟수
X : 페이지페이지페이지페이지 할당할당할당할당 횟수횟수횟수횟수
Physical Memory Address
Count
NVRAMOS 11
Observations
Memory manager in Linux
� Lazy buddy system
� Re-allocate the recently freed page frames with higher probability
� Lazy layer deteriorates unfairness
� Group management makes it difficult to employ an allocation scheme
based on allocation-counts of each page frame
� Is it possible to manage each page frame individually for fair allocation?
24
23
22
21
20
0 0 0 0 0 0 0 0
0 0 0 0
0 0
0 0 0 1 0 0 0 0
1 0 0 0
0 0
1 0
0
1 2 3 4 5 6 7 8 9 1
0
1
1
1
2
1
3
1
4
1
5
0
1
2
3
4
0
8, 9
0, 1, 2, 3, 4, 5, 6, 7
11
Bitmap Free listBuddy
Group
Lazy list
10 13
Lazy Layer
BuddyLayer
NVRAMOS 11
Observations
Request types: Single vs. Multiple
� Same test environments
� Mainly single page frame requests
KernelCompile
Lmbench Stream Unixbench Dbench Tbench
Multiple page 106316 9442 25 3195113 351 366
Single page 85977206 89195105 1504996 213534514 38052300 112785688
0%
20%
40%
60%
80%
100%
Re
qu
est
ra
tio
NVRAMOS 11
Observations
Service layer
� Same test environments
� Large portion of requests are handled in the buddy layer
� Depend on workload characteristics (burstiness)
KernelCompile
Lmbench Stream Unixbench Dbench Tbench
the buddy layer 78192436 76376010 1504080 77406822 22464403 26593767
the lazy layer 7784770 12819095 916 136127692 15587897 86191921
0%
20%
40%
60%
80%
100%
Se
rvic
e r
atio
NVRAMOS 11
Observations
Response time
� Same test environments
� Significant Buddy layer overhead (for splitting and coalescing)
� Large response time variations
� Get sidetracked
NVRAMOS 11
Proposal: iBuddy
New buddy system
� Fair allocation (based on allocation counts)
� Overcome the unfairness problem of Lazy layer
� Individual page frame management
� Reducing the splitting and coalescing overheads
� In addition, efficient handling multiple page frames requests
� iBuddy: Inverse (or Individual) Buddy
NVRAMOS 11
Proposal: iBuddy
Structure
� Individual page frame management
� Dual meaning bitmap
� Splitting or coalescing occurs only for multiple page frames request
(laziest buddy)
NVRAMOS 11
Proposal: iBuddy
Allocation
� after handing two single page frame requests
NVRAMOS 11
Proposal: iBuddy
Free
� after handing a single page frame (10) free request
NVRAMOS 11
Proposal: iBuddy
Summary of iBuddy characteristics
Lazy Buddy System Lazy iBuddy system
When Coalescing happened Page is freed into buddy layer Multiple page allocation request
When Splitting happenedPage is allocated from buddy
layerMultiple page free request
Time
complexity
Single page O(logn) O(1)
Multiple pages O(logn) O(n)
Lock granularity on buddy layer Coarse-granularity Fine-granularity
The number of Pages Management Policy
on the lazy layerBulky Bypass
Performance improvement ratio (baseline : Lazy Buddy system)
- 32%
Standard deviation 1400 cycles 400 cycles
NVRAMOS 11
Performance Evaluation
Allocation/Free response time
� Test environments: Intel 8 cores, 32GB DRAM, 450GB*10 Disks
� OS: Linux 2.6.32
� Workload: Kernel compile, Lmbench, Stream, Unixbench,
Dbench, Tbench
NVRAMOS 11
Performance Evaluation
Performance Improvement Analysis
NVRAMOS 11
Performance Evaluation
Variation of Response time
NVRAMOS 11
Performance Evaluation
But, …
� Total execution time of benchmark
1 2 4 8 16 32
Number of Threads
Lazy Buddy 1 1 1 1 1 1
Lazy iBuddy 1.054610486 1.053342336 1.021406728 0.91680208 0.943805598 0.93977559
0
0.2
0.4
0.6
0.8
1
1.2
No
rma
lize
d E
xe
cu
tio
n T
ime
( b
ase
lin
e :
La
zy B
ud
dy S
yste
m)
Lmbench (Normalized Results)
Lazy Buddy
Lazy iBuddy
NVRAMOS 11
Performance Evaluation
Possible causes about performance degradation for
small thread cases
Order 5
Order 4
Order 3
Order 2
Order 1
Order 0
0 1 2...
15
Order 5
Order 4
Order 3
Order 2
Order 1
Order 0
8
0
4 12
10 14 2 6
11 9 3 13 1 15 5 7
Original Buddy System
ibuddy Buddy System
0 1 2 ... 15
8 0 4 12 10 …5 7
Allocation Sequence
Allocation Sequence
Memory Access Sequence
Address Mapping
Transaction Queue Scheduler
FR-FCFS
Data Bus
Com
mand B
us
Buffer1 Buffer2 Buffer3 Buffer4
Page 0
Page 4
Page 8
Page 12
Page 1
Page 5
Page 9
Page 13
Page 2
Page 6
Page 10
Page 14
Page 3
Page 7
Page 11
Page 15
Row
Access
Column
Access
Bank1 Bank2 Bank3 Bank4
Memory
Controller
Row Buffer
DRAM page
NVRAMOS 11
Conclusion
New buddy system: iBuddy
� Inverse thinking
� Managing page frames individually
� Splitting and coalescing occurs on multiple page frames request
� But, the original lazy buddy has its own strong points
� CPU cache, multibank
� Can keep large consecutive page frames
� Issues
� Multicore/Multibank
• Multibank parallelism
• Multicore issues (lock issues in the buddy system)
• NUMA issues
� Fair-allocation for SCM
• RB-tree
• Performance degradation issues