Upload
others
View
22
Download
0
Embed Size (px)
Citation preview
SSD RAID as Cache (SRC)with Log-structured Approachfor Performance and Reliability
Yongseok Oh ([email protected])PhD Candidate
University of Seoul
서 울 시 립 대University of Seoul
IBM Watson Research Center
• Combine emerging SSDs and conventional HDDs§ Provide SSD-like performance at HDD-like price
• Our focus: issues managing SSD cache§ SSD cache solution is key to performance
Hybrid Storage Systems
2
SSD Cache Solution (our focus)
I/O Request
Hybrid Storage System
Hot Data Cold Data
SSD as Cache HDD as Primary Storage
IBM Watson Research Center 3
How do we guarantee SSD cache reliability?
SSD RAID as Cache (SRC) [INFLOW’13/ACM OSR’14]
IBM Watson Research Center 4
Ongoing Project and Its Status
Oct 27th(Now)
INFLOW’13 ACM OSR’14Start
April 13th
USENIX ATC’15?
Nov 3rd Jan 14th
DM-SRC (real prototype)
Dec 13th
DiskSim (simulator)
Jan 15th
Official Release
Share experiences and results
Paper:
Source:
Best paper
IBM Watson Research Center 5
Unreliable Write-back Policy with Single SSD
SSD Cache S/W Layer
Flash Cache
File System (e.g., Ext4)
Network Storage Driver
NAS / SAN
I/O Request
SATA Driver
Bcache DM-Cache
Unreliable Write-back
• High error rate• Worn out• FTL bugs• Destruction
Data loss happensOS, DB, JPG, AVI
SSD as Cache
IBM Watson Research Center
• Write-back policy: minimum dirty ratio§ FlashCache: 20% (default)§ Bcache: 10% (default)
• Write-through policy§ Writes are synchronously sent to both SSDs and HDDs
• Degraded performance is inevitable
6
Alternatives for Reliability of SSD Caches
Write-Back Write-Through
Performance
Reliability
Our goal: Improve Write-Back Reliability
Write-Back(Dirty threshold)
Option
IBM Watson Research Center
• SSD RAID as Cache: Advantages§ High performance§ High reliability§ On-the-fly SSD replacement§ Flexible capacity management
Our Approach: SSD RAID as Cache
7
SSD SSD SSD SSD SSD RAID
as Cache
I/O Request
Primary Storage
SSD
I/O Request
Primary Storage
Single SSD
Cache
a) Typical SSD Cache b) Our Idea
Failure Recovery via Parity
IBM Watson Research Center
• We investigate problems of FlashCache with RAID5§ As a fast prototype
• We propose SSD RAID as Cache (SRC)§ To the best of our knowledge,
this is the first study to use SSD RAID as cache§ We implement SRC in Linux
• We evaluate SRC with other solutions
8
Our Contributions
IBM Watson Research Center
• SSD cache solution for Linux§ Developed by§ Mohan Srinivasan§ Released in 2011§ Popular solution
• Features§ Write-back, Write-through§ FIFO, LRU policies
• Unsupported features§ Multiple SSDs (made for single SSD)
§ Erasure coding
9
FlashCache
SSD Cache S/W Layer
FlashCache
File System (e.g., Ext4)
Network Storage Driver
NAS / SAN
I/O Request
Primary Storage
SSD as Cache
Source: http://cdn.oreillystatic.com/en/assets/1/event/45/Flashcache%20Presentation.pdf
RAID solution (required)
IBM Watson Research Center
SSD as Cache
10
Fast Prototype: FlashCache with SSD-based RAID
SSD Cache S/W Layer
FlashCache
File System (e.g., Ext4)
Network Storage Driver
NAS / SAN
I/O Request
Linux RAID5 (MD)
1TB 1TB 1TB 1TB
SSD Device
3TB (data) + 1TB (parity)
RAID-5 Level (Reliable)
3TB
SSD Cache Level (Reliable)
Primary Storage
SSD RAID as Cache
IBM Watson Research Center 11
Fast Prototype: FlashCache with SSD-based RAID
SSD Cache S/W Layer
FlashCache
File System (e.g., Ext4)
Network Storage Driver
NAS / SAN
I/O Request
Linux RAID5 (MD)
Primary Storage
SSD RAID as Cache
Not optimized for random writes
Small write problem(read-modify-write)
IBM Watson Research Center
• Hash-based mapping scheme
12
FlashCache: Not Optimized for Random Write
0 1 2 3 4 5 6 7
<FlashCache Layout>
1
0
2
3
5
4
6
7
Cached Data BlocksMetadata Blocks
10 hdd_blk
cache_blk = hdd_blk % no_of_cache_blks
2 = 10 % 8
SSDCache
Single update requires two write I/O requests for data and metadata
IBM Watson Research Center
Small Write Problem of RAID-5
D0SSD
D1SSD
D2SSD
PSSD
⊕
FlashCache writes D0’
RAID-5Layer
ParityCachedblock
13
Primary Storage
Parity GeneratorSmall write request incurs 2 Reads + 2 Writes
IBM Watson Research Center 14
Putting It All Together
Data
Parity Metadata
File System
FlashCache
RAID5
Amplified I/O requests of FlashCache with RAID5
Write Read
Single write incurs 4 reads and 4 writes (8 I/Os)Performance and lifetime degrade
SSD RAID as Cache
Metadata Data Doubled
Parity Data
Quadrupled
IBM Watson Research Center
• Motivation
• Fast Prototype: FlashCache with RAID5
• SSD RAID-based Cache (SRC)
• Conclusion
Outline
15
IBM Watson Research Center
• Use multiple SSDs • LFS layout• Erasure coding
§ E.g., RAID-4, -5, -6§ RAID-4 (Default)
• Selective GC• Separated striping
16
SSD RAID as Cache (SRC)
SSD Cache S/W Layer
SSD RAID Cache (SRC)
File System (e.g., Ext4)
SATA Driver Network Storage Driver
NAS / SAN
Key Features
I/O Request
• Software solution for Linux• Based on DM-WriteBoost
(Forked in Dec 2013)• 26,516 lines - -• 34,855 lines ++• SRC will be released this year
Implementation
in the paper
Focus
IBM Watson Research Center
SSD Layout of SRC
SSD Cache 0
SSD Cache 1
SSD Cache 2
SSD Cache 3
Stripe 0
Stripe 1
… Stripe N-‐1
Log-
stru
ctur
ed W
rite
17
File System
Primary Storage
SRC Layer
R
R
R
R
R
R
b) Read Stripe without Parity (Same contents are already in HDDs)
Read data (clean)
R
M
W
W
W
W
W
M
P
P
a) Write Stripe with Parity
Write data (dirty) Parity
Metadata
IBM Watson Research Center
• Goal: minimize parity and metadata update overhead
Log-structured Approach
0 1 2 P Stripe0 3 4 M p
Incoming writes: 0’, 4’, 8’, 11’ 17’Lo
g-st
ruct
ured
writ
e
18
…
Free stripe 11’ 17’ M P
0’ 4’ 8’ P
…
Write modified data withmetadata and paritysequentially
11 23 34 P Stripe0 7 89 M p
IBM Watson Research Center
• Ubuntu 13.10 server (Kernel 3.11.7)
• Intel Xeon CPU (E5-2640)
• Samsung 840 Pro SSD x 8 (Cache device)
• 1TB iSCSI storage (Backing storage)
• Configurations§ FlashCache with Linux RAID0 (no parity)§ FlashCache with Linux RAID5 (parity)§ DM-SRC (a version of Linux implementation)
• FIO benchmark tool§ 4KB random write (direct, aio options)§ 8 threads, 128 I/O depth, total 32GB requests are issued
19
Evaluation Setup
IBM Watson Research Center
0
500
1000
1500
2000
2500
4 5 6 7 8
Thro
ughp
ut (M
B/s
)
# of SSDs
• DM-SRC shows best performance§ Aggregate modified data with metadata and parity
20
Improved Performance of SRC
25X
2XDM-SRC
FlashCache w RAID0(no data protection)
FlashCache w RAID5 due to small write problem
Scalable
FIO benchmark (4KB Write) DM-SRC: a version of SRC for Linux
IBM Watson Research Center 21
Enhanced Lifetime of SRC
0
50000
100000
150000
200000
250000
300000
350000
4 5 6 7 8
Eras
e C
ount
# of SSDs
Increased lifetime(Halved erase count)
DM-SRC
FlashCache w RAID0
FlashCache w RAID5
(Erase count measured using DiskSim)
• Traces are extracted using a blktrace tool in Linux § Erase count is measured by replaying each trace in DiskSim
IBM Watson Research Center 22
Trace Replay with Real Workload Traces
Workload Avg Req Size (KB)
Request Amount (GB) Read
Ratio Read Write Read Write
Fin (Financial) 5.73 7.2 6.76 28.16 0.19 HM (Hardware
Monitoring) 8.128 9.306 10.98 22.85 0.32
Prn (Printer) 23.011 11.272 13.22 53.56 0.20 Prxy (Proxy) 8.412 7.031 3.077 81.36 0.03
<Characteristics of I/O workloads>
MSRCambridge
UMass
*Umass traces available at http://traces.cs.umass.edu *MSR Cambridge traces available at http://iotta.snia.org/tracetypes/3
• Trace Replayer: linux aio, direct io, 4 threads, 32 qdepth§ https://bitbucket.org/yongseokoh/trace-replay
• Write intensive traces§ Small I/O requests, random workloads
IBM Watson Research Center
0
200
400
600
800
1000
1200
Fin HM Prn Prxy
Ban
dwid
th (M
B/s
)
FlashCache w RAID5 DM-SRC
• Using four SSDs (Samsung 840 Pro 128GB)
23
Evaluation Using Real Traces
3.2X4.3X
8.1X
8.7X
Improved Performance
3.1X 1.8X
4.1X
1.1X
0
200000
400000
600000
800000
1000000
1200000
Fin HM Prn Prxy Er
ase
Cou
nt
FlashCache w RAID5 DM-SRC Reduced Erase Count
IBM Watson Research Center
• SSD upgrading§ # of SSDs increases from 4 to 5§ Performance and capacity increase
• FlashCache with RAID5: Data Re-distribution § Took about 20 minutes§ Nearly all data need to be re-distributed
due to round robin manner
• DM-SRC: No Data Re-distribution§ Took about a few seconds
due to on-SSD super block and memory metadata modifications§ Write requests are NATURALLY re-distributed across SSDs
due to LFS approach instead of compulsory data migration
24
Additional Advantage: Quick Upgrading
New
IBM Watson Research Center 25
Traditional RAID5 Upgrading: Heavy Data Migration
2 P 1 0
New
P 5 4 3
7 8 P 6
10 11 9 P …
3
6 P 5 4
P 10 9 8
13 14 P 12
P
7
11
15
• Goal: sequential I/O optimization § Nearly all data need to be migrated§ All parities need to be re-generated
• Hurt performance and lifetime of SSDs
Round Robin Manner
IBM Watson Research Center
7 8 3 2
26
Our SRC Approach: No Data Re-Distribution
New
2 P 1 0
5 P 4 3
8 P 7 6
Log-
Stru
ctur
ed W
rite
Current Log Ptr
Write Req. 3 4 6 0 1 5 0 3 2 3 4 5 2 3 7 8…
6 0 4 3 P
0 3 5 1 P
4 5 3 2 P
GC
GC
X X
X
X X
X
XX
X
X
X
XCurrent Log Ptr
Current Log Ptr
Current Log Ptr
Current Log Ptr
P
X X
X X GC
• Goal: No data redistribution§ No parity regeneration§ Done by adding SSD to system
• Natural Expansion§ Writes are distributed across SSDs
IBM Watson Research Center 27
Impact of Upgrading on Performance
Replay trace Using 4 SSDs
Replay trace Using 5 SSDs
Upgrading Process
Time
Start upgrading
1 Phase 2 Phase
New
Foreground Job
Background Job
IBM Watson Research Center 28
Impact of Upgrading on Performance
0
200
400
600
800
1000
1200
Fin HM Prn Prxy
Ban
dwid
th (M
B/s
)
DM-SRC 1 Phase 2 Phase
0
200
400
600
800
1000
1200
Fin HM Prn Prxy B
andw
idth
(MB
/s)
FlashCache with RAID5 1 Phase 2 Phase
Upgrading is effective Upgrading is NOT effective(4 to 5 SSDs) (4 to 5 SSDs)
Interfered performance
IBM Watson Research Center 29
Analysis of SSD Upgrading Performance
0
100
200
300
400
500
600
700
800
0 50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
10
50
1100
Ban
dwid
th (M
B/s
)
Execution Time (Sec)
DM-SRC FlashCache
DM-SRC ends (90 sec)
FlashCache with RAID5 ends(1135sec)
Using Financial trace
With RAID5
IBM Watson Research Center
0
100
200
300
400
500
600
700
800
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85
Ban
dwid
th (M
B/s
)
Execution Time (Sec)
30
DM-SRC: Quick Upgrading
DM-SRC
1 Phase (50s) 2 Phase (40s)# of SSDs4 5
Using Financial trace
20% improved performancedue to increased SSDs
IBM Watson Research Center 31
Comparison of SSD Upgrading Schemes
0
100
200
300
400
500
600
700
800
0 50
100
150
200
250
300
350
400
450
500
550
600
650
700
750
800
850
900
950
1000
10
50
1100
Ban
dwid
th (M
B/s
)
Execution Time (Sec)
DM-SRC FlashCache
DM-SRC ends (90 sec)
FlashCache with RAID5 ends(1135sec)
Using Financial trace
With RAID5
IBM Watson Research Center 32
FlashCache with RAID5: Interfered Performance
0
20
40
60
80
100
120
140
160
265 365 465 565 665 765
Ban
dwid
th (M
B/s
)
Execution Time (Sec)
# of SSDs4 51 Phase (515s) 2 Phase (614s)
FlashCache
Using Financial trace
18% degraded performance due to data redistribution of RAID5
IBM Watson Research Center
• DM-SRC vs FlashCache with RAID5 § Performance: up to 8.7X faster§ Lifetime: up to 4.1X better
• SSD upgrading (4 to 5 SSDs)§ DM-SRC: a few seconds§ FlashCache with RAID5: 20 minutes
Summary of Experiment Results
33
IBM Watson Research Center
• OP-FCL (Optimal Partitioning Flash Cache Layer)§ Trace off: caching space vs OPS§ Dynamically split caching space to read, write, and OPS§ Future work
u Develop better destaging and replacement algorithm
• SRC (SSD RAID as Cache)§ High performance and reliability§ Future work
u Derive MTTDL modelu Apply OP-FCL algorithmu Release DM-SRC as open source
Concluding Remarks
34