View
215
Download
1
Category
Preview:
Citation preview
Adaptive Memory System over Ethernet
Jun Suzuki, Teruyuki Baba, Yoichi Hidaka†, Junichi Higuchi, Nobuharu Kami, Satoshi Uchida††, Masahiko Takahashi,
Tomoyoshi Sugawara, and Takashi Yoshikawa
System Platforms Research Laboratories, NEC Corporation†IP Network Division, NEC Corporation
††2nd Computer Software Division, NEC Corporation
© NEC Corporation 2010Page 2 HotStorage ’10
Memory Scalability in Cloud Computing
• Cannot scale depending on service requirements• Service performance limited by memory• Slow block I/O devices
Needs for scaling memory beyond individually loaded amount
Computer memory is limited by individually loaded resources
MEMMEM
MEMMEM
MEMMEM
MEMMEM
MEMMEM
MEMMEM MEMMEM MEMMEM
© NEC Corporation 2010Page 3 HotStorage ’10
Resource Should Simultaneously be High-Performance and Networked
Networked• Resource share among multiple computers• Ease of management
High-Performance
• Large throughput, low latency• Avoid firmware process and memory copy to transfer data
© NEC Corporation 2010Page 4 HotStorage ’10
Related Works
(A) Intel Turbo Memory
• PCIe flash device for disk cache• Device driver between OS and disk driver
J. Matthews et al., “Intel Turbo Memory: Nonvolatile Disk Caches in the Storage Hierarchy of Mainstream Computer Systems”, ACM Trans. on Storage, vol.4, no. 2, article 4, 2008.
(B) Remote Page Swap
• Using memory of next machine with swapping• Standard interconnection, e.g., Ethernet
E. P. Markatos and G. Dramitinos, “Implementation of a Reliable Remote Memory Pager”, USENIX 1996 Annual Technical Conference, 1996.
SW
MEM
High Performance
Resource Shareby Network
TurboMemoryDriver
SATADriver
TurboMemory SATA
Block I/O from OS
© NEC Corporation 2010Page 5 HotStorage ’10
I/O Expansion BoxI/O Expansion Box
Our Method: Ethernet-Attached SSD as High-Speed Swap Device
PCIe SSD
High-Performance AND Resource Share
Standard EthernetStandard Ethernet
• Standard Ethernet, PCIe SSD
“ExpEther” Interconnect- PCIe over Ethernet [1]
[1] J. Suzuki et al., “ExpressEther – Ethernet-Based Virtualization Technology for Reconfigurable Hardware Platform”, 14th IEEE Symposium on High-Performance Interconnects, pages 45-51, 2006.
EE Brid.EE Brid. EE Brid.EE Brid. EE Brid.EE Brid.
PCIe DMAw/o Firmware, Memcopy
Hot-Plug and Remove
© NEC Corporation 2010Page 6 HotStorage ’10
Computer
Extending PCIe Tree over Ethernet• PCIe packet encapsulation into Ethernet frames• Ethernet region is PCIe switch
High-Speed Ethernet Transport [1]• Delay-based congestion control• < 8.5% of TCP-based delay
Memory
PCIe DMA over Ethernet
Memory
EthernetEthernet
ExpEtherBridge
ExpEtherBridge
ExpEtherBridge
ExpEtherBridge
HostBridgeHost
Bridge
ExpEtherBridge
ExpEtherBridge
PCIe bus PCIe DMA
CPUCPU
SSD BSSD BSSD ASSD A
No Driver
Standard Ethernet
[1] H. Shimonishi et al., “A Congestion Control Algorithm for Data Center Area Communications”, 2008 International CQR Workshop, 2008.
No Firmware ProcessNo Memory Copy
© NEC Corporation 2010Page 7 HotStorage ’10
Hot-Plug and Remove
ComputerA
ExpEtherBridge
ExpEtherBridge
ComputerB
ExpEtherBridge
ExpEtherBridge
ComputerC
ExpEtherBridge
ExpEtherBridge
ExpEtherBridge
ExpEtherBridge
SSDA
ExpEtherBridge
ExpEtherBridge
SSDB
SSDC
VLAN 1
Ethernet Ethernet VLAN 2
ExpEtherBridge
SSDs Assigned to Computer with VLAN Grouping• Adaptive assignment using system manager• PCIe-standard hot-plug and remove
ExpEtherBridge
SystemManager
© NEC Corporation 2010Page 8 HotStorage ’10
Evaluations
• Block I/O Performance of Ethernet-Attached SSD
• System Evaluation: In-Memory DB
© NEC Corporation 2010Page 9 HotStorage ’10
Evaluation Setups
SSDSSD
ComputerComputer
ExpEtherBridge
(a) ExpEther
ExpEtherBridge
ExpEtherBridge
(c) iSCSI with TOE(b) iSCSI
ExpEtherBridge
InitiatorInitiator
TOE NICTOE NIC
TargetTarget
TOE NICTOE NIC SSDSSD
InitiatorInitiator
NICNIC
TargetTarget
NICNIC SSDSSD
ConventionalConventional
ProposalProposal
© NEC Corporation 2010Page 10 HotStorage ’10
(b) Random Write IOPS
0
10000
20000
30000
40000
50000
60000
512 4K 16K 32K 64K 256K 1M 4M
Access Size
IOP
S
Direct Insertion
EE
iSCSI with TOE
iSCSI
(a) Random Read IOPS
0
10000
20000
30000
40000
50000
512 4K 16K 32K 64K 256K 1M 4M
Access Size
IOP
S
Direct Insertion
EE
iSCSI with TOE
iSCSI
Block I/O Performance (IOPS) of Ethernet-Attached SSD
Read Close to Host I/O Slot, Write Twice of TOE iSCSI
Host I/O Slot ExpEther iSCSI w/ TOE iSCSI
Ran. Read 100 92 50 14
Ran. Write 100 74 42 14
Ran. Read w/ Switch 100 91 46 14
Ran. Write w/ Switch 100 68 39 14
© NEC Corporation 2010Page 11 HotStorage ’10
Write IOPS Overhead and Its Solution
Computer
Memory Read Requests
SSD
Completion w/ Data
Number of SSD’s outstanding read requestlimited by its implementation
Increasing number of requests enhances performance close to host I/O slot
Up to 4 Requests
© NEC Corporation 2010Page 12 HotStorage ’10
System Evaluation: In-Memory Database
RDB: postgresql 8.1Bench: pgbench (TPC-B-like)CPU: Intel Core 2 QuadOS: CentOS 5.3 (Linux 2.6.18)Ethernet: 10GbESSD: 16-GB Partition of Fusion IO 160 GB (Write Improve Mode)#Client: 100Transaction per Client: 1000
SSDSSD
ComputerComputer
ExpEtherBridge
ExpEtherBridge
ExpEtherBridge
ExpEtherBridgeMemoryMemory
RDB FileRDB FileSwap
Placing RDB File on Ramdisk
© NEC Corporation 2010Page 13 HotStorage ’10
Scaling-Up beyond Main Memory
• Maintaining performance when DB files enlarged beyond system memory• >79% performance of all-in-memory at 4G Mem + ExpEther case
0
500
1000
1500
2000
2500
3000
3500
0 2 4 6 8 10 12
Total Size of RDB Files [GB]
Tra
nsa
ction p
er
Sec
.
4G Mem + EE (SSD)
2G Mem + EE (SSD)
8G Mem (HDD)
4G Mem (HDD)
2G Mem (HDD)
>79% Performance
Maintaining Performance
© NEC Corporation 2010Page 14 HotStorage ’10
Comparison with Conventional Protocol
0
500
1000
1500
2000
2500
3000
0 2 4 6 8 10 12
Total Size of DB Files [GB]
Tra
nsa
ction p
er
Sec.
Mem 4G + EE
Mem 2G + EE
Mem 4G + iSCSI
Mem 2G + iSCSI
Performance Gain
[Note] iSCSI with TOE could not be evaluated by software bug.Calculation indicates proposal outperforms it by 21%
Proposal outperforms iSCSI by 139% at best case
© NEC Corporation 2010Page 15 HotStorage ’10
Saving CPU Resource for Transaction Processing
High-speed swap saves CPU for user process
020406080
100
Mem4G +EE
Mem2G +EE
Mem8G
Mem4G +EE
Mem2G +EE
Mem8G
6G 10G
Total Size of RDB Files [GB]
CP
U U
til.
[%] software irq
hardware irq
waitnice
system
user
Swap v v v v v Consumed by page swap
© NEC Corporation 2010Page 16 HotStorage ’10
Conclusion
Adaptive Memory Expansionwith Ethernet-Attached SSD as High-Speed Swap Device
Standard Components• Standard Ethernet and PCIe SSD• No software driver for Ethernet expansion
High-Performance and Resource Share• PCIe DMA over Ethernet• Superior block-io performance than conventional protocol• PCIe hot-plug and remove
Proven System Merits• Maintains database performance beyond system memory
© NEC Corporation 2010Page 17 HotStorage ’10
Future Works
Simultaneous Share of SSD among multiple computers• PCIe I/O virtualization emerges• Efficient resource utilization• High-speed data share
Solve Performance Bottleneck of Storage and Database System• Network storage for system availability• Performance bottleneck by network storage
Recommended