24
From Rack scale computers to Warehouse scale computers 産総研 情報技術研究部 野 成 1 2014/7/31 (Revised 8/6)

From Rack scale computers to Warehouse scale computers

Embed Size (px)

DESCRIPTION

A survey report on rack scale computers and warehouse scale computers

Citation preview

  • 1.From Rack scale computers to Warehouse scale computers 1 2014/7/31 (Revised 8/6)

2. disaggregation Rack scale computer Case #1: Open Compute Project Case #2: Intel Rack Scale Architecture Warehouse scale computer Case #1: HP The Machine Case #2: UCB ASPIRE FireBox 2 3. Rack scale computer HP Moonshot 3 Moonshot for extreme efficiency Converged Infrastructure for extreme scale Shared Power Shared Storage Shared Fabric Shared Management Shared Chassis Shared Cooling with a rich set of applications specific cartridges codesigned for extreme efficiency The new metric Gflops/Watt At extreme scale no way to escape specialization and heterogenity 4. Open Compute Project 4 20114 Industry Standard: 1.9 PUE Open Compute Project: 1.07 PUE Quanta Rackgo X series, GIGABYTE DataCenter Solution series PUE: Power Usage Eectiveness 5. Open Compute Rack v2: Open Rack Well-defined Mechanical API between the server and the rack Accepts any size equipment 1U 10U Wide 21 equipment bay for maximum space efficiency Shared 12v DC power system Available now from Delta Electronics (more suppliers coming soon) http://www.slideshare.net/finalbsd/1-ocp-workshop 6. 6 7. 7 Reference Architecture Network platform Flexible & Cost effective Increase utilization thru storage aggregation Extreme Compute and Network bandwidth Platform Flexibility - Increase useful life, and capacity Intel rack scale architecture CPU / Mem Modules Silicon Atom & Xeon Photonics & switch fabric Storage PCIE -SSD & Caching Open Network Platform Orchestration Accelerating rack scale innovation by delivering suite of interoperable technologies Efficiency thru granularity at physical & logical level Intel technologies optimized for flexibility, performance & cost Open rack scale reference architecture to simplify adoption Driving alignment on common standards with broad range of uses (end users, Scorpio and OCP ) and OEM implementations 8. 8 Silicon Photonics for Disaggregation Mezzanine Options Intel Ethernet controller and Intel Silicon Photonics Optical PCIe via Intel Silicon Photonics Intel Xeon processor based tray Mezzanine fiber Intel AtomTM Micro-server tray 100 Gb in the rack, enables flexible topologies & distributed switching 9. 911 Optical Rack Choice of Logical Architecture CPUMem DDR Server CPUMem DDRServer CPUMem Xeon: PCIe Atom: Enet DDR Server Xeon and Atom Fabric Compute HDDs PCIeCPUSSDs Compute Network CPUMem DDRServer CPUMem DDRServer CPUMem DDR Server SiPhSiPhSiPh FabricFabricFabric 100Glinks Architecture offers flexible solutions and multiple Value Propositions Remote Storage I/O Appliance To Spine Switches Network Storage Compute Switch ASIC CPU NIC SSD NICSSD Server SiPhSiPh CPUMem DDR CPUMem DDR CPUMem DDR SiPh PCIe PCIe PCIe Server Server Inter operable & programmable systems based on standard platforms Choice of platform sub systems & logical architecture composability Network & Storage move into TOR Switch TOR Switch distributed into Servers 10. 10 Example Usages Public Cloud Private Cloud Big Data IMDB (future) CSPs SW // Range of end user usage models driving innovation OEMs delivering range of implementations Industry delivering common building blocks with flexible configurations Range of emerging solution stacks with composability 11. Warehouse scale computer 11 HE DATACENTER AS A COMPUTER Figure 1.1 depicts some of the more popular building blocks for WSCs. A set of low-end serv- typically in a 1U or blade enclosure format, are mounted within a rack and interconnected using cal Ethernet switch. These rack-level switches, which can use 1- or 10-Gbps links, have a num- of uplink connections to one or more cluster-level (or datacenter-level) Ethernet switches. This ond-level switching domain can potentially span more than ten thousand individual servers. .1 Storage k drives are connected directly to each individual server and managed by a global distributed system (such as Googles GFS [31]) or they can be part of Network Attached Storage (NAS) ices that are directly connected to the cluster-level switching fabric. A NAS tends to be a simpler ution to deploy initially because it pushes the responsibility for data management and integrity to AS appliance vendor. In contrast, using the collection of disks directly attached to server nodes uires a fault-tolerant le system at the cluster level. This is difcult to implement but can lower dware costs (the disks leverage the existing server enclosure) and networking fabric utilization GURE 1.1: Typical elements in warehouse-scale systems: 1U server (left), 7 rack with Ethernet ch (middle), and diagram of a small cluster with a cluster-level Ethernet switch/router (right). connectivity. Storage Hierarchy 2 shows a programmers view of storage hierarchy of a typical WSC. A server consists of a f processor sockets, each with a multicore CPU and its internal cache hierarchy, local shared ent DRAM, and a number of directly attached disk drives.The DRAM and disk resources e rack are accessible through the rst-level rack switches (assuming some sort of remote call API to them), and all resources in all racks are accessible via the cluster-level switch. Quantifying Latency, Bandwidth, and Capacity 3 attempts to quantify the latency, bandwidth, and capacity characteristics of a WSC. For n we assume a system with 2,000 servers, each with 8 GB of DRAM and four 1-TB disk ach group of 40 servers is connected through a 1-Gbps link to a rack-level switch that ditional eight 1-Gbps ports used for connecting the rack to the cluster-level switch (an 1.2: Storage hierarchy of a WSC. 12. HP The Machine 12 SoC 13. HP Nanostores/Memristor 13 Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8 UNIVERSAL MEMORY A drastic reduction of the memory stack complexity and cost But requires a complete software stack redesign to leverage the full potentiality of the new architecture / 10100 14. 14 15. 15 Photonic Interconnect Compute Elements Memory Elements NV Memory Elements Storage Elements Architecture evolution/revolution Computing Ensemble: bigger than a server, smaller than a datacenter, built-in system software Disaggregated pools of uncommitted compute, memory, and storage elements Optical interconnects enable dynamic, on-demand composition Ensemble OS software using virtualization for composition and management Management and programming virtual appliances add value for IT and application developers On-demand composition Ensemble OS Management Ensemble Programming 16. Example Usage (1) 16 17. Example Usage (2) 17 18. Performance Estimation 680 18 HPC Challenges RandomAccess benchmark 19. Performance Estimation BG/Q20 19 Graph 500 benchmark 20. Roadmap 20 21. UC Berkeley 1 Terabit/sec optical fibers FireBox Overview! High Radix Switches SoC SoC SoC SoC SoC SoC SoCSoC SoC SoC SoC SoC SoC SoC SoC SoC Up to 1000 SoCs + High-BW Mem (100,000 core total) NVM NVM NVM NVM NVM NVM NVM NVMNVM NVM NVM NVM NVM NVM NVM NVM Up to 1000 NonVolatile Memory Modules (100PB total) InterXBox& Network& Many&Short&Paths& Thru&HighXRadix&Switches& FireBox Overview 21 The Machine 22. 22 UC Berkeley Photonic-Switches- ! Monolithically&integrated&silicon&photonics&with&WaveXDivision& MulCplexing&(WDM)& - A&ber&carries&32&wavelengths,&each&32Gb/s,&in&each&direcCon& - OXchip&laser&opCcal&supply,&onXchip&modulators&and&detectors& ! MulCple&radixX1000&photonic&switch&chips&arranged&as&middle& stage&of&Clos&network&(rst&and&last&Clos&stage&inside&sockets)& ! 2K&endpoints&can&be&congured&as&either&SoC&or&NVM&modules& ! In&Box,&all&paths&are&two&ber&hops:& - ElectricalXphotonic&at&socket& - One&ber&hop&socketXtoXswitch& - PhotonicXelectrical&at&switch& - Electrical&packet&rouCng&in&switch& - ElectricalXphotonic&at&socket& - One&ber&hop&switchXtoXsocket& - PhotonicXelectrical&at&socket& 30 SoC& Switch& Switch& SoC& NVM& 23. fat nodeB/F disaggregation 25 Gbps (100 GbE)28 Gbps (HMC)state-of-the-art 23 24. Intel rack scale architecture overview, Interop2013 http://presentations.interop.com/events/las-vegas/2013/ free-sessions---keynote-presentations/download/463 New technologies that disrupt our complete ecosystem and their limits in the race to Zettascale, HPC2014 http://www.hpcc.unical.it/hpc2014/pdfs/demichel.pdf HPTech Power Club , ASCII.jp http://ascii.jp/elem/000/000/915/915508/ 24