Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
整机柜架构中的电源与散热管理Power and Thermal Management in Rack Scale Architecture
Aug 29, 2014
INTEL CONFIDENTIAL
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.
All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copyright © 2014, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Legal Disclaimer
3 INTEL CONFIDENTIAL
• Rack Scale Management Architecture Overview
• Power Events Management in Rack Scale Architecture
• Advanced Thermal Management in Rack Scale Architecture
• Summary
• Q&A
Agenda
INTEL CONFIDENTIAL
Software Defined Infrastructure
PROVISIONING MANAGEMENT
Orchestration provisions and optimally allocates resources based on the unique requirements of an
application
POOLED RESOURCES
Network, Storage and Compute elements are abstracted into resource pools
SERVICE ASSURANCE
Policy based automation provides dynamic provisioning and service assurance as applications
are deployed and maintained
Storage Network Compute
Services Delivery
Resource Pool
Orchestration Software
Infrastructure Attributes
Application A
Application B
Application C
Application D
Power Performance Security Thermals Utilization Location
RSA optimizes systems to run your SDI solution
INTEL CONFIDENTIAL
Intel® Rack Scale Architecture – Optimized for SDIDiscrete Components,
Self-Integration
Storage
Server
Network
• Rack level pooled resources with discoverability & serviceability for uninterrupted maintenance
• Enables orchestration software to compose server, increase rack density & utilization
• Enables service assurance software for optimization & automation
Density
Up to
1.5X1
Provisioned Power
Up to
5X2
Cables
Up to
3X2
NW Downlink
Up to
25X1
Composable set of pooled and disaggregated resources
NVMNVM
Today RSA
Flexibility, Capital Efficiency, Lower TCO
6 INTEL CONFIDENTIAL
PSMESwitc
hCPPRSA
ModuleBMC
RSA ModuleBMC…
RMM
RSA Rack 1
Pooled System
PSMESwitc
hCPPRSA
ModuleBMC
RSA ModuleBMC…
RMM
RSA Rack 2
PSMESwitc
hCPPRSA
ModuleBMC
RSA ModuleBMC…
RMM
RSA Rack n
…
RSA Storage Node 1
RSA Storage Node n
RSA Storage Node n
RSA Storage Node 1
RSA Storage Node n
RSA Storage Node 1
RSA Management ArchitectureSystem Level Strawman
RSA Manageability Firmware API
POD Manager Foundational API
DiscoveryPOD Manager
Allocation Composition Management
ü Discoveryü Bootü Configurationü Powerü Faultü Telemetry
The “actor” that creates a manageable composite system
Unified Management APIs to support flexible and scalable usage models
7 INTEL CONFIDENTIAL
• Rack Scale Management Architecture Overview
• Power Events Management in Rack Scale Architecture
• Advanced Thermal Management in Rack Scale Architecture
• Summary
• Q&A
Agenda
INTEL CONFIDENTIAL
ü Over current
ü Power supply module overheat
ü Power supply module failure
ü Main power temporary loss / interrupt
ü Main power out of range
ü …
Power Events in Rack Scale Architecture
Shared Power Supply in RSA Requires More Sophisticated Power Events Management
INTEL CONFIDENTIAL
Power Events Management: CLST and SMART
INTEL CONFIDENTIAL
Rack Scale SMART/CLST Concept
1
⑴ Power Shelf asserts PS_ALERT upon the power event occurs;
⑵ PS_ALERT triggers CPU/DIMM throttling, and set Fans to the low power mode;
⑶ PS_ALERT also triggers RMM to poll Power Shelf for the event information;
⑷ RMM informs the Server Nodes for the next actions;
⑸ RMM also controls the Fans;
2
2
2
3
4
5
RMM to Server Nodes and Power Shelf communications are based on RSA manageability API.
4
INTEL CONFIDENTIAL
• Rack Scale Management Architecture Overview
• Power Events Management in Rack Scale Architecture
• Advanced Thermal Management in Rack Scale Architecture
• Summary
• Q&A
Agenda
INTEL CONFIDENTIAL
PTAS (Power Thermal Aware Solution)
PTAS connects the server platform and datacenter for optimized thermal control
Server Platform Additional Telemetry• Volumetric Airflow• Outlet Air Temperature• CUPS (Compute Usage Per Second)
Datacenter Management• Temperature, Airflow• Power management policy• Workload orchestration
INTEL CONFIDENTIAL
Rack Scale Level PTAS ConceptBasic Theory:Q=f(RPM) [1]Where:Q is the Volumetric Airflow of the cooling zone;RPM is the speed (rotate per minute) of the fans.Toutlet=Tinlet+1.76*P*kalt/Q [2]Where:Toutlet is the outlet temperature of the cooling zone.Tinlet is the inlet temperature of the cooling zone.P is the total power dissipation of the cooling zone;kalt is the altitude correction factor.Q is the Volumetric Airflow of the cooling zone;
Challenges at Rack Scale:• Function 1 is highly correlated to the cooling zone
configuration, e.g. number and types of the trays.• Equation 2 needs real time power data from all
components within the cooling zone.
Need to support the dynamical configuration of the cooling zone
INTEL CONFIDENTIAL
Rack Scale PTAS Implementation using RSA Manageability API
BMC sensors:Sled Type, Sled Power, Sled Thermal, Node Type, Node Presence, Node Power, Node Thermal, Node CUPS, ...
PSME sensors:Tray Type, Tray Power, Tray Thermal, Sled Presence, Sled Type, ...
MCU and RMM sensors:Cooling Zone Type (size, structure, …), Tray Presence, Fan Presence, Fan Type, Fan PWM/TACH, etc.
(1) RMM gather the cooling zone configuration information from the BMC, PSME and Rack Backplane through the RSA manageability API. (example: presence and type of trays, presence and type of sleds, presence and type of fans, etc.)
(2) RMM choose the appropriate algorithm and coefficient sets based on the cooling zone configuration information.
(3) RMM polls the Fan RPM from the Rack Backplane through the RSA manageability API.
(4) RMM polls the Node/Sled/Tray Power from the BMC, PSME and Rack Backplane through the RSA manageability API.
(5) RMM calculate the Cooling Zone Level Volumetric Airflow and Cooling Zone Level Outlet Temperature, and expose these sensors through RSA manageability API.
RMM to Server Nodes and Tray communications are based on RSA manageability API.
1 23
4
5
INTEL CONFIDENTIAL
• Rack Scale Management Architecture Overview
• Power Events Management in Rack Scale Architecture
• Advanced Thermal Management in Rack Scale Architecture
• Summary
• Q&A
Agenda
INTEL CONFIDENTIAL
ID Analytics
Reservation
Machine Learning
Service ManagementService Templates
Inference Engine
Optimization RSA Solution StackFramework to optimize Pooled System Architectures
Pooled SystemRack
Management (RMM)
Pooled Storage
RSA Manageability Firmware API
POD Manager Foundational API
Service API
POD Manager
IPMI/XML RMCP
Rack Scale Software ArchitectureStrawman with APIs
Processor DiskMemory PowerNetwork Cooling
Interconnect
Component Access
Resource Access
Composite Server Access
Composite Workload and Service Access
Discovery Allocation Composition
Management
RSA POD ManagerIdentifies available resources for composition based on a service request
RSA APIsTo Manage Pooled System, Ingredients, Actor for Composition
17 INTEL CONFIDENTIAL
• Rack Scale Management Architecture Overview
• Power Events Management in Rack Scale Architecture
• Advanced Thermal Management in Rack Scale Architecture
• Summary
• Q&A
Agenda
INTEL CONFIDENTIAL
Backup
INTEL CONFIDENTIAL
RSA: Physical Manifestation
BuildPower & Cool
Storage Storage
Compute Drawer
Compute Drawer
Compute Drawer
Switch Non
-un
iform
OK
Netw
ork
Fabr
ic Sc
aling
Com
pute
Scali
ng S
tora
ge
Rack 1
Rack 2
Rack n
POD
Mana
ger
Powe
r & C
oolin
g
Rack CP Mgr
POD •Collection of racks to be managed by a single POD Manager
Rack•Standardized frame for deploying RSA drawers, and infrastructure (switch, cables, Power etc)
Drawer
•Sliding shelf that holds multiple modules•Manageable Domain - a thermal or power zone•Many drawers can fit into a rack
Module
•Field replaceable unit •Can be pooled or shared•Includes single or multiple elements of Compute, network, storage, memory, mgmt., power, cooling
Component •Fundamental ingredients (CPU, DIMMs, cables, fans, boards etc)
One : n