20
整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014

Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

整机柜架构中的电源与散热管理Power and Thermal Management in Rack Scale Architecture

Aug 29, 2014

Page 2: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

UNLESS OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined." Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

All products, computer systems, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copyright © 2014, Intel Corporation. All rights reserved.

*Other names and brands may be claimed as the property of others.

Legal Disclaimer

Page 3: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

3 INTEL CONFIDENTIAL

• Rack Scale Management Architecture Overview

• Power Events Management in Rack Scale Architecture

• Advanced Thermal Management in Rack Scale Architecture

• Summary

• Q&A

Agenda

Page 4: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

Software Defined Infrastructure

PROVISIONING MANAGEMENT

Orchestration provisions and optimally allocates resources based on the unique requirements of an

application

POOLED RESOURCES

Network, Storage and Compute elements are abstracted into resource pools

SERVICE ASSURANCE

Policy based automation provides dynamic provisioning and service assurance as applications

are deployed and maintained

Storage Network Compute

Services Delivery

Resource Pool

Orchestration Software

Infrastructure Attributes

Application A

Application B

Application C

Application D

Power Performance Security Thermals Utilization Location

RSA optimizes systems to run your SDI solution

Page 5: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

Intel® Rack Scale Architecture – Optimized for SDIDiscrete Components,

Self-Integration

Storage

Server

Network

• Rack level pooled resources with discoverability & serviceability for uninterrupted maintenance

• Enables orchestration software to compose server, increase rack density & utilization

• Enables service assurance software for optimization & automation

Density

Up to

1.5X1

Provisioned Power

Up to

5X2

Cables

Up to

3X2

NW Downlink

Up to

25X1

Composable set of pooled and disaggregated resources

NVMNVM

Today RSA

Flexibility, Capital Efficiency, Lower TCO

Page 6: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

6 INTEL CONFIDENTIAL

PSMESwitc

hCPPRSA

ModuleBMC

RSA ModuleBMC…

RMM

RSA Rack 1

Pooled System

PSMESwitc

hCPPRSA

ModuleBMC

RSA ModuleBMC…

RMM

RSA Rack 2

PSMESwitc

hCPPRSA

ModuleBMC

RSA ModuleBMC…

RMM

RSA Rack n

RSA Storage Node 1

RSA Storage Node n

RSA Storage Node n

RSA Storage Node 1

RSA Storage Node n

RSA Storage Node 1

RSA Management ArchitectureSystem Level Strawman

RSA Manageability Firmware API

POD Manager Foundational API

DiscoveryPOD Manager

Allocation Composition Management

ü Discoveryü Bootü Configurationü Powerü Faultü Telemetry

The “actor” that creates a manageable composite system

Unified Management APIs to support flexible and scalable usage models

Page 7: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

7 INTEL CONFIDENTIAL

• Rack Scale Management Architecture Overview

• Power Events Management in Rack Scale Architecture

• Advanced Thermal Management in Rack Scale Architecture

• Summary

• Q&A

Agenda

Page 8: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

ü Over current

ü Power supply module overheat

ü Power supply module failure

ü Main power temporary loss / interrupt

ü Main power out of range

ü …

Power Events in Rack Scale Architecture

Shared Power Supply in RSA Requires More Sophisticated Power Events Management

Page 9: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

Power Events Management: CLST and SMART

Page 10: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

Rack Scale SMART/CLST Concept

1

⑴ Power Shelf asserts PS_ALERT upon the power event occurs;

⑵ PS_ALERT triggers CPU/DIMM throttling, and set Fans to the low power mode;

⑶ PS_ALERT also triggers RMM to poll Power Shelf for the event information;

⑷ RMM informs the Server Nodes for the next actions;

⑸ RMM also controls the Fans;

2

2

2

3

4

5

RMM to Server Nodes and Power Shelf communications are based on RSA manageability API.

4

Page 11: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

• Rack Scale Management Architecture Overview

• Power Events Management in Rack Scale Architecture

• Advanced Thermal Management in Rack Scale Architecture

• Summary

• Q&A

Agenda

Page 12: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

PTAS (Power Thermal Aware Solution)

PTAS connects the server platform and datacenter for optimized thermal control

Server Platform Additional Telemetry• Volumetric Airflow• Outlet Air Temperature• CUPS (Compute Usage Per Second)

Datacenter Management• Temperature, Airflow• Power management policy• Workload orchestration

Page 13: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

Rack Scale Level PTAS ConceptBasic Theory:Q=f(RPM) [1]Where:Q is the Volumetric Airflow of the cooling zone;RPM is the speed (rotate per minute) of the fans.Toutlet=Tinlet+1.76*P*kalt/Q [2]Where:Toutlet is the outlet temperature of the cooling zone.Tinlet is the inlet temperature of the cooling zone.P is the total power dissipation of the cooling zone;kalt is the altitude correction factor.Q is the Volumetric Airflow of the cooling zone;

Challenges at Rack Scale:• Function 1 is highly correlated to the cooling zone

configuration, e.g. number and types of the trays.• Equation 2 needs real time power data from all

components within the cooling zone.

Need to support the dynamical configuration of the cooling zone

Page 14: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

Rack Scale PTAS Implementation using RSA Manageability API

BMC sensors:Sled Type, Sled Power, Sled Thermal, Node Type, Node Presence, Node Power, Node Thermal, Node CUPS, ...

PSME sensors:Tray Type, Tray Power, Tray Thermal, Sled Presence, Sled Type, ...

MCU and RMM sensors:Cooling Zone Type (size, structure, …), Tray Presence, Fan Presence, Fan Type, Fan PWM/TACH, etc.

(1) RMM gather the cooling zone configuration information from the BMC, PSME and Rack Backplane through the RSA manageability API. (example: presence and type of trays, presence and type of sleds, presence and type of fans, etc.)

(2) RMM choose the appropriate algorithm and coefficient sets based on the cooling zone configuration information.

(3) RMM polls the Fan RPM from the Rack Backplane through the RSA manageability API.

(4) RMM polls the Node/Sled/Tray Power from the BMC, PSME and Rack Backplane through the RSA manageability API.

(5) RMM calculate the Cooling Zone Level Volumetric Airflow and Cooling Zone Level Outlet Temperature, and expose these sensors through RSA manageability API.

RMM to Server Nodes and Tray communications are based on RSA manageability API.

1 23

4

5

Page 15: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

• Rack Scale Management Architecture Overview

• Power Events Management in Rack Scale Architecture

• Advanced Thermal Management in Rack Scale Architecture

• Summary

• Q&A

Agenda

Page 16: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

ID Analytics

Reservation

Machine Learning

Service ManagementService Templates

Inference Engine

Optimization RSA Solution StackFramework to optimize Pooled System Architectures

Pooled SystemRack

Management (RMM)

Pooled Storage

RSA Manageability Firmware API

POD Manager Foundational API

Service API

POD Manager

IPMI/XML RMCP

Rack Scale Software ArchitectureStrawman with APIs

Processor DiskMemory PowerNetwork Cooling

Interconnect

Component Access

Resource Access

Composite Server Access

Composite Workload and Service Access

Discovery Allocation Composition

Management

RSA POD ManagerIdentifies available resources for composition based on a service request

RSA APIsTo Manage Pooled System, Ingredients, Actor for Composition

Page 17: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

17 INTEL CONFIDENTIAL

• Rack Scale Management Architecture Overview

• Power Events Management in Rack Scale Architecture

• Advanced Thermal Management in Rack Scale Architecture

• Summary

• Q&A

Agenda

Page 18: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

Backup

Page 19: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •

INTEL CONFIDENTIAL

RSA: Physical Manifestation

BuildPower & Cool

Storage Storage

Compute Drawer

Compute Drawer

Compute Drawer

Switch Non

-un

iform

OK

Netw

ork

Fabr

ic Sc

aling

Com

pute

Scali

ng S

tora

ge

Rack 1

Rack 2

Rack n

POD

Mana

ger

Powe

r & C

oolin

g

Rack CP Mgr

POD •Collection of racks to be managed by a single POD Manager

Rack•Standardized frame for deploying RSA drawers, and infrastructure (switch, cables, Power etc)

Drawer

•Sliding shelf that holds multiple modules•Manageable Domain - a thermal or power zone•Many drawers can fit into a rack

Module

•Field replaceable unit •Can be pooled or shared•Includes single or multiple elements of Compute, network, storage, memory, mgmt., power, cooling

Component •Fundamental ingredients (CPU, DIMMs, cables, fans, boards etc)

One : n

Page 20: Power and Thermal Management in Rack Scale …整机柜架构中的电源与散热管理 Power and Thermal Management in Rack Scale Architecture Aug 29, 2014 3 INTEL CONFIDENTIAL •