View
17
Download
0
Category
Preview:
Citation preview
6
持续创新,让计算变简单
SAP HANA一体机
HPC
大数据
超融合基础设施
Azure Stack 解决方案
边缘计算视频分析解决
方案
加速
加速部件
ACC
FPGA
NIC
NVMe SSD
FPGA
Intelligent NIC
FusionServer
XFusionServer
EFusionServer
GFusionServer
机架优化
标准2S-8S x86, 为大中型企业优化设计
传统 模块化
高密服务器面向大规模应用部署优化
刀片系统融合基础设施,提供最大化效率
GPU服务器面向需要GPU计算环境的HPC、视频和AI/DL等场景
独特创新
通用 专用应用
FDM
DEMT
创新
芯片
NC: Hi 1503NIC: Hi 1822存储: Hi 1812BMC: Hi 1710
7
算、存、传、管全领域的芯片设计能力
Node Controller
CPUBMC
NIC controller
SSD controller
1
4
3
NC interconnect
chip
CPU CPU
CPU
CPU
Storage/Network etc. I/O controller chips
Server management chip
Hi1503
Hi1812
Hi1822
Hi1710
1
2
3
42
High-speed interconnect chip for Intel E7 v3/v4 processors32S scale-up
PCIe/NVMe SSD storage controller chipRead/Write I/O acceleration
Programmable network controller chipDC high-speed flexible interconnect
BMC management chipBuilt-in fault diagnosis expert library and patented processing mechanism
8
DEMT:高效能专利技术
0.5%1.4%
3.8%
5.9%
9.0%
16%12.3%
9.7%
5.5%
1.9%
10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
• Optimized Digital Thermal Sensor (DTS) algorithm for dynamically tuning target temperature, with higher tuning precision
• Power capping: al locates power supply/heat dissipation resources according to actual equipment power
• Proportional-integral-derivative (PID) algorithm based fan speed tuning according to loads, and component/ambient temperature
• Fans enhanced with Deep Sleep Technology (DST)
• Low Workload Low Watt (LW2) technology• Automatic switching between active and standby
power supplies improves overall energy efficiency• High-voltage DC (HVDC) power supply
Power saving of DEMT with different CPU loads
1
2
3
4
Average
13.4%
Dynamic Energy Management Technology (DEMT) slashes overall power consumption by an average of 13.4% without compromising service performance.
9
FDM:高可靠专利技术Unique Fault Diagnosis & Management (FDM) Technology enables comprehensive out-of-band fault information collection and analysis, with diagnosis precision of up to 93%!
Comprehensive Hardware Component Diagnosis Key Technical Features
CPUMemory
PCIe
Voltage
RAID
Fan modules
Storage device
Power supplyTemperature
Out-of-band fault diagnosis system
Fault diagnosis expert library
93%Fault
generation
System runs normally
System crashes
In-band collection
Information receiving
Parsing Diagnosisexpert library
Raw data
Parsed data
Pre-warningexpert library
Output
Out-of-band collection
CATERR-related fault locating accuracyFault data
summary library
10
eSight Server:数据中心全生命周期管理软件Automates and Smartens Up Server Entire-Lifecycle Management, Maximizes O&M Efficiency
Delivery O&M Out-of-Service
eSight Server
Out-of-band OS deployment, shortens service rollout time by 50%•Template-based configuration management, batch deployment via out-of-band OS
Precise out-of-band locating of faults, with up to 93% diagnosis effectiveness•Comprehensive fault info collection & analysis out-of-band, enabling precise fault locating
Automated firmware update subscription, streamlines upgrade process•Automated subscription, download, and detection of firmware and driver version updates
Stateless computing shortens configuration recovery time to less than 3 minutes•Automated configuration management, no manual intervention required
Planning
11
华为HPC服务器产品家族
Dedicated Nodes - Big Memory, I/O Expansion, Multiple Accelerators
High Density Rack Mount Servers
High Performance Blades
58858100
1288/2288
KunLun 32S System
5288
E9000
X6000
2488
G5500
Heterogeneous Server
12
华为在高性能领域的持续与战略技术投资
Infrastructurem
iddleware Resource manage
Huawei MPI
Applications
Job scheduling
Huawei Compiler &
libraries
Weather & Ocean manufacturing EDA Life-
science AI
Processor, New Fabric- CCIX, GENZ
Next-generation
NAS storage system
Interconnection(IB, RoCE, dedicated
low-latency technology)
Oil & Gas
HPC application characterization, monitoring and tuning Unified
portal for HPC workflows
6, Huawei MPI, optimized for CPU and networking devices
5, Huawei Tool-chains
HPC Cloud with cloud bursting or
hybrid cloud system
1, Dedicated processor and fabric for HPC system
3, ASIC for ultra low latency networking tech, RoCE for specific HPC market
2, NAS system with burst buffer
4, Unique advantages, Cloud bursting or Hybrid Cloud infrastructure
14
开放共赢的HPC产业生态
Infrastructurem
iddleware Resource manage
Huawei MPI
Applications
Job scheduling
Remote visualization
Weather & Ocean manufacturing EDA Life-
science AI
Processor, New Fabric- CCIX, GENZ
Next-generation
NAS storage system
Interconnection(IB, RoCE, dedicated
low-latency technology)
Oil & Gas
HPC application characterization, monitoring and tuning Unified
portal for HPC workflows
HPC Cloud with cloud bursting or
hybrid cloud system
1, partnering with professional ISV to deliver professional HPC solution
2, partnering with commercial ISV and community , deep involve into application development form the very beginning
15
华为HPC解决方案:Transforming HPC
On-premise/Private Cloud HPC solution HPC Public CloudFundamental technical
innovation
Huawei, Transforming HPC
16
华为HPC价值主张:面向应用,加速传统HPC与New HPC的融合
极致高效 面向应用
面向应用优化
的极致性能
l 灵活的模块化架构
l 多样化的创新形态
l 深度应用优化的硬件加速
更小的空间、更低的能耗
获取更高的性能
l 端到端的工程设计能力
l 高效可靠的液冷技术
l 一体化的集成交付和安装
SDS
Big Data
Graph
适配变化
面向未来的
HPC融合架构
l 新兴技术的快速应用
l 多用途的HPC系统
l HPC与云结合
CloudAI
Big Data
17
L1:华为端到端HPC方案能力
All-In-Room 大中型HPC
• 机柜级部署,现场安装仅需 4小时• 1~6个IT机柜,支持
10~100TFlops HPC系统
All-In-Cabinet 小型HPC
FusionModule500 FusionModule800
All-In-Container 集装箱HPC
• 支持单排或双排密闭冷/热通道部署,面积在500平米以下
• 2~48个IT机柜,支持100TFlops~1PFlops HPC系统
• 工厂预制,预测试,现场交付,缩短80%部署周期
• 8个IT机柜,支持10~100TFlops HPC系统
FusionModule2000
单排
FusionModule1000A
双排
18
L1+L2:华为HPC全液冷方案
• CPU, Memory and VRD are cooled directly by up to 45 ℃ water• Chiller is optional, cooling PUE < 1.1• Board-level liquid cooling + Cabinet-level air-to-liquid heat exchange• No need for row air conditioners and water chillers
Internal serrated micro-channel CPU Heat Sink
Inter-DIMM water flowMemory cooling board
Optimized heat dissipation teeth spacing and flow resistance
Optimized serrated design
Energy efficiency up by10%
Multi-channel water flow
Shorter heat transfer path cut thermal resistance
65%
Fence-style fixture designArea of contact with air cooling
80%
Hybrid Rack design
Impact to ambient 0%
20
最快的横向扩展文件存储
OceanStor 9000
l 支持3~288个节点线性扩展l 系统带宽可达400GB/sl 支持单一文件系统100PB
OceanStor DFS
文件系统Lustre软件参考架构
OceanStor V3
l NAS和SAN融合架构l RAID 2.0技术保证数据可靠
21
面向New HPC的产品与方案
Enabling HPC Cloud
Open Telekom CloudHPC Class Hardware• InfiniBand Fabric• Hardware Accelerators• Bare Metal Compute Node
Advancing Cloud Software Stack• HPC class storage• Container support• Same stack for private & public cloud
Leverage AI
Business Driven Innovation• Team with customers to identify
new business problems• Creative use of new technology
such as AI• Partner with industry leaders
Unified HPC and Big Data Platform
Big Data Acceleration
Emerging Hardware Technology• Large-Memory Compute Node• New Storage Class Memory• FPGA and custom accelerator
Advancing Big Data Software• Massive streaming data• Millisecond latency• Artificial Intelligence
22
No Compromise!基于混合云的HPC Cloud方案
GPU acceleration FPGA data pre-processing
High performance network
Bare metal service
Optimal cloud acceleration and data pre-processing
Nvidia P100 GPU Acceleration
100G IB Service Network2μs ,Low Latency
Bare metal + SDIShared storage
Hybrid cloud for HPC and big data
10+ European top research institutes30%67%
Design emulation cloud Scientific computing cloud Energy exploration cloud
Cloud
VMs with high specifications
128vCPU+4TB RAM
23
HPC Singularity!
GPU acceleration FPGA data pre-processing
High performance network
Bare metal service
Optimal cloud acceleration and data pre-processing
Nvidia P100 GPU Acceleration
100G IB Service Network2μs ,Low Latency
Bare metal + SDIShared storage
Design emulation cloud Scientific computing cloud Energy exploration cloud
Cloud
Singularity !
VMs with high specifications
128vCPU+4TB RAM
24
统一的HPC+AI融合方案
HPC Storage Resources
Cluster Manager DL Framework, Library, Tools
Workload Management SoftwareAI/DL Application HPC Application
MPI, Math Library, Compiler
Management Portal / GUI
HPC Compute Resources Accelerator Resources PoolsContainer VM Bare Metal
Huawei ATLAS Resource Management Software Platform
CPU1 CPU2
SW
GPU GPU GPU GPU
SW
GPU GPU GPU GPU
IO
Topo1: Single RC-AI Training*
CPU1 CPU2
SW
GPU GPU GPU GPU
SW
GPU GPU GPU GPU
IOIO
Topo2: Balanced –HPC, Cloud*
CPU1 CPU2
SW
GPU GPU GPU GPU
SW
GPU GPU GPU GPU
IOIO
SW SW
IO IO
Topo3: High BW-HPC* Based on G5500 with G560 single-node configuredsupport 8 x NVIDIA Tesla P100/P40 | 1 or 2 x E5-2600 v5 | 24 DDR4 DIMMs | 8 x 3.5-inch SATA HDDs + 6 x NVMe SSDs + 2 x 2.5-inch SSD/SATA/SASTopo1 & topo2 Support one click topology change in BIOS
Computing 最新一代Intel Xeon
最新一代GPU
RDMA网卡
GPU
25
全球化丰富的HPC实施案例
斯坦福大学
多伦多大学
计算加拿大
内布拉斯加大学
田纳西大学
数字领域公司
新加波GlobalFoundries
新加坡科学技术研究所
新加坡国立大学
菲律宾气象局一期
澳门气象局
维多利亚大学
昆士兰大学
肯迪大学
塔斯马尼亚大学
智利CASSAC天文台
巴西麦肯锡大学
巴西圣保罗州立大学
巴西UNESP大学
委内瑞拉国家石油公司
墨西哥水利局
墨西哥农业部
土耳其学术网络与信息中心(ULAKBIM)
土耳其Yilidiz科技大学(YTU)
土耳其伊斯坦布尔科技大学(ITU)
土耳其Harran大学
土耳其Yeditepe大学
土耳其国家石油公司
中国
欧洲
亚太
北美
拉美
中亚
沙特MOI
非洲
中东
津巴布韦高等教育科技发展部
埃及亚历山大图书馆
德国戴姆勒集团
德国大众集团
德国宝马汽车
德国马克斯普朗克学会瑞士欧洲原子能研究所
波兰波兹南超算与网络中心
意大利原子能研究所
意大利CRS4跨学科研究中心
法国照明娱乐公司
英国南极调查局
英国纽卡斯尔大学
英国帝国理工大学
英国拉夫堡大学
德国吕贝克大学
德国慕尼黑大学
波兰华沙大学
俄罗斯圣彼得堡大学
丹麦DTU大学
瑞士洛桑联邦理工学院
瑞典乌普萨拉大学
Copyright©2017 Huawei Technologies Co., Ltd. All Rights Reserved.The information in this document may contain predictive statements including, without limitation, statements regarding the future financial and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.
Thank You.
Recommended