9
10/4/2013 1 Mellanox Confidential – Phil Hofstad Sept 2013 Mellanox Technology Update © 2013 Mellanox Technologies 2 - Mellanox Confidential - Mellanox Overview Leading provider of high-throughput, low-latency server and storage interconnect FDR 56Gb/s InfiniBand and 10/40/56GbE Reduces application wait-time for data Dramatically increases ROI on data center infrastructure Company headquarters: Yokneam, Israel; Sunnyvale, California ~1,200 employees* worldwide Solid financial position Record revenue in FY12; $500.8M, up 93% year-over-year Q2’13 revenue of $98.2M Q3’13 guidance ~$104M to $109M Cash + investments @ 6/30/13 = $411.3M Ticker: MLNX * As of June 2013 © 2013 Mellanox Technologies 3 - Mellanox Confidential - Providing End-to-End Interconnect Solutions ICs Switches/Gateways Adapter Cards Cables/Modules Comprehensive End-to-End InfiniBand and Ethernet Solutions Portfolio Long-Haul Systems MXM Mellanox Messaging Acceleration FCA Fabric Collectives Acceleration Management UFM Unified Fabric Management Storage and Data VSA Storage Accelerator (iSCSI) UDA Unstructured Data Accelerator Comprehensive End-to-End Software Accelerators and Managment © 2013 Mellanox Technologies 4 - Mellanox Confidential - Virtual Protocol Interconnect (VPI) Technology 64 ports 10GbE 36 ports 40/56GbE 48 10GbE + 12 40/56GbE 36 ports IB up to 56Gb/s 8 VPI subnets Switch OS Layer Mezzanine Card VPI Adapter VPI Switch Ethernet: 10/40/56 Gb/s InfiniBand:10/20/40/56 Gb/s Unified Fabric Manager Networking Storage Clustering Management Applications Acceleration Engines Acceleration Engines LOM Adapter Card 3.0 From data center to campus and metro connectivity

Mellanox IBM

Embed Size (px)

DESCRIPTION

Presentation from the HPC event at IBM Denmark - September 2013, Copenhagen

Citation preview

Page 1: Mellanox IBM

10/4/2013

1

Mellanox Confidential – Phil Hofstad

Sept 2013

Mellanox Technology Update

© 2013 Mellanox Technologies 2- Mellanox Confidential -

Mellanox Overview

� Leading provider of high-throughput, low-latency server and storage interconnect

• FDR 56Gb/s InfiniBand and 10/40/56GbE

• Reduces application wait-time for data

• Dramatically increases ROI on data center infrastructure

� Company headquarters:

• Yokneam, Israel; Sunnyvale, California

• ~1,200 employees* worldwide

� Solid financial position

• Record revenue in FY12; $500.8M, up 93% year-over-year

• Q2’13 revenue of $98.2M

• Q3’13 guidance ~$104M to $109M

• Cash + investments @ 6/30/13 = $411.3M

Ticker: MLNX

* As of June 2013

© 2013 Mellanox Technologies 3- Mellanox Confidential -

Providing End-to-End Interconnect Solutions

ICs Switches/GatewaysAdapter Cards Cables/Modules

Comprehensive End-to-End InfiniBand and Ethernet Solutions Portfolio

Long-Haul Systems

MXMMellanox Messaging

Acceleration

FCAFabric Collectives

Acceleration

Management

UFMUnified Fabric Management

Storage and Data

VSAStorage Accelerator

(iSCSI)

UDAUnstructured Data

Accelerator

Comprehensive End-to-End Software Accelerators and Managment

© 2013 Mellanox Technologies 4- Mellanox Confidential -

Virtual Protocol Interconnect (VPI) Technology

64 ports 10GbE36 ports 40/56GbE

48 10GbE + 12 40/56GbE36 ports IB up to 56Gb/s

8 VPI subnets

Switch OS Layer

Mezzanine Card

VPI Adapter VPI Switch

Ethernet: 10/40/56 Gb/s

InfiniBand:10/20/40/56 Gb/s

Unified Fabric Manager

Networking Storage Clustering Management

Applications

Acceleration EnginesAcceleration Engines

LOM Adapter Card

3.0

From data center to campus and metro

connectivity

Page 2: Mellanox IBM

10/4/2013

2

© 2013 Mellanox Technologies 5- Mellanox Confidential -

MetroDX™ and MetroX™

� MetroX™ and MetroDX™ extends InfiniBand and Ethernet RDMA reach

� Fastest interconnect over 40Gb/s InfiniBand or Ethernet links

� Supporting multiple distances

� Simple management to control distant sites

� Low-cost, low-power , long-haul solution

40Gb/s over Campus and Metro© 2013 Mellanox Technologies 6- Mellanox Confidential -

Data Center Expansion Example – Disaster Recovery

© 2013 Mellanox Technologies 7- Mellanox Confidential -

Key Elements in a Data Center Interconnect

StorageServers

Adapter and IC Adapter and IC

Cables, Silicon Photonics, Parallel Optical Modules

Switch and IC

Applications

© 2013 Mellanox Technologies 8- Mellanox Confidential -

IPtronics and Kotura Complete Mellanox’s 100Gb/s+ Technology

Recent Acquisitions of Kotura and IPtronics Enable Mellanox to Deliver

Complete High-Speed Optical Interconnect Solutions for 100Gb/s and Beyond

Page 3: Mellanox IBM

10/4/2013

3

© 2013 Mellanox Technologies 9- Mellanox Confidential -

Mellanox InfiniBand Paves the Road to Exascale

© 2013 Mellanox Technologies 10- Mellanox Confidential -

Leading Interconnect, Leading Performance

Latency

5usec

2.5usec1.3usec

0.7usec

<0.5usec

200Gb/s

100Gb/s

56Gb/s

40Gb/s20Gb/s

10Gb/s2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

Bandwidth

Same Software Interface

0.6usec

© 2013 Mellanox Technologies 11- Mellanox Confidential -

Architectural Foundation for Exascale Computing

Connect-IB

© 2013 Mellanox Technologies 12- Mellanox Confidential -

� World’s first 100Gb/s interconnect adapter• PCIe 3.0 x16, dual FDR 56Gb/s InfiniBand ports to provide >100Gb/s

� Highest InfiniBand message rate: 137 million messages per second• 4X higher than other InfiniBand solutions

� <0.7 micro-second application latency

� Supports GPUDirect RDMA for direct GPU-to-GPU communication

� Unmatchable Storage Performance• 8,000,000 IOPs (1QP), 18,500,000 IOPs (32 QPs)

� New Innovative Transport – Dynamically Connected Transport Service

� Supports Scalable HPC with MPI, SHMEM and PGAS/UPC offloads

Connect-IB : The Exascale Foundation

Enter the World of Boundless Performance

Page 4: Mellanox IBM

10/4/2013

4

© 2013 Mellanox Technologies 14- Mellanox Confidential -

Dynamically Connected Transport Advantages

© 2013 Mellanox Technologies 15- Mellanox Confidential -

FDR InfiniBand Delivers Highest Application Performance

0

50

100

150

QDR InfiniBand FDR InfiniBand

Messag

e R

ate

(M

illio

n)

Message Rate

© 2013 Mellanox Technologies 16- Mellanox Confidential -

3D Torus, Fat-Tree, Dragonfly+

Scalable and Cost Effective Topologies

Page 5: Mellanox IBM

10/4/2013

5

© 2013 Mellanox Technologies 17- Mellanox Confidential -

Topologies / Capabilities Overview

Can be fully rearrangeably non-blocking (1:1) or blocking (x:1). Typically enables best performance, lowest latencyAlleviates bandwidth bottleneck closer to the root. Most common topology in many supercomputer installations

Blocking network, cost-effective for systems at scaleGreat performance solutions for applications with localitySupport for dedicate sub-networksSimple expansion for future growthFault tolerant with multiple link failuresSupport for adaptive routing, congestion control, QOS Not limited to storage connection only at cube edges

Concept of connecting “groups” or “virtual routers” together in a full-graph (all to all) waySupport higher dimensional networksFlexible definition of intra-group interconnectionDragonfly+ has higher scalability and more cost effectiveNo intermediate proprietary storage conversion

Routing Chains IB Routers

l

r

3

1

2 m1

hh

l

HCA

1..hHCA

1..h

b

UpDn Routing

Progression of Advanced Capabilities…

Multiple systems can be routed with unique associated topologies. Provides for more complex routingSupport for Port-groups policies. A new policy file lets the user define “sub topologies”

Scalability beyond LID spaceAdministrative subnet isolationFault isolation. IB routing between subnets

Based on minimum hops to each nodeNodes ranked based on distance from root nodesRestricted routes by strict ranking rules

© 2013 Mellanox Technologies 18- Mellanox Confidential -

UFM

Unified Fabric Management

© 2013 Mellanox Technologies 19- Mellanox Confidential -

UFM in the Fabric

� Software or Appliance form factor

� 2 or more High Availability

� Switch and HCA management

� Full Mgmt or Monitoring Only modes

Synchronization, Heartbeat

© 2013 Mellanox Technologies 20- Mellanox Confidential -

Multi-Site Management

Site1 Site2 Site n. . .

Page 6: Mellanox IBM

10/4/2013

6

© 2013 Mellanox Technologies 21- Mellanox Confidential -

Integration with 3rd Party Systems

� Extensible architecture

• Based on Web-services

� Open API for users or 3rd-party extensions

• Allows simple reporting, provisioning, and monitoring

• Task automation

• Software Development Kit

� Extensible object model

• User-defined fields

• User-defined menus Web Based APIRead

WriteManage

Monitoring System

Configuration Mgmt

Orchestrator

Job Scheduler…

Alerts via SNMP Traps

Web BasedAPI

© 2013 Mellanox Technologies 22- Mellanox Confidential -

UFM Main Features

Automatic Discovery Central Device Mgmt Fabric Dashboard Congestion Analysis

Health & Perf Monitoring Advanced Alerting Fabric Health Reports Service Oriented Provisioning

© 2013 Mellanox Technologies 23- Mellanox Confidential -

MXM, FCA

Scalable Communication

© 2013 Mellanox Technologies 24- Mellanox Confidential -

Mellanox ScalableHPC Accelerate Parallel Applications

InfiniBand Verbs API

MXM• Reliable Messaging Optimized for Mellanox HCA• Hybrid Transport Mechanism • Efficient Memory Registration• Receive Side Tag Matching

FCA• Topology Aware Collective Optimization• Hardware Multicast• Separate Virtual Fabric for Collectives• CORE-Direct Hardware Offload

Memory

P1

Memory

P2

Memory

P3

MPI

Memory

P1 P2 P3

SHMEM

Logical Shared Memory

Memory

P1 P2 P3

PGAS

Memory Memory

Logical Shared Memory

Page 7: Mellanox IBM

10/4/2013

7

© 2013 Mellanox Technologies 25- Mellanox Confidential -

MXM v2.0 - Highlights

�Transport library integrated with OpenMPI, OpenSHMEM, BUPC, Mvapich2� More solutions will be added in the future� Utilizing Mellanox offload engines

�Supported APIs (both sync/async): AM, p2p, atomics, synchronization

�Supported transports: RC, UD, DC, RoCE, SHMEM

�Supported built-in mechanisms: tag matching, progress thread, memory registration cache, fast path send for small messages, zero copy, flow control

�Supported data transfer protocols: Eager Send/Recv, Eager RDMA, Rendezvous

© 2013 Mellanox Technologies 26- Mellanox Confidential -

Mellanox FCA Collective Scalability

0,0

10,0

20,0

30,0

40,0

50,0

60,0

70,0

80,0

90,0

100,0

0 500 1000 1500 2000 2500

Late

ncy

(u

s)

Processes (PPN=8)

Barrier Collective

Without FCA With FCA

0

500

1000

1500

2000

2500

3000

0 500 1000 1500 2000 2500

La

ten

cy

(u

s)

processes (PPN=8)

Reduce Collective

Without FCA With FCA

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 500 1000 1500 2000 2500Ban

dw

idth

(K

B*p

rocesses)

Processes (PPN=8)

8-Byte Broadcast

Without FCA With FCA

© 2013 Mellanox Technologies 27- Mellanox Confidential -

GPU Direct

© 2013 Mellanox Technologies 28- Mellanox Confidential -

GPUDirect RDMA

TransmitReceive

CPU

GPUChipset

GPUGPUMemoryMemory

InfiniBand

System Memory

1CPU

GPU Chipset

GPUGPUMemoryMemory

InfiniBand

System Memory

1

GPUDirect RDMA

CPU

GPUChipset

GPUGPUMemoryMemory

InfiniBand

System Memory

1CPU

GPU Chipset

GPUGPUMemoryMemory

InfiniBand

System Memory

1

GPUDirect 1.0

Page 8: Mellanox IBM

10/4/2013

8

© 2013 Mellanox Technologies 29- Mellanox Confidential -

GPU-GPU Internode MPI Latency

0

5

10

15

20

25

30

35

1 4 16 64 256 1K 4K

MVAPICH2-1.9 MVAPICH2-1.9-GDR

Small Message Latency

Message Size (bytes)

Late

ncy (

us)

Low

er is

Bette

r

19.78

69 %

6.12

Preliminary Performance of MVAPICH2 with GPUDirect RDMA

69% Lower Latency

GPU-GPU Internode MPI Bandwidth

0

100

200

300

400

500

600

700

800

900

1 4 16 64 256 1K 4K

MVAPICH2-1.9 MVAPICH2-1.9-GDR

Message Size (bytes)B

an

dw

idth

(M

B/s

)

Small Message Bandwidth

3x

Hig

her

is B

etter

3X Increase in Throughput

Source: Prof. DK Panda

© 2013 Mellanox Technologies 30- Mellanox Confidential -

Execution Time of HSG

(Heisenberg Spin Glass)

Application with 2 GPU Nodes

Source: Prof. DK Panda

Preliminary Performance of MVAPICH2 with GPU-Direct-RDMA

© 2013 Mellanox Technologies 31- Mellanox Confidential -

FDR InfiniBand Switch Portfolio

648 port 324 port 216port 108 port

Modular Switches Edge Switches

SX6025 – 36 ports externally managed SX6036 – 36 ports managed

SX6018 – 18 port managed

Management

SX6015 – 18 port externally managed

SX6012 – 12 port managed SX6005 – 12 port externally managed

Long Distance Bridge − VPI

BridgingRouting

NEW NEW

NEW

NEW

© 2013 Mellanox Technologies 32- Mellanox Confidential -

� Highest Capacity in 1U• 12x 40GbE to 36x 40GbE

• 64x 10GbE

� Latency• 220ns L2 latency

• 330ns L3 latency

Top-of-Rack (TOR) Ethernet Switch Portfolio

SX103636x 40GbE non blocking TOR / Aggregation

non-blocking 10GbE���� 40GbE Aggregation

SX102448x10GbE+12x40GbE, 1.92Tbpsnon-blocking 10GbE���� 40GbE Aggregation

SX101664x10GbE, 1.28Tbps non blocking

� Unique Value Proposition• 56GbE gives 40% better bandwidth

• VPI to connect to Infiniband

� Power (SX1036)• Under 1W per 10GbE interface

• 2.3W per 40GbE interface

SX101212x40GbE non blocking, 1U half-width

Page 9: Mellanox IBM

10/4/2013

9

© 2013 Mellanox Technologies 33- Mellanox Confidential -

The Generation of Open Ethernet (Open Switch Platform)

Linux

Open

Ethernet

Switching L2 Routing L3 OpenFlow Management

10/20/40/56GbERoadmap to 100GbE

End-2-End Solution (NICs, Switches, Cables, Software)

OpenFlow Controller

Open Source ApplicationsCentralized SDN Controller

Route Flow

Routing L3TE

Security

Manager

Power

ManagerApplications

QUAGGAQUAGGAQUAGGAQUAGGA

© 2013 Mellanox Technologies 34- Mellanox Confidential -

ConnectX-3 Pro

� Dual port VPI Adapter

� Low power

• Typ 2-port 10GigE – 2.9W

� LOM/mezz features

• SMBus / NCSI - Baseboard Mgmt Controller I/F , WoL

• Integrated sensors

� Performance Improvements

• Larger context caches

• Higher message rate

� Virtualization

• NVGRE/VXLAN Stateless Offloads

• SRIOV

� Small Footprint, minimal peripherals

• FCBGA 17x17 (small footprint)

• 1.0mm standard ball pitch

© 2013 Mellanox Technologies 35- Mellanox Confidential -

Thank You

Phil Hofstad

[email protected]

Tel: +44 7597 566281