15
Gilad Ben-Yossef Principal Software Architect NetDev 1.2 October 2016 Accelerating Linux Virtual Server with OpenNPU

Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

  • Upload
    letram

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

Gilad Ben-Yossef

Principal Software Architect

NetDev 1.2 October 2016

Accelerating Linux Virtual Server with OpenNPU

Page 2: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 2 - Mellanox Confidential -

How it all started

辻斬り Tsujigiri (辻斬り or 辻斬 tsuji-giri,

literally crossroads killing) is

a Japanese term for a practice when

a samurai, after receiving a

new katana or developing a new

fighting style or weapon, tests its

effectiveness by attacking a human

opponent, usually a random

defenseless passer-by, in many cases

during nighttime. https://en.wikipedia.org/wiki/Tsujigiri

Page 3: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 3 - Mellanox Confidential -

Linux Virtual Server

“Wikimedia uses LVS for balancing traffic over multiple servers, see

also load balancing architecture”

“Yesterday at DockerCon Europe, Andrey Sibiryov, a senior engineer at Uber Technologies, demonstrated how to improve load-balancing performance using

an open-source technology that’s been part of the Linux kernel for more than a

decade — IPVS.”

LVS (Linux Virtual Server) implements transport-layer

load balancing inside the Linux kernel, so called Layer-4

switching.

LVS running on a host acts as a load balancer at the

front of a cluster of real servers, it can direct requests for

TCP/UDP based services to the real servers, and makes

services of the real servers to appear as a virtual service

on a single IP address.

LVS has been in active use for 14 years

What is LVS?

Page 4: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 4 - Mellanox Confidential -

NPS-400: a Network Processor

NPS-400 is a Network Processor

• Think “GPU, but for networking”

• An NPU let’s you program your network by writing a program that

processes packets at data center line rates

NPUs used to be part of the secret sauce of carrier equipment

• e.g. NP-5, NPS predecessor, is part of the Cisco ASR-9K service router

shown here

• These programmable devices were “buried” inside proprietary silos

We are bringing them into the open and into the data center

• White box systems from MLNX and ODM

• OpenNPU – Open Source (GPL v2/BSD) SDK

Page 5: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 5 - Mellanox Confidential -

Accelerated Linux Virtual Server

LVS Linux kernel data path

KeepAliveD management & control

ALVS NPS data path

Control &

Configuration

State &

Counters

VIP

VIP

VIP

VIP

IP-A

IP-B

IP-C

A

B

C

Load Balancer Up to 400 Gbps of

requests traffic

pass through NPS

based load

balancer

Response traffic

pass directly from

server, so not

limited by NPS

bandwidth

Decision

taken on flow

establishmen

t for flow

assignment

to server

ALVS is LVS with the data path running on a network processor.

Same program, 400 GBPS performance

NPS

ToR

WAN

Router

Page 6: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 6 - Mellanox Confidential -

NPS-400 Main Features

400 Gbps line rate

• 600Mpps wire speed with up to 960 Gbps oversubscription

Hardware Traffic Manager

• 1M queues, 5-level H-QoS

960Gbps of network I/O

• Including10GE, 40GE, 100GE, 400G

256 CTOP cores – 4,096 CPU (SMT threads)

• Specialized instruction set for network processing

• Runs SMP Linux (we’re upstream)

Hardware acceleration engines

• Crypto (180 Gbps of IPsec), buffer allocations

• Network order engines, DPI, TCAM

Commodity DDR (96 GB) • Unlimited tables, states, counters at wire-speed performance

C on Linux programmable

• Not an ASIC controlled by Linux, it is a processor that runs Linux

NPC NPC NPC NPC

NPC NPC NPC NPC

NPC NPC NPC NPC

NPC NPC NPC NPC

MC MC MC TCAM Stat Stat MC MC MC

MC MC MC TCAM Stat Stat MC MC MC

MAC

IFU

ICU

PCie

MAC

IFU

ICU

PCIe

DDR DDR DDR DDR DDR DDR

DDR DDR DDR DDR DDR DDR

PMU

TM

NDMA

BMU

PMU

TM

NDMA

BMU

Page 7: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 7 - Mellanox Confidential -

x86

LINUX Kernel

Management tools,

configuration, etc.

LVS to ALVS Software Migration

IPVS Data Plane

Processing

x86 NPS

LINUX Kernel

IPVS Data Plane

Processing

Management

tools, config.,

etc.

Supplied by

Mellanox

LINUX Kernel

LVS Reflector

Daemon

EZcp

NPU

Page 8: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 8 - Mellanox Confidential -

LOCAL_IN LOCAL_OUT

PREROUTING FORWARD POSTROUTING Route Network

ip_vs_forward_

icmp

Ipvs+out

(return LVS-NAT)

Ip_vs_post_routing

(LVS-NAT only)

ip_vp_in

Route

Local Process Local Process

Network

Detailed Software Architecture

KeepAliveD Daemons

NPS Data Path

Linux Kernel Data Path

EZdp

SFT FrameLib

IPVS IPv4 Route LAG Classify

Punt

FDB IPVS Config IPVS State

Linux User Space ALVS Daemon

Listen to IPVS, FDB and Arp

NETLINK messages

Update NPS control tables

via EZcp interface over PCIe

EZcp

Synchronize IPVS state over Ethernet

via IPVS HA SYNC messages

NETLINK IPVS and FDB control &

config messages

Page 9: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 9 - Mellanox Confidential -

Minimal Viable Product

Minimal

• Single forwarding mode out of 3

• Three scheduling algorithms out of 10

• TCP/IPv4 (will add SCTP, UDP and IPv6 later)

Viable

• LVS look and feel

- Same API, same CLI , same log mechanism

- Integrates with unmodified management plane

• Robust

- Resilient. Cover the corner cases

Testing already revealed one bug in LVS itself…

- Supports passive/active fail over

• Product

- x400 Performance

- Scales with your ToR switch

Page 10: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 10 - Mellanox Confidential -

Load Balancer

VIP

IXIA

100 Gb/s port

Client Side

IXIA

100 Gb/s port

Server Side

2 service (VIP)

Each service with 5 servers

IXIA simulates a lot of clients

(large range of IP/port)

ALVS Test Setup

Test limited by testing equipment scale

Page 11: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 11 - Mellanox Confidential -

Performance

Criteria Lab test 25% capacity

Simulation 100% capacity

Concurrent connections 30 M 128 M (200 M)

Connection setup rate 1 M/s 3 M/s

Requests bandwidth 75 Gbps 400 Gbps

Page 12: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 12 - Mellanox Confidential -

Connecting NPUs to Linux networking stack

It’s a useful thing to do

• If you need an L4 load balancer and love LVS, running it at 400

Gbps / 200 M connection on an open source platform is useful

We need to put the low level NPU driver into the kernel

• Since NPU is a programmable entity possibly remoteproc

subsystem is the right way

We need to figure out how to hook NPU into network stack

• Switchdev? XDP? Something else?

Page 13: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 13 - Mellanox Confidential -

Vision Architecture

NPS

NPS driver

Linux

Mellanox provided Middleware

Stateful connection tracking, DPI Application

Recognition, Crypto

Open Network Services Interface API

User

Kernel

switchdev

Custom data

plane

Linux net stack Layers 2 - 7

NPS

Open NPU Data Plane API

Commercial third party data plane

NOS Layers 2 - 7

Open NPU Control API

VNFs Layers 2 - 7

Remote

OPNFV g-API

OpenNPU API OpenNPU API OpenNPU API “The CUDA of NPUs”

Page 14: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 14 - Mellanox Confidential -

Connecting NPUs to Linux networking stack – cont.

The ALVS data path program ended up very different than IPVS code

• The architecture of an NPU is very different than a CPU + NIC

- HW engines for packet scheduling, order restoration, memory architecture that does not rely

on caches

- Program ended up very similar in design to Google Maglev, with HW engines taking place of

some of the code blocks

• This has implications on ideas such as using eBPF/XDP to bring NPU into kernel

- Yes, you can run the eBPF bytecode, but the program is written under different assumption

We ran into networking stack scaling issues when trying to synchronize

state with the NPU

• IPVS slowed down to a crawl way before we reached 30 M flows

• What does it mean when NPU slave device can hold more state than the OS on

the host?

Page 15: Accelerating Linux Virtual Server with OpenNPUopennpu.org/wp-content/uploads/2016/06/ALVS_netdev_slides.pdf · Accelerating Linux Virtual Server with OpenNPU ... is part of the Cisco

© 2015 Mellanox Technologies 15 - Mellanox Confidential -

Thank You

http://www.opennpu.org

ALVS: https://github.com/Mellanox/ALVS

Thank you!