Upload
letram
View
220
Download
0
Embed Size (px)
Citation preview
Gilad Ben-Yossef
Principal Software Architect
NetDev 1.2 October 2016
Accelerating Linux Virtual Server with OpenNPU
© 2015 Mellanox Technologies 2 - Mellanox Confidential -
How it all started
辻斬り Tsujigiri (辻斬り or 辻斬 tsuji-giri,
literally crossroads killing) is
a Japanese term for a practice when
a samurai, after receiving a
new katana or developing a new
fighting style or weapon, tests its
effectiveness by attacking a human
opponent, usually a random
defenseless passer-by, in many cases
during nighttime. https://en.wikipedia.org/wiki/Tsujigiri
© 2015 Mellanox Technologies 3 - Mellanox Confidential -
Linux Virtual Server
“Wikimedia uses LVS for balancing traffic over multiple servers, see
also load balancing architecture”
“Yesterday at DockerCon Europe, Andrey Sibiryov, a senior engineer at Uber Technologies, demonstrated how to improve load-balancing performance using
an open-source technology that’s been part of the Linux kernel for more than a
decade — IPVS.”
LVS (Linux Virtual Server) implements transport-layer
load balancing inside the Linux kernel, so called Layer-4
switching.
LVS running on a host acts as a load balancer at the
front of a cluster of real servers, it can direct requests for
TCP/UDP based services to the real servers, and makes
services of the real servers to appear as a virtual service
on a single IP address.
LVS has been in active use for 14 years
What is LVS?
© 2015 Mellanox Technologies 4 - Mellanox Confidential -
NPS-400: a Network Processor
NPS-400 is a Network Processor
• Think “GPU, but for networking”
• An NPU let’s you program your network by writing a program that
processes packets at data center line rates
NPUs used to be part of the secret sauce of carrier equipment
• e.g. NP-5, NPS predecessor, is part of the Cisco ASR-9K service router
shown here
• These programmable devices were “buried” inside proprietary silos
We are bringing them into the open and into the data center
• White box systems from MLNX and ODM
• OpenNPU – Open Source (GPL v2/BSD) SDK
© 2015 Mellanox Technologies 5 - Mellanox Confidential -
Accelerated Linux Virtual Server
LVS Linux kernel data path
KeepAliveD management & control
ALVS NPS data path
Control &
Configuration
State &
Counters
VIP
VIP
VIP
VIP
IP-A
IP-B
IP-C
A
B
C
Load Balancer Up to 400 Gbps of
requests traffic
pass through NPS
based load
balancer
Response traffic
pass directly from
server, so not
limited by NPS
bandwidth
Decision
taken on flow
establishmen
t for flow
assignment
to server
ALVS is LVS with the data path running on a network processor.
Same program, 400 GBPS performance
NPS
ToR
WAN
Router
© 2015 Mellanox Technologies 6 - Mellanox Confidential -
NPS-400 Main Features
400 Gbps line rate
• 600Mpps wire speed with up to 960 Gbps oversubscription
Hardware Traffic Manager
• 1M queues, 5-level H-QoS
960Gbps of network I/O
• Including10GE, 40GE, 100GE, 400G
256 CTOP cores – 4,096 CPU (SMT threads)
• Specialized instruction set for network processing
• Runs SMP Linux (we’re upstream)
Hardware acceleration engines
• Crypto (180 Gbps of IPsec), buffer allocations
• Network order engines, DPI, TCAM
Commodity DDR (96 GB) • Unlimited tables, states, counters at wire-speed performance
C on Linux programmable
• Not an ASIC controlled by Linux, it is a processor that runs Linux
NPC NPC NPC NPC
NPC NPC NPC NPC
NPC NPC NPC NPC
NPC NPC NPC NPC
MC MC MC TCAM Stat Stat MC MC MC
MC MC MC TCAM Stat Stat MC MC MC
MAC
IFU
ICU
PCie
MAC
IFU
ICU
PCIe
DDR DDR DDR DDR DDR DDR
DDR DDR DDR DDR DDR DDR
PMU
TM
NDMA
BMU
PMU
TM
NDMA
BMU
© 2015 Mellanox Technologies 7 - Mellanox Confidential -
x86
LINUX Kernel
Management tools,
configuration, etc.
LVS to ALVS Software Migration
IPVS Data Plane
Processing
x86 NPS
LINUX Kernel
IPVS Data Plane
Processing
Management
tools, config.,
etc.
Supplied by
Mellanox
LINUX Kernel
LVS Reflector
Daemon
EZcp
NPU
© 2015 Mellanox Technologies 8 - Mellanox Confidential -
LOCAL_IN LOCAL_OUT
PREROUTING FORWARD POSTROUTING Route Network
ip_vs_forward_
icmp
Ipvs+out
(return LVS-NAT)
Ip_vs_post_routing
(LVS-NAT only)
ip_vp_in
Route
Local Process Local Process
Network
Detailed Software Architecture
KeepAliveD Daemons
NPS Data Path
Linux Kernel Data Path
EZdp
SFT FrameLib
IPVS IPv4 Route LAG Classify
Punt
FDB IPVS Config IPVS State
Linux User Space ALVS Daemon
Listen to IPVS, FDB and Arp
NETLINK messages
Update NPS control tables
via EZcp interface over PCIe
EZcp
Synchronize IPVS state over Ethernet
via IPVS HA SYNC messages
NETLINK IPVS and FDB control &
config messages
© 2015 Mellanox Technologies 9 - Mellanox Confidential -
Minimal Viable Product
Minimal
• Single forwarding mode out of 3
• Three scheduling algorithms out of 10
• TCP/IPv4 (will add SCTP, UDP and IPv6 later)
Viable
• LVS look and feel
- Same API, same CLI , same log mechanism
- Integrates with unmodified management plane
• Robust
- Resilient. Cover the corner cases
Testing already revealed one bug in LVS itself…
- Supports passive/active fail over
• Product
- x400 Performance
- Scales with your ToR switch
© 2015 Mellanox Technologies 10 - Mellanox Confidential -
Load Balancer
VIP
IXIA
100 Gb/s port
Client Side
IXIA
100 Gb/s port
Server Side
2 service (VIP)
Each service with 5 servers
IXIA simulates a lot of clients
(large range of IP/port)
ALVS Test Setup
Test limited by testing equipment scale
© 2015 Mellanox Technologies 11 - Mellanox Confidential -
Performance
Criteria Lab test 25% capacity
Simulation 100% capacity
Concurrent connections 30 M 128 M (200 M)
Connection setup rate 1 M/s 3 M/s
Requests bandwidth 75 Gbps 400 Gbps
© 2015 Mellanox Technologies 12 - Mellanox Confidential -
Connecting NPUs to Linux networking stack
It’s a useful thing to do
• If you need an L4 load balancer and love LVS, running it at 400
Gbps / 200 M connection on an open source platform is useful
We need to put the low level NPU driver into the kernel
• Since NPU is a programmable entity possibly remoteproc
subsystem is the right way
We need to figure out how to hook NPU into network stack
• Switchdev? XDP? Something else?
© 2015 Mellanox Technologies 13 - Mellanox Confidential -
Vision Architecture
NPS
NPS driver
Linux
Mellanox provided Middleware
Stateful connection tracking, DPI Application
Recognition, Crypto
Open Network Services Interface API
User
Kernel
switchdev
Custom data
plane
Linux net stack Layers 2 - 7
NPS
Open NPU Data Plane API
Commercial third party data plane
NOS Layers 2 - 7
Open NPU Control API
VNFs Layers 2 - 7
Remote
OPNFV g-API
OpenNPU API OpenNPU API OpenNPU API “The CUDA of NPUs”
© 2015 Mellanox Technologies 14 - Mellanox Confidential -
Connecting NPUs to Linux networking stack – cont.
The ALVS data path program ended up very different than IPVS code
• The architecture of an NPU is very different than a CPU + NIC
- HW engines for packet scheduling, order restoration, memory architecture that does not rely
on caches
- Program ended up very similar in design to Google Maglev, with HW engines taking place of
some of the code blocks
• This has implications on ideas such as using eBPF/XDP to bring NPU into kernel
- Yes, you can run the eBPF bytecode, but the program is written under different assumption
We ran into networking stack scaling issues when trying to synchronize
state with the NPU
• IPVS slowed down to a crawl way before we reached 30 M flows
• What does it mean when NPU slave device can hold more state than the OS on
the host?
© 2015 Mellanox Technologies 15 - Mellanox Confidential -
Thank You
http://www.opennpu.org
ALVS: https://github.com/Mellanox/ALVS
Thank you!