Can VMs networking benefit from DPDK?
Virtio/Vhost-user status & updates
Maxime Coquelin – Victor Kaplansky2017-01-27
2
AGENDACan VMs networking benefit from DPDK?
● Overview● Challenges● New & upcoming features
Overview
4
DPDK – project overviewOverview
DPDK is a set of userspace libraries aimed at fast packet processing.
● Data Plane Development Kit● Goal:
● Benefit from software flexibility● Achieving performance close to dedicated HW solutions
5
DPDK – project overviewOverview
● License: BSD● CPU architectures: x86, Power8, TILE-Gx & ARM● NICs: Intel, Mellanox, Broadcom, Cisco,...● Other HW: Crypto, SCSI for SPDK project● Operating systems: Linux, BSD
6
1st release by Intel asZip file
2012 2013
6Wind intiatesdpdk.orgcommunity
DPDK - project historyOverview
Packaged In Fedora
2014 2015
Power8 &TILE-Gxsupport
ARMSupport /Crypto
2016 2017
Moving toLinux Foundation
v1.2v1.3
v1.4v1.5
v1.6v1.7
v1.8v2.0
v2.1v2.2
v16.04
v16.07
v16.11v17.02
v17.05
v17.08
v17.11
v16.11: ~750K LoC / ~6000 commits / ~350 contributors
7
DPDK – comparisonOverview
UserKernel
NIC
Application
Socket
Net stack
Driver
UserKernel
NIC
Application
DPDK
VFIO
8
DPDK – performanceOverview
DPDK uses:● CPU isolation/partitioning & polling
→ Dedicated CPU cores to poll the device● VFIO/UIO
→ Direct devices registers accesses from user-space● NUMA awareness
● → Resources local to the Poll-Mode Driver’s (PMD) CPU● Hugepages
→ Less TLB misses, no swap
9
DPDK – performanceOverview
To avoid:● Interrupt handling
→ Kernel’s NAPI polling mode is not enough● Context switching● Kernel/user data copies● Syscalls overhead
→ More than the time budget for a 64B packet at 14.88Mpps
10
DPDK - componentsOverview
Credits: Tim O’Driscoll - Intel
11
Virtio/VhostOverview
● Device emulation, direct assignment, VirtIO● Vhost: In-kernel virtio device emulation● device emulation code calls to directly call into kernel
subsystems● Vhost worker thread in host kernel● Bypasses system calls from user to kernel space on host
12
Vhost driver model
VM
QEMU Userspace
Kvm.ko
ioeventfd
irqfd
ioeventfd
Vhost
Host Kernel
13
In-kernel device emulation
● In-kernel restricted to virtqueue emulation● QEMU handles control plane, feature negotiation, migration,
etc● File descriptor polling done by vhost in kernel● Buffers moved between tap device and virtqueues by kernel
worker thread
Overview
14
Vhost as user space interface
● Vhost architecture is not tied to KVM● Backend: Vhost instance in user space ● Eventfd is set up to signal backend when new buffers are
placed by guest (kickfd)● Irqfd is set up to signal the guest about new buffers placed by
backend (callfd)● The beauty: backend only knows about guest memory
mapping, kick eventfd and call eventfd● Vhost-user implemented in DPDK in v16.7
Overview
Challenges
16
PerformanceChallenges
DPDK is about performance, which is a trade-off between: ● Bandwidth → achieving line rate even for small packets● Latency → as low as possible (of course)● CPU utilization → $$$
→ Prefer bandwidth & latency at the expense of CPU utilization
→ Take into account HW architectures as much as possible
17
0% packet-loss• Some use-cases of Virtio cannot afford packet loss, like NFV
• Hard to achieve max perf without loss, as Virtio is CPU intensive
→ Scheduling “glitches” may cause packets drop
Migration• Requires restoration of internal state including the backend
• Interface exposed by QEMU must stay unchanged for cross-version migration
• Interface exposed to guest depends on capabilities of third-party application
• Support from the management tool is required
ReliabilityChallenges
18
• Isolation of untrusted guests• Direct access to device from untrusted guests• Current implementations require mediator for guest -to-
guest communication.• Zero-copy is problematic from security point of view
SecurityChallenges
New & upcoming features
20
Pro:• Allows receiving packets larger
than descriptors’ buffer size
Con:• Introduce extra-cache miss in the
dequeue path
Rx mergeable buffersNew & upcoming features
Desc 0
Desc 1
Desc 2
Desc 3
Desc 4
Desc n
Desc n-1
num_buffers = 1
num_buffers = 3
21
Indirect descriptors (DPDK v16.11)New & upcoming features
Desc 0
Desc 1
Desc 2
Desc 3
Desc 4
Desc n
Desc n-1
Desc 0
Desc 1
Desc 2
Desc 3
Desc 4
Desc n
Desc n-1
iDesc 2-0
iDesc 2-1
iDesc 2-2
iDesc 2-3
Direct descriptors chaining Indirect descriptors table
22
Pros:• Increase ring capacity• Improve performance for large number of large requests• Improve 0% packet loss perf even for small requests
→ If system is not fine-tuned
→ If Virtio headers are in dedicated descriptor
Cons:• One more level of indirection
→ Impacts raw performance (~-3%)
Indirect descriptors (DPDK v16.11)New & upcoming features
23
Vhost dequeue 0-copy (DPDK v16.11)New & upcoming features
addrlenflagsnext
descriptordesc's buf mbuf 's buf
addrpaddrlen...
mbuf
Memcpy
addrlenflagsnext
descriptordesc's buf
addrpaddrlen...
mbuf
Defaultdequeuing
Zero-copydequeuing
24
Pros: • Big perf improvement for standard & large packet sizes
→ More than +50% for VM-to-VM with iperf benchs • Reduces memory footprint
Cons:• Performance degradation for small packets
→ But disabled by default• Only for VM-to-VM using Vhost lib API (No PMD support)• Does not work for VM-to-NIC
→ Mbuf lacks release notif mechanism / No headroom
Vhost dequeue 0-copy (DPDK v16.11)New & upcoming features
25
Way for the host to share its max supported MTU• Can be used to set MTU values across the infra• Can improve performance for small packet
→ If MTU fits in rx buffer size, disable Rx mergeable buffers
→ Save one cache-miss when parsing the virtio-net header
MTU feature (DPDK v17.05?)New & upcoming features
26
Vhost-pci (DPDK v17.05?)New & upcoming features
vSwitch
VM1
Virtio
Vhost
VM2
Virtio
Vhost
VM1
Vho
st-p
ci
VM2
Virtio
Traditional VM to VM
communication
Direct VM to VM
communication
27
Vhost-pci (DPDK v17.05?)New & upcoming features
VM1
Vhost-pci driver 1
VM2
Virtio-pci driver
Vhost-pci device 1
Virtio-pci deviceVhost-pci clientVhost-pci server
Vhost-pci protocol
28
Pros:• Performance improvement
→ The 2 VMs share the same virtqueues
→ Packets doesn’t go through host’s vSwitch• No change needed in Virtio’s guest drivers
Vhost-pci (DPDK v17.05?)New & upcoming features
29
Cons:• Security
→ Vhost-pci’s VM maps all Virtio-pci’s VM memory space
→ Could be solved with IOTLB support• Live migration
→ Not supported in current version
→ Hard to implement as VMs are connected to each other through a socket
Vhost-pci (DPDK v17.05?)New & upcoming features
30
IOTLB in kernelNew & upcoming features
VM
IOMMU driver
VM
QEMU VHOST
IOMMU Device IOTLB
iovaIOTLB miss
IOTLB update, invalidate
TLB invalidation
hva
memory
Syscall/eventfd
31
IOTLB for vhost-userNew & upcoming features
VM
IOMMU driver
VM
QEMU Backend
IOMMU IOTLB Cache
iovaIOTLB miss
IOTLB update, invalidate
TLB invalidation
hva
memory
Unix Socket
32
• DPDK support for VM is in active development• 10M pps is real in VM with DPDK• New features to boost performance of VM networking• Accelerate transition to NFV / SDN
Conclusions
33
Q / A
THANK YOU