28
1 KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, Krishna Murthy, James Tsai, Wei Wang, Huawei Xie, Yang Zhang Contributors:

KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

1

KVM as The NFV Hypervisor

Jun Nakajima

Mesut Ergin, Yunhong Jiang, Krishna Murthy, James Tsai, Wei Wang, Huawei Xie, Yang Zhang

Contributors:

Page 2: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

2

Legal Disclaimer �  INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS

OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL® PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. INTEL PRODUCTS ARE NOT INTENDED FOR USE IN MEDICAL, LIFE SAVING, OR LIFE SUSTAINING APPLICATIONS.

�  Intel may make changes to specifications and product descriptions at any time, without notice.

�  All products, dates, and figures specified are preliminary based on current expectations, and are subject to change without notice.

�  Intel, processors, chipsets, and desktop boards may contain design defects or errors known as errata, which may cause the product to deviate from published specifications. Current characterized errata are available on request.

�  Intel and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

�  *Other names and brands may be claimed as the property of others.

�  Copyright © 2015 Intel Corporation.

Page 3: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

3

Agenda

•  KVM Enhancements for NFV at OPNFV •  Deterministic Execution and Minimal Latency •  Inter-VM communication: vhost-user-shmem

Page 4: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

4

Software

IT

Virtualization

Standard High Volume Servers

ETSI’s Vision European Telecommunications Standards Institute

Page 5: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

5

Architecture Framework

KVM, Containers, …

NFV Infrastructure

Virtual Network Function

Page 6: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

6

KVM is Crucial to OPNFV

Upstream Projects: …

Page 7: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

7

Project: NFV Hypervisors-KVM

1.  Minimal Interrupt latency variation for data plane VNFs (Virtual Network Function)

2.  Inter-VM Communication 3.  Fast Live Migration Developers from:

https://wiki.opnfv.org/nfv-kvm

Page 8: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

8

Deterministic Execution and Minimal Latency

Page 9: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

9

User-Level

Linux Kernel

HardwareNIC

core

KVM Modules

User-Level

core

NIC

core core

Guest (VNF) Causes of Latency

Variation

Intr_Handler

Asynchronous Events

Interrupts, VM Exits, Cache/TLB

Misses

Software

Spin Locks, Loops, Scheduling, Exit to

user-level

Hardware/Firmware

SMI, Power Management, NIC

Page 10: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

10

User-Level

Linux Kernel

HardwareNIC

core

KVM Modules

User-Level

core

NIC

core core

Guest Solutions

PREEMPT-RT Configuration

Intr_Handler

Excusive/Static Allocation

Soft “Partitioning”, CPU Binding, Huge

Pages

Software

PREEMPT-RT Linux, Code inspection,

testing/measurements

Hardware Technologies

Cache Allocation Technology, Advanced

VT features

Page 11: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

11

Cache Allocation Technology

DRAM

Last Level Cache

CoreApp

CoreApp

CoreApp

CoreApp

CAT is supported on the following 6 SKUs for Intel Xeon processor E5 v3 family: E5-2658 v3, E5-2658A v3, E5-2648L v3, E5-2628L v3, E5-2618L v3, and E5-2608L v3 and Intel(R) Xeon(R) processor D family.

•  Last Level Cache partitioning mechanism enabling the separation of an application

•  VMs can be isolated to increase determinism

•  Having limited cache is still better than “unlimited cache and noisy neighbors”

Page 12: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

12

Latency Data 1: Cyclictest

Cyclic Test in Guest: Latency (in µS ) • Min: 7 • Avg: 9 • Max: 16

RT Linux Guest on KVM

Latency (µs)

Host: Linux with RT patches

000007 000003000008 11757562000009 67812652000010 159222000011 069100000012 011004000013 000379000014 000207000015 000049000016 000005

Latency

99.69% (Total #: 79,810,183)

Occurrences

0

10000000

20000000

30000000

40000000

50000000

60000000

70000000

80000000

0 2 4 6 8 10 12 14 16 18 20 22

Histogram

Page 13: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

13

Latency Data 2: Latency from Periodic External Interrupts

Linux Kernel

Hardware

KVM Modules

Intr_Hander

PCIe Device

Interrupt (MSI)

Device Driver

Periodic (1ms)

Latency from periodic external interrupt: • Time delta from interrupt occurrence to invocation

of interrupt handler in guest (unit: in µS) Min: 3.98 Avg: 4.42 Max: 9.10

Expecting even better results with: • Posted Interrupts and • CAT (Cache Allocation Technology)

Page 14: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

14

Inter-VM Communication

Page 15: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

15 15

Linux Kernel

HardwareNIC

corecore

NIC

core core

VM1

Shared Memory (1GB pages)

KVM Modules

Kernel

Network stack

Process

VM2

Kernel

Network stack

Process

vSwitch

TX, RX queues

TX, RX queues

in-kernel vSwitch TAP

VM3

Kernel

Network stack

Process

TAP

Specific API

Networking API

Communication Models

•  Shared memory, etc. •  Directly used by

particular Processes

•  Use vSwitch in hypervisor

•  Generic

API API

vhost-user

Process

Process

TX, RX queues

TX, RX queues

Network stack

Network stack

Network stack

KVM Modules

Process

Page 16: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

16 16

Linux Kernel

HardwareNIC

corecore

NIC

core core

VM1

Shared Memory (1GB pages)

KVM Modules

Kernel

Network stack

Process

VM2

Kernel

Network stack

Process

vSwitch

TX, RX queues

TX, RX queues

in-kernel vSwitch TAP

VM3

Kernel

Network stack

Process

TAP

Specific API

Networking API

Fast Paths: Inter-VM

Communication In-VM Switch

•  Access Destination VM memory

•  Use In-VM Switch

•  Need to improve security when using shared memory

Shared Memory

sync

Page 17: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

17

Implementing Inter-VM Communication: vhost-user-shmem

Page 18: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

18 18

Goals VM1

virtio-net

vSwitch (e.g. OVS with DPDK-netdev)

In-VM Switch

VM2

TX

RX

virtio-net

Simple Flow Table (inc.

MPLS)

If (in-VM Switch is present) Go to Fast Path

NIC

In-VM Switch

VM3

TX

RX

virtio-net

In-VM Switch

Inter-VM data transfer

API Libraries

ProcessProcess•  Add fast-paths in VMs as optimized inter-VM communication

•  Maintain consistent flow table entries in VMs

•  Enable protected access to the destination VM or shared memory

•  Open the Window when needed

•  Close it immediately when done

Flow Table

Entries Shared Memory

vhost-user

vSwitch (e.g. OVS-dpdk, Snabb Switch)

VM1 VM2 VM3

virtio-net virtio-net

In-VM Switch

NIC

virtio-net

Page 19: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

19 19

Clean Design Objectives

Extend vhost-user as transport mechanism over shared memory/virtqueues: •  Deliver packets to another guest’s virtio device/

virtqueue directly •  Provide memory mapping (GPAs), protected

access, destination addressing Build innovative high-performance networking applications, e.g: 1.  In-VM switch as a fast cached-datapath for the full-

blown virtual switch 2.  Lightweight and fast Service Function Chaining 3.  Next big NFV app you are developing

In-VM switch, SFC or…

vhost-user-shmem

TRA

NS

PO

RT

AP

Ps

Page 20: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

20

vhost-user Server

QEMU 1

VM1

QEMU n

VM n

Send mem infoSend mem info

VM 1 Memory

VM 2 Memory

VM n Memory

fd0, gpa0, len0fd1, gpa1, len1

...vring address info

fd0, gpa0, len0fd1, gpa1, len1

...vring address info

...

fd0, gpa0, len0

fd1, gpa1, len1

...

vring address info

fd0, gpa0, len0

fd1, gpa1, len1

...

vring address info

...

fd0, gpa0, len0

fd1, gpa1, len1

...

vring address info

Shared Memory Using vhost-user Server

vhost-user server (backend) has sufficient info and capability to host shared memory: •  Gather mem info to access

virtuques from vhost-user clients (QEMUs)

•  It can allocate its own memory for sharing purposes

•  E.g. large pages shared by guests (like ivshmem)

Server (backend)

Clients Clients vhost-user Protocol

vSwitch

VM1 VM2 VMn

Page 21: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

21

vhost-user Server

QEMU 1

VM1

QEMU n

VM n

Send mem infoSend mem info

VM 1 Memory

VM 2 Memory

VM n Memory

fd0, gpa0, len0fd1, gpa1, len1

...vring address info

fd0, gpa0, len0fd1, gpa1, len1

...vring address info

...

fd0, gpa0, len0

fd1, gpa1, len1

...

vring address info

fd0, gpa0, len0

fd1, gpa1, len1

...

vring address info

...

fd0, gpa0, len0

fd1, gpa1, len1

...

vring address info

VM 2, ...

...

VM 2, ...

mem info mem info

Extending it for Inter-VM Communication

•  vhost-user server (backend) becomes a client

•  Send mem info to QEMUs •  QEMU extends memory regions

•  Allows vhost-user clients to access their virtqueues each other

•  Provides vhost-user clients with shared memory

Server (backend)

Clients Clients vhost-user Protocol Server (backend)

Clients

VM1 VM2 VMn

Page 22: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

22

QEMU

vhost-user-shmem (vushmem) server

VM1

VM1

socket

vhost-user-shmemsupport vhost-user-

shmem Protocolvhost-user-

shmem Protocol

fastpath code

vushmem control structure

vushmem PCI detection

QEMU

VM2

BAR

fastpath code

vushmem control structure

vushmem PCI Device model

vhost-user-shmemsupport

vushmem PCI Device model

interrupt (MSI)vushmem

PCI detection

vushmem clientvushmem client

1. vhost-user from VM2

2. vhost-user for VM2(multicast)

vhost-user socket

vushmem control structure

shared shared

Simple Example: VM1 and VM2 vSwitch

1

2 3

VM1 VM2

VM2

VM2

PCI BAR

Page 23: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

23

Linux Kernel

HardwareNIC

core

VM2

core

NIC

core core

VM3VM1

KVM Modules

1 2

1 2 1 2

TX() VMFUNC VMFUNC VMFUNC

RX() RX()

•  Extends memory to access fast-path channel or destination VM

•  VMFUNC instruction in VM w/o VM exit

•  #0 (EAX): Switches EPT (Extend Page Table) Pointers

•  Alternate EPT has additional translation

Adding Protected Access

Mapped by Default EPT (Extend Page Table)

Guest Memory

Page 24: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

24

QEMU

VM1

BAR

vhost-user-shmemsupport

VM2, VM3, ...

fastpath code

vushmem control structure

vushmem PCI detection

vushmem PCI Device model

interrupt (MSI)

VM2, VM3, ...

VM2, VM3, ...

QEMU

VM1

BAR

vhost-user-shmemsupport

VM2, VM3, ...

fastpath code

vushmem control structure

vushmem PCI detection

vushmem PCI Device model

interrupt (MSI)

VM2, VM3, ...

VM2, VM3, ...

QEMU

VM1

BAR

vhost-user-shmemsupport

VM2, VM3, ...

fastpath code

vushmem control structure

vushmem PCI detection

vushmem PCI Device model

interrupt (MSI)

VM2, VM3, ...

VM2, VM3, ...Adding EPT Alternate View

Default EPT

Alt. EPT

No Access

OK

VM1

VM1

Fast Pass Code (Protected Code): •  Upon VMFUNC #0, EPT View is

changed •  Access other shared memory and

virtqueues of other VMs in protected fashion

KVM ioctl options for QEMU to extend Guest Memory: 1.  W/O protection, or 2.  W/ protection

•  Extend only in alternate EPT view

VMFUNC #0

PCI BAR

Page 25: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

25 25

start_xmit(*skb, *dev) {

... send(packets);}

send(*packet) {... VMFUNC #0, EPTP; Tx(packets); VMFUNC #0, 0}

Page Boundary

Tx(*packet) { move_data(); notifify();}

start_xmit(*skb, *dev) {

... send(packets);}

send(*packet) {... VMFUNC #0, EPTP; Tx(packets); VMFUNC #0, 0}

Tx(*packet) { move_data(); notifify();}

Trampoline Page

EPT Permission

Default EPT

Alternate EPT

Full (X, W, R)

Write-Protected (X, -, R)

No Access (-, -, -) Full (X, W, R)

Write-Protected (X, -, R)

Protected Code

•  Registered by a trusted entity

•  Entry/Exit to/from Trusted Code

No Execute (-, W, R)

•  Specified at registration time

Additional pages

Implementing Protected Access

Page 26: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

26

Performance Estimate from PoC

*Intel internal estimation

+VMFUNC)

Measure cost of VMFUNC and Trampoline Code: •  Transfer 64B packets from virtio-net

to another VM (fast path) 65Mpps with 32-packet batching*:

•  Same batching size as DPDK

Page 27: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

27

Summary

1.  Minimal Interrupt latency variation for data plane VNFs (Virtual Network Function)

•  On Track 2.  Inter-VM Communication

•  Preliminary performance data from PoC with trampoline code •  Implementation proposal (vhost-user-shmem) based on

vhost-user 3.  Fast Live Migration

•  Next presentation

Join OPNFV Projects!

Page 28: KVM as The NFV Hypervisor · 2015-08-27 · KVM as The NFV Hypervisor Jun Nakajima Mesut Ergin, Yunhong Jiang, ... • KVM Enhancements for NFV at OPNFV • Deterministic Execution

Q & A