24
Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Embed Size (px)

Citation preview

Page 1: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg

Presented by Kit Cischke

Page 2: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Differences in Contributions The first paper discussed transferring

run-time memory of a VM in a LAN. Cool.

This paper expands on that to transfer the VM’s image, its persistent state and on-going network connections over a WAN as well.

By combining pre-copying, write-throttling and a block-driver, we can achieve this.

Page 3: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Introduction

In this project, the authors want to extend live VM migration to include: Persistent state (file systems used by the VM) Open network connections

Why? Many apps running on a VM need that storage,

and NAS systems may not be available in the new location.

Moving across a WAN will almost certainly involve an IP change, and we don’t want to (overly) disrupt TCP connections.

Contribution A system enables live migration of VMs that use

local storage and open network connections without severely disrupting their live services.

Page 4: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Highlights

Some highlights of this work: Built upon the Xen Live Migration

facility as part of XenoServer. Enables:

Live migration Consistency Minimal Service Disruption Transparency

Utilizes: Pre-copying, write-throttling and IP

tunneling.

Page 5: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

System Design - Environment Both the source and destination run

Xen, with the VM running XenLinux. Uses blocktap to export block

devices into the migrated VM. Block devices are file-backed

meaning the contents of the block device are stored in an ordinary file on the file system of the VM.

Page 6: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

System Design - Architecture The initialization stage starts things off by prepping the

migration. The bulk transfer stage pre-copies the disk image of the VM to

the destination while the VM continues to run. Xen transfer is then initiated, which performs incremental

migration, again without stopping the VM. While the transfers are occurring, all disk writes are

intercepted as deltas that will be forwarded to the destination. Deltas include the data written, the location written and the size of the

data. The deltas are recorded into a queue that will be transferred later.

If write activity is too high and too many deltas are being generated, write-throttling is engaged to slow down the VM.

In parallel with Xen transfer, the deltas are applied to the destination VM.

At some point, the source VM is paused, the destination is started and a temporary network redirect is created to handle the potential IP changes.

Page 7: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Implementation - Initialization Authentication, authorization and access

control are handled by XenoServer. The migration client forks, creating a

listener process that signals the block-driver to enter record mode. In record mode, the driver copies the writes to

the listener process, which transfers them to the destination.

The other half of the migration client begins the bulk transfer.

At the destination, there is also a fork in the daemon. One receives the bulk transfer, the other receives the deltas.

Page 8: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Implementation – Bulk Xfer The VM’s disk image is transferred

from the source to the destination. XenoServers platform supports copy-

on-write along with immutable template disk images, so we can just transfer the changes, rather than the whole image.

Page 9: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Implentation – Xen Migration The system here relies on the built-in

migration mechanism of Xen. Xen logs dirty memory pages and

copies them to the destination without stopping the source VM.

Eventually, we will have to pause the source and copy the remaining memory pages.

Then we start up the migrated VM.

Page 10: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Implementation - Intercepting The blkfront device

driver communicates with the dedicated storage VM via ring buffer.

The blktap framework intercepts requests, but does it in user space.

Once a disk request makes it to the backend, it is both committed to the disk and sent to the migration client.

The client then packages the write up as a delta.

Page 11: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Implementation - Application

After the bulk transfer, and in parallel with the Xen transfer, the deltas are transferred and applied to the migrated VM by the migration daemon in the storage VM at the destination.

If the delta queue becomes empty and the Xen migration is finished, I/O requests are put on hold until the application of the current crop of deltas is finished.

The authors found that delta application was normally finished before Xen migration, adding zero time to the overall migration.

Page 12: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Implementation – Write Throttling

If the VM attempts to complete more writes than a given threshold value, future write attempts are delayed by the block driver.

This process repeats, with the delay and threshold doubling each time.

Experimentally, 16384 is a suitable threshold with a delay of 2048 μs also being good.

Enforcement is separated from policy for extensibility.

Page 13: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Implementation – WAN Redirection If the IP of the VM changes, we use IP tunneling and

Dynamic DNS to prevent dropped network connections. Just before the source VM is paused, an IP tunnel is

created from the source to destination using iproute2. Once the destination VM is capable of responding to

requests at its new IP, Dynamic DNS forwards the requests to the new IP.

Packets that arrive during the final stage of migration are simply dropped.

Once no connections exist that use the old IP, the tunnel is torn down.

Practically, this works because: The source server only needs to cooperate for a short time,

most network connections are short-lived and if nothing else, it’s no worse than what you get if the VM doesn’t even try.

Page 14: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Evaluation - Metrics

Want to evaluate the disruption of the system, as perceived by users. Spoiler: Results look good.

Want to show the system handles diabolical workloads, defined in this paper as being heavy disk accessors, rather than heavy memory accessors.

Downtime: Time between pausing the VM on the source and resuming on the destination.

Disruption Time: Time during which clients observe a reduction in service responsiveness.

Additional Disruption: Difference between disruption time and downtime.

Migration time: Time from migration request to running VM at destination.

Number of Deltas and Delta Rate: How many file system changes and how often they occur.

Page 15: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Eval – Workload Overview

Web server serving static content, serving dynamic web application and video streaming.

Chosen for realistic usage scenarios and because they neatly trifurcate the spectrum of disk I/O patterns: Dynamic app generates lots of bursty writes Static workload generates a medium amount

of constant writes Streaming video causes few writes, but is

very sensitive to disruption.

Page 16: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Eval – Experimental Setup

Three hosts Dual Xeon 3.2 GHz, 4 GB DDR

RAM, mirrored RAID array of U320 SCSI disks.

The migrated VM was provided with 512 MB RAM and a single CPU.

All hosts were connected by a 100 Mbps switched Ethernet networks.

The migrated VM was running Debian on a 1GB ext3 disk.

Host C is the client. To emulate WAN transfers,

traffic shaping was used to limit the bandwidth to 5Mbps with 100 ms of latency. Representative of host in

London and U.S. east coast.

Page 17: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Results – LAN Migration

Measured disruption is 1.04 seconds and “practically unnoticeable by a human user.”

Few deltas by the log files for web server. If you’re shooting for a “5 9’s” uptime, you still get 289 migrations

a year.

Page 18: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Results – LAN Migration

phpBB with a randomly posting script. Disruption is 3.09 seconds due to more deltas. HTTP throughput is almost not affected and total migration time is shorter. 5 9’s uptime still lets us migrate 98 migrations.

Page 19: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Results – LAN Migration

Streamed a large video file, viewed by a human on host C. Disruption is 3.04 seconds and alleviated by the buffer of

the video player. No packets are lost, but there is lots of retransmission.

Page 20: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Comparison to Freeze-and-Copy

Clearly, freeze and copy provides a much worse disruption than the live migration.

Page 21: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Results – WAN MigrationLonger migration time leads to more deltas.The tunneling let the connections persist.68 total seconds of disruption, which is a lot, but much less than freeze-and-copy.

Page 22: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Results – Diabolical WorkloadRan the Bonnie benchmark as a diabolical process, generating lots of disk writes. Needed to throttle twice. Initially, the bulk transfer is severely impeded and fixed. The overall migration takes 811 s, and without throttling, the transfer would have taken 3 days.

Page 23: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Results – I/O Overhead

The overhead of intercepting deltas is pretty low and only noticeable during the migration.

Page 24: Robert Bradford, Evangelos Kotsovinos, Anja Feldmann, Harald Schiöberg Presented by Kit Cischke

Conclusions

Demonstrated a VM migration scheme that includes persistent state, maintains open network connections and therefore works over a WAN without major disruption.

It can handle high I/O workloads too. Works much better than freeze-and-copy. Future work includes batching of deltas,

data compression and “better support for sparse files.”