49
VMware vSphere Replication: Technical Walk-Through with Engineering Aleksey Pershin, VMware Ken Werneburg, VMware BCO4977 #BCO4977

VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

  • Upload
    vmworld

  • View
    384

  • Download
    4

Embed Size (px)

DESCRIPTION

VMworld 2013 Jeff Hunter, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare Vahid Fereydouny, VMware

Citation preview

Page 1: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

VMware vSphere Replication:

Technical Walk-Through with Engineering

Aleksey Pershin, VMware

Ken Werneburg, VMware

BCO4977

#BCO4977

Page 2: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

2

Agenda

Introduction to vSphere Replication

What’s New in 2013

vSphere Replication and SRM

Configuring VR replication

VR internals

Failover and test

Automated reprotect and failback

Summary

Page 3: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

3

Agenda

Introduction to vSphere Replication

What’s New in 2013

vSphere Replication and SRM

Configuring VR replication

VR internals

Failover and test

Automated reprotect and failback

Summary

Page 4: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

4

vSphere Replication: Protection Built-in to the Platform!

Standalone Protection

VM-by-VM Protection and

Recovery

Replication Engine

Integrated with the vSphere

Platform

Bundled with most vSphere

Editions

vSphere vSphere

vSphere Replication enables simple

and reliable protection for all Virtual Machines

Page 5: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

5

Introduction to vSphere Replication: Protection for SRM

Replication built into vSphere

Replicates individual VMs

Replicates between

heterogenous datastores

Asynchronous replication with RPO >= 15 min

Alternative or augmentation for

ABR

Recovery and test are done through SRM

recovery plans

vSphere Replication can be used by SRM as the replication engine

vSphere vSphere

Page 6: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

6

vSphere vSphere ESX

vSphere

vSphere

vSphere

VM VM

VR Appliance

vCenter

Protected Site Recovery Site

VR Agent (Further VR

Servers)

vCenter

NFC Service

vSphere Web UI

vSphere Replication Architecture

VR vSCSI Filter

VM

VR Appliance

Page 7: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

7

Agenda

Introduction to vSphere Replication

What’s New in 2013

vSphere Replication and SRM

Configuring VR replication

VR internals

Failover and test

Automated reprotect and failback

Summary

Page 8: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

8

Top New Features in vSphere Replication

Multiple Points in Time

Multiple vSphere Replication Appliances per vCenter

Support for Storage vMotion

New User Interface Location

Support for vSAN and other VM Storage Policies

Dramatic Speed Improvement

Page 9: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

9

Secondary DC

Storage Storage

vSphere

VR Agent

vSphere

VR Agent

(VMDK3) VMDK1

VR

Appliance

Storage Storage

vSphere

VR Agent

(VMDK1) VMDK2

VR

Appliance

vSphere

VR Agent

VR Server

Storage Storage VMDK3 (VMDK2)

vCenter Server

vCenter Server

Main Office Datacenter

Open Topologies with up to 10 vSphere Replication Appliances

Replicate to or between remote sites with or without a vCenter server present!

Remote Office

Page 10: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

10

Up to 24 Points in Time Retained to Allow Reversion of VM State

Retention policy is specified during configuration of replication

Page 11: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

11

Protected Site Storage vMotion Now Supported

Replication

Manually migrate VMs or even use Storage DRS to ease management

Protected Site Recovery Site

Storage vMotion

can now be used

for protected virtual

machines.

Only protected site

VMDKs can be

migrated: recovery

‘shadow’ objects

are fixed.

Page 12: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

12

Administrator chooses a VM Storage Policy: only valid datastores are selectable

VM Storage Policy and vSAN Interoperability

Page 13: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

13

VR Now Found Under the Corresponding vCenter

vSphere Replication now easier to find and more intuitive to manage

Page 14: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

14

Each vCenter Now Has “Monitor” and “Manage” for VR

vSphere Replication now easier to find and more intuitive to manage

Page 15: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

15

Dramatic Performance Improvement

vSphere

VR Agent

vSphere

VR Agent

VR Server VR Server

5.5 Behaviour 5.1 Behaviour

Increased parallelism and more efficient throughput means faster replication,

pushing more data. Replicate more, with no performance cost!

New TCP Stack Optimized for

Latency

Buffered IO for NFC Writes

Coalesced Contiguous

Writes

Page 16: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

16

Agenda

Introduction to vSphere Replication

What’s New in 2013

vSphere Replication and SRM

Configuring VR replication

VR internals

Failover and test

Automated reprotect and failback

Summary

Page 17: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

17

vSphere Replication and SRM

SVR is now independent of SRM

SVR can replicate within a single vCenter

• SRM will discover and use SVR and its replication

SRM can be installed after SVR

• Gain automation, test recovery, failback, customization, reporting...

Upgrade to SRM

SVR and SRM can coexist

See a more detailed session on using VR and SRM

INF-BCO5129 “Protection for All – vSphere Replication + SRM Technical Update”

Page 18: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

18

Architecture: vSphere Replication with Site Recovery Manager

“Protected” Site “Recovery” Site

VR App VR App

vSphere Client

SRM Plug-In

vSphere Client

SRM Plug-In

VMFS Storage VMFS

DB DB

SRM Server SRM Server

DB DB

vCenter Server vCenter Server

ESX ESX

VMFS Storage VMFS

ESX ESX ESX

VRA VRA VRA

VR Server

DB DB

Replication

Page 19: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

19

Agenda

Introduction to vSphere Replication

What’s New in 2013

vSphere Replication and SRM

Configuring VR replication

VR internals

Failover and test

Automated reprotect and failback

Summary

Page 20: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

20

Configuring VR Replication

VR replication is configured

per VM in vCenter

Selectable RPO from 15 min up

to 24 hours

Selectable destination

datastore (per virtual disk)

Select MPIT policy

Page 21: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

21

Configuring VR Replication: Multiple VMs

All VMs will have the same settings (RPO, quiescence, etc.)

Page 22: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

22

Datastore Mappings Ease Mass Protection of Systems

Page 23: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

23

Seeding the Initial Copy to Save Time and Bandwidth

The user can provide the seed for the initial copy

The seed can be delivered through any out-of-band channel

The more recent, the better

The user directs the wizard to the seed files when configuring replication

If using seeds when configuring en masse

The seed files must be placed in a specific way at the target

Refer to the VR user manual for more details

Page 24: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

24

Agenda

Introduction to vSphere Replication

What’s New in 2013

vSphere Replication and SRM

Configuring VR replication

VR internals

Failover and test

Automated reprotect and failback

Summary

Page 25: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

25

First, It Does an Initial Full Sync of Source and Target

Compares disk IDs to avoid mismatches

Calculates checksum of all blocks at source and target

Exchanges and compares checksums to determine delta

Replicates all changed blocks necessary to align VMDKs

A B C D E A C

Source Disk Seed Disk tcp/31031

B D E

A B C D E

Page 26: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

26

After Full Sync, We Switch to Sending the Delta

• Crash consistent if quiescing is turned off

• Allows cross-disk consistency within a VM

• Ongoing I/O not penalized with replication active

• Lightweight snapshots are not the same as VM snapshots (redo logs)

Light-Weight Deltas

Page 27: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

27

Normally Sends Only Changed Blocks

Switches to delta after first sync

VR Agent tracks all changing blocks via vSCSI filter

Changed blocks replicated as per RPO

A B C D E

Source Disk Target Disk

A C D

tcp/44046

Disks are always consistent

A B C D E AII B CI DI E AII B CI DI E

Page 28: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

28

Lightweight Snapshots and the LWD Protocol

Writes tracked by vSCSI filter driver

Each replica corresponds to a lightweight snapshot

Bitmap of changed blocks is maintained between replications

During a sync changed blocks are read and sent to the target

LWD protocol – Light Weight Deltas

• Port 31031 – Initial replication traffic

• Port 44046 – Ongoing replication traffic

VR Filter

Page 29: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

29

Replication Consistency

• VM has a known RPO

Maintains point-in-time consistency

• All disks within a VM treated as an entity

Guarantees cross-disk consistency

• A VMDK will never be corrupt

Every replica is a crash consistent image of the VM

• Improves OS recoverability with VSS

Guest quiescing adds file system consistency

• Flush application writers with VSS

App-level quiescing adds application level consistency

Page 30: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

30

Protecting against Network Failures

VR vSCSI filter discards a snapshot only after a sync is completed

VR Server writes each replica into a separate redo log

A redo log is snapshotted only after a sync is completed

Old replicas are collapsed only after a sync is completed

There is always at least one valid replica that corresponds to a

valid lightweight snapshot

Blocks changed LWD Shipped

Redo log

collected

Write

committed to

replica vmdks

Page 31: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

31

The Replication Scheduler

The scheduler runs in the VR agent on each ESX host

Minimizes RPO violations across all VMs on the host

Tries to minimize the overall bandwidth usage within RPO constraints

Statistical analysis to predict sync durations

Can do “early syncs” in anticipation of large syncs

Page 32: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

32

Retain Historical Replications as Snapshots

vSphere

VR Agent

After recovery, use the snapshot manager to revert to earlier points

Retention of

multiple points in

time allows

reversion to

earlier known

good states

Page 33: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

33

Multiple Points in Time Saved Intelligently

Current

Previous replicas retained

Replication

Running

Replication

Halted Recovers to most recent replica

– others are snapshots

Ongoing Protection

During Recovery

Page 34: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

34

MPIT retention policy: keep 3 replicas per 24 hour retention period = 1 retained every 8 hours

4 hour RPO = ~6 replications during the day

Of the 6 replica snapshots created, only 3 are kept during the 24 hour period

Retains the most recent up-to-date snapshot within an 8 hour period

Replication Differs from Retention - Example

12AM 4AM 8AM 12PM 4PM 8PM

4AM 12PM 8PM

Retains only a subset of the replicas in accordance with policy

Page 35: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

35

Replication Slots Differ from Replication Instances

The most recent complete instance is *always* preserved even

though it might be the second instance in the slot.

This ensures you can always failover to the most recent copy.

Page 36: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

36

Replication Slots Differ from Replication Instances

The oldest instance in any given retention slot is preserved,

as is the most recent replication.

Page 37: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

37

MPIT Presented as VM Snapshots after Failover

Use the snapshot manager to revert to earlier points, an interface all administrators

have been comfortable with for many years.

Page 38: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

38

SRM and VR Interop Resolution

Point in time recovery is

available in SRM when using

vSphere Replication

SRM Advanced Settings

dialog to instruct SRM to

preserve the MPIT images

vrReplication.preserveMpitIma

gesAsSnapshots

On by default, change at both

sites if desired

Page 39: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

39

Agenda

Introduction to vSphere Replication

What’s New in 2013

vSphere Replication and SRM

Configuring VR replication

VR internals

Failover and test

Automated reprotect and failback

Summary

Page 40: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

40

Failover and Test

During a failover, a replica is surfaced as a VM in vCenter

• Replication is automatically stopped

• All MPIT replicas are collapsed to avoid a performance penalty at runtime or preserved as VM snapshots

During a test (SRM only), a snapshot of a replica is surfaced as a VM

• Replication continues to run while test is in progress

• The test VM can write to the disks without affecting the replicas

• After the test the test snapshot is discarded

Page 41: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

41

Agenda

Introduction to vSphere Replication

What’s New in 2013

vSphere Replication and SRM

Configuring VR replication

VR internals

Failover and test

Automated reprotect and failback

Summary

Page 42: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

42

SRM Allows for Automated Reprotect and Failback

• Reprotect

• Test recovery after reprotect

• Failback

SRM provides additional automation workflows

• Failover shuts down protected VMs and disables power-on

• All VM files are left at the protected site

A successful planned migration is required for reprotect

• All replication settings preserved

• Original VMs used as seeds

• Detects manually configured replications

Reprotect automatically configures VMs for replication in the opposite direction

V

M

D

K1

V

M

D

K2

(VMDK1)

VMDK1 (VMDK2)

VMDK2

Page 43: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

43

Agenda

Introduction to vSphere Replication

What’s New in 2013

vSphere Replication and SRM

Configuring VR replication

VR internals

Failover and test

Automated reprotect and failback

Summary

Page 44: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

44

Summary

vSphere Replication provides robust and cost effective replication

More features and improvements coming in 2013

• Multiple Point In Time

• Multiple replication appliances per vCenter

• SDRS and Storage vMotion support

• New and improved UI

• Support for vSAN and storage classes

• Dramatic performance improvements

vSphere Replication for SMBs

• Offered with Essentials Plus licenses and above

• Can be upgraded to SRM to provide automation, test, failback

Page 45: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

45

More Good Stuff!

http://blogs.vmware.com/vSphere/Uptime

Page 46: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

46

Other VMware Activities Related to This Session

HOL:

HOL-SDC-1305

Business Continuity and Disaster Recovery In Action

Group Discussions:

BCO1003-GD

Disaster Recovery and Replication with Ken Werneburg

BCO4977

Page 47: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

THANK YOU

Page 48: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering
Page 49: VMworld 2013: VMware vSphere Replication: Technical Walk-Through with Engineering

VMware vSphere Replication:

Technical Walk-Through with Engineering

Aleksey Pershin, VMware

Ken Werneburg, VMware

BCO4977

#BCO4977