Transcript

©2015, Amazon Web Services, Inc. or its affiliates. All rights reserved

Amazon Elastic Block Store

Deep DiveDougal Ballantyne, HPC Solutions Architect

What we’ll cover

Amazon EBS overview• Volumes

• Snapshots

Performance

Encryption

Q&A

EBS overview

For most builders AWS is get in and go!

Source: http://www.trucksplanet.com/catalog/model.php?id=1020

A “normal” hard drive

EBS =

What is EBS?

• Network block storage

• Designed for five nines of availability

• Attaches to Amazon EC2 within the same

Availability Zone

• Provides point-in-time snapshots to

Amazon S3

More about EBS

• It’s a service!

• It’s independent of EC2

• It has regional and AZ availability goals– All EBS volumes are designed for 99.999% availability

• Over 1.5 million volumes are created per day

A few definitions…

• IOPS: Input/output operations per second (#)

• Throughput: Read/write rate to storage (MB/s)

• Latency: Delay between request and completion (ms)

• Capacity: Volume of data that can be stored (GB)

• Block size: Size of each I/O (KB)

EBS volume types

• General Purpose (SSD)

• Provisioned IOPS (SSD)

• Magnetic

When performance matters, use SSD-backed volumes

EBS SSD volumes

• Applies to both General Purpose and

Provisioned IOPS

• IOPS measured up to 256 KB

• Single-digit ms latency

• Designed for 99.999% availability

EBS General Purpose volumes (SSD)

New default volume type for EBS

Every volume can burst up to 3,000 IOPS• Larger volumes can burst for longer periods

3 IOPS per GB baseline performance, maximum of 10,000 IOPS

99% performance consistency

Up to 160 MB/s throughput

General Purpose (SSD) – Burst & baseline

16 KB I/O size

(2) Max I/O credit per bucket is 5.4M

(1) Always accumulating 3

IOPS per GB per second

(3) You can spend up to

3000 IOPS per second

Understanding General Purpose (SSD) bursting

Baseline performance = 3 IOPS per GB

Minutes to empty a full I/O credit bucket for various volume sizes

The larger the volume, the longer it takes to empty the I/O credit bucket

1 TB or larger volume will never exhaust its I/O credit bucket

Minutes to empty a full I/O credit bucket for various sizes

The larger the volume, the longer it takes to empty the I/O credit bucket

1 TB or larger volume will never exhaust its I/O credit bucket

General Purpose (SSD) volumes example

Microsoft Windows 30 GB boot volume:

• Gets initial I/O credit of 5.4M

• Could burst for up to 30 mins @ 3000 IOPS

• Always accumulating 90 I/O credits per

second

Improved instance boot time

m3.medium

Volume type Boot time Access time OS

GP2 3:31 4:33 Windows Server

2012

Magnetic 4:30 7:16 Windows Server

2012

GP2 0:36 0:45 CentOS6

Magnetic 0:57 1:16 CentOS6

40% Reduction in boot times by using General Purpose SSD

Database volume

1 TB PIOPS volume with 4K IOPS = $526.40 per month per volume

GP2 1 TB volume with 3000 IOPS = $102.40

GP2 2 x 500 GB volume at 3K, Burst to 6K = $102.40

80% cost savings, 50% more peak I/O with

General Purpose SSD

Guidelines for sizing General Purpose (SSD)

volumes

Generic boot, developer, test/dev, and web apps:Provision GB required for your applications

Database apps:1. Calculate the IOPS required in steady state

2. Perform this calculation: (steady state IOPS) ÷ 3 = GB to provision

Note: I/O bursts will support:

• Database load or table scan operations

• Spike in I/O workload

20

EBS PIOPS (SSD) volumes

• Best for I/O intensive databases that require highest

consistency

• Throughput up to 320 MB/sec

• Provision up to 20,000 IOPS per volume

(supports IOPS:GB ratio of 30)

• Designed for 99.9% performance consistency

EBS Magnetic volumes

• Best for cold workloads (rarely accessed data that needs

always-on access)

• IOPS: ~100 IOPS steady-state, with best-effort bursts

• Throughput: variable by workload, best effort to 10s of MBs

• Latency: Varies, reads typically ~20-40 ms, writes typically

~2-10 ms

EBS volume types - summary

General Purpose (SSD) Provisioned IOPS (SSD) Magnetic

Recommend use cases

Boot volumes

Small to med DBs

Dev and test

I/O-intensive workloads

Large DBsCold storage

Storage media SSD-backed SSD-backed Magnetic-backed

Volume size 1 GB - 16 TB 4 GB - 16 TB 1 GB - 1 TB

Max IOPS per volume 10,000 IOPS 20,000 IOPS ~100 IOPS

Burst < 1 TB to 3000 IOPS baseline baseline

Read and write peak throughput 160 MB/s 320 MB/s ~50-90 MBps

Max IOPS per node (16k) 48,000 48,000 48,000

Peak throughput node 800 MB/s 800 MB/s 800 MB/s

Latency (random read) 1-2 ms 1-2 ms 20-40 ms

API Name gp2 io1 standard

Price* $.10/GB-month$.125/GB-month

$.065/provisioned IOPS

$.05/GB-month

$.05/ 1M I/O

Why is General Purpose SSD the default?

High baseline level of performance

Burst to higher level of IOPS

Single, capacity-based pricing dimension

• Makes forecasting very easy

• Eliminates sizing complexity

Attractive price/gigabyte/price/IOPS density

Always use General Purpose (SSD) for boot volumes

Migrating to General Purpose (SSD) volumes

Change volume type during launch

Use EBS snapshots

You may be able to resize the file system

Use General Purpose (SSD) sizing guide

Benefits of using EBS snapshots

More durable than an EBS volume

• Stored in Amazon S3

Differential (space-efficient)

• First snapshot is a clone

• Pay only for what you use

Availability Zone-independent

• Clone into any AZ

Can be copied efficiently across regions

Tagging snapshots

Use tags to add

metadata to snapshots:

• Type (daily, weekly)

• Version

• Instance ID

• Volume ID

• Application stack

Performance

Queuing theory – Little’s Law

Little’s Law is the foundation for performance

tuning theory• Mathematically proven by John Little in 1961

L = A * WL = Queue length = average number of requests waiting

A = Arrival rate = the rate of requests arriving

W = Wait time = average wait time

EBS performance is related to this law

Performance optimization is measured by:

IOPS: Read/write I/O rate (IOPS)

Latency: Time between I/O submission

and completion (ms)

Throughput: Read/write transfer rate

(MB/s); throughput = IOPS X I/O size

Four key components of performance optimization

1. EC2 instance

2. I/O

4. EBS

3. Network

link

Tools available for performance tuning:

1. EC2 instance: Network bandwidth (Mbps)

2. EBS-optimized instance: EC2 instance option (On/Off)

3. Workload: Block size, read/write ratio, serialization

4. Queue depth: The number of outstanding I/Os

5. RAID: Stripe volumes to maximize performance

6. Pre-warming: Eliminate first-touch penalty

1. EC2 instance

Compute-optimized – C3,C4

Memory-optimized – R3

General-Purpose – M3

EBS

EC2

Select the EC2 instance that has the right network,

RAM, and CPU resources for your applications

2. EBS-optimized instance

Most instance families support the EBS-optimized flag

EBS-optimized instances now support up to 4 GB/s

• Drive 32,000 16K IOPS or 500 MB/s

EC2 *.8xlarge instances support 10 Gb/s network

Max IOPS per node supported is ~48,000 IOPS @ 16K I/O

Use EBS-optimized instances for

consistent EBS performance

3. Workload

I/O size:

• 4 KB to 64 MB

I/O pattern:

• Sequential and random

I/O type:

• Read and write

I/O concurrency:

• Number of concurrent I/O

EBS SSD-backed volumes measure I/O size up to 256 KB

EBS SSD-backed volumes deliver same performance for read and write

EBS IOPS and throughput limits

20,000 IOPS

PIOPS volume

20,000 IOPS

320 MB/s

throughput

You can achieve 20,000 IOPS when

driving smaller I/O operations

You can achieve up to 320 MB/s

when driving larger I/O operations

EBS IOPS and throughput limits

8,000 IOPS

PIOPS volume

8,000 IOPS

320 MB/s

throughput

8,000 x 64 KB=512 MB/s

1,250 x 256 KB = 320 MB/s

8,000 X 8 KB = 64 MB/s

8,000 X 16 KB = 128 MB/s

16,000 x 8 KB = 128 MB/s

8,000 x 32 KB = 256 MB/s

Block (I/O) size determines whether your

application is IOPS bound or throughput bound

4. Queue depth

An I/O operation

EBS

After it’s gone, it’s gone

EC2

Queue depth is the pending I/O for a volume

Monitoring EBS volumes

Important Amazon

CloudWatch metrics:

• IOPS and bandwidth

• Latency

• Queue depth

I/O latency

• Elapsed time between I/O submission and its completion

time

• Performance requirements may be driven by IOPS or

latency or both

• There is an interdependency among IOPS, queue depth,

and latency

Latency: What does it look like?

Latency: Introducing the boxplot

Latency: Baseline

Latency: General Purpose SSD volume

Latency: Latest generation instance

EC2: Instance comparison

m2.4xlarge

CPU: Intel Xeon

vCPU: 8

Memory: 68.4 GiB

Price: $0.98/hour

r3.2xlarge

CPU: Intel Xeon E5-2670 v2

vCPU: 8

Memory: 61 GiB

Enhanced Networking

Price: $0.70/hour

* All pricing from us-east-1

Random read latency

0.075

35.1

0

5

10

15

20

25

30

35

40

1 4 8 12 16 20 24 28 32

La

ten

cy T

P9

0 (

ms

)

Queue depth

Random read latency across various queue depths

Latency (TP90)

Read latency linearly increases with increase in queue depth

Random read latency

0.075

35.1

2.09

1,865

4,152

3,851

-

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

0

5

10

15

20

25

30

35

1 4 8 12 16 20 24 28 32

La

ten

cy

TP

90

(m

s)

Queue depth

16 KB random read IOPS, latency across various queue depths

Latency (TP90) Avg Read IOPS

IOP

S

Queue depth of 1 has the lowest latency, but also has the lowest IOPS

Random read latency

0.075

35.1

2.09

1,865

4,152

3,851

-

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

0

5

10

15

20

25

30

35

1 4 8 12 16 20 24 28 32

La

ten

cy

TP

90

(m

s)

Queue depth

16 KB random read IOPS, latency across various queue depths

Latency (TP90) Avg Read IOPS

IOP

S

Queue depth between 4 and 8 has the optimal IOPS and latency performance

Random read latency

0.075

35.1

2.09

1,865

4,152

3,851

-

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

0

5

10

15

20

25

30

35

1 4 8 12 16 20 24 28 32

La

ten

cy

TP

90

(m

s)

Queue depth

16 KB random read IOPS, latency across various queue depths

Latency (TP90) Avg Read IOPS

IOP

S

Higher queue depths have negative impact on IOPS and latency

Random write latency

0.08

7.71

845

4,152

0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

0

1

2

3

4

5

6

7

8

9

10

1 4 8 12 16 20 24 28 32

La

ten

cy

TP

90

(m

s)

Queue depth

16 KB random write IOPS, latency across various queue depths

Latency (TP90) AvgIOPS

IOP

S

Write latency queue depth and IOPS interaction is similar to that of read latency

Optimal queue depth to achieve lower latency and highest IOPS

is typically between 4-8; ~1 queue depth per 500 IOPS

EBS-optimized instances provide consistent latency experience

Use SSD volumes with latest-generation EC2 instances

5. RAID

Increases performance, or capacity, or both

Over 320 MB/sec or 20K IOPS, striping needed

Don’t mix volume types

Typically RAID 0 or LVM stripe

Avoid RAID for redundancyEBS

EC2

Maximum performance per instance

How should you think about taking snapshots on a striped volume?

• Quiesce file systems and take snapshot

• Unmount file system and take snapshot

• Use OS-specific tools

12×400 GB PIOPS, pre-warmed, RAID 0 LVM, stripe size 128 KB, attached to CR1 instance

Use stripe size of 128 KB or 256 KB

6. Pre-warming

• Eliminates first-access penalty

• Typically 5%, extreme worst case of 50% performance reduction in IOPS and latency when volumes are used without pre-warming:

– Performance is as provisioned when all the chunks are accessed

• Recommendations before benchmarking:

– For new volumes:

• Linux: DD write

• Windows: NTFS full format

– Takes roughly an hour to pre-warm 1 TB PIOPS/General Purpose (SSD) volumes

• Always check latest documentation http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-prewarm.html

Use large block size to speed up your pre-warming

Example: sudo dd if=/dev/xvdf

of=/dev/xvdf conv=notrunc bs=1M

Final tips

• Try to use Ext4 or XFS

• Alignment can matter; check tools use 4k

Workload/

software

Typical block

size

Random/

Seq?

Max EBS @ 500

MB/s instances

Max EBS @

1 GB/s instances

Max EBS @ 10 GB/s

instances

Oracle DB Configurable:2 KB

–16 KB

Default 8 KB

random ~7,800 IOPS ~15,600 IOPS ~48,000 IOPS

Microsoft SQL

Server

8 KB w/ 64 KB

extents

random ~7,800 IOPS ~15,600 IOPS ~48,000 IOPS

MySQL 16 KB random ~4,000 IOPS ~7,800 IOPS ~48,000 IOPS

PostgreSQL 8 KB random ~7,800 IOPS ~15,600 IOPS ~48,000 IOPS

MongoDB 4 KB serialized ~15,600 IOPS ~31,000 IOPS ~48,000 IOPS

Apache

Cassandra

4 KB random ~15,600 IOPS ~31,000 IOPS ~48,000 IOPS

GlusterFS 128 KB sequential ~500 IOPS ~1,000 IOPS ~6,000 IOPS

Cheat sheet sample: Storage workloads on AWS

EBS-optimized instance

Four key components: balanced (Oh, YEAH!!)

EC2

A “boatload” of I/O

Right-sized EBS

Tools available for tuning

EC2 Instance: Network transfer rate (Mbps)

EBS Optimized: EC2 instance option (On/Off)

Workload: Block size, read/write ratio, serialization

Queue Depth: The number of outstanding I/Os (#)

RAID: Stripe volumes to maximize performance

Pre-warm: Eliminates first touch penalty

Encryption

Why encrypt data volumes?

Security:

Protects against someone who might gain unauthorized physical access to the volume

Can help with internal or external compliance efforts:

• Chief Information Security Officer wants encryption to protect sensitive corporate

data

• 3rd-party auditors want to see evidence that sensitive customer data is encrypted

Ease of use and operating cost reduction:

Unlike open-source or 3rd-party solutions, such as Trend Micro SecureCloud, SafeNet

ProtectV, etc., EBS encryption offers:

• “Checkbox” encryption at no extra cost

• Automated, secure key management

AWS KMS

A service that simplifies encrypting data and managing keys

Allows customers to create, use, and manage encryption keys from within

their own applications and supported AWS services (Amazon S3, EBS,

Amazon Redshift)

Key management functions include:

• Create, enable, disable, rotate, and define usage policy on master keys

• Generate a data key that can be exported from the service after it’s

encrypted by a master key

• Audit use of master keys in AWS CloudTrail

Available in 9 commercial regions

Use Create Volume in the EC2 console

How AWS services integrate with AWS KMS

2-tiered key hierarchy using envelope

encryption

Unique data key encrypts customer data

AWS KMS master keys encrypt data keys

Benefits of envelope encryption:

• Limits risk of a compromised data key

• Better performance for encrypting large data

• Easier to manage a small number of master

keys than millions of data keys

Master key(s)

Data key 1

S3 object EBS

volume

Amazon

Redshift

cluster

Data key 2 Data key 3 Data key 4

Custom

application

KMS

Summary

Use encryption if

you need itTake snapshotsSelect the right

instance for your

workload

Select the right

volume for your

workload

Q&A

NEW YORK