MongoDB and Amazon Web Services: Storage Options for MongoDB Deployments

Preview:

DESCRIPTION

When using MongoDB and AWS, you want to design your infrastructure to avoid storage bottlenecks and make the best use of your available storage resources. AWS offers a myriad of storage options, including ephemeral disks, EBS, Provisioned IOPS, and ephemeral SSD's, each offering different performance and persistence characteristics. In this session, we’ll evaluate each of these options in the context of your MongoDB deployment, assessing the benefits and drawbacks of each.

Citation preview

MongoDB and AWSStorage Configurations

Senior Solutions Architect, MongoDB Inc.

Sandeep Parikh

#mongodb

Quick Recap

• Deployment and Availability– MongoDB Basics– Deployment Configurations– Instance Types– Best Practices

• Slides and Recording:– http://www.mongodb.com/presentations/mongodb

-and-amazon-web-services-deploying-high-availability

Agenda

• Storage Options

• Simple Recommendations

• Backup and Restore

• Advanced Configurations

• Drawbacks/Tradeoffs

• Next Steps

Storage Options

AWS Storage Options

• Instance-based (ephemeral)

• Elastic Block Store (persistent)

• Simple Storage Service (S3)

• Glacier

MongoDB Storage Elements

• Data

• Journal

• Logs

• Snapshots

• Archived Backups

Instance

• Data• Log• Journal

EBS

• Data• Log• Journal• Snapsho

ts

S3

• Snapshots

• Archived Backups

Glacier

• Archived Backups

MongoDB Elements & AWS Storage

Data Lifecycle

Instance Storage

• Ephemeral– If you’re instance is stopped or terminated,

ephemeral storage is lost (!)

• Configurations– Single or multiple volumes per instance

• Management– LVM for RAID or snapshots

EBS

• Persistent– Allocated and attached to individual instances like

network-attached storage– Storage lifecycle independent of instances

• Configuration– Single or multiple volumes per instance

• Management– LVM or MD for RAID– EBS Snapshots (Console or API)

Standard EBS

Standard volumes are designed for applications with moderate I/O requirements. They are also well-suited for use as boot volumes or applications where I/O can be bursty.

• Performance is somewhat variable

• Average of 100 IOPS

• Possible to aggregate via RAID but underlying bursty nature still exists

Provisioned IOPS EBS

Provisioned IOPS volumes offer storage with consistent and low-latency performance, and are designed for applications with I/O-intensive workloads such as databases.

• Consistent volume I/O performance

• Available with 100-4000 IOPS per volume

• Launch with EBS-Optimized– Adds additional network bandwidth for EBS

volumes

Measuring IOPS

• Volumes are optimized for 4 KB per operation

• MongoDB document sizes and workload patterns will affect throughput

• Use mongoperf to test disk configuration– Threads– Data file size– Document size

Simple Recommendations

Multiple EBS Volumes

• Provisioned IOPS EBS

• EBS-optimized

• Separate volumes for– Data– Journal– Log

• Decrease disk contention during high load

Disk Configurations

• Mirror or stripe multiple disks (or both)– LVM– MDADM

• Different implications for each RAID level– Durability– Performance– Cost

Aggregating IOPS

• Single volumes capable of 4000 IOPS

• Stripe volumes to aggregate IOPS (RAID0, RAID10)

• Note: network bandwidth is the limiting factor

MongoDB on AWS Marketplace

MongoDB on AWS Marketplace

MongoDB Configurations

• Follows MongoDB best practices– Amazon Linux, MongoDB installed via yum– EBS PIOPS volumes per mount (data, log, journal)– Configured: ulimits, read ahead, keep alive

ConfigData Log Journal

Size IOPS Size IOPS Size IOPS

1000 IOPS

200 GB 1000 10 GB 100 25 GB 250

2000 IOPS

200 GB 2000 15 GB 150 25 GB 250

4000 IOPS

400 GB 4000 20 GB 200 25 GB 250

Backup and Restore

Data Safety

• What’s your backup plan?

• Have you tested restoring?

• Is your data highly available?

• How do you recover from disaster?

Protecting Your Data

• Replica Sets– Proper deployments provide HA and DR

• Manual backup/restore– Scriptable, tuneable

• MMS Backup– Continuous, secure backup

Manual Backup Procedures

EBS• EBS Snapshots• LVM Snapshots

Ephemeral• LVM Snapshots

Note:

• EBS snapshots can be done “hot” but for MongoDB it’s better to fsyncLock()

• LVM snapshots require enough free space on instance to store snapshot

Restore

• Boot new or use existing instance

• Create new volume from EBS snapshot and attach

or

• Copy over LVM snapshot and create/mount LV

LVM

• Copy snapshots to S3 bucket

• Create lifecycle rules to move data from bucket to Glacier

EBS

• Mount volume from snapshot

• Copy volume data to S3 bucket

• Create lifecycle rules to move data from bucket to Glacier

Archiving Backups

MongoDB Management Service

MMS Backup

Fully-managed, agent-based,

continuous backup

Custom snapshot scheduling and

retention

Point-in-time recovery and

consistent snapshots across sharded clusters

Performance impact similar to

Secondary

Encrypted data transfer

Restores require 2-factor

authentication

MMS Backup In-Depth

Advanced Configurations

Standard Ephemeral Storage

• Remember, it’s ephemeral

• Technically feasible

• Lack of persistence is a big negative

• Any benefits can’t outweigh the negatives

Ephemeral SSDs

• Performance ceiling might outweigh typical negatives

• Cost implications: SSD-backed instances are more expensive

• Does your workload truly need flash?– Profile early and often to make this determination

• How many drives do you need?– Drives instance choice

RAID

SSD and MongoDB Configurations

SSD

mongod

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

SSD

mongod

mongod

mongod

mongod

mongod

mongod

mongod

mongod

SSD Deployment Strategies

• SSD deployments– Replica Sets

and – MMS Backup

• High performance

• Highly available

• Continuous backup

mongodPrimary

mongodSecondar

y

mongodSecondar

y

MMS Backup Agent

SSD Deployment Considerations

• One Secondary could use EBS

• Will need to have an instance with – High network bandwidth and – Mutliple EBS volumes aggregated to approach

IOPS parity

• Key is avoiding significant replication lag because of IO performance dropoff

Drawbacks & Tradeoffs

Considerations

• Performance

• Consistency

• Safety

• Flexibility

• Scalability

Best Practices

• Prototype > Test > Scale

• IO on AWS is easy to scale

• AWS makes it easy to iterate deployment– Start small– Profile your workload– Remove all other bottlenecks– Add instance and IO capacity

Recommended Starting Points

• EBS-Optimized and PIOPS EBS

• M1.large is an effective starting point for profiling an early production deployment

• Use volumes with 250 or 500 IOPS for data to start– A dding more IOPS is as easy– Snapshot and recreate with more capacity

Questions?

Resources

• MMS Monitoring and Backup– http://mms.mongodb.com

• MongoDB on AWS best practices:– http://bit.ly/deploy-mongodb-ec2

• MongoDB on AWS Marketplace:– http://bit.ly/aws-marketplace-mongodb

• MongoDB docs– http://docs.mongodb.org

MongoDB WorldNew York City, June 23-25

#MongoDBWorld

See what’s next in MongoDB including • MongoDB 2.6• Sharding• Replication• Aggregation

http://world.mongodb.comSave 25% with discount code 25SandeepParikh