46
Achieving Scale with MongoDB David Lutz Senior Solutions Architect

Webinar: Scaling MongoDB

  • Upload
    mongodb

  • View
    5.646

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Webinar: Scaling MongoDB

Achieving Scale with MongoDB

David LutzSenior Solutions Architect

Page 2: Webinar: Scaling MongoDB

Agenda• Optimize for Scale

– Design the Schema– Build Indexes– Use Monitoring Tools– Use WiredTiger Storage

• Vertical Scaling• Horizontal Scaling

Page 3: Webinar: Scaling MongoDB

Don’t Substitute Scaling for Optimization

• Make sure you are solving the right problem

• Remedy schema and index problems first

• We’ll discuss both …

Page 4: Webinar: Scaling MongoDB

Optimization Tips: Schema Design

Page 5: Webinar: Scaling MongoDB

Documents are Rich Data Structures{

customer_id : 123,first_name : ”John”,last_name : "Smith”,address : { location : [45.123,47.232] street : "123 Main

Street", city : "Houston", state : "TX", zip_code : "77027"

}profession : [“banking”,

”trader”]policies: [ {

policy_number : 13,

description : “short term”,

deductible : 500},{ policy_number :

14,description :

“dental”,visits : […]

} ] }

Fields can contain an array of sub-documents

Fields

Typed field values

Fields can contain arrays

String

Number

Geo-Location

Page 6: Webinar: Scaling MongoDB

The Document Data Model

Matches Application Objects– Eases development

Flexible– Evolves with application

High performance– Designed for access pattern

{ customer_id : 123,first_name : ”John",last_name : "Smith",address : { street: "123 Main

Street", city: "Houston", state: "TX", zip_code: "77027"

}policies: [ {

policy_number : 13,

description: “short term”,

deductible: 500},{ policy_number :

14,description:

“dental”,visits: […]

} ] }

Page 7: Webinar: Scaling MongoDB

The Importance of Schema Design

• Very different from RDBMS schema design

• With MongoDB Schema:– denormalize the data– create a schema with prior knowledge of your

actual query patterns, then …– write simple queries

Page 8: Webinar: Scaling MongoDB

Real World ExampleProduct catalog for retailer selling in 20 countries

{_id: 375,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,<… and so on for other locales …>

}

Page 9: Webinar: Scaling MongoDB

• What's good about this schema?– Each document contains all the data

about the product across all possible locales.

– It is the most efficient way to retrieve all translations of a product in a single query (English, French, German, etc).

Real World Example

Page 10: Webinar: Scaling MongoDB

But that’s not how the data was accessed!

db.catalog.find( { _id: 375 }, { en_US: true } );db.catalog.find( { _id: 375 }, { fr_FR: true } );db.catalog.find( { _id: 375 }, { de_DE: true } );

… and so forth for other locales

The data model did not fit the access pattern.

Real World Example

Page 11: Webinar: Scaling MongoDB

Inefficient use of resources

Data in BLUE are being used. Data in RED take up memory but are not in demand.

{_id: 375,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,<… and so on for other locales …>

}

{_id: 42,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,<… and so on for other locales …>

}

Page 12: Webinar: Scaling MongoDB

Consequences of Schema Redesign• Queries induced minimal memory overhead• 20x as many products fit in RAM at once• Disk IO utilization reduced• Application latency reduced

{_id: "375-en_GB",name: …,description: …, <… the rest of the document …>

}

Page 13: Webinar: Scaling MongoDB

Schema Design Patterns

• Pattern: pre-computing interesting quantities, ideally with each write operation

• Pattern: putting unrelated items in different collections to take advantage of indexing

• Anti-pattern: appending to arrays ad infinitum• Anti-pattern: importing relational schemas

(3NF) directly into MongoDB

Page 14: Webinar: Scaling MongoDB

Schema Design Resources• The docs! Data Model Design, Patterns & Examples

https://docs.mongodb.org/manual/core/data-model-design/https://docs.mongodb.org/manual/applications/data-models/https://docs.mongodb.org/manual/MongoDB-data-models-guide.pdf

• The blogs! http://blog.mongodb.org "6 Rules of Thumb for Schema Design"

– Part 1: http://goo.gl/TFJ3dr– Part 2: http://goo.gl/qTdGhP– Part 3: http://goo.gl/JFO1pI

• Webinars, training, consulting, etc…

Page 15: Webinar: Scaling MongoDB

Optimization Tips:Indexing

Page 16: Webinar: Scaling MongoDB

Indexes• Single biggest tunable performance factor • Tree-structured references to your documents• Indexing and schema design go hand in hand

))))))))))))))

Page 17: Webinar: Scaling MongoDB

Indexing Mistakes and Their Fixes• Failing to build necessary indexes

– Run .explain(), examine slow query log and system.profile collection, download mtools

• Building unnecessary indexes– Talk to your application developers about usage

• Running ad-hoc queries in production– Use a staging environment, use secondaries

Page 18: Webinar: Scaling MongoDB

Indexing Strategies• Create indexes that support your queries!• Create highly selective indexes• Eliminate duplicate indexes with compound indexes

– db.collection.ensureIndex({A:1, B:1, C:1})– allows queries using leftmost prefix

• Order index columns to support scans & sorts• Create indexes that support covered queries• Prevent collection scans in pre-production environments

db.runCommand( { setParameter: 1, notablescan: 1 } )

Page 19: Webinar: Scaling MongoDB

Indexing Example – Before an Indexdb.tweets.explain("executionStats").find( {"user.lang":”ja"} ){"winningPlan" : { "inputStage" : { "stage" : “COLLSCAN", "keyPattern" : { "user.lang" : 1 } } },"executionStats" : {

"nReturned" : 3560, "executionTimeMillis" : 56, "totalKeysExamined" : 0, "totalDocsExamined" : 51428 } }

Page 20: Webinar: Scaling MongoDB

Indexing Example – After an Indexdb.tweets.explain("executionStats").find( {"user.lang":”ja"} ){"winningPlan" : { "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "user.lang" : 1 } } },"executionStats" : {

"nReturned" : 3560, "executionTimeMillis" : 8, "totalKeysExamined" : 3560, "totalDocsExamined" : 3560 } }

Page 21: Webinar: Scaling MongoDB

Optimization Tips:Monitoring

Page 22: Webinar: Scaling MongoDB

The Best Way to Run MongoDBOps Manager allows you leverage and automate the best practices we’ve learned from thousands of deployments in a comprehensive application that helps you run MongoDB safely and reliably.

Benefits include:

10x-20x more efficient operations

Complete performance visibility

Assisted performance optimization

Page 23: Webinar: Scaling MongoDB

Ops Manager Provides:

for Developers• Visual Query Profiler

for Administrators• Index Suggestions• Automated Index Builds• Monitoring and Alerting

for Operations• APM Integration• Database Automation• Backup with Point-In-Time

Recovery

Page 24: Webinar: Scaling MongoDB

Fast and simple

query optimization

with the Visual

Query Profiler

Query Visualization and Optimization

Page 25: Webinar: Scaling MongoDB

Example Deployment – 12 ServersInstall, Configure150+ steps

…Error handling, throttling, alerts

Scale out, move servers, resize oplog, etc.10-180+ steps

Upgrades, downgrades100+ steps

Without Ops Manager

Page 26: Webinar: Scaling MongoDB

With Ops Manager

Page 27: Webinar: Scaling MongoDB

Also Available in the Cloud

Cloud Manager allows you to leverage and automate the best practices we’ve learned from thousands of deployments in a comprehensive application that helps you run MongoDB safely and reliably …

in the cloud!

http://cloud.mongodb.com

Page 28: Webinar: Scaling MongoDB

Manual Monitoring Tools

mongod

log fileprofiler (collection)

query engine

Review log files, or

Use mtools to visualize them –

http://github.com/rueckstiess/mtools

.explain(), is your friend

• queryPlanner

• executionStats

• allPlansExecution

ENABLE

Page 29: Webinar: Scaling MongoDB

WiredTiger Storage Engine

Page 30: Webinar: Scaling MongoDB

7x - 10x Performance & 50% - 80% Less Storage• 100% backwards compatible

• Non-disruptive upgrade

• Same data model, query language, ops

• WRITE performance gains driven by document-level concurrency control

• Storage savings driven by native compression MongoDB 3.0/3.2

MongoDB 2.6

Performance

Page 31: Webinar: Scaling MongoDB

Vertical Scaling

Page 32: Webinar: Scaling MongoDB

Factors:– RAM– Disk– CPU– Network

We are Here to Pump you Up

Primary

Secondary

Secondary

Replica Set Primary

Secondary

Secondary

Replica Set

Page 33: Webinar: Scaling MongoDB

Before you add hardware....• Make sure you are solving the right scaling problem• Remedy schema and index problems first

– schema and index problems can look like hardware problems

• Tune the Operating System– ulimits, swap, NUMA, NOOP scheduler with

hypervisors• Tune the IO subsystem

– ext4 or XFS vs SAN, RAID10, readahead, noatime

• See MongoDB “Production Checklist”• Heed logfile startup warnings

Page 34: Webinar: Scaling MongoDB

Working Set Exceeds Physical Memory

Page 35: Webinar: Scaling MongoDB

Initial Architecture

4-Way Cluster backed by spinning disk

Application / mongosmongod

Page 36: Webinar: Scaling MongoDB

Vertical Scaling

Scaling random IOPS with SSDs

Application / mongosmongod SSD

Page 37: Webinar: Scaling MongoDB

Horizontal Scaling

Page 38: Webinar: Scaling MongoDB

Horizontal ScalingRapidly growing business means more shards

Application w/ driver & mongos

…16 more shards…

mongod

Page 39: Webinar: Scaling MongoDB

What is a Shard Key?• Shard key must be indexed• Shard key is used to partition your collection• Shard key must exist in every document• Shard key is immutable• Shard key values are immutable• Shard key is used to route requests to shards

See How to Choose a Shard Key: The Card Gamehttps://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/

Page 40: Webinar: Scaling MongoDB

Shard Key Characteristics• A good shard key has:

– sufficient cardinality– distributed writes– targeted reads ("query isolation")

• Shard key should be in every query, if possible– Scatter-gather otherwise

• Choosing a good shard key is important!– affects performance and scalability– changing it later can be expensive

Page 41: Webinar: Scaling MongoDB

Range-based Sharding

Bagpipes Iceberg Snow Cone

A - C D - O P - Z

Shard

Shard key range

Shard key

Page 42: Webinar: Scaling MongoDB

Balancing

Dates Dragons

A - C D - O P - Z

Page 43: Webinar: Scaling MongoDB

Balancing

A - De Df - O P - Z

Background process balances data across shards

Page 44: Webinar: Scaling MongoDB

Other Forms of Sharding

There are more advanced types of sharding that are discussed in our sharding webinars.

• Tag-aware, aka zone partitioning, is a special case of range-based sharding that allows for data locality

• Hash-based, aka hash partitioning, uses a hashed value derived from the shard key(s) for assignment

Page 45: Webinar: Scaling MongoDB

Examples of Scale

Page 46: Webinar: Scaling MongoDB

Cluster, Performance & Data Scale

Cluster Scale Performance Scale Data Scale

Entertain Co.

1400 servers

250M Ticks / Sec Petabytes

Asian Internet

Co.

1000+ servers

300K+ Ops / Sec

10s of billions of objects

250+ servers Fed Agency 500K+

Ops / Sec13B

documents