Webinar: Scaling MongoDB

Achieving Scale with MongoDB

David LutzSenior Solutions Architect

Agenda• Optimize for Scale

– Design the Schema– Build Indexes– Use Monitoring Tools– Use WiredTiger Storage

• Vertical Scaling• Horizontal Scaling

Don’t Substitute Scaling for Optimization

• Make sure you are solving the right problem

• Remedy schema and index problems first

• We’ll discuss both …

Optimization Tips: Schema Design

Documents are Rich Data Structures{

customer_id : 123,first_name : ”John”,last_name : "Smith”,address : { location : [45.123,47.232] street : "123 Main

Street", city : "Houston", state : "TX", zip_code : "77027"

}profession : [“banking”,

”trader”]policies: [ {

policy_number : 13,

description : “short term”,

deductible : 500},{ policy_number :

14,description :

“dental”,visits : […]

} ] }

Fields can contain an array of sub-documents

Fields

Typed field values

Fields can contain arrays

String

Number

Geo-Location

The Document Data Model

Matches Application Objects– Eases development

Flexible– Evolves with application

High performance– Designed for access pattern

{ customer_id : 123,first_name : ”John",last_name : "Smith",address : { street: "123 Main

Street", city: "Houston", state: "TX", zip_code: "77027"

}policies: [ {

policy_number : 13,

description: “short term”,

deductible: 500},{ policy_number :

14,description:

“dental”,visits: […]

} ] }

The Importance of Schema Design

• Very different from RDBMS schema design

• With MongoDB Schema:– denormalize the data– create a schema with prior knowledge of your

actual query patterns, then …– write simple queries

Real World ExampleProduct catalog for retailer selling in 20 countries

{_id: 375,en_US: { name: …, description: …, <etc…> },en_GB: { name: …, description: …, <etc…> },fr_FR: { name: …, description: …, <etc…> },fr_CA: { name: …, description: …, <etc…> },de_DE: …,<… and so on for other locales …>

}

• What's good about this schema?– Each document contains all the data

about the product across all possible locales.

– It is the most efficient way to retrieve all translations of a product in a single query (English, French, German, etc).

Real World Example

But that’s not how the data was accessed!

db.catalog.find( { _id: 375 }, { en_US: true } );db.catalog.find( { _id: 375 }, { fr_FR: true } );db.catalog.find( { _id: 375 }, { de_DE: true } );

… and so forth for other locales

The data model did not fit the access pattern.

Real World Example

Inefficient use of resources

Data in BLUE are being used. Data in RED take up memory but are not in demand.


}


}

Consequences of Schema Redesign• Queries induced minimal memory overhead• 20x as many products fit in RAM at once• Disk IO utilization reduced• Application latency reduced

{_id: "375-en_GB",name: …,description: …, <… the rest of the document …>

}

Schema Design Patterns

• Pattern: pre-computing interesting quantities, ideally with each write operation

• Pattern: putting unrelated items in different collections to take advantage of indexing

• Anti-pattern: appending to arrays ad infinitum• Anti-pattern: importing relational schemas

(3NF) directly into MongoDB

Schema Design Resources• The docs! Data Model Design, Patterns & Examples

https://docs.mongodb.org/manual/core/data-model-design/https://docs.mongodb.org/manual/applications/data-models/https://docs.mongodb.org/manual/MongoDB-data-models-guide.pdf

• The blogs! http://blog.mongodb.org "6 Rules of Thumb for Schema Design"

– Part 1: http://goo.gl/TFJ3dr– Part 2: http://goo.gl/qTdGhP– Part 3: http://goo.gl/JFO1pI

• Webinars, training, consulting, etc…

http://blog.mongodb.org/





https://docs.mongodb.org/manual/MongoDB-data-models-guide.pdf







Optimization Tips:Indexing

Indexes• Single biggest tunable performance factor • Tree-structured references to your documents• Indexing and schema design go hand in hand

))))))))))))))

Indexing Mistakes and Their Fixes• Failing to build necessary indexes

– Run .explain(), examine slow query log and system.profile collection, download mtools

• Building unnecessary indexes– Talk to your application developers about usage

• Running ad-hoc queries in production– Use a staging environment, use secondaries

Indexing Strategies• Create indexes that support your queries!• Create highly selective indexes• Eliminate duplicate indexes with compound indexes

– db.collection.ensureIndex({A:1, B:1, C:1})– allows queries using leftmost prefix

• Order index columns to support scans & sorts• Create indexes that support covered queries• Prevent collection scans in pre-production environments

db.runCommand( { setParameter: 1, notablescan: 1 } )

Indexing Example – Before an Indexdb.tweets.explain("executionStats").find( {"user.lang":”ja"} ){"winningPlan" : { "inputStage" : { "stage" : “COLLSCAN", "keyPattern" : { "user.lang" : 1 } } },"executionStats" : {

"nReturned" : 3560, "executionTimeMillis" : 56, "totalKeysExamined" : 0, "totalDocsExamined" : 51428 } }

Indexing Example – After an Indexdb.tweets.explain("executionStats").find( {"user.lang":”ja"} ){"winningPlan" : { "inputStage" : { "stage" : "IXSCAN", "keyPattern" : { "user.lang" : 1 } } },"executionStats" : {

"nReturned" : 3560, "executionTimeMillis" : 8, "totalKeysExamined" : 3560, "totalDocsExamined" : 3560 } }

Optimization Tips:Monitoring

The Best Way to Run MongoDBOps Manager allows you leverage and automate the best practices we’ve learned from thousands of deployments in a comprehensive application that helps you run MongoDB safely and reliably.

Benefits include:

10x-20x more efficient operations

Complete performance visibility

Assisted performance optimization

Ops Manager Provides:

for Developers• Visual Query Profiler

for Administrators• Index Suggestions• Automated Index Builds• Monitoring and Alerting

for Operations• APM Integration• Database Automation• Backup with Point-In-Time

Recovery

Fast and simple

query optimization

with the Visual

Query Profiler

Query Visualization and Optimization

Example Deployment – 12 ServersInstall, Configure150+ steps

…Error handling, throttling, alerts

Scale out, move servers, resize oplog, etc.10-180+ steps

Upgrades, downgrades100+ steps

Without Ops Manager

With Ops Manager

Also Available in the Cloud

Cloud Manager allows you to leverage and automate the best practices we’ve learned from thousands of deployments in a comprehensive application that helps you run MongoDB safely and reliably …

in the cloud!

http://cloud.mongodb.com

Manual Monitoring Tools

mongod

log fileprofiler (collection)

query engine

Review log files, or

Use mtools to visualize them –

http://github.com/rueckstiess/mtools

.explain(), is your friend

• queryPlanner

• executionStats

• allPlansExecution

ENABLE

WiredTiger Storage Engine

7x - 10x Performance & 50% - 80% Less Storage• 100% backwards compatible

• Non-disruptive upgrade

• Same data model, query language, ops

• WRITE performance gains driven by document-level concurrency control

• Storage savings driven by native compression MongoDB 3.0/3.2

MongoDB 2.6

Performance

Vertical Scaling

Factors:– RAM– Disk– CPU– Network

We are Here to Pump you Up

Primary

Secondary

Secondary

Replica Set Primary

Secondary

Secondary

Replica Set

Before you add hardware....• Make sure you are solving the right scaling problem• Remedy schema and index problems first

– schema and index problems can look like hardware problems

• Tune the Operating System– ulimits, swap, NUMA, NOOP scheduler with

hypervisors• Tune the IO subsystem

– ext4 or XFS vs SAN, RAID10, readahead, noatime

• See MongoDB “Production Checklist”• Heed logfile startup warnings

Working Set Exceeds Physical Memory

Initial Architecture

4-Way Cluster backed by spinning disk

Application / mongosmongod

Vertical Scaling

Scaling random IOPS with SSDs

Application / mongosmongod SSD

Horizontal Scaling

Horizontal ScalingRapidly growing business means more shards

Application w/ driver & mongos

…16 more shards…

mongod

What is a Shard Key?• Shard key must be indexed• Shard key is used to partition your collection• Shard key must exist in every document• Shard key is immutable• Shard key values are immutable• Shard key is used to route requests to shards

See How to Choose a Shard Key: The Card Gamehttps://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/

https://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/

Shard Key Characteristics• A good shard key has:

– sufficient cardinality– distributed writes– targeted reads ("query isolation")

• Shard key should be in every query, if possible– Scatter-gather otherwise

• Choosing a good shard key is important!– affects performance and scalability– changing it later can be expensive

Range-based Sharding

Bagpipes Iceberg Snow Cone

A - C D - O P - Z

Shard

Shard key range

Shard key

Balancing

Dates Dragons

A - C D - O P - Z

Balancing

A - De Df - O P - Z

Background process balances data across shards

Other Forms of Sharding

There are more advanced types of sharding that are discussed in our sharding webinars.

• Tag-aware, aka zone partitioning, is a special case of range-based sharding that allows for data locality

• Hash-based, aka hash partitioning, uses a hashed value derived from the shard key(s) for assignment

Examples of Scale

Cluster, Performance & Data Scale

Cluster Scale Performance Scale Data Scale

Entertain Co.

1400 servers

250M Ticks / Sec Petabytes

Asian Internet

Co.

1000+ servers

300K+ Ops / Sec

10s of billions of objects

250+ servers Fed Agency 500K+

Ops / Sec13B

documents

Technology

Webinar: Scaling MongoDB