32
SoftwarePeople Md Khairul Anam Introduction to Availability & Scalability in MongoDB

Availability and scalability in mongo

Embed Size (px)

Citation preview

SoftwarePeople

Md Khairul Anam

Introduction to Availability & Scalability in MongoDB

Availability

Replica Set – Creation

Replica Set – Initialize

Replica Set – Failure

Replica Set – Failover

Replica Set – Recovery

Replica Set – Recovered

Replica Set Roles

• Heartbeats• Priority Comparisons• Optime• Connections• Networka Partitions

Factors and Conditions that Affect Elections

Strong Consistency

Delayed Consistency

Maintenance and Upgrade

• Rolling upgrade/maintenance– Start with Secondary– Primary last

Replica Set – 1 Data Center

Single datacenter

Single switch & power

Points of failure:– Power– Network– Data center

Automatic recovery of single node crash

Replica Set – 2 Data Centers

Multi data center

DR node for safety

Can’t do multi data center durable write safely since only 1 node in distant DC

Replica Set – 3 Data Centers

Three data centers

Can survive full data center loss

Can do w= { dc : 2 } to guarantee write in 2 data centers

Questions?

Scalability

User Growth– 1995: 0.4% of the world’s population– Today: 30% of the world is online (~2.2B)

Data Set Growth– Facebook’s data set is around 100 petabytes– 4 billion photos taken in the last year (4x a decade

ago

Examining Growth

Read/Write Throughput Exceeds I/O

Working SetIndexes

Data

Working SetIndexes Dat

a

Working Set Exceeds Physical Memory

Vertical Scalability (Scale Up)

Horizontal Scalability (Scale Out)

Custom Hardware– Oracle

Custom Software– Facebook + MySQL– Google

MongoDB Auto-Sharding

A data store that is– Free– Publicly available– Open Source (https://github.com/mongodb/mongo)– Horizontally scalable– Application independent

Data Store Scalability Solutions

Sharded Cluster Architecture

• Shard is a node of the cluster• Shard can be a single mongod or a

replica set

What is a Shard?

Config Server– Stores cluster chunk ranges and locations– Can have only 1 or 3 (production must have

3)– Not a replica set

Meta Data Storage

Mongos– Acts as a router / balancer– No local data (persists to config database)– Can have 1 or many

Routing and Managing Data

• User defines shard key

• Shard key defines range of data

• Key space is like points on a line

• Range is a segment of that line

Partitioning

• Shard key is used to partition your collection

• Shard key must exist in every document

• Shard key must be indexed

• Shard key is used to route requests to shards

What is a Shard Key

Shards and Shard Keys

Shard

Shard key range

• Initially 1 chunk

• Default max chunk size: 64mb

• MongoDB automatically splits & migrates chunks when max reached

Data Distribution

• Targeted Queries

• Scatter Gather Queries

• Scatter Gather Queries with Sort

Cluster Request Routing

Questions?

Thank You