51
opensource, highperformance, documentoriented database

MongoDB schema design basics

Embed Size (px)

DESCRIPTION

What is document design in MongoDB? In this talk we will cover the history of normalization, how data design changes from a relational to a document design and basic patterns for handling, One-Many, Many-Many, Trees and Stacks.

Citation preview

Page 1: MongoDB schema design basics

open-­‐source,  high-­‐performance,  document-­‐oriented  database  

Page 2: MongoDB schema design basics

Schema Design Basics���

Alvin Richards���[email protected]

Page 3: MongoDB schema design basics

This talk Part One

‣  Intro ‣  Terms / Definitions

‣  Getting a flavor ‣  Creating a Schema

‣  Indexes

‣  Evolving the Schema

Part Two

‣  Data modeling ‣  DBRef

‣  Single Table Inheritance

‣  Many – Many

‣  Trees

‣  Lists / Queues / Stacks

Page 4: MongoDB schema design basics

So why model data?

Page 5: MongoDB schema design basics

A brief history of normalization •  1970 E.F.Codd introduces 1st Normal Form (1NF)

•  1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)

•  1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)

•  2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)

Goals:

•  Avoid anomalies when inserting, updating or deleting

•  Minimize redesign when extending the schema

•  Make the model informative to users

•  Avoid bias towards a particular style of query

* source : wikipedia

Page 6: MongoDB schema design basics

Relational made normalized data look like this

Page 7: MongoDB schema design basics

Document databases make normalized data look like this

Page 8: MongoDB schema design basics

Some terms before we proceed

RDBMS Document DBs

Table Collection

Row(s) JSON Document

Index Index

Join Embedding & Linking across documents

Partition Shard

Partition Key Shard Key

Page 9: MongoDB schema design basics

DB Considerations How can we manipulate

this data ?

•  Dynamic Queries

•  Secondary Indexes

•  Atomic Updates

•  Map Reduce

Access Patterns ?

•  Read / Write Ratio

•  Types of updates

•  Types of queries

•  Data life-cycle

Considerations •  No Joins •  Document writes are atomic

Page 10: MongoDB schema design basics

Design Session

Design documents that simply map to your application

post  =  {author:  “kyle”,                  date:  new  Date(),                  text:  “my  blog  post...”,                  tags:  [“mongodb”,  “intro”]}  

>db.post.save(post)  

Page 11: MongoDB schema design basics

>db.posts.find()

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ] }

Notes: •  ID must be unique, but can be anything you’d like •  MongoDB will generate a default ID if one is not supplied

Find the document

Page 12: MongoDB schema design basics

Secondary index for “author”

// 1 means ascending, -1 means descending

>db.posts.ensureIndex({author: 1})

>db.posts.find({author: 'kyle'})

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", ... }

Add and index, find via Index

Page 13: MongoDB schema design basics

Verifying indexes exist

>db.system.indexes.find()

// Index on ID { name : "_id_", ns : "test.posts", key : { "_id" : 1 } }

// Index on author { _id : ObjectId("4c4ba6c5672c685e5e8aabf4"), ns : "test.posts", key : { "author" : 1 }, name : "author_1" }

Page 14: MongoDB schema design basics

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags >db.posts.find({tags: {$exists: true}})

Page 15: MongoDB schema design basics

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags >db.posts.find({tags: {$exists: true}})

Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i })

Page 16: MongoDB schema design basics

Query operators Conditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags >db.posts.find({tags: {$exists: true}})

Regular expressions: // posts where author starts with k >db.posts.find({author: /^k*/i })

Counting: // posts written by mike    >db.posts.find({author:  “mike”}).count()  

Page 17: MongoDB schema design basics

Extending the Schema

new_comment = {author: “fred”, date: new Date(), text: “super duper”}

new_info = { ‘$push’: {comments: new_comment}, ‘$inc’: {comments_count: 1}}

 >db.posts.update({_id:  “...”  },  new_info)  

Page 18: MongoDB schema design basics

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "kyle", date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)", text : "My first blog", tags : [ "mongodb", "intro" ], comments_count: 1, comments : [

{ author : "Fred", date : "Sat Jul 24 2010 20:51:03 GMT-0700 (PDT)", text : "Super Duper" }

]}

Extending the Schema

Page 19: MongoDB schema design basics

// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1})

>db.posts.find({comments.author:”kyle”})

Extending the Schema

Page 20: MongoDB schema design basics

// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1})

>db.posts.find({comments.author:”kyle”})

// find last 5 posts: >db.posts.find().sort({date:-1}).limit(5)

Extending the Schema

Page 21: MongoDB schema design basics

// create index on nested documents: >db.posts.ensureIndex({"comments.author": 1})

>db.posts.find({comments.author:”kyle”})

// find last 5 posts: >db.posts.find().sort({date:-1}).limit(5)

// most commented post: >db.posts.find().sort({comments_count:-1}).limit(1)

When sorting, check if you need an index

Extending the Schema

Page 22: MongoDB schema design basics

Map Reduce

Aggregation and batch manipulation

Collection in, Collection out

Parallel in sharded environments

Page 23: MongoDB schema design basics

Map reduce mapFunc = function () { this.tags.forEach(function (z) {emit(z, {count:1});}); }

reduceFunc = function (k, v) { var total = 0; for (var i = 0; i < v.length; i++) { total += v[i].count; } return {count:total}; }

res = db.posts.mapReduce(mapFunc, reduceFunc)

>db[res.result].find() { _id : "intro", value : { count : 1 } } { _id : "mongodb", value : { count : 1 } }

Page 24: MongoDB schema design basics

Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more

Page 25: MongoDB schema design basics

Wordnik 9B records, 100M queries / week, 1.2TB {

entry : { header: { id: 0, headword: "m", sourceDictionary: "GCide", textProns : [ {text: "(em)", seq:0} ], syllables: [ {id: 0, text: "m"} ], sourceDictionary: "1913 Webster", headWord: "m", id: 1, definitions: : [ {text: "M, the thirteenth letter..."}, {text: "As a numeral, M stands for 1000"}] } }

}

Page 26: MongoDB schema design basics

Review So Far: - Started out with a simple schema - Queried Data - Evolved the schema - Queried / Updated the data some more

Observations: - Using Rich Documents works well - Simplify relations by embedding them - Iterative development is easy with MongoDB

Page 27: MongoDB schema design basics
Page 28: MongoDB schema design basics

Single Table Inheritance

>db.shapes.find() { _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1} { _id: ObjectId("..."), type: "square", area: 4, d: 2} { _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}

// find shapes where radius > 0 >db.shapes.find({radius: {$gt: 0}})

// create index >db.shapes.ensureIndex({radius: 1})

Page 29: MongoDB schema design basics

One to Many

- Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents

Page 30: MongoDB schema design basics

One to Many

- Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents

- Embedded tree - Single document - Natural - Hard to query

Page 31: MongoDB schema design basics

One to Many

- Embedded Array / Array Keys - slice operator to return subset of array - hard to find latest comments across all documents

- Embedded tree - Single document - Natural - Hard to query

- Normalized (2 collections) - most flexible - more queries

Page 32: MongoDB schema design basics

Many - Many

Example:

- Product can be in many categories - Category can have many products

Products - product_id

Category - category_id

Prod_Categories -  id -  product_id -  category_id

Page 33: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

Many - Many

Page 34: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

Many - Many

Page 35: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

//All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})

Many - Many

Page 36: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia", product_ids: [ ObjectId("4c4ca23933fb5941681b912e"), ObjectId("4c4ca30433fb5941681b9130"), ObjectId("4c4ca30433fb5941681b913a"]}

//All categories for a given product >db.categories.find({product_ids: ObjectId("4c4ca23933fb5941681b912e")})

//All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

Many - Many

Page 37: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

Alternative

Page 38: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

// All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

Alternative

Page 39: MongoDB schema design basics

products: { _id: ObjectId("4c4ca23933fb5941681b912e"), name: "Sumatra Dark Roast", category_ids: [ ObjectId("4c4ca25433fb5941681b912f"), ObjectId("4c4ca25433fb5941681b92af”]}

categories: { _id: ObjectId("4c4ca25433fb5941681b912f"), name: "Indonesia"}

// All products for a given category >db.products.find({category_ids: ObjectId("4c4ca25433fb5941681b912f")})

// All categories for a given product product = db.products.find(_id : some_id) >db.categories.find({_id : {$in : product.category_ids}})

Alternative

Page 40: MongoDB schema design basics

Trees

Full Tree in Document

{ comments: [ { author: “rpb”, text: “...”, replies: [ {author: “Fred”, text: “...”, replies: []} ]} ]}

Pros: Single Document, Performance, Intuitive Cons: Hard to search, Partial Results, 4MB limit

Page 41: MongoDB schema design basics

Trees

Parent Links - Each node is stored as a document - Contains the id of the parent

Child Links - Each node contains the id’s of the children - Can support graphs (multiple parents / child)

Page 42: MongoDB schema design basics

Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }

Page 43: MongoDB schema design basics

Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }

//find all descendants of b: >db.tree2.find({ancestors: ‘b’})

Page 44: MongoDB schema design basics

Array of Ancestors - Store Ancestors of a node { _id: "a" } { _id: "b", ancestors: [ "a" ], parent: "a" } { _id: "c", ancestors: [ "a", "b" ], parent: "b" } { _id: "d", ancestors: [ "a", "b" ], parent: "b" } { _id: "e", ancestors: [ "a" ], parent: "a" } { _id: "f", ancestors: [ "a", "e" ], parent: "e" } { _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }

//find all descendants of b: >db.tree2.find({ancestors: ‘b’})

//find all ancestors of f: >ancestors = db.tree2.findOne({_id:’f’}).ancestors >db.tree2.find({_id: { $in : ancestors})

Page 45: MongoDB schema design basics

findAndModify Queue example

//Example: find highest priority job and mark

job = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -1), update: {$set: {inprogress: true, started: new Date()}}, new: true})

Page 46: MongoDB schema design basics

Cool Stuff - Aggregation - Capped collections - GridFS - Geo

Page 47: MongoDB schema design basics

Learn More •  Kyle’s presentation + video: http://www.slideshare.net/kbanker/mongodb-schema-design http://www.blip.tv/file/3704083

•  Dwight’s presentation http://www.slideshare.net/mongosf/schema-design-with-mongodb-dwight-merriman

•  Documentation Trees: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB Queues: http://www.mongodb.org/display/DOCS/findandmodify+Command Aggregration: http://www.mongodb.org/display/DOCS/Aggregation Capped Col. : http://www.mongodb.org/display/DOCS/Capped+Collections Geo: http://www.mongodb.org/display/DOCS/Geospatial+Indexing GridFS: http://www.mongodb.org/display/DOCS/GridFS+Specification

Page 48: MongoDB schema design basics

Thank You :-) �

Page 49: MongoDB schema design basics

Download MongoDB�

http://www.mongodb.org  

and  let  us  know  what  you  think  @mongodb  

Page 50: MongoDB schema design basics

DBRef DBRef {$ref: collection, $id: id_value}

- Think URL - YDSMV: your driver support may vary

Sample Schema: nr = {note_refs: [{"$ref" : "notes", "$id" : 5}, ... ]}

Dereferencing: nr.forEach(function(r) { printjson(db[r.$ref].findOne({_id: r.$id})); }

Page 51: MongoDB schema design basics

BSON Mongodb stores data in BSON internally

Lightweight, Traversable, Efficient encoding

Typed boolean, integer, float, date, string, binary, array...