MongoDB Tokyo Design

Alvin Richards -‐ [email protected] Director, 10gen Inc.@jonnyeight

Basic Application & Schema Design

Topics

Schema design is easy!• Data as Objects in code

Common patterns• Single table inheritance• One-to-Many & Many-to-Many• Trees• Queues

Use MongoDB with your language

10gen Supported Drivers• Ruby, Python, Perl, PHP, Javascript• Java, C/C++, C#, Scala• Erlang, Haskell

Object Data Mappers• Morphia - Java• Mongoid, MongoMapper - Ruby

Community Drivers• F# , Smalltalk, Clojure, Go, Groovy

So today’s example will use...

Design your objects in your code - Java using Driver// Get a connection to the databaseDBCollection coll = new Mongo().getDB("blogs");

// Create the ObjectMap<String, Object> obj = new HashMap...obj.add("author", "Hergé"); obj.add("text", "Destination Moon");obj.add("date", new Date());

// Insert the object into MongoDBcoll.insert(new BasicDBObject(obj));

Design your objects in your code - Java using Object Data Mapper// Use Morphia annotations@Entityclass Blog { @Id String author; @Indexed Date date; String text;}

Design your objects in your code - Java using Object Data Mapper// Create the data storeDatastore ds = new Morphia().createDatastore()

// Create the ObjectBlog entry = new Blog("Hergé", New Date(), "Destination Moon")

// Insert object into MongoDBds.save(entry);

Terminology

RDBMS MongoDB

Table Collection

Row(s) JSON Document

Index Index

Join Embedding & Linking

Partition Shard

Partition Key Shard Key

Schema DesignRelational Database

Schema DesignMongoDB

Schema DesignMongoDB embedding

Schema DesignMongoDB linking

Design Session

Design documents that simply map to your application> post = {author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), text: "Destination Moon", tags: ["comic", "adventure"]}

> db.posts.save(post)

> db.posts.find()

{ _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), text: "Destination Moon", tags: [ "comic", "adventure" ] } Notes:• ID must be unique, but can be anything you’d like• MongoDB will generate a default ID if one is not supplied

Find the document

Secondary index for “author”

// 1 means ascending, -‐1 means descending

> db.posts.ensureIndex({author: 1})

> db.posts.find({author: 'Hergé'}) { _id: ObjectId("4c4ba5c0672c685e5e8aabf3"), date: ISODate("2011-‐09-‐18T09:56:06.298Z"), author: "Hergé", ... }

Add and index, find via Index

Examine the query plan> db.blogs.find({author: "Hergé"}).explain(){ "cursor" : "BtreeCursor author_1", "nscanned" : 1, "nscannedObjects" : 1, "n" : 1, "millis" : 5, "indexBounds" : { "author" : [ [ "Hergé", "Hergé" ] ] }}

Query operatorsConditional operators: $ne, $in, $nin, $mod, $all, $size, $exists, $type, .. $lt, $lte, $gt, $gte, $ne,

// find posts with any tags> db.posts.find({tags: {$exists: true}})



Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })



Regular expressions:// posts where author starts with h> db.posts.find({author: /^h/i })

Counting: // number of posts written by Hergé> db.posts.find({author: "Hergé"}).count()

Extending the Schema new_comment = {author: "Kyle", date: new Date(), text: "great book"}

> db.posts.update( {text: "Destination Moon" }, { "$push": {comments: new_comment}, "$inc": {comments_count: 1}})

> db.blogs.find({_id: ObjectId("4c4ba5c0672c685e5e8aabf3")})

{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"), author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), text : "Destination Moon", tags : [ "comic", "adventure" ], comments : [ { author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z"), text : "great book" } ], comments_count: 1 }

Extending the Schema

// create index on nested documents:> db.posts.ensureIndex({"comments.author": 1})

> db.posts.find({"comments.author":"Kyle"})




// find last 5 posts:> db.posts.find().sort({date:-‐1}).limit(5)




// find last 5 posts:> db.posts.find().sort({date:-‐1}).limit(5)

// most commented post:> db.posts.find().sort({comments_count:-‐1}).limit(1)

When sorting, check if you need an index


Common Patterns

Inheritance

Single Table Inheritance - RDBMS

shapes tableid type area radius length width

1 circle 3.14 1

2 square 4 2

3 rect 10 5 2

Single Table Inheritance - MongoDB> db.shapes.find() { _id: "1", type: "circle",area: 3.14, radius: 1} { _id: "2", type: "square",area: 4, length: 2} { _id: "3", type: "rect", area: 10, length: 5, width: 2}

missing values not stored!


// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})


// find shapes where radius > 0 > db.shapes.find({radius: {$gt: 0}})

// create index> db.shapes.ensureIndex({radius: 1}, {sparse:true})

index only values present!

One to ManyOne to Many relationships can specify• degree of association between objects• containment• life-cycle

One to Many- Embedded Array - $slice operator to return subset of comments - some queries harder e.g find latest comments across all blogs

blogs: { author : "Hergé", date : ISODate("2011-‐09-‐18T09:56:06.298Z"), comments : [ { author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z"), text : "great book" } ]}

One to Many- Normalized (2 collections) - most flexible - more queries

blogs: { _id: 1000, author: "Hergé", date: ISODate("2011-‐09-‐18T09:56:06.298Z"), comments: [ {comment : 1)} ]}

comments : { _id : 1, blog: 1000, author : "Kyle", date : ISODate("2011-‐09-‐19T09:56:06.298Z")}

> blog = db.blogs.find({text: "Destination Moon"});> db.comments.find({blog: blog._id});

One to Many - patterns

- Embedded Array / Array Keys

- Embedded Array / Array Keys- Normalized

Many - ManyExample: - Product can be in many categories- Category can have many products

products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }

Many - Many

products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }

categories: { _id: 21, name: "movie", product_ids: [ 10 ] }

Many - Many

products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }

categories: { _id: 21, name: "movie", product_ids: [ 10 ] }

//All categories for a given product> db.categories.find({product_ids: 10})

Many - Many

products: { _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] } categories: { _id: 20, name: "adventure"}

Alternative


// All products for a given category> db.products.find({category_ids: 20)})

Alternative


// All products for a given category> db.products.find({category_ids: 20)})

// All categories for a given productproduct = db.products.find(_id : some_id)> db.categories.find({_id : {$in : product.category_ids}})

Alternative

TreesHierarchical information

TreesFull Tree in Document

{ comments: [ { author: “Kyle”, text: “...”, replies: [ {author: “James”, text: “...”, replies: []} ]} ]}

Pros: Single Document, Performance, Intuitive

Cons: Hard to search, Partial Results, 16MB limit

Array of Ancestors- Store all Ancestors of a node { _id: "a" } { _id: "b", thread: [ "a" ], replyTo: "a" } { _id: "c", thread: [ "a", "b" ], replyTo: "b" } { _id: "d", thread: [ "a", "b" ], replyTo: "b" } { _id: "e", thread: [ "a" ], replyTo: "a" } { _id: "f", thread: [ "a", "e" ], replyTo: "e" }

// find all threads where "b" is in

> db.msg_tree.find({thread: "b"})

A B C

DE

F




// find all direct message "b: replied to

> db.msg_tree.find({replyTo: "b"})

A B C

DE

F




// find all direct message "b: replied to

> db.msg_tree.find({replyTo: "b"})

//find all ancestors of f:> threads = db.msg_tree.findOne({_id:"f"}).thread> db.msg_tree.find({_id: { $in : threads})

A B C

DE

F

Trees as PathsStore hierarchy as a path expression- Separate each node by a delimiter, e.g. “/”- Use text search for find parts of a tree

{ comments: [ { author: "Kyle", text: "initial post", path: "" }, { author: "Jim", text: "jim’s comment", path: "jim" }, { author: "Kyle", text: "Kyle’s reply to Jim", path : "jim/kyle"} ] }

// Find the conversations Jim was part of > db.posts.find({path: /^jim/i})

Queue• Need to maintain order and state• Ensure that updates are atomic

db.jobs.save( { inprogress: false, priority: 1, ... });

// find highest priority job and mark as in-‐progressjob = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -‐1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})

Queue• Need to maintain order and state• Ensure that updates are atomic

db.jobs.save( { inprogress: false, priority: 1, ... });

// find highest priority job and mark as in-‐progressjob = db.jobs.findAndModify({ query: {inprogress: false}, sort: {priority: -‐1}, update: {$set: {inprogress: true, started: new Date()}}, new: true})

Queue

{ inprogress: true, priority: 1, started: ISODate("2011-‐09-‐18T09:56:06.298Z") ... }

updated

added

Summary

Schema design is different in MongoDB

Basic data design principals stay the same

Focus on how the application manipulates data

Rapidly evolve schema to meet your requirements

Enjoy your new freedom, use it wisely :-)

@mongodb

conferences, appearances, and meetupshttp://www.10gen.com/events

http://bit.ly/mongo> Facebook | Twitter | LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

[email protected]

Documents

MongoDB Tokyo Design