07.2-Mongo DB 1

Embed Size (px)

Citation preview

  • 7/24/2019 07.2-Mongo DB 1

    1/57

    Principles and Techniques of DBMS

    Mongo

    DB 1

    Haopeng Chen

    REliable, INtelligent andScalable Systems Group (REINS)

    Shanghai Jiao Tong University

    Shanghai, China

    http://reins.se.sjtu.edu.cn/~chenhp

    e-mail: [email protected]

    http://reins.se.sjtu.edu.cn/~chenhphttp://reins.se.sjtu.edu.cn/~chenhp
  • 7/24/2019 07.2-Mongo DB 1

    2/57

    REliable, INtelligent & Scalable SystemsOverview

    Mongo DB specification Basic concept of Mongo DB

    Basic operations with shell

    Access Mongo DB in Java

    2

  • 7/24/2019 07.2-Mongo DB 1

    3/57

    REliable, INtelligent & Scalable SystemsMongoDB specification

    Mongo DB is a document-oriented database, not a relationalone ,which makes scaling out easier

    It can balance data and load across a cluster, redistributingdocuments automatically

    It supports MapReduce and other aggregation tools, andsupports generic secondary indexes

    MongoDB could do some administration by itself, if a master

    server goes down ,MongoDB can automatically failover to abackup slave and promote the slave to the master

    3

  • 7/24/2019 07.2-Mongo DB 1

    4/57

    REliable, INtelligent & Scalable SystemsBasic concept of MongoDB

    A documentis the basic unit of data for MongoDB

    A collection can be thought of as the schema-free equivalent ofa table, the documents in the same collection could havedifferent shapes or types.

    Every document has a special key _id, it is unique across the

    documents collection

    Mongo DB groups collections into database and each database

    has its own permission and be stored in separate disks

    4

  • 7/24/2019 07.2-Mongo DB 1

    5/57

    REliable, INtelligent & Scalable SystemsDocument

    Simple document

    A document is roughly equivalent to a row in a relational database,which contain one or multiple key-value pairs

    {"greeting" : "Hello, world!"}

    Most documents will be more complex than this simple one and often

    will contain multiple key/value pairs: {"greeting" : "Hello, world!", "foo" : 3}

    Key/value pairs in documents are orderedthe earlier document isdistinct from the following document:

    {"foo" : 3, "greeting" : "Hello, world!"}

    Values in documents are not just blobs. They can be one of severaldifferent data types

    5

  • 7/24/2019 07.2-Mongo DB 1

    6/57

    REliable, INtelligent & Scalable SystemsDocument

    Simple document

    Keys must notcontain the character \0 (the null character). The . and $ characters have some special properties and should be used

    only in certain circumstances

    Keys starting with _ should be considered reserved; although this is notstrictly enforced.

    MongoDB is type-sensitive and case-sensitive.

    {"foo" : 3}

    {"foo" : "3"}

    {"Foo" : 3}

    Documents in MongoDB cannot contain duplicate keys.

    {"greeting" : "Hello, world!", "greeting" : "Hello, MongoDB!"} // illegal

    6

  • 7/24/2019 07.2-Mongo DB 1

    7/57

    REliable, INtelligent & Scalable SystemsDocument

    Embedded document

    embedded documents are entire Mongo DB documents that are used as the

    values for a key in another document,{

    "name" : "John Doe",

    "address" : {

    "street" : "123 Park Street",

    "city" : "Anytown",

    "state" : "NY"

    }

    }

    7

  • 7/24/2019 07.2-Mongo DB 1

    8/57

    REliable, INtelligent & Scalable SystemsCollection

    A collection is a group of documents

    If a document is the MongoDB analog of a row in a relational database,then a collection can be thought of as the analog to a table.

    Collections are schema-free.

    This means that the documents within a single collection can have anynumber of different shapes.

    {"greeting" : "Hello, world!"}

    {"foo" : 5}

    8

  • 7/24/2019 07.2-Mongo DB 1

    9/57

    REliable, INtelligent & Scalable SystemsCollection

    Why shouldwe use more than one collection?

    Keeping different kinds of documents in the same collection can be anightmare for developers and admins.

    It is much faster to get a list of collections than to extract a list of the

    types in a collection.

    Grouping documents of the same kind together in the same collectionallows for data locality.

    We begin to impose some structure on our documents when we createindexes.

    9

  • 7/24/2019 07.2-Mongo DB 1

    10/57

    REliable, INtelligent & Scalable SystemsCollection

    Naming

    The empty string ("") is not a valid collection name.

    You should not create any collections that start with system.

    User-created collections should not contain the reserved character $ inthe name.

    Subcollections

    One convention for organizing collections is to use namespacedsubcollections separated by the . character.

    For example, an application containing a blog might have a collectionnamed blog.posts and a separate collection named blog.authors.

    This is for organizational purposes onlythere is no relationship betweenthe blog collection (it doesnt even have to exist) and its children.

    10

  • 7/24/2019 07.2-Mongo DB 1

    11/57

    REliable, INtelligent & Scalable SystemsDatabase

    In addition to grouping documents by collection, MongoDB

    groups collections into databases. A database has its own permissions, and each database is stored in

    separate files on disk.

    A good rule of thumb is to store all data for a single application in the

    same database.

    There are also several reserved database names, which you canaccess directly but have special semantics. These are as follows:

    admin: This is the root database, in terms of authentication.

    local: This database will never be replicated and can be used to store anycollections that should be local to a single server.

    config: When Mongo is being used in a sharded setup (see Chapter 10),the config database is used internally to store information about theshards.

    11

  • 7/24/2019 07.2-Mongo DB 1

    12/57

    REliable, INtelligent & Scalable SystemsGetting and Starting MongoDB

    To start the server, run the mongod executable:$ ./mongod

    ./mongod --help for help and startup options

    Sun Mar 28 12:31:20 Mongo DB : starting : pid = 44978 port = 27017

    dbpath = /data/db/ master = 0 slave = 0 64-bit

    Sun Mar 28 12:31:20 db version v1.5.0-pre-, pdfile version 4.5

    Sun Mar 28 12:31:20 git version: ...Sun Mar 28 12:31:20 sys info: ...

    Sun Mar 28 12:31:20 waiting for connections on port 27017

    Sun Mar 28 12:31:20 web admin interface listening on port 28017

    When run with no arguments, mongod will use the default datadirectory,/data/db/ (or C:\data\db\ on Windows), and port27017.

    If the data directory does not already exist or is not writable, the server

    will fail to start.12

  • 7/24/2019 07.2-Mongo DB 1

    13/57

    REliable, INtelligent & Scalable SystemsA MongoDB Client

    MongoDB comes with a JavaScriptshell

    $ ./mongo

    MongoDB shell version: 1.6.0

    url: test

    connecting to: test

    type "help" for help>

    The shell contains some add-ons that are not valid JavaScriptsyntax but were implemented because of their familiarity tousers of SQL shells.

    13

  • 7/24/2019 07.2-Mongo DB 1

    14/57

    REliable, INtelligent & Scalable SystemsCreate

    Insertingdb.foo.insert({"bar" : "baz"})

    Batch Insert

    A batch insert is a single TCP request

    They are useful only if you are inserting multiple documents into asingle collection: you cant use batch inserts to insert into multiple

    collections by one batch insert

    MongoDB dont accept message longer than 16MB, so there is a limit tohow much can be inserted in single batch insert

    14

  • 7/24/2019 07.2-Mongo DB 1

    15/57

    REliable, INtelligent & Scalable SystemsRead

    find returns all of the documents in a collection.

    If we just want to see one document from a collection, we canuse findOne:

    > db.blog.findOne()

    {

    "_id" : ObjectId("4b23c3ca7525f35f94b60a2d"),

    "title" : "My Blog Post",

    "content" : "Here's my blog post.",

    "date" : "Sat Dec 12 2009 11:23:21 GMT-0500 (EST)"

    } find and findOne can also be passed criteria in the form of a

    query document.

    This will restrict the documents matched by the query.

    15

  • 7/24/2019 07.2-Mongo DB 1

    16/57

    REliable, INtelligent & Scalable SystemsUpdate

    If we would like to modify our post, we can use update.

    two parameters: a query document, which locates documents to update,and a modifier document, and the update are atomic

    db.users.update({"name" : "joe"}, newjoe);

    16

  • 7/24/2019 07.2-Mongo DB 1

    17/57

    REliable, INtelligent & Scalable SystemsUpdating

    Using Modifiers

    Update modifiers are special keys that can be used to specify complexupdate operations, such as altering, adding, or removing keys, and evenmanipulating arrays and embedded documents.

    if the document of URL is like this :{

    "_id" : ObjectId("4b253b067525f35f94b60a31"),

    "url" : "www.example.com",

    "pageviews" : 52

    }

    > db.analytics.update({"url" : "www.example.com"},

    {"$inc" : {"pageviews" : 1}})

    17

  • 7/24/2019 07.2-Mongo DB 1

    18/57

    REliable, INtelligent & Scalable SystemsDelete

    Removing documentsdb.users.remove( )

    db.mailing.list.remove({"opt-out" : true})

    Drop method

    db.drop_collection("bar")

    18

  • 7/24/2019 07.2-Mongo DB 1

    19/57

    REliable, INtelligent & Scalable SystemsData Types

    Documents in MongoDB can be thought of as JSON-like in that they areconceptually similar to objects in JavaScript.

    null: {"x" : null}

    boolean: {"x" : true}

    32-bit integer

    64-bit integer

    64-bit floating point number: {"x" : 3.14}

    string: {"x" : "foobar"} symbol

    object id: {"x" : ObjectId()}

    date: {"x" : new Date()}

    regular expression: {"x" : /foobar/i}

    code: {"x" : function() { /* ... */ }}

    binary data

    maximum value

    minimum value

    undefined: {"x" : undefined}

    array: {"x" : ["a", "b", "c"]}

    embedded document: {"x" : {"foo" : "bar"}}19

  • 7/24/2019 07.2-Mongo DB 1

    20/57

    REliable, INtelligent & Scalable SystemsQuerying

    You can perform ad hoc queries on the database using the find

    or findOne functions and a query document.

    You can query for ranges, set inclusion, inequalities, and moreby using $ conditionals.

    Query Criteria "$lt", "$lte", "$gt", "$gte", and"$ne" are all comparisonoperators, corresponding to =, and not equal respectively.

    They can be combined to look for a range of values.

    e.g.

    > db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})

    20

  • 7/24/2019 07.2-Mongo DB 1

    21/57

    REliable, INtelligent & Scalable SystemsQuerying

    You can use a $where clause to harness the full expressive

    power of JavaScript

    Queries return a database cursor, which lazily returns batchesof documents as you need them.

    There are a lot of meta operations you can perform on a cursor,

    including

    skipping a certain number of results,

    limiting the number of results returned,

    and sorting results.

    21

  • 7/24/2019 07.2-Mongo DB 1

    22/57

    REliable, INtelligent & Scalable SystemsQuerying

    Cursor Internals

    There are two sides to a cursor: the client-facing cursor and the databasecursor that the client-side one represents

    On the client side

    The implementations of cursors generally allow you to control a great

    deal about the eventual output of a query.> for(i=0; i var cursor = db.collection.find();

    > while (cursor.hasNext()) {

    obj = cursor.next();

    // do stuff

    }

    22

  • 7/24/2019 07.2-Mongo DB 1

    23/57

    REliable, INtelligent & Scalable SystemsQuerying

    Cursor Internals

    There are two sides to a cursor: the client-facing cursor and the databasecursor that the client-side one represents

    On the client side

    > db.c.find().limit(3)

    > db.c.find().skip(3)> db.c.find().sort({username : 1, age : -1})

    > db.stock.find({"desc" : "mp3"}).limit(50).sort({"price" : -1})

    > db.stock.find({"desc" : "mp3"}).limit(50).skip(50).sort({"price" : -1})

    > // do not use: slow for large skips

    > var page1 = db.foo.find(criteria).limit(100)

    > var page2 = db.foo.find(criteria).skip(100).limit(100)

    > var page3 = db.foo.find(criteria).skip(200).limit(100)

    23

  • 7/24/2019 07.2-Mongo DB 1

    24/57

    REliable, INtelligent & Scalable SystemsQuerying

    On the client side Getting Consistent Result

    24

  • 7/24/2019 07.2-Mongo DB 1

    25/57

    REliable, INtelligent & Scalable SystemsQuerying

    Cursor Internals

    There are two sides to a cursor: the client-facing cursor and the databasecursor that the client-side one represents

    On the server side

    a cursor takes up memory and resources.

    Once a cursor runs out of results or the client sends a message telling itto die, the database can free the resources it was using.

    25

  • 7/24/2019 07.2-Mongo DB 1

    26/57

    REliable, INtelligent & Scalable SystemsQuerying

    There are a couple of conditions that can cause the death (and

    subsequent cleanup) of a cursor. First, when a cursor finishes iterating through the matching results, it

    will clean itself up.

    Another way is that, when a cursor goes out of scope on the client side,the drivers send the database a special message to let it know that it cankill that cursor.

    This death by timeout is usually the desired behavior: very few

    applications expect their users to sit around for minutes at a timewaiting for results.

    26

  • 7/24/2019 07.2-Mongo DB 1

    27/57

    REliable, INtelligent & Scalable SystemsIndexing

    Normal index

    When only a single key is used in the query, that key can be indexed toimprove the querys speed.

    > db.people.ensureIndex({"username" : 1})

    As a rule of thumb, you should create an index that contains all of thekeys in your query.

    > db.people.find({"date" : date1}).sort({"date" : 1, "username" : 1}

    > db.ensureIndex({"date" : 1, "username" : 1})

    If you have more than one key, you need to start thinking about indexdirection.

    > db.ensureIndex{"username" : 1, "age" : -1}.

    27

  • 7/24/2019 07.2-Mongo DB 1

    28/57

    REliable, INtelligent & Scalable SystemsIndexing

    There are several questions to keep in mind when deciding

    what indexes to create: What are the queries you are doing? Some of these keys will need to be

    indexed.

    What is the correct direction for each key?

    How is this going to scale? Is there a different ordering of keys thatwould keep more of the frequently used portions of the index inmemory?

    28

  • 7/24/2019 07.2-Mongo DB 1

    29/57

    REliable, INtelligent & Scalable SystemsIndexing

    Geospatial Indexing finding the nearest N things to a current location.

    MongoDB provides a special type of index for coordinate plane queries,called a geospatial index.

    MongoDBs geospatial indexes assumes that whatever youre indexing is a flat plane.

    This means that results arent perfect for spherical shapes, like the earth,

    especially near the poles.

    29

  • 7/24/2019 07.2-Mongo DB 1

    30/57

    REliable, INtelligent & Scalable SystemsIndexing

    A geospatial index can be created using the ensureIndex

    function, but by passing "2d" as a value instead of 1 or -1:> db.map.ensureIndex({"gps" : "2d"})

    { "gps" : [ 0, 100 ] }

    { "gps" : { "x" : -30, "y" : 30 } }{ "gps" : { "latitude" : -180, "longitude" : 180 } }

    > db.star.trek.ensureIndex({"light-years" : "2d"}, {"min" : -1000, "max" :1000})

    > db.map.find({"gps" : {"$near" : [40, -73]}})

    > db.map.find({"gps" : {"$near" : [40, -73]}}).limit(10)

    30

  • 7/24/2019 07.2-Mongo DB 1

    31/57

    REliable, INtelligent & Scalable SystemsAggregation

    MongoDB provides a number of aggregation tools that go

    beyond basic query functionality. These range from simply counting the number of documents in a

    collection to using MapReduce to do complex data analysis.

    Count returns the number of documents in the collection

    Distinct

    finds all of the distinct values for a given key. You must specify acollection and key:

    > db.runCommand({"distinct" : "people", "key" : "age"})

    then you will get all of the distinct ages like this

    {"values" : [20, 35, 60], "ok" : 1}

    31

  • 7/24/2019 07.2-Mongo DB 1

    32/57

    REliable, INtelligent & Scalable SystemsMongoDB and MapReduce

    There are many optional keys that can be passed to the MapReduce

    command. "finalize" : function, A final step to send reducer's output to.

    "keeptemp" : Boolean, If the temporary result collection should be saved whenthe connection is closed.

    "query" : document, Query to filter documents by before sending to the map

    function. "sort" : document, Sort to use on documents before sending to the map (useful in

    conjunction with

    "scope" : document, Variables that can be used in any of the JavaScript code.

    "verbose" : Boolean, Whether to use more verbose output in the server logs.

    e.g. //this command will come back with todays webpage

    > db.runCommand({"mapreduce" : "webpages", "map" : map, "reduce" : reduce,

    "scope" : {now : new Date()}})

    32

  • 7/24/2019 07.2-Mongo DB 1

    33/57

    REliable, INtelligent & Scalable SystemsAccess MongoDB with Java

    Make a Connection

    MongoClient mongoClient = new MongoClient();

    // or

    MongoClient mongoClient = new MongoClient( "localhost" );

    // or

    MongoClient mongoClient = new MongoClient( "localhost" , 27017 );

    // or, to connect to a replica set, supply a seed list of members

    MongoClient mongoClient = new MongoClient(Arrays.asList(

    new ServerAddress("localhost", 27017),

    new ServerAddress("localhost", 27018),

    new ServerAddress("localhost", 27019)));

    DB db = mongoClient.getDB( "mydb" );

    33

    h

  • 7/24/2019 07.2-Mongo DB 1

    34/57

    REliable, INtelligent & Scalable SystemsAccess MongoDB with Java

    Authentication (Optional)

    MongoClient mongoClient = new MongoClient();

    DB db = mongoClient.getDB("test");

    boolean auth = db.authenticate(myUserName, myPassword);

    Getting a List Of CollectionsSet colls = db.getCollectionNames();

    for (String s : colls) {

    System.out.println(s);}

    34

    h

  • 7/24/2019 07.2-Mongo DB 1

    35/57

    REliable, INtelligent & Scalable SystemsAccess MongoDB with Java

    Getting a Collection

    BCollection coll = db.getCollection("testCollection");

    Setting Write Concern

    mongoClient.setWriteConcern(WriteConcern.JOURNALED);

    Inserting a document

    BasicDBObject doc = new BasicDBObject("name", "MongoDB").

    append("type", "database").

    append("count", 1).

    append("info", new BasicDBObject("x", 203).append("y", 102));

    coll.insert(doc);

    35

    A

    M DB

    i h J

  • 7/24/2019 07.2-Mongo DB 1

    36/57

    REliable, INtelligent & Scalable SystemsAccess MongoDB with Java

    Getting A Single Document with A Query

    BasicDBObject query = new BasicDBObject("i", 71);

    cursor = coll.find(query);

    try {

    while(cursor.hasNext()) {

    System.out.println(cursor.next());

    }

    } finally {

    cursor.close();

    }

    36

    A

    M DB

    ith J

  • 7/24/2019 07.2-Mongo DB 1

    37/57

    REliable, INtelligent & Scalable SystemsAccess MongoDB with Java

    Getting A Set of Documents With a Query

    query = new BasicDBObject("i", new BasicDBObject("$gt", 50));

    // e.g. find all where i > 50

    cursor = coll.find(query);

    try {

    while(cursor.hasNext()) {

    System.out.println(cursor.next());}

    } finally {

    cursor.close();

    }

    37

    A

    M DB

    ith J

  • 7/24/2019 07.2-Mongo DB 1

    38/57

    REliable, INtelligent & Scalable SystemsAccess MongoDB with Java

    Creating An Index

    coll.createIndex(new BasicDBObject("i", 1)); // create index on "i",ascending

    Getting a List of Indexes on a Collection

    List list = coll.getIndexInfo();

    for (DBObject o : list) {

    System.out.println(o);

    }

    38

    A

    M DB

    ith J

  • 7/24/2019 07.2-Mongo DB 1

    39/57

    REliable, INtelligent & Scalable SystemsAccess MongoDB with Java

    Getting A List of Databases

    MongoClient mongoClient = new MongoClient();

    for (String s : m.getDatabaseNames()) {

    System.out.println(s);

    }

    Dropping A Database

    MongoClient mongoClient = new MongoClient();

    mongoClient.dropDatabase("myDatabase");

    39

    Ad d t i

  • 7/24/2019 07.2-Mongo DB 1

    40/57

    REliable, INtelligent & Scalable SystemsAdvanced topics

    Capped Collections

    It is a special collection which is created in advance and is fixed in size.

    When it is full, it will automatically age-out the oldest documents asnew documents are inserted.

    Documents in capped collections cannot be removed or deleted orupdates a document with a larger one

    40

    When the queue is ful l , the oldest element w i l l be replaced by the newest

    Ad d t i

  • 7/24/2019 07.2-Mongo DB 1

    41/57

    REliable, INtelligent & Scalable SystemsAdvanced topics

    Capped collections advantages:

    Firstly, inserts into a capped collection are extremely fast

    Secondly, queries retrieving documents in insertion order are very fast.

    Finally, capped collections have the useful property of automaticallyaging-out old data as new data is inserted.

    Capped collections usage:

    In general, capped collections are good for any case where the auto age-out property is helpful as opposed to undesirable and the limitations onavailable operations are not prohibitive.

    41

    Ad d t i

  • 7/24/2019 07.2-Mongo DB 1

    42/57

    REliable, INtelligent & Scalable SystemsAdvanced topics

    GridFS

    GridFS is a mechanism for storing large binary files in MongoDB. Thereare several reasons why you might consider using GridFS for file storage:

    Using GridFS can simplify your stack.

    GridFS will leverage any existing replication or autosharding that youve setup for MongoDB, so getting failover and scale-out for file storage is easy.

    GridFS can alleviate some of the issues that certain file systems can exhibitwhen being used to store user uploads.

    You can get great disk locality with GridFS, because MongoDB allocates datafiles in 2GB chunks

    42

    Ad i i t ti

  • 7/24/2019 07.2-Mongo DB 1

    43/57

    REliable, INtelligent & Scalable SystemsAdministration

    MongoDB is run as a normal command-line program using the

    mongod executable. MongoDB features a built-in admin interface and monitoring

    functionality that is easy to integrate with third-party monitoringpackages.

    MongoDB supports basic, database-level authenticationincluding read-only users and a separate level of authenticationfor admin access.

    43

    Monitoring

  • 7/24/2019 07.2-Mongo DB 1

    44/57

    REliable, INtelligent & Scalable SystemsMonitoring

    By default,

    starting mongod also starts up a (very) basic HTTP server that listens ona port 1,000 higher than the native driver port.

    To see the admin interface, start the database and go tohttp://localhost:28017 in a web browser.

    This interface gives access to assertion, locking, indexing, andreplication information about the MongoDB server.

    It also gives some more general information, like the log preamble and

    access to a list of available database commands.

    44

    ServerStatus

  • 7/24/2019 07.2-Mongo DB 1

    45/57

    REliable, INtelligent & Scalable SystemsServerStatus

    ServerStatus

    > db.runCommand({"serverStatus" : 1}:1)

    ServerStatus provides a detailed look at what is going on insidea MongoDB server.

    45

    Master

    Slave Replication

  • 7/24/2019 07.2-Mongo DB 1

    46/57

    REliable, INtelligent & Scalable SystemsMaster-Slave Replication

    Master-slave replication is the most general replication mode

    supported by MongoDB. This mode is very flexible and can be used for backup, failover, read

    scaling,

    All slaves must be replicated from a master node.

    There is currently no mechanism for replicating from a slave (daisychaining), because slaves do not keep their own oplog

    46

    Replica Sets

  • 7/24/2019 07.2-Mongo DB 1

    47/57

    REliable, INtelligent & Scalable SystemsReplica Sets

    A replica set is basically a master-slave cluster with automatic

    failover.

    The biggest difference between a master-slave cluster and areplica set is that

    a replica set does not have a single master: one is elected by the clusterand may change to another node if the current master goes down.

    At any point in time, one node in the cluster isprimary, and therest are secondary.

    A secondary node will also write the operation to its own local oplog,however, so that it is capable of becoming the primary

    47

    Failover and Primary Election

  • 7/24/2019 07.2-Mongo DB 1

    48/57

    REliable, INtelligent & Scalable SystemsFailover and Primary Election

    This election process will be initiated by any node that cannot

    reach the primary. The new primary must be elected by a majority of the nodes in the set,

    The new primary will be the node with the highest priority, usingfreshness of data to break ties between nodes with the same priority

    The primary node uses a heartbeat to track how many nodes inthe cluster are visible to it.

    If this falls below a majority, the primary will automatically fall back tosecondary status.

    Whenever the primary changes, the data on the new primary isassumed to be the most up-to-date data in the system

    48

  • 7/24/2019 07.2-Mongo DB 1

    49/57

    REliable, INtelligent & Scalable Systems

    49

    Sharding

  • 7/24/2019 07.2-Mongo DB 1

    50/57

    REliable, INtelligent & Scalable SystemsSharding

    Sharding is MongoDBs approach to scaling out.

    Sharding allows you to add more machines to handle increasing load anddata size without affecting your application

    Manual sharding will work well but become difficult tomaintain when adding or removing nods from the cluster or inface of changing data distributions or load patterns

    MongoDB supports autosharding ,which eliminates some ofthe administrative headaches of manual sharding.

    50

    Autosharding

    in

    MongoDB

  • 7/24/2019 07.2-Mongo DB 1

    51/57

    REliable, INtelligent & Scalable SystemsAutosharding in MongoDB

    The basic concept behind MongoDBs sharding is to break up

    collections into small chunks

    We dont need to know what shard has what data, so we run a

    routerin front of the application, it knows where the data

    located, so applications can connect to it and issue requestsnormally.

    An application will connected to a normal mongod, the router,

    knowing what data is on which shard, is able to forward therequests to the appropriate shard(s).

    51

    Non

    shard client

    vs

    shardded

    client

  • 7/24/2019 07.2-Mongo DB 1

    52/57

    REliable, INtelligent & Scalable SystemsNon-shard client vs shardded client

    52

    In a nonsharded MongoDB setup,you would have a client connecting

    to a mongod process,

    In a sharded setup the clientconnects to a mongos process,

    which abstracts the sharding

    away from the application. .

    Nonsharded client connection Sharded client connection

    When to Shard

  • 7/24/2019 07.2-Mongo DB 1

    53/57

    REliable, INtelligent & Scalable SystemsWhen to Shard

    In general, you should start with a nonsharded setup and

    convert it to a sharded one, if and when you need.

    When the situations like this, you should probably to shard

    Youve run out of disk space on your current machine.

    You want to write data faster than a single mongod can handle.

    You want to keep a larger proportion of data in memory to improveperformance

    53

    Shard Keys

  • 7/24/2019 07.2-Mongo DB 1

    54/57

    REliable, INtelligent & Scalable SystemsShard Keys

    When you set up sharding, you choose a key from a collection

    and use that keys values to split up the data. This key is call ashard key

    For example, If we chose "name" as our shard key, one shard could holddocuments where the "name" started with AF, the next shard couldhold names from GP, and the final shard would hold names from QZ.

    As you added (or removed) shards,MongoDB would rebalancethis data so that each shard was getting a balanced amount of

    traffic and a sensible amount of data

    54

    Chunks

  • 7/24/2019 07.2-Mongo DB 1

    55/57

    REliable, INtelligent & Scalable SystemsChunks

    Suppose we add a new shard. Once this shard is up and

    running, MongoDB will break up the collection into two pieces,called chunks.

    A chunk contains all of the documents for a range of values for

    the shard key for example, if we use "timestamp" as the shard key, so one chunk would

    have documents with a timestamp value between - and, say, June 26,2003, and the other chunk would have timestamps between June 27,2003 and .

    55

    Reference

  • 7/24/2019 07.2-Mongo DB 1

    56/57

    REliable, INtelligent & Scalable SystemsReference

    MongoDB: The Definitive Guide, by Kristina Chodorow and

    Michael Dirolf, Published by OReilly Media, Inc., September2010, ISBN: 978-1-449-38156-1

    Getting Started with Java Driver,http://docs.mongodb.org/ecosystem/tutorial/getting-started-

    with-java-driver/#getting-started-with-java-driver

    56

  • 7/24/2019 07.2-Mongo DB 1

    57/57

    REliable, INtelligent & Scalable Systems

    Thank You!