26
Aggregation New framework in MongoDB Alvin Richards Technical Director, EMEA [email protected] @jonnyeight 1

MongoDB Berlin Aggregation

Embed Size (px)

DESCRIPTION

Aggregation with MongoDB and introducing the new aggregation framework... think Unix pipes for JSON data!

Citation preview

Page 1: MongoDB Berlin Aggregation

AggregationNew framework in MongoDB

Alvin Richards

Technical Director, [email protected]

@jonnyeight

1

Page 2: MongoDB Berlin Aggregation

What problem are we solving?

• Map/Reduce can be used for aggregation…• Currently being used for totaling, averaging, etc

• Map/Reduce is a big hammer• Simpler tasks should be easier

• Shouldn’t need to write JavaScript• Avoid the overhead of JavaScript engine

• We’re seeing requests for help in handling complex documents• Select only matching subdocuments or arrays

2

Page 3: MongoDB Berlin Aggregation

How will we solve the problem?

• New aggregation framework• Declarative framework (no JavaScript)• Describe a chain of operations to apply• Expression evaluation

• Return computed values• Framework: new operations added easily• C++ implementation

3

Page 4: MongoDB Berlin Aggregation

Aggregation - Pipelines

• Aggregation requests specify a pipeline• A pipeline is a series of operations• Members of a collection are passed

through a pipeline to produce a result• e.g. ps -ef | grep -i mongod

4

Page 5: MongoDB Berlin Aggregation

Example - twitter{

   "_id"  :  ObjectId("4f47b268fb1c80e141e9888c"),

   "user"  :  {

       "friends_count"  :  73,

       "location"  :  "Brazil",

       "screen_name"  :  "Bia_cunha1",

       "name"  :  "Beatriz  Helena  Cunha",

       "followers_count"  :  102,

   }

}

• Find the # of followers and # friends by location

5

Page 6: MongoDB Berlin Aggregation

Example - twitterdb.tweets.aggregate(    {$match:        {"user.friends_count":      {  $gt:  0  },            "user.followers_count":  {  $gt:  0  }        }    },    {$project:        {  location:    "$user.location",            friends:      "$user.friends_count",              followers:  "$user.followers_count"        }    },    {$group:        {_id:              "$location",          friends:      {$sum:  "$friends"},          followers:  {$sum:  "$followers"}        }    });

6

Page 7: MongoDB Berlin Aggregation

Example - twitterdb.tweets.aggregate(    {$match:        {"user.friends_count":      {  $gt:  0  },            "user.followers_count":  {  $gt:  0  }        }    },    {$project:        {  location:    "$user.location",            friends:      "$user.friends_count",              followers:  "$user.followers_count"        }    },    {$group:        {_id:              "$location",          friends:      {$sum:  "$friends"},          followers:  {$sum:  "$followers"}        }    });

Predicate

7

Page 8: MongoDB Berlin Aggregation

Example - twitterdb.tweets.aggregate(    {$match:        {"user.friends_count":      {  $gt:  0  },            "user.followers_count":  {  $gt:  0  }        }    },    {$project:        {  location:    "$user.location",            friends:      "$user.friends_count",              followers:  "$user.followers_count"        }    },    {$group:        {_id:              "$location",          friends:      {$sum:  "$friends"},          followers:  {$sum:  "$followers"}        }    });

Predicate

Parts of the document you want to project

8

Page 9: MongoDB Berlin Aggregation

Example - twitterdb.tweets.aggregate(    {$match:        {"user.friends_count":      {  $gt:  0  },            "user.followers_count":  {  $gt:  0  }        }    },    {$project:        {  location:    "$user.location",            friends:      "$user.friends_count",              followers:  "$user.followers_count"        }    },    {$group:        {_id:              "$location",          friends:      {$sum:  "$friends"},          followers:  {$sum:  "$followers"}        }    });

Predicate

Parts of the document you want to project

Function to apply to the

result set

9

Page 10: MongoDB Berlin Aggregation

Example - twitter{   "result"  :  [     {       "_id"  :  "Far  Far  Away",       "friends"  :  344,       "followers"  :  789     },...   ],   "ok"  :  1}

10

Page 11: MongoDB Berlin Aggregation

Pipeline Operations• $match

• Uses a query predicate (like .find({…})) as a filter• $project

• Uses a sample document to determine the shape of the result (similar to .find()’s optional argument)• This can include computed values

• $unwind• Hands out array elements one at a time

• $group• Aggregates items into buckets defined by a key

11

Page 12: MongoDB Berlin Aggregation

Pipeline Operations (continued)

• $sort• Sort documents

• $limit• Only allow the specified number of

documents to pass• $skip

• Skip over the specified number of documents

12

Page 13: MongoDB Berlin Aggregation

Computed Expressions

• Available in $project operations• Prefix expression language

• $add:[“$field1”, “$field2”]• $ifNull:[“$field1”, “$field2”]• Nesting:

$add:[“$field1”, $ifNull:[“$field2”, “$field3”]]• Other functions….

• $divide, $mod, $multiply

13

Page 14: MongoDB Berlin Aggregation

Computed Expressions

• String functions• $toUpper, $toLower, $substr

• Date field extraction• $year, $month, $day, $hour...

• Date arithmetic• $ifNull• Ternary conditional

• Return one of two values based on a predicate

14

Page 15: MongoDB Berlin Aggregation

Projections

• $project can reshape results• Include or exclude fields• Computed fields

• Arithmetic expressions• Pull fields from nested documents to the top• Push fields from the top down into new virtual

documents

15

Page 16: MongoDB Berlin Aggregation

Unwinding

• $unwind can “stream” arrays• Array values are doled out one at time in the

context of their surrounding documents• Makes it possible to filter out elements before

returning

16

Page 17: MongoDB Berlin Aggregation

Grouping

• $group aggregation expressions• Define a grouping key as the _id of the result• Total grouped column values: $sum• Average grouped column values: $avg• Collect grouped column values in an array or

set: $push, $addToSet• Other functions

• $min, $max, $first, $last

17

Page 18: MongoDB Berlin Aggregation

Sorting

• $sort can sort documents• Sort specifications are the same as today,

e.g., $sort:{ key1: 1, key2: -1, …}

18

Page 19: MongoDB Berlin Aggregation

DemoDemo  files  are  at  https://gist.github.com/2036709

19

Page 20: MongoDB Berlin Aggregation

Usage Tips

• Use $match in a pipeline as early as possible• The query optimizer can then be used to

choose an index and avoid scanning the entire collection

• Use $sort in a pipeline as early as possible• The query optimizer can sometimes be used

to choose an index to scan instead of sorting the result

20

Page 21: MongoDB Berlin Aggregation

Driver Support

• Initial version is a command• For any language, build a JSON database

object, and execute the command• { aggregate : <collection>, pipeline : {…} }

• Beware of result size limit of 16MB

21

Page 22: MongoDB Berlin Aggregation

When is this being released?

• Now!• 2.1.0 - unstable• 2.2.0 - stable (soon)

22

Page 23: MongoDB Berlin Aggregation

Sharding support

• Initial release supports sharding• Mongos analyzes pipeline

• forwards operations up to $group or $sort to shards

• combines shard server results and returns them

23

Page 24: MongoDB Berlin Aggregation

Pipeline Operations – Future

• $out• Saves the document stream to a collection• Similar to M/R $out, but with sharded output• Functions like a tee, so that intermediate

results can be saved

24

Page 25: MongoDB Berlin Aggregation

Documentation, Bug Reports• http://www.mongodb.org/display/DOCS/

Aggregation+Framework

• https://jira.mongodb.org/browse/SERVER/component/10840

25

Page 26: MongoDB Berlin Aggregation

@mongodb

conferences,  appearances,  and  meetupshttp://www.10gen.com/events

http://bit.ly/mongoE  Facebook                    |                  Twitter                  |                  LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

[email protected]

26