61
MongoDB + Spark @blimpyach t

MongoDB & Spark

  • Upload
    mongodb

  • View
    2.626

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MongoDB & Spark

MongoDB + Spark@blimpyacht

Page 2: MongoDB & Spark

Level Setting

Page 3: MongoDB & Spark
Page 4: MongoDB & Spark
Page 5: MongoDB & Spark
Page 6: MongoDB & Spark

TROUGH OF DISILLUSIONMENT

Page 7: MongoDB & Spark

HDFS

Distributed Data

Page 8: MongoDB & Spark

HDFS

YARN

Distributed Resources

Page 9: MongoDB & Spark

HDFS

YARN

MapReduce

Distributed Processing

Page 10: MongoDB & Spark

HDFSYARN

Hive

Pig

Domain Specific Languages

MapReduce

Page 11: MongoDB & Spark

Interactive Shell

Easy (-er)Caching

Page 12: MongoDB & Spark

HDFS

Distributed Data

Page 13: MongoDB & Spark

HDFS

YARN

Distributed Resources

Page 14: MongoDB & Spark

HDFSYARN

SparkHadoop

Distributed Processing

Page 15: MongoDB & Spark

HDFSYARN

Spark

Hadoop

Mesos

Page 16: MongoDB & Spark

HDFSStand Alone

YARN

Spark

Hadoop

Mesos

Page 17: MongoDB & Spark

HDFSStand AloneYARN

SparkHadoop

Mesos

Hive

Pig

Page 18: MongoDB & Spark

HDFSStand Alone

YARN

SparkHadoop

Mesos

Hive

Pig

SparkShell

Page 19: MongoDB & Spark

HDFSStand Alone

YARN

SparkHadoop

Mesos

Hive

Pig

SparkShell

SparkStreaming

Page 20: MongoDB & Spark

HDFS

Stand AloneYAR

N

SparkHadoop

Mesos

Hive

Pig

SparkSQL

SparkShell

SparkStreaming

Page 21: MongoDB & Spark

HDFS

Stand Alone

YARN

Spark

Hadoop

Mesos

Hive

Pig

SparkSQL

SparkShell

SparkStreaming

Page 22: MongoDB & Spark

HDFS

Stand Alone

YARN

Spark

Hadoop

Mesos

Hive

Pig

SparkSQL

SparkShell

SparkStreaming

Page 23: MongoDB & Spark

Stand Alone

YARN

Spark

Hadoop

Mesos

Hive

Pig

SparkSQL

SparkShell

SparkStreaming

Page 24: MongoDB & Spark

SparkStreaming

Hive

SparkShell

MesosHado

op

Pig

SparkSQL

Spark

Stand Alone

YARN

Page 25: MongoDB & Spark

Stand AloneYAR

N

SparkMesos

SparkSQL

SparkShell

SparkStreaming

Page 26: MongoDB & Spark

Stand Alone

YARN

SparkMesos

SparkSQL

SparkShell

SparkStreaming

Page 27: MongoDB & Spark

executor

Worker Node

executor

Worker Node

Driver

Resilient Distributed Datasets

Page 28: MongoDB & Spark

Parallelization

Parellelize = x

Page 29: MongoDB & Spark

Transformations

Parellelize = x

t(x) = x’

t(x’) = x’’

Page 30: MongoDB & Spark

Transformationsfilter( func )union( func )intersection( set )distinct( n )map( function )

Page 31: MongoDB & Spark

Action

f(x’’) = y

Parellelize = x

t(x) = x’

t(x’) = x’’

Page 32: MongoDB & Spark

Actionscollect()count()first()take( n )reduce( function )

Page 33: MongoDB & Spark

Lineage

f(x’’) = y

Parellelize = x

t(x) = x’

t(x’) = x’’

Page 34: MongoDB & Spark

Transform

Transform ActionParalleliz

e

Lineage

Page 35: MongoDB & Spark

Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e

Lineage

Page 36: MongoDB & Spark

Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e

Lineage

Page 37: MongoDB & Spark

Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e Transform

Transform ActionParalleliz

e

Lineage

Page 38: MongoDB & Spark

https://github.com/mongodb/mongo-hadoop

Page 39: MongoDB & Spark

{"_id" : ObjectId("4f16fc97d1e2d32371003e27"),"body" : "the scrimmage is still up in the air.

"subFolder" : "notes_inbox","mailbox" : "bass-e","filename" : "450.","headers" : {

"X-cc" : "","From" : "[email protected]","Subject" : "Re: Plays and other information","X-Folder" : "\\Eric_Bass_Dec2000\\Notes Folders\\

Notes inbox","Content-Transfer-Encoding" : "7bit","X-bcc" : "","To" : "[email protected]","X-Origin" : "Bass-E","X-FileName" : "ebass.nsf","X-From" : "Michael Simmons","Date" : "Tue, 14 Nov 2000 08:22:00 -0800 (PST)","X-To" : "Eric Bass","Message-ID" :

"<6884142.1075854677416.JavaMail.evans@thyme>","Content-Type" : "text/plain; charset=us-ascii","Mime-Version" : "1.0"

}}

Page 40: MongoDB & Spark

{"_id" : ObjectId("4f16fc97d1e2d32371003e27"),"body" : "the scrimmage is still up in the air.

"subFolder" : "notes_inbox","lfpwoojjf0wig=-i1qf=q0qif0=i38 \-00\ 1-8" : "bass-e","filename" : "450.","headers" : {

"X-cc" : "",

"From" : "[email protected]",

"Subject" : "Re: Plays and other information","X-Folder" : "\\Eric_Bass_Dec2000\\Notes Folders\\

Notes inbox","Content-Transfer-Encoding" : "7bit","X-bcc" : "",

"To" : "[email protected]","X-Origin" : "Bass-E","X-FileName" : "ebass.nsf","X-From" : "Michael Simmons","Date" : "Tue, 14 Nov 2000 08:22:00 -0800 (PST)","X-To" : "Eric Bass","Message-ID" :

"<6884142.1075854677416.JavaMail.evans@thyme>","Content-Type" : "text/plain; charset=us-ascii","Mime-Version" : "1.0"

}}

Page 41: MongoDB & Spark

{ _id : "[email protected]|[email protected]", value : 2}{ _id : "[email protected]|[email protected]", value : 2}{ _id : "[email protected]|[email protected]", value : 2 }

Page 42: MongoDB & Spark

Eratosthenes

Democritus

Hypatia

Shemp

Euripides

Page 43: MongoDB & Spark

Spark ConfigurationConfiguration conf = new Configuration();conf.set(

"mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat”);conf.set(

"mongo.input.uri", "mongodb://localhost:27017/db.collection”);

Page 44: MongoDB & Spark

Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,

MongoInputFormat.class,Object.class,BSONObject.class

);

Page 45: MongoDB & Spark

Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,

MongoInputFormat.class,Object.class,BSONObject.class

);

Page 46: MongoDB & Spark

Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,

MongoInputFormat.class,Object.class,BSONObject.class

);

Page 47: MongoDB & Spark

Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,

MongoInputFormat.class,Object.class,BSONObject.class

);

Page 48: MongoDB & Spark

Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,

MongoInputFormat.class,Object.class,BSONObject.class

);

Page 49: MongoDB & Spark

mongos

mongos

Data Services

Page 50: MongoDB & Spark

Deployment Artifacts

Hadoop Connector Jar

Fat JarJava Driver

Jar

Page 51: MongoDB & Spark

Spark Submit/usr/local/spark-1.5.1/bin/spark-submit \ --class com.mongodb.spark.examples.DataframeExample \ --master local Examples-1.0-SNAPSHOT.jar

Page 52: MongoDB & Spark

Stand Alone

YARN

SparkMesos

SparkSQL

SparkShell

SparkStreaming

Page 53: MongoDB & Spark

JavaRDD<Message> messages = documents.map (

new Function<Tuple2<Object, BSONObject>, Message>() {

public Message call(Tuple2<Object, BSONObject> tuple) { BSONObject header = (BSONObject)tuple._2.get("headers");

Message m = new Message(); m.setTo( (String) header.get("To") ); m.setX_From( (String) header.get("From") ); m.setMessage_ID( (String) header.get( "Message-ID" ) ); m.setBody( (String) tuple._2.get( "body" ) );

return m; } });

Page 54: MongoDB & Spark

MognoDB & Spackcode demo

Page 55: MongoDB & Spark

THE FUTUREAND

BEYOND THE INFINITE

Page 56: MongoDB & Spark

Stand Alone

YARN

SparkMesos

SparkSQL

SparkShell

SparkStreaming

Page 57: MongoDB & Spark
Page 58: MongoDB & Spark
Page 59: MongoDB & Spark
Page 60: MongoDB & Spark

MongoDB + Spark

Page 61: MongoDB & Spark

THANKS!{

name: ‘Bryan Reinero’,role: ‘Developer

Advocate’,twitter: ‘@blimpyacht’,email:

[email protected]’ }