43
MongoDB + Spark @blimpyacht

MongoDB + Spark

Embed Size (px)

Citation preview

Page 1: MongoDB + Spark

MongoDB + Spark@blimpyacht

Page 2: MongoDB + Spark

Level Setting

Page 3: MongoDB + Spark
Page 4: MongoDB + Spark
Page 5: MongoDB + Spark
Page 6: MongoDB + Spark

TROUGH OF DISILLUSIONMENT

Page 7: MongoDB + Spark

Interactive ShellEasy (-er)Caching

Page 8: MongoDB + Spark

HDFS

Distributed Data

Page 9: MongoDB + Spark

HDFSYARN

Hive

Pig

Domain Specific Languages

MapReduce

Page 10: MongoDB + Spark

Spark Stand Alone

YARN

Mesos

HDFS

Distributed Resources

Page 11: MongoDB + Spark

YARN

SparkMesos

HDFS

Spark Stand Alone

Hadoop

Distributed Processing

Page 12: MongoDB + Spark

YARN

SparkMesos

Hive

Pig

HDFS

Hadoop

Spark Stand Alone

Domain Specific Languages

Page 13: MongoDB + Spark

YARN

SparkMesos

Hive

Pig

SparkSQL

Spark Shell

SparkStreaming

HDFS

Spark Stand Alone

Hadoop

Page 14: MongoDB + Spark

YARN

SparkMesos

Hive

Pig

SparkSQL

Spark Shell

SparkStreaming

HDFS

Spark Stand Alone

Hadoop

Page 15: MongoDB + Spark

YARN

SparkMesos

Hive

Pig

SparkSQL

Spark Shell

SparkStreaming

HDFS

Spark Stand Alone

Hadoop

Page 16: MongoDB + Spark

YARN

SparkMesos

Hive

Pig

SparkSQL

Spark Shell

SparkStreaming

Spark Stand Alone

Hadoop

Page 17: MongoDB + Spark

Stand AloneYARN

SparkMesos

Hive

Pig

SparkSQL

SparkShell

SparkStreaming

MapReduce

Page 18: MongoDB + Spark

Stand AloneYARN

SparkMesos

SparkSQL

SparkShell

SparkStreaming

Page 19: MongoDB + Spark

Stand AloneYARN

SparkMesos

SparkSQL

SparkShell

SparkStreaming

Page 20: MongoDB + Spark

executor

Worker Node

executor

Worker Node Master

Java Driver

Hadoop Connector

Driver Application

Page 21: MongoDB + Spark

Parallelization

Parellelize = x

Page 22: MongoDB + Spark

Transformations

Parellelize = x t(x) = x’ t(x’) = x’’

Page 23: MongoDB + Spark

Transformationsfilter( func )union( func )intersection( set )distinct( n )map( function )

Page 24: MongoDB + Spark

Action

f(x’’) = yParellelize = x t(x) = x’ t(x’) = x’’

Page 25: MongoDB + Spark

Actionscollect()count()first()take( n )reduce( function )

Page 26: MongoDB + Spark

Lineage

f(x’’) = yParellelize = x t(x) = x’ t(x’) = x’’

Page 27: MongoDB + Spark

Transform Transform ActionParallelize

Lineage

Page 28: MongoDB + Spark

Transform Transform ActionParallelize Transform Transform ActionParallelize Transform Transform ActionParallelize Transform Transform ActionParallelize Transform Transform ActionParallelize

Lineage

Page 29: MongoDB + Spark

Transform Transform ActionParallelize Transform Transform ActionParallelize Transform Transform ActionParallelize Transform Transform ActionParallelize Transform Transform ActionParallelize

Lineage

Page 30: MongoDB + Spark

Transform Transform ActionParallelize Transform Transform ActionParallelize Transform Transform ActionParallelize Transform Transform ActionParallelize Transform Transform ActionParallelize

Lineagehttp://www.blimpyacht.com/2016/02/03/a-visual-guide-to-the-spark-hadoop-ecosystem/

Page 31: MongoDB + Spark

https://github.com/mongodb/mongo-hadoop

Page 32: MongoDB + Spark

Spark ConfigurationConfiguration conf = new Configuration();conf.set(

"mongo.job.input.format", "com.mongodb.hadoop.MongoInputFormat”);conf.set(

"mongo.input.uri", "mongodb://localhost:27017/db.collection”);

Page 33: MongoDB + Spark

Spark ContextJavaPairRDD<Object, BSONObject> documents = context.newAPIHadoopRDD( conf,

MongoInputFormat.class,Object.class,BSONObject.class

);

Page 34: MongoDB + Spark

Spark Submit

/usr/local/spark-1.5.1/bin/spark-submit \ --class com.mongodb.spark.examples.DataframeExample \ --master local Examples-1.0-SNAPSHOT.jar

Page 35: MongoDB + Spark

Stand AloneYAR

N

SparkMesos

SparkSQL

SparkShell

SparkStreaming

Page 36: MongoDB + Spark

JavaRDD<Message> messages = documents.map (

new Function<Tuple2<Object, BSONObject>, Message>() {

public Message call(Tuple2<Object, BSONObject> tuple) { BSONObject header = (BSONObject)tuple._2.get("headers");

Message m = new Message(); m.setTo( (String) header.get("To") ); m.setX_From( (String) header.get("From") ); m.setMessage_ID( (String) header.get( "Message-ID" ) ); m.setBody( (String) tuple._2.get( "body" ) );

return m; } });

Page 37: MongoDB + Spark
Page 38: MongoDB + Spark
Page 39: MongoDB + Spark

THE FUTUREAND

BEYOND THE INFINITE

Page 40: MongoDB + Spark

Spark Connector

Page 41: MongoDB + Spark

Aggregation Filters$match | $project | $group

Page 42: MongoDB + Spark

Data Locality mongos

Page 43: MongoDB + Spark

THANKS!@blimpyacht