Upload
ntt-communications
View
188
Download
3
Embed Size (px)
Citation preview
Spark- Spark SUMMIT EAST 2015
201531718 [email protected]
Spark Summit EAST 2015 2
01. Spark
Spark
Spark
Apache Spark
Spark Summit EAST 2015 3
Spark
Spark UC BerkeleyAMPLab.OSS Databricks
Ion Stoica
hadoop
spark
Spark Summit EAST 2015 4
Spark
Ver.
2009 - UC BerkleyAMPLab.
2010 - OSSApache
201210 0.6.0 Java API
20132 0.7.0 Python API
20139 0.8.0 UIMlib
20142 0.9.0 Scala2.10GraphX
20145 1.0.0 Spark SQLMlib
201411 1.1.0
201412 1.2.0 Spark StreamingHA
20153 1.3.0 DataFrames API
20154 1.3.1
Spark Summit EAST 2015 5
Spark
Hadoop MapReduceSpark
Spark
HDFS
MapReduce
Spark SQL MlibHive Sqoop
YARN Mesos
SparkHadoop
YARN Mesos or
HDFS
YARN Mesos
Spark Summit EAST 2015 6
Spark
HadoopMapReduce
M
Spark
Hadoop
R R R
HDFS
S S S
HDFS
Spark Summit EAST 2015 7
Spark
Hadoop
ASF(Apache Software Foundation)PJ HDFS MapReduce SQL
Scala, Python
Spark Summit EAST 2015 8
2015/03/182015/03/192 3/18Keynote 3 tracks27 sessions - Developers, Applications, Data Science 3/19Workshop
The Sheraton, New York Spark Summit East Spark Summit 2015 20157 Spark Summit 20132014
Spark Summit EAST 2015 9
Spark Summit EAST 2015 10
Silver
Sponsors
Platinum
Gold
/
Spark Summit EAST 2015 11
2014Spark
http://www.slideshare.net/databricks/new-directions-for-apache-spark-in-2015
Spark Summit EAST 2015 12
2014Spark
Matei
Contributors per Month to Spark
http://www.slideshare.net/databricks/new-directions-for-apache-spark-in-2015
Spark Summit EAST 2015 13
Spark Summit Keynote
Spark Summit EAST 2015 14
2015 1. Data Science
RDD20153Spark 1.3 Machine Learning Pipelines R interface2015/6Spark 1.4SparkR
2015
2. Platform Interfaces Plug in data sources and algorithms Data Souces
MySQLHiveHbaseSQL
Goalunified engine across data sources
New Direction for Spark in 2015Matei, CTO, Databricks
Spark Summit EAST 2015 15
New Direction for Spark in 2015Matei, CTO, Databricks
Spark
Spark Summit EAST 2015 16
Harnessing the Power of Spark with Databricks Cloud
Ion Stoica(CEO at databricks) Databricks Cloud
Databricks Notebook Scala, Python, SQL AWSSpark + Cluster Manager
Notebook
Spark Summit EAST 2015 17
Harnessing the Power of Spark with Databricks Cloud
Databricks Cloud
Spark Summit EAST 2015 18
Developers Track
Developers Track spark
SQL Hadoop DB
java PythonR
Spark Summit EAST 2015 19
Developers Track
Beyond SQL: Spark SQL Abstractions For The Common Spark Job - Michael Armbrust (Databricks) Hadoop
API
importJSON, Hive, MySQL, HDFS, S3 exportdBase, cassandram HBASE, elasticsearch, amazonRedshift
Spark Summit EAST 2015 20
Developers Track
Spark User Concurrency and Context/RDD Sharing at Production Scale - Farzad Aref (Zoomdata) Zoomdata Zoomdataex. S3, HDFS, RDBSpark
SparkZoomdata
HDFSspark
Spark Summit EAST 2015 21
Developers Track
Power Hive with Spark(Hive on Spark) - Chao Sun (Cloudera), Marcelo Vanzin (Cloudera) HiveSQLHadoopmap/reduce
HiveSpark
hiveHIVE-7292
Hive1.1Hive on Spark(HoS)
HDFS
Spark
Mesos
Hive
YARN
HoS
Spark Summit EAST 2015 22
Data Science Track
Data Science Track
2014 /
Mlib, Graph X, Spark Streaming
Spark
SparkRR only Deep LearningGPUSpark
Youtube
Spark Summit EAST 2015 23
Spark ML Pipelines
Tokenizer/hashingTFTF-IDF
lr
ML Pipelines
Pipelines
Spark Summit EAST 2015 24
Spark ML Pipelines
Practical Machine Learning Pipelines with Mllib Joseph Bradley (Databricks) ML Pipelines
Spark 1.2 Cross Validation
Future PlanRoadmapSpark 1.3
Spark Summit EAST 2015 25
Spark Mlib
K-means, Logistic regression
Scikit-learn / R
Scala, Python, Java Spark
Spark Summit 2014
https://databricks-training.s3.amazonaws.com/movie-recommendation-with-mllib.html
Spark Summit EAST 2015 26
Spark Mlib
Un-collaborative filtering: Giving the right recommendations when your users arent helping you Leah McGuire (PhD, Salesforce)
Mlib
Spark Summit EAST 2015 27
Spark Streaming
Scala, JavaSpark 1.3Python
Socket, Flume, Kafka, TwitterFluentd Discretized Stream= RDD
nRDD 500ms ~ 30s 10ms Flume / Storm
CPU /
DMM2Sparkhttps://prezi.com/iz1d_sefm1q9/dmmcom-dmm2-spark/
Spark Summit EAST 2015 28
Spark Streaming
Streaming machine learning in Spark Jeremy Freeman (HHMI Janelia Research Center)
Neuroscientist using computation to understand the brain MlibSpark Streming
K-means Streaming, Streaming Linear Regression, Time Series analysis
Spark
Spark Summit EAST 2015 29
() Graph X
SNS, Network
Graph X Advent Calendar 2014 http://www.adventar.org/calendars/491
Graph X
Spark Summit EAST 2015 30
Workshop
Data Science Workshop
n Databricks Cloud n n Kaggle
Hands OnRecSys2015
SparkGUIDataBricks CloudSpark- - GUI
Advance Developer Workshop
Spark Summit EAST 2015 31
Workshop
Workshop DataBricks Cloud
GUIVM SQLPython
Developers Workshop
JavaSQL ScalaPython R 1
Spark Developers Wireless LAN2
lan
Spark Summit EAST 2015 32
Meetup
Meetup DataDriven2015/03/17
NYC ITCEO,CTO bloomberg Youtube
NYC Data Science2015/03/18 Spark DataFrames and ML Pipelines for Large-Scale Data Science Databricks
PyData NYC2015/03/20 Python + Data Science 5(5/22)
http://pydatatokyo.connpass.com/
Spark Summit EAST 2015 33
Data Driven NYC #35
#35 SwiftkeySwiftkey, CTO
InfluxDBPaul Dix@InfluxDB, CEO GO DB
SparkIon Stoica@Databricks, CEO
Swiftkey
1. Datadrivenhttp://datadrivennyc.com/ 2. Datadriven Youtubehttps://www.youtube.com/channel/UCQID78IY6EOojr5RUdD47MQ
Spark Summit EAST 2015 34
PyData NYC
Project Jupyter for Data Science Matplotlib and the IPython notebook shapeshifting for your data A couple of tips for winning data science competitions
JupyterJulia + Python + R
notebook notebook
notebook Notebook
1. PyDatahttp://datadrivennyc.com/ 2. PyData Youtubehttps://www.youtube.com/channel/UCQID78IY6EOojr5RUdD47MQ
Spark Summit EAST 2015 35
Spark
Spark Summit Hadoop
Workshop HadoopHadoop
Mlib / SparkStreaming / Graph X / SparkR
MTG Notebook R, Python, (Julia