40
H2O PySparkling Water Michal Malohlava @mmalohlava and @h2oai presents 2016/10/08 PyData

H2O PySparkling Water

Embed Size (px)

Citation preview

Page 1: H2O PySparkling Water

H2O PySparkling Water

Michal Malohlava @mmalohlava and @h2oai

presents

2016/10/08 PyData

Page 2: H2O PySparkling Water

H2O.aiMachine Intelligence

H2O+

PySpark =

PySparkling

Page 3: H2O PySparkling Water

H2O.aiMachine Intelligence

H2OOpen-Source In-Memory Data Science Platform

•Highly optimized Java code (in-house)

•Distributed in-memory K-V store and map/reduce computation framework

•Data parser (HDFS, S3, NFS, HTTP, local drives, etc.)

•Read/write access to distributed data frames (R/Pandas-style)

•ML algos - Deep Learning, GBM, DRF, GLM, GLRM, K-Means, PCA, CoxPH, Ensembles

•REST API: clients Interactive UI/R/Python

H2O Python client

pySparkling

Page 4: H2O PySparkling Water

H2O.aiMachine Intelligence

PySparklingProvides

Transparent integration of H2O machine learning platform with Spark ecosystem (PySpark)

Transparent use of H2O data structures (H2OFrame) and algorithms with Spark Python API

Excels in existing Spark workflows requiring advanced Machine Learning algorithms

Func

tiona

lity

mis

sing

in H

2O c

an b

e re

plac

ed b

y Sp

ark

and

vice

ver

sa

Page 5: H2O PySparkling Water

H2O.aiMachine Intelligence

Benefits

• Additional algorithms

• NLP features

• Powerful data munging

• ML Pipelines

• Advanced algorithms

• speed v. accuracy

• advanced parameters

• Fully distributed and parallelized

• Graphical environment

• Fully fledged Python/R interfaces

Page 6: H2O PySparkling Water

H2O.aiMachine Intelligence

PySparklingUse-Cases

Page 7: H2O PySparkling Water

H2O.aiMachine Intelligence

Model Building

Data Source

Page 8: H2O PySparkling Water

H2O.aiMachine Intelligence

Model Building

Data Source

Data munging

Page 9: H2O PySparkling Water

H2O.aiMachine Intelligence

Model Building

Data Source

Data munging Modelling

Deep Learning, GBMDRF, GLM, GLRM

K-Means, PCACoxPH, Ensembles

Page 10: H2O PySparkling Water

H2O.aiMachine Intelligence

Model Building

Data Source

Data munging Modelling

Deep Learning, GBMDRF, GLM, GLRM

K-Means, PCACoxPH, Ensembles

Prediction processing

Page 11: H2O PySparkling Water

H2O.aiMachine Intelligence

Model Building

Data Source

Data munging Modelling

Deep Learning, GBMDRF, GLM, GLRM

K-Means, PCACoxPH, Ensembles

Prediction processing

Steam

Model management

Page 12: H2O PySparkling Water

H2O.aiMachine Intelligence

Data Munging

Data Source

Page 13: H2O PySparkling Water

H2O.aiMachine Intelligence

Data Munging

Data Source

Page 14: H2O PySparkling Water

H2O.aiMachine Intelligence

Data Munging

Data Source

Page 15: H2O PySparkling Water

H2O.aiMachine Intelligence

Data Munging

Data Source

Data load/munging/ exploration (H2O Flow UI)

Page 16: H2O PySparkling Water

H2O.aiMachine Intelligence

Data Munging

Data Source

Data load/munging/ exploration (H2O Flow UI) Modelling

Page 17: H2O PySparkling Water

H2O.aiMachine Intelligence

Stream processing

Page 18: H2O PySparkling Water

H2O.aiMachine Intelligence

Stream processing

DataSourceO

ff-lin

e m

odel

trai

ning

Page 19: H2O PySparkling Water

H2O.aiMachine Intelligence

Stream processing

DataSourceO

ff-lin

e m

odel

trai

ning

Data munging

Page 20: H2O PySparkling Water

H2O.aiMachine Intelligence

Stream processing

DataSourceO

ff-lin

e m

odel

trai

ning

Data munging Modelling

Page 21: H2O PySparkling Water

H2O.aiMachine Intelligence

Stream processing

DataSourceO

ff-lin

e m

odel

trai

ning

Data mungingSt

ream

proc

essi

ng

Data Stream

Spark Streaming/Storm/Flink

Modelling

Page 22: H2O PySparkling Water

H2O.aiMachine Intelligence

Stream processing

DataSourceO

ff-lin

e m

odel

trai

ning

Data mungingSt

ream

proc

essi

ng

Data Stream

Spark Streaming/Storm/Flink

Export model

Modelling

Page 23: H2O PySparkling Water

H2O.aiMachine Intelligence

Stream processing

DataSourceO

ff-lin

e m

odel

trai

ning

Data mungingSt

ream

proc

essi

ng

Data Stream

Spark Streaming/Storm/FlinkModel

prediction

Deploy the model

Export model

Modelling

Page 24: H2O PySparkling Water

H2O.aiMachine Intelligence

Stream processing

DataSourceO

ff-lin

e m

odel

trai

ning

Data mungingSt

ream

proc

essi

ng

Data Stream

Spark Streaming/Storm/FlinkModel

prediction

Deploy the model

Export model

Modelling

Stea

m

Mod

el

man

agem

ent

Page 25: H2O PySparkling Water

H2O.aiMachine Intelligence

What is inside?

Page 26: H2O PySparkling Water

H2O.aiMachine Intelligence

Cluster

Worker node

PySpark main program

Driver node Worker nodeWorker node

Page 27: H2O PySparkling Water

H2O.aiMachine Intelligence

Cluster

Worker node

PySpark main program

Driver node

SparkContext

Worker nodeWorker node

sc = SparkContext.getOrCreate()

Page 28: H2O PySparkling Water

H2O.aiMachine Intelligence

Cluster

Worker node

Spark executor Spark executorSpark executor

PySpark main program

Driver node

SparkContext

Worker nodeWorker node

sc = SparkContext.getOrCreate()

Page 29: H2O PySparkling Water

H2O.aiMachine Intelligence

Cluster

Worker node

Spark executor Spark executorSpark executor

PySpark main program

Driver node

SparkContext

Worker nodeWorker node

sc = SparkContext.getOrCreate()

Page 30: H2O PySparkling Water

H2O.aiMachine Intelligence

Cluster

Worker node

Spark executor Spark executorSpark executor

PySpark main program

Driver node

SparkContext

Worker nodeWorker node

H2OContext

sc = SparkContext.getOrCreate()

h2o_context = H2OContext.getOrCreate()

Page 31: H2O PySparkling Water

H2O.aiMachine Intelligence

Cluster

Worker node

Spark executor Spark executorSpark executor

PySpark main program

Driver node

SparkContext

Worker nodeWorker node

H2OContext

sc = SparkContext.getOrCreate()

h2o_context = H2OContext.getOrCreate()

Page 32: H2O PySparkling Water

H2O.aiMachine Intelligence

DataSource

Spar

k Ex

ecut

orSp

ark

Exec

utor

Spar

k Ex

ecut

or

Spark Cluster

DataFrame

H2O

Sto

re H2OFrame

DataSource

H2O

Sto

reH

2O S

toreDat

a In

gest

Page 33: H2O PySparkling Water

H2O.aiMachine Intelligence

DataSource

Spar

k Ex

ecut

orSp

ark

Exec

utor

Spar

k Ex

ecut

or

Spark Cluster

DataFrame

H2O

Sto

re H2OFrame

h2o_context.as_spark_frame H2OFrame serves data for DataFrame operations

DataSource

H2O

Sto

reH

2O S

tore

Dat

a Ex

chan

ge

PyAPI

Page 34: H2O PySparkling Water

H2O.aiMachine Intelligence

DataSource

Spar

k Ex

ecut

orSp

ark

Exec

utor

Spar

k Ex

ecut

or

Spark Cluster

DataFrame

H2O

Sto

re H2OFrame

h2o_context.as_h2o_frame Materializes DataFrame as H2OFrame (H2O as a clever cache)

DataSource

H2O

Sto

reH

2O S

tore

Dat

a Ex

chan

ge

PyAPI

Page 35: H2O PySparkling Water

H2O.aiMachine Intelligence

Sentiment Analysis

with PySparklingDEM

O

Page 36: H2O PySparkling Water

H2O.aiMachine Intelligence

Start PySparkling

Opens Jupyter Notebook

Download from h2o.ai/download

Page 37: H2O PySparkling Water

H2O.aiMachine Intelligence

Future

Page 38: H2O PySparkling Water

H2O.aiMachine Intelligence

The PlanSeparation of H2O cluster from Spark infrastructure ✓ Preserving existing API

h2oContext = H2OContext.getOrCreate(ip=“…”, port=…)

Better integration into PySpark pipelines ✓ Support of H2O Ensembles (right now only as R-package)

Integration with Steam platform to support model management DeepWater integration H2O DeepWater with Python

early sneakFabrizio MiloSu

nday

3p

m

Page 39: H2O PySparkling Water

H2O.aiMachine Intelligence

Checkout GitHub & Contribute

https://github.com/h2oai/sparkling-water

Checkout H2O.ai Training Books http://h2o.ai/resources

Checkout H2O.ai Blog http://h2o.ai/blog/

Checkout H2O.ai Youtube Channel https://www.youtube.com/user/0xdata

More info

Page 40: H2O PySparkling Water

H2O.aiMachine Intelligence

Learn more at h2o.ai Follow us at @h2oai

Come to see us at Open Tour in Dallas! See open.h2o.ai

PySparkling is open-source

ML application platform combining

power of PySpark and H2O

Thank you!

DALLAS, TX OCT 26th