0xdata H2O Podcast

H2O – The Open Source Math Engine !

Better Predictions!

4/23/13

H2O – Open Source in-memory Machine Learning for Big Data

SriSatish Ambati, July 2013

Universe is sparse. Life is messy. Data is sparse & messy.!

- Lao Tzu

Hadoop = opportunity Not enough Data Scientists Analysts won’t code java

Volume: HDFS

HIVE/SQL

Data Scientist

Munging slice n dice Features

Classification Regression Clustering Optimal Model

Engineer

Velocity: Events Online Scoring

Explora;on

Modeling

Offline Scoring

Business Analyst

Ensemble models Low latency

Applications

Predictions

Rule Engine

Before H2O

H2O the

Prediction

Engine

Adhoc Explora;on

Math Modeling

Real-‐;me Scoring

Big Data

Messy NAs

Clustering

Classifica;on

Ensembles 100’s nanos

models

Regression

Group By Grep

H2O the

Prediction

Engine

Big Data Explora;on Modeling Scoring Real-‐;me

No New API!

Approximate!results each step!

Big Data beats Better Algorithms!

Big Data and Better Algorithms! Scale & Parallelism!

H2O the

Prediction

Engine

Intellectual Legacy

Math needs to be free

Open Source

Support and Innovation

hLps://github.com/0xdata/h2o

Usecases

Conversion, Retention & Churn!•  Lead Conversion!•  Engagement!•  Product Placement!•  Recommendations!

Pricing Engine!Fraud Detection!

Customers, Users

Insurance Credit Card Others…

Big Data and Better Algorithms

-‐ Antonio Mollins, Data Scien;st

Pete Fishman, Data Science @Yammer

Screen title

0xdata.com

A Collection of Distributed Vectors

// A Distributed Vector // much more than 2billion elements class Vec { long length(); // more than an int's worth // fast random access double at(long idx); // Get the idx'th elem boolean isNA(long idx); void set(long idx, double d); // writable void append(double d); // variable sized }

0xdata.com

JVM 4 Heap

JVM 1 Heap

JVM 2 Heap

JVM 3 Heap

Frames

A Frame: Vec[] age sex zip ID car

l Vecs aligned in heaps l Optimized for concurrent access l Random access any row, any JVM l But faster if local... more on that later

0xdata.com

JVM 4 Heap

JVM 1 Heap

JVM 2 Heap

JVM 3 Heap

Distributed Data Taxonomy

A Chunk, Unit of Parallel Access Vec Vec Vec Vec Vec

l Typically 1e3 to 1e6 elements l Stored compressed l In byte arrays l Get/put is a few clock cycles including compression

0xdata.com

JVM 4 Heap

JVM 1 Heap

JVM 2 Heap

JVM 3 Heap

Distributed Parallel Execution

Vec Vec Vec Vec Vec l All CPUs grab Chunks in parallel l F/J load balances l Code moves to Data l Map/Reduce & F/J handles all sync l H2O handles all comm, data manage

0xdata.com

Distributed Data Taxonomy

Frame – a collection of Vecs Vec – a collection of Chunks Chunk – a collection of 1e3 to 1e6 elems elem – a java double Row i – i'th elements of all the Vecs in a Frame

0xdata.com

Distributed Coding Taxonomy

l  No Distribution Coding: l  Whole Algorithms, Whole Vector-Math!l  REST + JSON: e.g. load data, GLM, get results!

l  Simple Data-Parallel Coding: l  Per-Row (or neighbor row) Math!l  Map/Reduce-style: e.g. Any dense linear algebra!

l  Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank!

0xdata.com

Distributed Coding Taxonomy

l  No Distribution Coding: l  Whole Algorithms, Whole Vector-Math!l  REST + JSON: e.g. load data, GLM, get results!

l  Simple Data-Parallel Coding: l  Per-Row (or neighbor row) Math!l  Map/Reduce-style: e.g. Any dense linear algebra!

l  Complex Data-Parallel Coding l  K/V Store, Graph Algo's, e.g. PageRank!

Read the docs!

This talk!

Join our GIT!

H2O – The Open Source Math Engine !

Better Predictions!

0xdata H2O Podcast

Technology

Podcast IPN

PODCAST tutoriala

2017 - H2O/Waternetwerk · 2018. 11. 25. · H2O online nieuwsbrief 2117 volgers + meer dan 2400 tweets Twitter H2O. PRIJZEN H2O prijs voor het beste vakartikel Waternetwerk Scriptieprijs

Diapo podcast

Espacio podcast

lUNDT - InfoTerreinfoterre.brgm.fr/rapports/RR-35368-FR.pdf · 2007-12-07 · ZiMn^O^, 3 H2O,.^) + H^O 4(Mn30^, 3 H2O (^ j)-1-O2-i-H2O., * 6(Mn203, 2 H2O) + H2O La vitesse d'oxydation

« Podcast »

H2O biokimya

H2O Magazin

Moodle podcast

Atelier podcast

Calitate h2o

PODCAST ¿Qué es un podcast? Origen del podcast ¿Cómo hacer un podcast? Contenidos y características Aplicaciones ¿Cómo se escuchan? ¿Dónde se almacenan?

Slidecast podcast

Ahorrar H2O

H2O - ջուր

Crédito H2O

Podcast kursus - fra planlægning til færdig podcast

Biblia H2O

Podcast, podcast - RadioMedia · 2018-09-26 · Podcast, podcast 2 RadioMedia selvitti syksyllä 2018 podcastien kuuntelua. Tutkimuksen tavoitteena oli selvittää, kuinka paljon