Upload
bigml-inc
View
366
Download
1
Embed Size (px)
Citation preview
BigML, Inc 2
Spring 2016 Release
POUL PETERSEN (CIO)
Enter questions into chat box – we’ll answer some via chat; others at the end of the session
https://bigml.com/releases
ATAKAN CETINSOY, (VP Predictive Applications)
Resources
Moderator
Speaker
Contact [email protected]
Twitter @bigmlcom
Questions
@whizzml
BigML, Inc 4
Promise of ML
time
Want
•Reduce churn
•Increase conversion
•Improve diagnosis
•Reduce fraud
•Etc.
Automated InsightsData
Have
BigML, Inc 5
ML Hurdles
time
•Which algorithms? •How to scale it? •How to handle real data? •How to tune it? •How to automate it?
BigML, Inc 6
Current Resources
SOURCE DATASET CORRELATIONSTATISTICAL
TEST
MODEL ENSEMBLELOGISTIC
REGRESSION EVALUATION
ANOMALY DETECTOR
ASSOCIATION DISCOVERY PREDICTION
BATCH PREDICTIONSCRIPT LIBRARY EXECUTION
Dat
a Ex
plo
ratio
nSu
per
vise
d
Lear
ning
Uns
uper
vise
d
Lear
ning
Aut
omat
ion
CLUSTER Scoring
BigML, Inc 7
BigML Vision
time
Auto
mat
ion Paving the Path to Automatic Machine Learning
REST API
Programmable Infrastructure
A
Sauron • Automatic deployment and auto-‐scaling
Data Generation and Filtering
C
Flatline • DSL for transformation and new field generation
B
Wintermute • Distributed Machine Learning Framework
2011 Spring 2016
Automatic Model Selection
E
SMACdown • Automatic parameter optimization
Workflow Automation
D
WhizzML • DSL for programmable workflows
BigML, Inc 8
Workflow MapDecision Trees Bagging Decision Forest LogisGc Regression
MODEL
DATASET
CLUSTER ANOMALY
ASSOCIATION
SOURCE
K-‐Means G-‐Means
IsolaGon Forest
Magnum Opus
StaGsGcal Tests CorrelaGons
STATSDATASET
Flatline Flatline Editor
PREDICTION
Batch PredicGon Batch Anomaly Batch Centroid EvaluaGon
BigML, Inc 10
Regular Workflows
MODEL
FILTERSOLD HOMES
BATCH PREDICTION
NEW FEATURES
DATASET DEALS DATASET
FILTERFORSALE HOMES NEW FEATURES
BigML, Inc 11
Model Selection
ENSEMBLE LOGISTIC REGRESSION
EVALUATION
SOURCE DATASET
TRAINING
TEST
MODEL
EVALUATIONEVALUATION
CHOOSE
BigML, Inc 12
Model Tuning
ENSEMBLE N=20
EVALUATION
SOURCE DATASET
TRAINING
TEST
EVALUATIONEVALUATION
ENSEMBLE N=10
ENSEMBLE N=1000
CHOOSE
BigML, Inc 13
SMACdown
•How many models? •How many nodes? •Missing splits or not? •Number of random candidates? •Balance the objective?
SMACdown can tell you!
BigML, Inc 14
Best-First Features{F1}
CHOOSE BEST S = {Fa}
{F2} {F3} {F4} Fn
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST S = {Fa, Fb}
S+{F1} S+{F2} S+{F3} S+{F4} S+{Fn-1}
CHOOSE BEST S = {Fa, Fb, Fc}
BigML, Inc 15
Stacked Generalization
ENSEMBLE LOGISTIC REGRESSION
SOURCE DATASET
MODEL
BATCH PREDICTION
BATCH PREDICTION
BATCH PREDICTION
EXTENDED DATASET
EXTENDED DATASET
EXTENDED DATASET
LOGISTIC REGRESSION
BigML, Inc 16
Better Algorithms
•Stacked Generalization •Boosting
•Adaboost • Logitboost •Martingale Boosting •Gradient Boosting
BigML, Inc 17
Why Workflows
•Machine Learning is iterative by nature. •ML tools still require many repetitive (and manual) tasks.
•Instead of helping to focus on the output many tools force analysts, developers, and scientists to focus on infrastructure, parallelism, etc.
•Not everybody can implement complex workflows or meta-algorithms but many people can reuse them.
BigML, Inc 18
WhizzML Features
•A Domain-Specific Language (DSL) for automating Machine Learning workflows.
•Complete programming language.
•Machine Learning “operations” are first-class citizens.
•Scale is provided for free.
•API First! - Everything is composable.
BigML, Inc 20
export BIGML_USERNAME=myuser export BIGML_API_KEY=6ef37b3d791061d345ef51281dae821ac7943ed7 export BIGML_AUTH="username=$BIGML_USERNAME;api_key=$BIGML_API_KEY"
export SCRIPT="https://bigml.io/script?$BIGML_AUTH" export LIBRARY="https://bigml.io/library?$BIGML_AUTH" export EXECUTION="https://bigml.io/execution?$BIGML_AUTH"
Via API
BigML, Inc 21
Via APIhttp $LIBRARY \ source_code="(define (addition a b) (+ a b))" | jq ".resource"
"library/573a97f5b95b3941f6000004"
http $SCRIPT \ imports:='["library/573a97f5b95b3941f6000004"]' \ source_code="(addition x 2)" \ inputs:='[{"name": "x", "type": "number"}]' | jq ".resource"
"script/573a9862b95b3941ff000015"
http $EXECUTION \ script=script/573a9862b95b3941ff000015 \ inputs:='[["x", 5]]' | jq ".resource"
"execution/573a987ab95b3941f000000d"
http http://bigml.io/execution/573a987ab95b3941f000000d?$BIGML_AUTH \ | jq ".execution.result"
7
BigML, Inc 22
Via Bindingshttps://gist.github.com/whizzmler/8a849c282a770ac79a1441df5c5ccf62
BigML, Inc 25
WhizzML in GitHubNEW
https://github.com/whizzml/examples
BigML, Inc 29
Reify
•"Reifies" a resource into a WhizzML script. •Rapid prototyping meets automation. •Coming soon…
BigML, Inc 30
Secret Link Scripts
https://bigml.com/shared/script/oazVtg8t2V2JHFf6PLmenUJbNU
https://bigml.com/dashboard/script/573d53a628eb3e026f000012
BigML, Inc 31
A Gallery of Scriptshttps://bigml.com/gallery/scripts
BigML, Inc 35
API Documentation
• https://bigml.com/developers/libraries • https://bigml.com/developers/scripts • https://bigml.com/developers/executions
NEW
BigML, Inc 37
Documentation
Getting Started withWhizzML
The BigML Team
Version 1.0
MACHINE LEARNING MADE BEAUTIFULLY SIMPLE
Copyright © 2016, BigML, Inc.
WhizzML ReferenceManualThe BigML Team
Version draft
MACHINE LEARNING MADE BEAUTIFULLY SIMPLE
Copyright © 2016, BigML, Inc.
WhizzML TutorialsThe BigML Team
Version draft
MACHINE LEARNING MADE BEAUTIFULLY SIMPLE
Copyright © 2016, BigML, Inc.
NEW
https://bigml.com/whizzml#documentation
BigML, Inc 41
Conclusion•Automation is critical to fulfilling the promise of ML •WhizzML can create workflows that:
•Automate repetitive tasks. •Automate model tuning and feature selection.
•Combine ML models into more powerful algorithms.
•Create shareable and re-usable executions.