Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Zürcher Fachhochschule
Automated Machine Learning in Practice:State of the Art and Recent Results
6th Swiss Conference on Data Science14.6.2019
Lukas Tuggener1,4
Mohammadreza Amirian1,2
Katharina Rombach1
Stefan Lörwald3
Anastasia Varlet3Christian Westermann3
Thilo Stadelmann1
1ZHAW, 2Ulm University, 3PricewaterhouseCoopers AG, 4USI
Zürcher Fachhochschule 2
Contents
• Automated Machine Learning• What is it?• Why is it?
• Current state of the art• Benchmark results• Conclusions and closing remarks
• Q & A
Zürcher Fachhochschule 3
Automated Machine Learning - What is it?
Training Data
Live Data
Data Generating Process
CRM systemStock MarketsSurveysSensors (Camera, Thermometer)
LabelsTraining Algorithm Trained
Model
TrainedModel Predictions
Zürcher Fachhochschule 4
Automated Machine Learning - What is it?
Training Data
LabelsTraining Algorithm Trained
Model
Zürcher Fachhochschule 5
Automated Machine Learning - What is it?
• Model selection• Model hyperparameter
e.g. number of layers, splitting criterion …
• Training alg. Selection• Training hyperparameter
e.g. learning rate, batch size …• Regularization• Data handling
e.g. transformations, outlier handling…
• …• …Training Data
LabelsTraining Algorithm Trained
Model
Zürcher Fachhochschule 6
Automated Machine Learning - What is it?
• Model selection• Model hyperparameter
e.g. number of layers, splitting criterion …
• Training alg. Selection• Training hyperparameter
e.g. learning rate, batch size …• Regularization• Data handling
e.g. transformations, outlier handling…
• …• …
Do this automatically
Training Data
LabelsTraining Algorithm Trained
Model
Zürcher Fachhochschule 7
Automated Machine Learning - What is it?
More formally:
Combined Algorithm Selection and Hyperparameter optimization (CASH)1:
1C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Autoweka: Combined selection and hyperparameter optimization of classification algorithms”.
i-th crossvalidation
train / valid set
validation loss
model space hyperparameter space
Zürcher Fachhochschule 8
Automated Machine Learning - What is it?
More formally:
Combined Algorithm Selection and Hyperparameter optimization (CASH)1:
1C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, “Autoweka: Combined selection and hyperparameter optimization of classification algorithms”.
i-th crossvalidation
train / valid set
validation loss
model space hyperparameter space
Notably absentData preprocessing
Training Alg. configuration
Zürcher Fachhochschule 9
Automated Machine Learning - Why is it?
Make data analytics talent more efficient on hard tasks and obsolete on simple ones.
Zürcher Fachhochschule 10
Automated Machine Learning - Why is it?
Make data analytics talent more efficient on hard tasks and obsolete on simple ones.
Anyone needs convincing?
Zürcher Fachhochschule 11
Current state of the art - concepts
Optimization
CASH is an optimization problem
Meta-learning
What can we learn about datasets and learning algorithms that is “generally true”?
Resource allocation
How do we spend the resources at our disposal?
Bayesian OptimizationEvolutionary StrategiesTree SearchHandcrafted HeuristicsRandom Search
Dataset ClusteringDataset LandmarksLearning Curve estimationTraining Meta-Models (BO etc…)Selecting Model candidatesShipping pretrained Models (think: MAML)
Early stoppingModel compressionRestarts of promising candidates
Explore vs. Exploit
Zürcher Fachhochschule 12
Current state of the art - implementations
• Data Science Machine (DSM)Random search (our configuration), “fully trained"
• Auto-sklearn1
Bayesian optimization, ensemble building, meta-trained BO model
• TPOT2
genetic programming
• Portfolio Hyperbandportfolio of promising model candiates3 in our case sourced from
openml.org, hyperband4
1M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, “Efficient and robust automated machine learning “2R. S. Olson, R. J. Urbanowicz, P. C. Andrews, N. A. Lavender, J. H. Moore, et al., “Automating biomedical data science through tree-based pipeline optimization”3M. Feurer, K. Eggensperger, S. Falkner, M. Lindauer, and F. Hutter, “Practical automated machine learning for the automl challenge 2018”4L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar, “Hyperband: A novel bandit-based approach to hyperparameter optimization”
Zürcher Fachhochschule 13
Portfolio Hyperband
Portfolio ConfigP1P2P3P4P5P6P…
Random ConfigPaPbPcPdPePfP…
Config: full model + training configuration
Portfolio sampled from successful and diverse runs on meta-datasets Choose best 1/2
of populationChoose best 1/2
of populationChoose best 1/2
of population
Cho
ose
N c
onfig
s
Train each config for M CPU-seconds
Train each config for M CPU-seconds
Train each config for M CPU-seconds
Train each config for M CPU-seconds
Restart with different M/N ratio
Zürcher Fachhochschule 14
Benchmark results
Zürcher Fachhochschule 15
Conclusions and closing remarks
• Design space of AutoML systems is vast.
• No clearly superior paradigm – but different characteristics
• Random search is boring but a crucial part of any AutoMLsystem
Zürcher Fachhochschule 16
Conclusions and closing remarks
• Design space of AutoML systems is vast.
• No clearly superior paradigm – but different characteristics
• Random search is boring but a crucial part of any AutoMLsystem
speed accuracy
good priors
Aggressive early stopping
Local random search
Dataset characterization
Response surface modelling
More random search
Zürcher Fachhochschule 17
Conclusions and closing remarks
Any constraints possible?
Zürcher Fachhochschule 18
Conclusions and closing remarks
Any constraints possible?
Spend a lot of time on:
• defining your meta-dataset
• pre-training meta-models
• pre-training models
Zürcher Fachhochschule 19
Conclusions and closing remarks
Any constraints possible?
Spend a lot of time on:
• defining your meta-dataset
• pre-training meta-models
• pre-training models
Producing general improvements is extremely difficult
Zürcher Fachhochschule 20
Any questions?
On me:• Doctoral Student ZHAW / USI• [email protected]• 058 934 47 33• https://tuggeluk.github.io/
Happy to answer questions & requests.
Thanks for your attention!
Zürcher Fachhochschule 21
APPENDIX