View
42
Download
0
Category
Preview:
Citation preview
MACHINE LEARNING IN PRODUCTION
Integrating with the Software Stack
Angela Bassa • September 2016
TODAY’S AGENDA
Brief Introduction
Learnings in Production
Examples
Questions and Wrap Up
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 20162
HI! I’M ANGELA.
@angebassa
I run Data Science at EnerNOC, where we’re
changing the way the world uses energy.
3ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
QUICK INTRO TO
4ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Best in class user experience < 2 seconds response time for 3k peak users per hour
Reliability > 99.99% availability for critical apps
Global Architecture Supporting globally deployable applications localized to 15 countries
Scalability 150,000 new data streams by 2016, 8TB new monthly data by 2017
Near Real-Time Instantaneous access to insights derived from real time streaming customer data
DevOps Rapid development of new application capabilities that can be deployed efficiently
5ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Metadata
Monthly Bills
Demand Response
Weather Metrics
Energy Consumption
Utility Partners
© KZawadzki
6ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
7ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
8ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
© Young Entrepreneur Journal
It’s not personal (and it’s not academic either);
it’s a business.
RESEARCH & DEVELOPMENT
9ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Knowledge is used to design the blueprint for constructing a program that meets specifications.
© WikimediaML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Knowledge is used to decide the form a program should take:
• Plant the seed (algorithm) • Feed/water (data) • Reap the plants (programs)
© WikimediaML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
12ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
© Kono Designs© Unknown
MISTAKES WERE MADE© Unknown
Josh Wills Keynote @ DataEngConf SF16
14ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
THE LOVE
• Acceptance Criteria by all parties
• Meta Cost Functions
• The Right Tool for the Job
• Golden Data Sets
• Testing Harnesses© Charles M. Schulz
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 201615
LET’S LOOK AT 2 USE CASES:
16ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Flagship: Novel product feature
Under the hood: Anomaly detection and handling
LET’S LOOK AT 2 USE CASES:
17ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Flagship: Novel product feature
Under the hood: Anomaly detection and handling
WEATHER NORMALIZATION
18
• What if we could compare buildings in a portfolio across time and space?
• We have developed a novel methodology that delivers even more granularity than the ‘daily’ industry standard.
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
PROBLEM DEFINITION
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 201619
Build a cloud-based application to allow for apples-to-apples
comparison of energy consumption (across locations, time
frames, building types, etc.) in daily batched transformations at
an hourly resolution.
ACCEPTANCE CRITERIA
• Weather normalization is a typical (though burdensome) calculation that ASHRAE Engineers perform—and they know when it “looks right” (e.g. similar days have similar values, hot days display a discount and vice versa, etc.)
• Practical implementation required since the normalized data series are calculated from large native interval datasets.
• Actionable with limited training, meaning that the solution has to be robust to small increments of data since we can’t retrain the generative algorithm every day.
20ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
META COST FUNCTION
• Computation costs• Model explainability• Product requirements• Data availability• Architectural complexity
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Cost = 𝑓( Model, Business )
• Bias/Variance trade-off• Computational complexity• Cross validation• Design optimization
GOLDEN DATA SET
22
Input Output
𝑎0 𝑎0’
𝑏0 𝑏0’
𝑐0 𝑐0’
… …
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Input
𝑎0
𝑏0
𝑐0
…
Output
𝑎0’
𝑏0’
𝑐0’
…
Output
𝑎0’
𝑏0’
𝑐0’
…𝑓(𝑥)=𝐶𝑥+𝜀
TESTING HARNESSES
23ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Input
𝑎0
𝑏0
𝑐0
…
Output
𝑎0’
𝑏0’
𝑐0’
…𝑓(𝑥) = 𝐶𝑖 𝑥 + 𝜀 𝑓(𝑥) = 𝐶𝑖+𝜀 𝑥 + 𝜀
There’s more than one way to skin a cat. But cats are adorable, so why would you want to skin one?!
© Pets4Homes
24ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 201625
© The Lion’s Choice
THE RIGHT TOOL FOR THE JOB
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 201626
LET’S LOOK AT 2 USE CASES:
27ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
Flagship: Novel product feature
Under the hood: Anomaly detection and handling
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
DATA QUALITY
THE THING ABOUT RELIABILITY
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
© World of HDR© Left Handed Guitarists
29
VEE: VERIFICATION, EDITING, AND ESTIMATION
30
Before After
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
PROBLEM DEFINITION
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 201631
Need reliable data to underpin all other analytics on
the platform to run on all incoming time series with
minimal latency, and near-realtime performance.
ACCEPTANCE CRITERIA
• Has to supportable by Production Operations team whenever there’s an issue; could not scale if Data Science needed at 2am
• Conservative data handling required from regulatory perspective, overwriting good data with bad data is unacceptable
• Rigid service-level contracts around each step (configuration, flagging, estimation, etc.) so all teams knew what to work on, what was available when, and what to expect.
32ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
33ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 201634
THE RIGHT TOOL FOR THE JOB
© The Lion’s Choice
ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 201635
© The Lion’s Choice
ONE MORE THING…
© Kono Designs
36ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
QUESTIONS?ML in `prd` by @angebassa; probably doesn’t make sense without audio ai.withthebest.com 2016
©Netflix
THANKS!
@AngeBassa • September, 2016
Recommended