Wrangleconf Big Data Malaysia 2016

Overview

● Brief Skymind Intro● Deep Learning outside research● Core trends for ROI in deep learning● Anomaly Detection with deep learning● Simbox fraud detection for telco● Network Intrusion● Fintech securities churn prediction● Real time corporate campus security: Detecting

dangerous objects

Distributed Deep RL on Spark

We builtDeeplearning4j

SKYMIND INTELLIGENCE LAYER (SKIL)REFERENCE ARCHITECTURE

Deep Learning outside research

● Too much hype● Most companies rarely do machine learning let

alone deep learning● Beginners try to jump to deep learning after

andrew ng’s coursera class without first principles

This is not deep learning.

This is deep learning.


● Mostly python and r on kaggle● Many learning from udacity● Most deep learning is research stage/enthusiast● Salaried engineers doing DL mostly publishing

papers● Large fight for talent (see google fellowship)


● Deep Learning hasn’t penetrated the fortune 2000

● Fortune 2000 wants ROI not cat pictures● Many organizations just NOW starting to take

software seriously let alone data science● Use cases for deep learning still not widely

understood● Large fight for talent (see google fellowship)

Core trends for ROI in DL

● Mostly funded by adtech companies● Companies doing DL have data from lots of

media data (audio,image,video)● Many companies using DL for ad targeting ● Best use cases are targeting understanding large

scale hidden patterns in data (often cross domain)

● Time series has largely been ignored


● Initial first attempts at deep learning following papers (no other examples)

● Many companies end up sticking to simpler techniques after trying DL

● Expectations for DL tend to match hype not reality

● Some rare cases exist outside this trend (mainly in asia)


For more trends see: https://www.oreilly.com/ideas/the-current-state-of-machine-intelligence-3-0

Anomaly Detection

● “Find the needle in the haystack”● “Find the bad guy”● “The machines about to break!”● “Find the next market rally”● “Take action on said anomaly”

Anomaly Detection with deep learning

● Both unsupervised and supervised techniques● LSTMs (time series neural net)● Autoencoders (unsupervised)● Expectations for DL tend to match hype not

reality● Some rare cases exist outside this trend (mainly

in asia)

LSTM

AutoEncoder

Simbox fraud for telco

● Costs telco over 3 billion yearly ● Route calls for free over a carrier network● Need to mine raw call detail records to find● Find and cluster fraudulent CDRs with

autoencoders (unsupervised)● Beats current rules and supervised based

approaches

Network Intrusion

● Raw web log traffic ● Detect attacks at points of origin ● Typically supervised learning● Goal: Classify raw time series to find attacks● Optional: Detect *kind* of attack

Fintech securities churn prediction

● Predict when user is going to leaveservice● Using recurrent nets find likelihood of leaving ● Using lift curves identify budget for sending

discounts to percentage of users “worth” saving● Optional: use autoencoders with kmeans toidentify groups of users wanting to leave

Corporate campus security

● At 30 FPS or more find dangerous objects in a crowd

● Identify a target object and send immediate report

● Uses variants of Convolutional nets● Imagine hooking this up to a real camera

Conclusion

● Deep Learning still young● Many use cases not being tried● Research is moving faster every year● Talent still hard to find● Will become more common with time

Art & Photos

Wrangleconf Big Data Malaysia 2016