37
B2B Travel technology Machine Learning Introduction October 20 th 2016

Introduction Machine Learning by MyLittleAdventure

Embed Size (px)

Citation preview

Page 1: Introduction Machine Learning by MyLittleAdventure

B2B Travel technology

Machine Learning IntroductionOctober 20th 2016

Page 2: Introduction Machine Learning by MyLittleAdventure

Introduction

Johnny RAHAJARISON @[email protected]

MyLittleAdventure @mylitadventure

2

Page 3: Introduction Machine Learning by MyLittleAdventure

Agenda

What’s Machine Learning ?Usage examples

ComplexityAlgorithm families

Let’s go!TroubleshootTech insights Next stepsConclusion

3!

Page 4: Introduction Machine Learning by MyLittleAdventure

Machine learning

Introduction

4

Page 5: Introduction Machine Learning by MyLittleAdventure

What’s Machine Learning ?

Software that do something without being explicitly programmed to, just by learning through examples

Same software can be used for various tasks

It learns from experiences with respect to some task and performance, and improves through experience

5!

Page 6: Introduction Machine Learning by MyLittleAdventure

Usage examples (1/2)

6!

Some typical usage examples

Page 7: Introduction Machine Learning by MyLittleAdventure

Use cases : MyLittleAdventure (2/2)

7

Language detection

Clustering

Anomaly detection

Recommendation

Chose of parameters

MyLittleAdventure usage

!

Page 8: Introduction Machine Learning by MyLittleAdventure

Complexity

8!

"""Tests for convolution related functionality in tensorflow.ops.nn.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function

import numpy as np from six.moves import xrange # pylint: disable=redefined-builtin import tensorflow as tf

class Conv2DTransposeTest(tf.test.TestCase):

def testConv2DTransposeSingleStride(self): with self.test_session(): strides = [1, 1, 1, 1]

# Input, output: [batch, height, width, depth] x_shape = [2, 6, 4, 3] y_shape = [2, 6, 4, 2]

# Filter: [kernel_height, kernel_width, output_depth, input_depth] f_shape = [3, 3, 2, 3]

x = tf.constant(1.0, shape=x_shape, name="x", dtype=tf.float32) f = tf.constant(1.0, shape=f_shape, name="filter", dtype=tf.float32) output = tf.nn.conv2d_transpose(x, f, y_shape, strides=strides, padding="SAME") value = output.eval()

# We count the number of cells being added at the locations in the output. # At the center, #cells=kernel_height * kernel_width # At the corners, #cells=ceil(kernel_height/2) * ceil(kernel_width/2) # At the borders, #cells=ceil(kernel_height/2)*kernel_width or # kernel_height * ceil(kernel_width/2)

for n in xrange(x_shape[0]): for k in xrange(f_shape[2]): for w in xrange(y_shape[2]): for h in xrange(y_shape[1]): target = 4 * 3.0 h_in = h > 0 and h < y_shape[1] - 1 w_in = w > 0 and w < y_shape[2] - 1 if h_in and w_in: target += 5 * 3.0

"""GradientDescent for TensorFlow.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function

from tensorflow.python.framework import ops from tensorflow.python.ops import math_ops from tensorflow.python.training import optimizer from tensorflow.python.training import training_ops

class GradientDescentOptimizer(optimizer.Optimizer): """Optimizer that implements the gradient descent algorithm. @@__init__ """

def __init__(self, learning_rate, use_locking=False, name="GradientDescent"): """Construct a new gradient descent optimizer. Args: learning_rate: A Tensor or a floating point value. The learning rate to use. use_locking: If True use locks for update operations. name: Optional name prefix for the operations created when applying gradients. Defaults to "GradientDescent". """ super(GradientDescentOptimizer, self).__init__(use_locking, name) self._learning_rate = learning_rate

def _apply_dense(self, grad, var): return training_ops.apply_gradient_descent( var, math_ops.cast(self._learning_rate_tensor, var.dtype.base_dtype), grad, use_locking=self._use_locking).op

def _apply_sparse(self, grad, var): delta = ops.IndexedSlices( grad.values *

"""Tests for tensorflow.ops.linalg_grad.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function

import numpy as np import tensorflow as tf

class ShapeTest(tf.test.TestCase):

def testBatchGradientUnknownSize(self): with self.test_session(): batch_size = tf.constant(3) matrix_size = tf.constant(4) batch_identity = tf.tile( tf.expand_dims( tf.diag(tf.ones([matrix_size])), 0), [batch_size, 1, 1]) determinants = tf.matrix_determinant(batch_identity) reduced = tf.reduce_sum(determinants) sum_grad = tf.gradients(reduced, batch_identity)[0] self.assertAllClose(batch_identity.eval(), sum_grad.eval())

class MatrixUnaryFunctorGradientTest(tf.test.TestCase): pass # Filled in below

def _GetMatrixUnaryFunctorGradientTest(functor_, dtype_, shape_, **kwargs_):

def Test(self): with self.test_session(): np.random.seed(1) m = np.random.uniform(low=-1.0, high=1.0, size=np.prod(shape_)).reshape(shape_).astype(dtype_) a = tf.constant(m) b = functor_(a, **kwargs_)

# Optimal stepsize for central difference is O(epsilon^{1/3}). epsilon = np.finfo(dtype_).eps delta = 0.1 * epsilon**(1.0 / 3.0) # tolerance obtained by looking at actual differences using # np.linalg.norm(theoretical-numerical, np.inf) on -mavx build

Complex algorithm before

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

And Machine learning now…

Page 9: Introduction Machine Learning by MyLittleAdventure

Machine learning

Algorithm families

9

Page 10: Introduction Machine Learning by MyLittleAdventure

Supervised algorithms

10

Supervised algorithms

ClassificationRegression

Page 11: Introduction Machine Learning by MyLittleAdventure

Unsupervised algorithms

11

Unsupervised algorithms

ClusteringAnomaly detection

Page 12: Introduction Machine Learning by MyLittleAdventure

Machine learning

Let’s go!

12

Page 13: Introduction Machine Learning by MyLittleAdventure

Recipe

!13

Collect Training data

Files, database, cache, data flow

Selection of model, and (hyper) parameters

Train algorithm

Use or store your trained estimator

Make predictions

Measure accuracy precision

Measure

Page 14: Introduction Machine Learning by MyLittleAdventure

Collect training data

Get qualitative data

Get some samples

Don’t get data for months and then try Go fast and try things.

14

Weight (g) Width (cm) Height (cm) Label

192 8.4 7.3 Granny smith apple

86 6.2 4.7 Mandarin

178 7.1 7.8 Braeburn apple

162 7.4 7.2 Cripps pink apple

118 6.1 8.1 Unidentified lemons

144 6.8 7.4 Turkey orange

362 9.6 9.2 Spanish jumbo orange

… … … …

What about the data?

Fruit identification example

Page 15: Introduction Machine Learning by MyLittleAdventure

Prepare your data

15

Numerize your features and labels

Put them in same scale (normalization) ?

Weight (g) Width (cm) Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 3

162 7.4 7.2 5

118 6.1 8.1 10

144 6.8 7.4 8

362 9.6 9.2 9

… … … …

We need to have some tests

Training set Learning phase (60% - 80 %)

Test set Analytics phase (20% - 40%)

Page 16: Introduction Machine Learning by MyLittleAdventure

16

Prepare your data (code)

Page 17: Introduction Machine Learning by MyLittleAdventure

Train algorithm

17

Choose a classifier

Fit the decision tree

Weight (g) Width (cm) Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 3

162 7.4 7.2 5

118 6.1 8.1 10

144 6.8 7.4 8

362 9.6 9.2 9

… … … …

We need to choose an estimator

Page 18: Introduction Machine Learning by MyLittleAdventure

Make predictions

18

What looks our predictions?

Weight (g)

Width (cm)

Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 3

Test set

Weight (g)

Width (cm)

Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 1

Predictions

!

Page 19: Introduction Machine Learning by MyLittleAdventure

Measure (1/2)

Evaluate on the dataset that as never ever been learned by your model

19!

Accuracy

Correct predictions / total predictions

Gives a simple confidence score of our performance level

Page 20: Introduction Machine Learning by MyLittleAdventure

Measure (2/2)

Try to visualize and analyze your data, and know what you want

20!

Actual true Actual false

Predicted true

True positive

False positive

Predicted false

False negative

True negative

Confusion Matrix

Skewed classes

Precision = True positives / #predicted positives

Recall = True positives / #actual positives

F1 score (trade-off) = (precision * recall) / (precision + recall)

Page 21: Introduction Machine Learning by MyLittleAdventure

21

Measure and prediction (code)

Page 22: Introduction Machine Learning by MyLittleAdventure

Machine learning

Troubleshoot

22

Page 23: Introduction Machine Learning by MyLittleAdventure

Troubleshoot (1/4)

23!

Under/Overfitting situation

Page 24: Introduction Machine Learning by MyLittleAdventure

Troubleshoot (2/4)

Underfitting

Add / create more features

Use more sophisticated model

Use fewer samples

Decrease regularization

24!

Overfitting

Use fewer features

Use more simple model

Use more samples

Increase regularization

What are the different options ?

Page 25: Introduction Machine Learning by MyLittleAdventure

Troubleshoot (3/4)

25!

Underfitting Overfitting

Using the learning curves…

Page 26: Introduction Machine Learning by MyLittleAdventure

Troubleshoot : Model choice (4/4)

26!

Page 27: Introduction Machine Learning by MyLittleAdventure

Machine learning

Tech insights

27

Page 28: Introduction Machine Learning by MyLittleAdventure

Platforms : easy, peasy

You don’t even have to code to build something (*wink wink* business developers)

Built-in models

Data munging

Model management by UI

PaaS

28!

Very high-level solutions

Page 29: Introduction Machine Learning by MyLittleAdventure

Languages

For understanding & prototyping implementation

Most Valuable LanguagesComfortable for prototyping,

yet powerful for industrialisation

For bigger companies & projects, and fine-tuned

softwares

29!

Matlab Octave Go Python Java C++

What language for what purpose ?

Page 30: Introduction Machine Learning by MyLittleAdventure

Libraries

Built-in models

Data munging

Fine-tuning

Full integration to your product

30!

You will have great power using a library

Golearn

Page 31: Introduction Machine Learning by MyLittleAdventure

Machine learning

Next steps…

31

Page 32: Introduction Machine Learning by MyLittleAdventure

Next steps

Split your data in 3 : Training / Cross validation / Test set

Know the top algorithms

Search advanced techniques and optimizers (online learning, stacking)

Deep and reinforcement learning

Partial and semi-supervised learning

Transfer learning

How to store and analyse big data ? How do we scale ? !32

Try it ! Find your best tools and have some fun

Page 33: Introduction Machine Learning by MyLittleAdventure

Conclusion

Try it and let’s get in touch!

Machine learning is not just a buzz word

Difficulties are not always what we think!

Machine learning is rather experiences and tests than just algorithms

There is no perfect unique solution

There is plenty of easy to use solutions for beginners

33!

Page 34: Introduction Machine Learning by MyLittleAdventure

Machine learning

One more thing!

34

Page 35: Introduction Machine Learning by MyLittleAdventure

Tensorflow

35

Page 36: Introduction Machine Learning by MyLittleAdventure

Tensorflow learn

36

Page 37: Introduction Machine Learning by MyLittleAdventure

Thank youMachine Learning Introduction

October 20th 2016

Questions ?