B2B Travel technology Machine Learning Introduction October 20 th 2016

Introduction Machine Learning by MyLittleAdventure

Embed Size (px)

Citation preview

Page 1: Introduction Machine Learning by MyLittleAdventure

B2B Travel technology

Machine Learning IntroductionOctober 20th 2016

Page 2: Introduction Machine Learning by MyLittleAdventure


Johnny RAHAJARISON @[email protected]

MyLittleAdventure @mylitadventure


Page 3: Introduction Machine Learning by MyLittleAdventure


What’s Machine Learning ?Usage examples

ComplexityAlgorithm families

Let’s go!TroubleshootTech insights Next stepsConclusion


Page 4: Introduction Machine Learning by MyLittleAdventure

Machine learning



Page 5: Introduction Machine Learning by MyLittleAdventure

What’s Machine Learning ?

Software that do something without being explicitly programmed to, just by learning through examples

Same software can be used for various tasks

It learns from experiences with respect to some task and performance, and improves through experience


Page 6: Introduction Machine Learning by MyLittleAdventure

Usage examples (1/2)


Some typical usage examples

Page 7: Introduction Machine Learning by MyLittleAdventure

Use cases : MyLittleAdventure (2/2)


Language detection


Anomaly detection


Chose of parameters

MyLittleAdventure usage


Page 8: Introduction Machine Learning by MyLittleAdventure



"""Tests for convolution related functionality in tensorflow.ops.nn.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function

import numpy as np from six.moves import xrange # pylint: disable=redefined-builtin import tensorflow as tf

class Conv2DTransposeTest(tf.test.TestCase):

def testConv2DTransposeSingleStride(self): with self.test_session(): strides = [1, 1, 1, 1]

# Input, output: [batch, height, width, depth] x_shape = [2, 6, 4, 3] y_shape = [2, 6, 4, 2]

# Filter: [kernel_height, kernel_width, output_depth, input_depth] f_shape = [3, 3, 2, 3]

x = tf.constant(1.0, shape=x_shape, name="x", dtype=tf.float32) f = tf.constant(1.0, shape=f_shape, name="filter", dtype=tf.float32) output = tf.nn.conv2d_transpose(x, f, y_shape, strides=strides, padding="SAME") value = output.eval()

# We count the number of cells being added at the locations in the output. # At the center, #cells=kernel_height * kernel_width # At the corners, #cells=ceil(kernel_height/2) * ceil(kernel_width/2) # At the borders, #cells=ceil(kernel_height/2)*kernel_width or # kernel_height * ceil(kernel_width/2)

for n in xrange(x_shape[0]): for k in xrange(f_shape[2]): for w in xrange(y_shape[2]): for h in xrange(y_shape[1]): target = 4 * 3.0 h_in = h > 0 and h < y_shape[1] - 1 w_in = w > 0 and w < y_shape[2] - 1 if h_in and w_in: target += 5 * 3.0

"""GradientDescent for TensorFlow.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function

from tensorflow.python.framework import ops from tensorflow.python.ops import math_ops from tensorflow.python.training import optimizer from tensorflow.python.training import training_ops

class GradientDescentOptimizer(optimizer.Optimizer): """Optimizer that implements the gradient descent algorithm. @@__init__ """

def __init__(self, learning_rate, use_locking=False, name="GradientDescent"): """Construct a new gradient descent optimizer. Args: learning_rate: A Tensor or a floating point value. The learning rate to use. use_locking: If True use locks for update operations. name: Optional name prefix for the operations created when applying gradients. Defaults to "GradientDescent". """ super(GradientDescentOptimizer, self).__init__(use_locking, name) self._learning_rate = learning_rate

def _apply_dense(self, grad, var): return training_ops.apply_gradient_descent( var, math_ops.cast(self._learning_rate_tensor, var.dtype.base_dtype), grad, use_locking=self._use_locking).op

def _apply_sparse(self, grad, var): delta = ops.IndexedSlices( grad.values *

"""Tests for tensorflow.ops.linalg_grad.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function

import numpy as np import tensorflow as tf

class ShapeTest(tf.test.TestCase):

def testBatchGradientUnknownSize(self): with self.test_session(): batch_size = tf.constant(3) matrix_size = tf.constant(4) batch_identity = tf.tile( tf.expand_dims( tf.diag(tf.ones([matrix_size])), 0), [batch_size, 1, 1]) determinants = tf.matrix_determinant(batch_identity) reduced = tf.reduce_sum(determinants) sum_grad = tf.gradients(reduced, batch_identity)[0] self.assertAllClose(batch_identity.eval(), sum_grad.eval())

class MatrixUnaryFunctorGradientTest(tf.test.TestCase): pass # Filled in below

def _GetMatrixUnaryFunctorGradientTest(functor_, dtype_, shape_, **kwargs_):

def Test(self): with self.test_session(): np.random.seed(1) m = np.random.uniform(low=-1.0, high=1.0, size=np.prod(shape_)).reshape(shape_).astype(dtype_) a = tf.constant(m) b = functor_(a, **kwargs_)

# Optimal stepsize for central difference is O(epsilon^{1/3}). epsilon = np.finfo(dtype_).eps delta = 0.1 * epsilon**(1.0 / 3.0) # tolerance obtained by looking at actual differences using # np.linalg.norm(theoretical-numerical, np.inf) on -mavx build

Complex algorithm before

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

And Machine learning now…

Page 9: Introduction Machine Learning by MyLittleAdventure

Machine learning

Algorithm families


Page 10: Introduction Machine Learning by MyLittleAdventure

Supervised algorithms


Supervised algorithms


Page 11: Introduction Machine Learning by MyLittleAdventure

Unsupervised algorithms


Unsupervised algorithms

ClusteringAnomaly detection

Page 12: Introduction Machine Learning by MyLittleAdventure

Machine learning

Let’s go!


Page 13: Introduction Machine Learning by MyLittleAdventure



Collect Training data

Files, database, cache, data flow

Selection of model, and (hyper) parameters

Train algorithm

Use or store your trained estimator

Make predictions

Measure accuracy precision


Page 14: Introduction Machine Learning by MyLittleAdventure

Collect training data

Get qualitative data

Get some samples

Don’t get data for months and then try Go fast and try things.


Weight (g) Width (cm) Height (cm) Label

192 8.4 7.3 Granny smith apple

86 6.2 4.7 Mandarin

178 7.1 7.8 Braeburn apple

162 7.4 7.2 Cripps pink apple

118 6.1 8.1 Unidentified lemons

144 6.8 7.4 Turkey orange

362 9.6 9.2 Spanish jumbo orange

… … … …

What about the data?

Fruit identification example

Page 15: Introduction Machine Learning by MyLittleAdventure

Prepare your data


Numerize your features and labels

Put them in same scale (normalization) ?

Weight (g) Width (cm) Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 3

162 7.4 7.2 5

118 6.1 8.1 10

144 6.8 7.4 8

362 9.6 9.2 9

… … … …

We need to have some tests

Training set Learning phase (60% - 80 %)

Test set Analytics phase (20% - 40%)

Page 16: Introduction Machine Learning by MyLittleAdventure


Prepare your data (code)

Page 17: Introduction Machine Learning by MyLittleAdventure

Train algorithm


Choose a classifier

Fit the decision tree

Weight (g) Width (cm) Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 3

162 7.4 7.2 5

118 6.1 8.1 10

144 6.8 7.4 8

362 9.6 9.2 9

… … … …

We need to choose an estimator

Page 18: Introduction Machine Learning by MyLittleAdventure

Make predictions


What looks our predictions?

Weight (g)

Width (cm)

Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 3

Test set

Weight (g)

Width (cm)

Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 1



Page 19: Introduction Machine Learning by MyLittleAdventure

Measure (1/2)

Evaluate on the dataset that as never ever been learned by your model



Correct predictions / total predictions

Gives a simple confidence score of our performance level

Page 20: Introduction Machine Learning by MyLittleAdventure

Measure (2/2)

Try to visualize and analyze your data, and know what you want


Actual true Actual false

Predicted true

True positive

False positive

Predicted false

False negative

True negative

Confusion Matrix

Skewed classes

Precision = True positives / #predicted positives

Recall = True positives / #actual positives

F1 score (trade-off) = (precision * recall) / (precision + recall)

Page 21: Introduction Machine Learning by MyLittleAdventure


Measure and prediction (code)

Page 22: Introduction Machine Learning by MyLittleAdventure

Machine learning



Page 23: Introduction Machine Learning by MyLittleAdventure

Troubleshoot (1/4)


Under/Overfitting situation

Page 24: Introduction Machine Learning by MyLittleAdventure

Troubleshoot (2/4)


Add / create more features

Use more sophisticated model

Use fewer samples

Decrease regularization



Use fewer features

Use more simple model

Use more samples

Increase regularization

What are the different options ?

Page 25: Introduction Machine Learning by MyLittleAdventure

Troubleshoot (3/4)


Underfitting Overfitting

Using the learning curves…

Page 26: Introduction Machine Learning by MyLittleAdventure

Troubleshoot : Model choice (4/4)


Page 27: Introduction Machine Learning by MyLittleAdventure

Machine learning

Tech insights


Page 28: Introduction Machine Learning by MyLittleAdventure

Platforms : easy, peasy

You don’t even have to code to build something (*wink wink* business developers)

Built-in models

Data munging

Model management by UI



Very high-level solutions

Page 29: Introduction Machine Learning by MyLittleAdventure


For understanding & prototyping implementation

Most Valuable LanguagesComfortable for prototyping,

yet powerful for industrialisation

For bigger companies & projects, and fine-tuned



Matlab Octave Go Python Java C++

What language for what purpose ?

Page 30: Introduction Machine Learning by MyLittleAdventure


Built-in models

Data munging


Full integration to your product


You will have great power using a library


Page 31: Introduction Machine Learning by MyLittleAdventure

Machine learning

Next steps…


Page 32: Introduction Machine Learning by MyLittleAdventure

Next steps

Split your data in 3 : Training / Cross validation / Test set

Know the top algorithms

Search advanced techniques and optimizers (online learning, stacking)

Deep and reinforcement learning

Partial and semi-supervised learning

Transfer learning

How to store and analyse big data ? How do we scale ? !32

Try it ! Find your best tools and have some fun

Page 33: Introduction Machine Learning by MyLittleAdventure


Try it and let’s get in touch!

Machine learning is not just a buzz word

Difficulties are not always what we think!

Machine learning is rather experiences and tests than just algorithms

There is no perfect unique solution

There is plenty of easy to use solutions for beginners


Page 34: Introduction Machine Learning by MyLittleAdventure

Machine learning

One more thing!


Page 35: Introduction Machine Learning by MyLittleAdventure



Page 36: Introduction Machine Learning by MyLittleAdventure

Tensorflow learn


Page 37: Introduction Machine Learning by MyLittleAdventure

Thank youMachine Learning Introduction

October 20th 2016

Questions ?