Introduction Machine Learning by MyLittleAdventure

B2B Travel technology

Machine Learning IntroductionOctober 20th 2016

Introduction

Johnny RAHAJARISON @brainstorm_mejohnny.rahajarison@mylittleadventure.com

MyLittleAdventure @mylitadventure

Agenda

What’s Machine Learning ?Usage examples

ComplexityAlgorithm families

Let’s go!TroubleshootTech insights Next stepsConclusion

Machine learning

Introduction

What’s Machine Learning ?

Software that do something without being explicitly programmed to, just by learning through examples

Same software can be used for various tasks

It learns from experiences with respect to some task and performance, and improves through experience

Usage examples (1/2)

Some typical usage examples

Use cases : MyLittleAdventure (2/2)

Language detection

Clustering

Anomaly detection

Recommendation

Chose of parameters

MyLittleAdventure usage

Complexity

"""Tests for convolution related functionality in tensorflow.ops.nn.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function

import numpy as np from six.moves import xrange # pylint: disable=redefined-builtin import tensorflow as tf

class Conv2DTransposeTest(tf.test.TestCase):

def testConv2DTransposeSingleStride(self): with self.test_session(): strides = [1, 1, 1, 1]

# Input, output: [batch, height, width, depth] x_shape = [2, 6, 4, 3] y_shape = [2, 6, 4, 2]

# Filter: [kernel_height, kernel_width, output_depth, input_depth] f_shape = [3, 3, 2, 3]

x = tf.constant(1.0, shape=x_shape, name="x", dtype=tf.float32) f = tf.constant(1.0, shape=f_shape, name="filter", dtype=tf.float32) output = tf.nn.conv2d_transpose(x, f, y_shape, strides=strides, padding="SAME") value = output.eval()

# We count the number of cells being added at the locations in the output. # At the center, #cells=kernel_height * kernel_width # At the corners, #cells=ceil(kernel_height/2) * ceil(kernel_width/2) # At the borders, #cells=ceil(kernel_height/2)*kernel_width or # kernel_height * ceil(kernel_width/2)

for n in xrange(x_shape[0]): for k in xrange(f_shape[2]): for w in xrange(y_shape[2]): for h in xrange(y_shape[1]): target = 4 * 3.0 h_in = h > 0 and h < y_shape[1] - 1 w_in = w > 0 and w < y_shape[2] - 1 if h_in and w_in: target += 5 * 3.0

"""GradientDescent for TensorFlow.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function

from tensorflow.python.framework import ops from tensorflow.python.ops import math_ops from tensorflow.python.training import optimizer from tensorflow.python.training import training_ops

class GradientDescentOptimizer(optimizer.Optimizer): """Optimizer that implements the gradient descent algorithm. @@__init__ """

def __init__(self, learning_rate, use_locking=False, name="GradientDescent"): """Construct a new gradient descent optimizer. Args: learning_rate: A Tensor or a floating point value. The learning rate to use. use_locking: If True use locks for update operations. name: Optional name prefix for the operations created when applying gradients. Defaults to "GradientDescent". """ super(GradientDescentOptimizer, self).__init__(use_locking, name) self._learning_rate = learning_rate

def _apply_dense(self, grad, var): return training_ops.apply_gradient_descent( var, math_ops.cast(self._learning_rate_tensor, var.dtype.base_dtype), grad, use_locking=self._use_locking).op

def _apply_sparse(self, grad, var): delta = ops.IndexedSlices( grad.values *

"""Tests for tensorflow.ops.linalg_grad.""" from __future__ import absolute_import from __future__ import division from __future__ import print_function

import numpy as np import tensorflow as tf

class ShapeTest(tf.test.TestCase):

def testBatchGradientUnknownSize(self): with self.test_session(): batch_size = tf.constant(3) matrix_size = tf.constant(4) batch_identity = tf.tile( tf.expand_dims( tf.diag(tf.ones([matrix_size])), 0), [batch_size, 1, 1]) determinants = tf.matrix_determinant(batch_identity) reduced = tf.reduce_sum(determinants) sum_grad = tf.gradients(reduced, batch_identity)[0] self.assertAllClose(batch_identity.eval(), sum_grad.eval())

class MatrixUnaryFunctorGradientTest(tf.test.TestCase): pass # Filled in below

def _GetMatrixUnaryFunctorGradientTest(functor_, dtype_, shape_, **kwargs_):

def Test(self): with self.test_session(): np.random.seed(1) m = np.random.uniform(low=-1.0, high=1.0, size=np.prod(shape_)).reshape(shape_).astype(dtype_) a = tf.constant(m) b = functor_(a, **kwargs_)

# Optimal stepsize for central difference is O(epsilon^{1/3}). epsilon = np.finfo(dtype_).eps delta = 0.1 * epsilon**(1.0 / 3.0) # tolerance obtained by looking at actual differences using # np.linalg.norm(theoretical-numerical, np.inf) on -mavx build

Complex algorithm before

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

And Machine learning now…

Machine learning

Algorithm families

Supervised algorithms

ClassificationRegression

Unsupervised algorithms

ClusteringAnomaly detection

Machine learning

Let’s go!

Recipe

Collect Training data

Files, database, cache, data flow

Selection of model, and (hyper) parameters

Train algorithm

Use or store your trained estimator

Make predictions

Measure accuracy precision

Measure

Collect training data

Get qualitative data

Get some samples

Don’t get data for months and then try Go fast and try things.

Weight (g) Width (cm) Height (cm) Label

192 8.4 7.3 Granny smith apple

86 6.2 4.7 Mandarin

178 7.1 7.8 Braeburn apple

162 7.4 7.2 Cripps pink apple

118 6.1 8.1 Unidentified lemons

144 6.8 7.4 Turkey orange

362 9.6 9.2 Spanish jumbo orange

… … … …

What about the data?

Fruit identification example

Prepare your data

Numerize your features and labels

Put them in same scale (normalization) ?

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 3

162 7.4 7.2 5

118 6.1 8.1 10

144 6.8 7.4 8

362 9.6 9.2 9

… … … …

We need to have some tests

Training set Learning phase (60% - 80 %)

Test set Analytics phase (20% - 40%)

Prepare your data (code)

Train algorithm

Choose a classifier

Fit the decision tree

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 3

162 7.4 7.2 5

118 6.1 8.1 10

144 6.8 7.4 8

362 9.6 9.2 9

… … … …

We need to choose an estimator

Make predictions

What looks our predictions?

Weight (g)

Width (cm)

Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 3

Test set

Weight (g)

Width (cm)

Height (cm) Label

192 8.4 7.3 1

86 6.2 4.7 2

178 7.1 7.8 1

Predictions

Measure (1/2)

Evaluate on the dataset that as never ever been learned by your model

Accuracy

Correct predictions / total predictions

Gives a simple confidence score of our performance level

Measure (2/2)

Try to visualize and analyze your data, and know what you want

Actual true Actual false

Predicted true

True positive

False positive

Predicted false

False negative

True negative

Confusion Matrix

Skewed classes

Precision = True positives / #predicted positives

Recall = True positives / #actual positives

F1 score (trade-off) = (precision * recall) / (precision + recall)

Measure and prediction (code)

Machine learning

Troubleshoot

Troubleshoot (1/4)

Under/Overfitting situation

Troubleshoot (2/4)

Underfitting

Add / create more features

Use more sophisticated model

Use fewer samples

Decrease regularization

Overfitting

Use fewer features

Use more simple model

Use more samples

Increase regularization

What are the different options ?

Troubleshoot (3/4)

Underfitting Overfitting

Using the learning curves…

Troubleshoot : Model choice (4/4)

Machine learning

Tech insights

Platforms : easy, peasy

You don’t even have to code to build something (*wink wink* business developers)

Built-in models

Data munging

Model management by UI

Very high-level solutions

Languages

For understanding & prototyping implementation

Most Valuable LanguagesComfortable for prototyping,

yet powerful for industrialisation

For bigger companies & projects, and fine-tuned

softwares

Matlab Octave Go Python Java C++

What language for what purpose ?

Libraries

Built-in models

Data munging

Fine-tuning

Full integration to your product

You will have great power using a library

Golearn

Machine learning

Next steps…

Next steps

Split your data in 3 : Training / Cross validation / Test set

Know the top algorithms

Search advanced techniques and optimizers (online learning, stacking)

Deep and reinforcement learning

Partial and semi-supervised learning

Transfer learning

How to store and analyse big data ? How do we scale ? !32

Try it ! Find your best tools and have some fun

Conclusion

Try it and let’s get in touch!

Machine learning is not just a buzz word

Difficulties are not always what we think!

Machine learning is rather experiences and tests than just algorithms

There is no perfect unique solution

There is plenty of easy to use solutions for beginners

Machine learning

One more thing!

Tensorflow

Tensorflow learn

Thank youMachine Learning Introduction

October 20th 2016

Questions ?

Introduction Machine Learning by MyLittleAdventure

Technology

CSC 311: Introduction to Machine Learning · 2020. 11. 4. · CSC 311: Introduction to Machine Learning Lecture 8 - Probabilistic Models Pt. II, PCA Roger Grosse Chris Maddison Juhan

Mncs 16-10-1주-변승규-introduction to the machine learning #2

Machine Learning

Introduction to ankus(data mining and machine learning open source)

Introduction to Machine Learning · Introduction to Machine Learning Mohammad Emtiyaz Khan エムティ Team Leader, Approximate Bayesian Inference Team RIKEN Center for Advanced

Introduction to Machine Learning Multivariate Methods

Machine Learning Probabilistic Machine Learning · Machine Learning Probabilistic Machine Learning learning as inference, Bayesian Kernel Ridge regression = Gaussian Processes, Bayesian

Introduction of Swift from Machine Learning

Machine Learning Foundations hxÒ - 國立臺灣大學htlin/course/ml13fall/doc/... · 2020-05-29 · The Learning Problem Course Introduction Course Design (1/2) Machine Learning:

Machine Learning: An introduction โดย รศ.ดร.สุรพงค์ เอื้อวัฒนามงคล

NigthClazz Spark - Machine Learning / Introduction à Spark et Zeppelin

Machine Learning Introduction

Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_470/02a_nb.pdf · Introduction to Machine Learning Natural Language Processing: Jordan Boyd-Graber

Introduction to Deep Learning for Dialogue Systems · 2020-07-20 · Outline I. Introduction to dialog systems II. Background • Machine learning • Deep learning and Neural networks

Introduction to Programming - Jun Jijun.hansung.ac.kr/ML/docs-slides-Lecture1-kr.pdf · 2016-09-01 · Introduction Supervised Learning Machine Learning. Andrew Ng 0 100 200 300 400

Machine Learning, Deep Learning and Data Analysis Introduction

Mncs 16-09-4주-변승규-introduction to the machine learning

Introduction to Machine Learning - UMIACSlegacydirs.umiacs.umd.edu/~jbg/teaching/CMSC_726/06b.pdf · 2020-04-24 · Machine Learning: Jordan Boyd-Graber jUMD Introduction to Machine

Machine Learning (CSE 446): Introduction Learning (CSE 446): Introduction Noah Smith c 2017 University of Washington nasmith@cs.washington.edu September 27, 2017 1/35

INTRODUCTION TO MACHINE LEARNING 19. Reinforcement Learning