Upload
university-of-waterloo
View
55.274
Download
4
Embed Size (px)
Citation preview
PowerPoint
Terry Taewoong Um ([email protected]) University of Waterloo Department of Electrical & Computer Engineering
Terry Taewoong Um
Introduction to Machine Learning and Deep Learning1T-robotics.blogspot.comFacebook.com/TRobotics
1
Terry Taewoong Um ([email protected])CAUTIONI cannot explain everythingYou cannot get every details 2
Try to get a big pictureGet some useful keywordsConnect with your research
2
Terry Taewoong Um ([email protected])ContentsWhat is Machine Learning?
What is Deep Learning?3
3
Terry Taewoong Um ([email protected])Contents4What is Machine Learning?
4
Terry Taewoong Um ([email protected])What is Machine Learning?"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E T. Michell (1997)
Example: A program for soccer tactics5T : Win the gameP : GoalsE : (x) Players movements (y) Evaluation
5
Terry Taewoong Um ([email protected])What is Machine Learning?6
Toward learning robot table tennis, J. Peters et al. (2012) https://youtu.be/SH3bADiB7uQ"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E T. Michell (1997)
6
Terry Taewoong Um ([email protected])Tasks7 classification discrete target values
x : pixels (28*28)y : 0,1, 2,3,,9
regression real target valuesy : 0,1, 2,3,,9
clustering no target values"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E T. Michell (1997)
7
Terry Taewoong Um ([email protected])Performance8"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E T. Michell (1997)
classification 0-1 loss function
regression L2 loss function
clustering
8
Terry Taewoong Um ([email protected])eXperience9"A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E T. Michell (1997)
classification labeled data
(pixels)(number)
regression labeled data
(x) (y)
clustering
unlabeled data
(x1,x2)
9
Terry Taewoong Um ([email protected])A Toy Example10
?Height(cm)Weight(kg)[Input X][Output Y]
10
Terry Taewoong Um ([email protected])11
180Height(cm)Weight(kg)
80Y = aX+bModel : Y = aX+bParameter : (a, b)
[Goal] Find (a,b) which best fits the given dataA Toy Example
11
Terry Taewoong Um ([email protected])12[Analytic Solution] Least square problem
(from AX = b, X=A#b whereA# is As pseudo inverse)Not always available
[Numerical Solution]1. Set a cost function
2. Apply an optimization method (e.g. Gradient Descent (GD) Method)
L(a,b)http://www.yaldex.com/game-development/1592730043_ch18lev1sec4.html
Local minima problemhttp://mnemstudio.org/neural-networks-multilayer-perceptron-design.htmA Toy Example
12
Terry Taewoong Um ([email protected])13
32Age(year)RunningRecord(min)140
What would be the correct model?Select a model Set a cost function Optimization
13
Terry Taewoong Um ([email protected])14
?XY
What would be the correct model?1. Regularization2. Nonparametric modeloverfitting
14
Terry Taewoong Um ([email protected])15L2 Regularization
(e.g. w=(a,b) where Y=aX+b)
Avoid a complicated model!
Another interpretation : : Maximum a Posteriori (MAP)http://goo.gl/6GE2ix
http://goo.gl/6GE2ix
15
Terry Taewoong Um ([email protected])16What would be the correct model?1. Regularization2. Nonparametric model
training timeerror
training errortest errorwe should stop here
trainingsetvalidationsettestsetfor training(parameter optimization)for early stopping(avoid overfitting)for evaluation(measure theperformance)
keep watching the validation error
16
Terry Taewoong Um ([email protected])17NonParametric ModelIt does not assume any parametric models (e.g. Y = aX+b, Y=aX2+bX+c, etc.)It often requires much more samples
Kernel methods are frequently applied for modeling the dataGaussian Process Regression (GPR), a sort of kernel method, is a widely-used nonparametric regression methodSupport Vector Machine (SVM), also a sort of kernel method, is a widely-used nonparametric classification method
kernel function[Input space][Feature space]
17
Terry Taewoong Um ([email protected])18Support Vector Machine (SVM)
Myo, Thalmic Labs (2013) https://youtu.be/oWu9TFJjHaM
[Linear classifiers][Maximum margin]
Support vector Machine Tutorial, J. Weston, http://goo.gl/19ywcj
[Dual formulation] ( )kernel functionkernel function
18
Terry Taewoong Um ([email protected])19Gaussian Process Regression (GPR)https://youtu.be/YqhLnCm0KXY
https://youtu.be/kvPmArtVoFE
Gaussian DistributionMultivariate regression likelihoodposteriorpriorlikelihood
predictionconditioning the joint distribution of the observed & predicted values
https://goo.gl/EO54WN
http://goo.gl/XvOOmf
19
Terry Taewoong Um ([email protected])20Dimension reduction
[Original space][Feature space]
low dim.high dim.high dim.low dim.Principal Component Analysis
: Find the best orthogonal axes (=principal components) which maximize the variance of the data
Y = P X
20
Terry Taewoong Um ([email protected])21Dimension reduction
http://jbhuang0604.blogspot.kr/2013/04/miss-korea-2013-contestants-face.html
21
Terry Taewoong Um ([email protected])22SUMMARY - Part 1Machine Learning - Tasks : Classification, Regression, Clustering, etc. - Performance : 0-1 loss, L2 loss, etc. - Experience : labeled data, unlabelled data Machine Learning Process (1) Select a parametric / nonparametric model (2) Set a performance measurement including regularization term (3) Training data (optimizing parameters) until validation error increases (4) Evaluate the final performance using test setNonparametric model : Support Vector Machine, Gaussian Process RegressionDimension reduction : used as pre-processing data
22