Upload
liang-zeng-han
View
224
Download
1
Embed Size (px)
DESCRIPTION
semi slide for Christopher M. Bishop 'Pattern Recognition and Machine Learning' chapter.1
Citation preview
.......
PRML Chapter 1 : Introduction
Liang Zeng Han
2013/3/19(Tue)
2013/3/19(Tue)
Introduction
About PRML
Pattern Recognition and Machine Learning, Christopher. M. Bishop,2006, Springer-Verlag New York
Support Page :http://research.microsoft.com/en-us/um/people/cmbishop/PRML/
Wikihttp://ibisforest.org/index.php?PRML
: https://github.com/herumi/prml
2013/3/19(Tue)
Introduction
What is expected to Machine Learning
(ex. face detection) (ex. collaborative ltering)
2013/3/19(Tue)
Example
Handwritten Digit Recognition
Introduction
Preprocessing
(feature extraction) : (,PCA) :
2013/3/19(Tue)
Introduction
Classication of Learning
...1 Supervised learning
classication : regression : ()
...2 Unsupervised learning
clustering density estimation visualiation
...3 Reinforcement learning
2013/3/19(Tue)
Polynomial Curve Fitting
Polynomial Curve Fitting
2013/3/19(Tue)
Polynomial Curve Fitting
Sum-of-Squares Error Function
0th Order Polynomial
1st Order Polynomial
3rd Order Polynomial
9th Order Polynomial
Over-fitting
Root-Mean-Square (RMS) Error:
Polynomial Curve Fitting
sin(2x) Taylor :
sin(2x) =1Xn=1
(1)n+1 (2)2n1
(2n 1)!x2n1
= 2x (2)3
3!x3 +
(2)5
5!x5 (2)
7
7!x7
' 6:28x 41:34x3 + 81:61x5 76:71x7
M !1
w2n1 ! (1)n+1 (2)2n1
(2n 1)! ; w2n ! 0
?
2013/3/19(Tue)
Polynomial Coefficients
Data Set Size:
9th Order Polynomial
Data Set Size:
9th Order Polynomial
Regularization
Penalize large coefficient values
Regularization:
Regularization:
Regularization: vs.
Polynomial Coefficients
Probability Theory
Probability Theory
2013/3/19(Tue)
Probability Theory
Apples and Oranges
Probability Theory
Marginal Probability
Conditional ProbabilityJoint Probability
Probability Theory
Sum Rule
Product Rule
The Rules of Probability
Sum Rule
Product Rule
Sum Rule
Product Rule
Bayes Theorem
posterior v likelihood prior
Probability Densities
Transformed Densities
Expectations
Conditional Expectation(discrete)
Approximate Expectation(discrete and continuous)
Variances and Covariances
Probability Theory
Rethink : Bayes Theorem
prior probability : p(w)D likelihood function : p(Djw)w D posterior probability : p(wj D)D
p(wj D) = p(Djw)p(w)p(D)
p(D) =Z
p(Djw)p(w)dw
2013/3/19(Tue)
The Gaussian Distribution
Gaussian Mean and Variance
The Multivariate Gaussian
Gaussian Parameter Estimation
Likelihood function
Maximum (Log) Likelihood
Properties of and
Curve Fitting Re-visited
Maximum Likelihood
Determine by minimizing sum-of-squares error, .
Predictive Distribution
MAP: A Step towards Bayes
Determine by minimizing regularized sum-of-squares error, .
Bayesian Curve Fitting
Bayesian Predictive Distribution
Model Selection
Model Selection
2013/3/19(Tue)
Model Selection
Cross-Validation
Curse of Dimensionality
Curse of Dimensionality
2013/3/19(Tue)
Curse of Dimensionality
Curse of Dimensionality
Polynomial curve fitting, M = 3
Gaussian Densities in higher dimensions
Decision Theory
Decision Theory
2013/3/19(Tue)
Decision Theory
Inference step
Determine either or .
Decision stepFor given x, determine optimal t.
Minimum Misclassification Rate
Minimum Expected Loss
Example: classify medical images as cancer or normal
DecisionT
r
u
t
h
Minimum Expected Loss
Regions are chosen to minimize
Reject Option
Why Separate Inference and Decision?
Minimizing risk (loss matrix may change over time) Reject option Unbalanced class priors Combining models
Decision Theory for Regression
Inference step
Determine .
Decision stepFor given x, make optimal prediction, y(x), for t.
Loss function:
The Squared Loss Function
Generative vs Discriminative
Generative approach:
Model
Use Bayes theorem
Discriminative approach:
Model directly
Information Theory
Information Theory
2013/3/19(Tue)
Entropy
Important quantity in coding theory statistical physicsmachine learning
Entropy
Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x?
All states equally likely
Entropy
Entropy
In how many ways can N identical objects be allocated Mbins?
Entropy maximized when
Entropy
Differential Entropy
Put bins of width along the real line
Differential entropy maximized (for fixed ) when
in which case
Conditional Entropy
The Kullback-Leibler Divergence
Mutual Information
prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_2prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_3-9prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_10-17prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_18-26prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_27-38prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_39prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_40-41prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_42-50prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory
prml-slides-1_51-59