PRML Chap.1

Embed Size (px)

DESCRIPTION

semi slide for Christopher M. Bishop 'Pattern Recognition and Machine Learning' chapter.1

Citation preview

  • .......

    PRML Chapter 1 : Introduction

    Liang Zeng Han

    2013/3/19(Tue)

    2013/3/19(Tue)

  • Introduction

    About PRML

    Pattern Recognition and Machine Learning, Christopher. M. Bishop,2006, Springer-Verlag New York

    Support Page :http://research.microsoft.com/en-us/um/people/cmbishop/PRML/

    Wikihttp://ibisforest.org/index.php?PRML

    : https://github.com/herumi/prml

    2013/3/19(Tue)

  • Introduction

    What is expected to Machine Learning

    (ex. face detection) (ex. collaborative ltering)

    2013/3/19(Tue)

  • Example

    Handwritten Digit Recognition

  • Introduction

    Preprocessing

    (feature extraction) : (,PCA) :

    2013/3/19(Tue)

  • Introduction

    Classication of Learning

    ...1 Supervised learning

    classication : regression : ()

    ...2 Unsupervised learning

    clustering density estimation visualiation

    ...3 Reinforcement learning

    2013/3/19(Tue)

  • Polynomial Curve Fitting

    Polynomial Curve Fitting

    2013/3/19(Tue)

  • Polynomial Curve Fitting

  • Sum-of-Squares Error Function

  • 0th Order Polynomial

  • 1st Order Polynomial

  • 3rd Order Polynomial

  • 9th Order Polynomial

  • Over-fitting

    Root-Mean-Square (RMS) Error:

  • Polynomial Curve Fitting

    sin(2x) Taylor :

    sin(2x) =1Xn=1

    (1)n+1 (2)2n1

    (2n 1)!x2n1

    = 2x (2)3

    3!x3 +

    (2)5

    5!x5 (2)

    7

    7!x7

    ' 6:28x 41:34x3 + 81:61x5 76:71x7

    M !1

    w2n1 ! (1)n+1 (2)2n1

    (2n 1)! ; w2n ! 0

    ?

    2013/3/19(Tue)

  • Polynomial Coefficients

  • Data Set Size:

    9th Order Polynomial

  • Data Set Size:

    9th Order Polynomial

  • Regularization

    Penalize large coefficient values

  • Regularization:

  • Regularization:

  • Regularization: vs.

  • Polynomial Coefficients

  • Probability Theory

    Probability Theory

    2013/3/19(Tue)

  • Probability Theory

    Apples and Oranges

  • Probability Theory

    Marginal Probability

    Conditional ProbabilityJoint Probability

  • Probability Theory

    Sum Rule

    Product Rule

  • The Rules of Probability

    Sum Rule

    Product Rule

    Sum Rule

    Product Rule

  • Bayes Theorem

    posterior v likelihood prior

  • Probability Densities

  • Transformed Densities

  • Expectations

    Conditional Expectation(discrete)

    Approximate Expectation(discrete and continuous)

  • Variances and Covariances

  • Probability Theory

    Rethink : Bayes Theorem

    prior probability : p(w)D likelihood function : p(Djw)w D posterior probability : p(wj D)D

    p(wj D) = p(Djw)p(w)p(D)

    p(D) =Z

    p(Djw)p(w)dw

    2013/3/19(Tue)

  • The Gaussian Distribution

  • Gaussian Mean and Variance

  • The Multivariate Gaussian

  • Gaussian Parameter Estimation

    Likelihood function

  • Maximum (Log) Likelihood

  • Properties of and

  • Curve Fitting Re-visited

  • Maximum Likelihood

    Determine by minimizing sum-of-squares error, .

  • Predictive Distribution

  • MAP: A Step towards Bayes

    Determine by minimizing regularized sum-of-squares error, .

  • Bayesian Curve Fitting

  • Bayesian Predictive Distribution

  • Model Selection

    Model Selection

    2013/3/19(Tue)

  • Model Selection

    Cross-Validation

  • Curse of Dimensionality

    Curse of Dimensionality

    2013/3/19(Tue)

  • Curse of Dimensionality

  • Curse of Dimensionality

    Polynomial curve fitting, M = 3

    Gaussian Densities in higher dimensions

  • Decision Theory

    Decision Theory

    2013/3/19(Tue)

  • Decision Theory

    Inference step

    Determine either or .

    Decision stepFor given x, determine optimal t.

  • Minimum Misclassification Rate

  • Minimum Expected Loss

    Example: classify medical images as cancer or normal

    DecisionT

    r

    u

    t

    h

  • Minimum Expected Loss

    Regions are chosen to minimize

  • Reject Option

  • Why Separate Inference and Decision?

    Minimizing risk (loss matrix may change over time) Reject option Unbalanced class priors Combining models

  • Decision Theory for Regression

    Inference step

    Determine .

    Decision stepFor given x, make optimal prediction, y(x), for t.

    Loss function:

  • The Squared Loss Function

  • Generative vs Discriminative

    Generative approach:

    Model

    Use Bayes theorem

    Discriminative approach:

    Model directly

  • Information Theory

    Information Theory

    2013/3/19(Tue)

  • Entropy

    Important quantity in coding theory statistical physicsmachine learning

  • Entropy

    Coding theory: x discrete with 8 possible states; how many bits to transmit the state of x?

    All states equally likely

  • Entropy

  • Entropy

    In how many ways can N identical objects be allocated Mbins?

    Entropy maximized when

  • Entropy

  • Differential Entropy

    Put bins of width along the real line

    Differential entropy maximized (for fixed ) when

    in which case

  • Conditional Entropy

  • The Kullback-Leibler Divergence

  • Mutual Information

    prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

    prml-slides-1_2prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

    prml-slides-1_3-9prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

    prml-slides-1_10-17prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

    prml-slides-1_18-26prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

    prml-slides-1_27-38prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

    prml-slides-1_39prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

    prml-slides-1_40-41prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

    prml-slides-1_42-50prml_01IntroductionPolynomial Curve FittingProbability TheoryModel SelectionCurse of DimensionalityDecision TheoryInformation Theory

    prml-slides-1_51-59