Ch11.1-BasicSampling

  • Upload
    ngodtri

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

  • 8/8/2019 Ch11.1-BasicSampling

    1/26

    Machine Learning Srihari

    1

    Basic Sampling Methods

    Sargur Srihari

    [email protected]

  • 8/8/2019 Ch11.1-BasicSampling

    2/26

    Machine Learning Srihari

    2

    Topics

    1. Motivation2. Ancestral Sampling3. Basic Sampling Algorithms4. Rejection Sampling5. Importance Sampling6. Sampling-Importance-Resampling

  • 8/8/2019 Ch11.1-BasicSampling

    3/26

    Machine Learning Srihari

    3

    1. Motivation

    When exact inference is intractable, we needsome form of approximation

    True of probabilistic models of practical significance Inference methods based on numerical sampling

    are known as Monte Carlo techniques

    Most situations will require evaluatingexpectations of unobserved variables, e.g., to

    make predictions

    Rather than the posterior distribution

  • 8/8/2019 Ch11.1-BasicSampling

    4/26

    Machine Learning Srihari

    4

    Task

    Find expectationE[f] of some functionf(z) wrtdistributionp(z) Components ofz can be discrete, continuous or combination Function can be z, z2, etc

    We wish to evaluate

    Assume it is too complex to be evaluated analytically E.g., Mixture of Gaussians

    Note: EM with GMM is for clustering, Our current interest isinference

    In discrete case, integral replaces by summation

  • 8/8/2019 Ch11.1-BasicSampling

    5/26

  • 8/8/2019 Ch11.1-BasicSampling

    6/26

    Machine Learning Srihari

    6

    2. Ancestral Sampling

    If joint distribution is represented by a directed graphwith no observed variables

    a straightforward method exists Distribution is specified by

    where ziare set of variables associated with node i and pai are set of variables associated with node parents of node i

    To obtain samples from joint we make one pass through set of variables in orderz1,..zM

    sampling from conditional distributionp(z|pai)

    After one pass through the graph we obtain one sample

  • 8/8/2019 Ch11.1-BasicSampling

    7/26

    Machine Learning Srihari

    7

    Logic Sampling

    Directed graph where some nodes areinstantiated with observed values

    Use ancestral sampling, except

    When sample is obtained for an observed value, ifthey agree then sample value is retained and proceedto next variable

    If they dont agree, whole sample is discarded Samples correctly from posterior distribution

    However probability of accepting sample decreasesas no of variables increase and number of states thatvariables can take increases

    This is a special case ofImportance Sampling Rarely used in practice

  • 8/8/2019 Ch11.1-BasicSampling

    8/26

    Machine Learning Srihari

    8

    Undirected Graphs

    No one-pass sampling strategy even forcase of no observed variables

    Computationally expensive methods suchas Gibbs sampling must be used

  • 8/8/2019 Ch11.1-BasicSampling

    9/26

    Machine Learning Srihari

    9

    3. Basic Sampling Algorithms

    Strategies for generating samples from a givenstandard distribution, e.g., Gaussian

    Assume that we have a pseudo-randomgenerator foruniform distribution over (0,1)

    For standard distributions we can transformuniformly distributed samples into desired

    distributions

  • 8/8/2019 Ch11.1-BasicSampling

    10/26

    Machine Learning Srihari

    10

    Transforming Uniform to Standard Distribution

    Ifzis uniformly distributed over(0,1), i.e.,p(z) =1 in the interval If we transform values ofzusingf() such thaty=f(z) Distribution ofy is governed by Goal is to choosef(z) such that values ofy have distributionp(y) Integrating (1) above

    which is an indefinite integral ofp(y) Thusy = h-1(z) So we have to transform uniformly distributed random numbers

    using a function which is the inverse of the indefinite integral of thedistribution

    0 1z yf(z)

    Sincep(z)=1and integral

    ofdz/dy wrtyisz

  • 8/8/2019 Ch11.1-BasicSampling

    11/26

    Machine Learning Srihari

    11

    Geometry of Transformation Generating non-uniform random variables

    h(y) is indefinite integral of desiredp(y)z~U(0,1) is transformed usingy = h-1(z) Results iny being distributed asp(y)

  • 8/8/2019 Ch11.1-BasicSampling

    12/26

    Machine Learning Srihari

    12

    Transformation for Exponential

    Exponential Distribution

    where 0

  • 8/8/2019 Ch11.1-BasicSampling

    13/26

    Machine Learning Srihari

    13

    Transformation for Cauchy

    Cauchy Distribution Inverse of the integral can be expressed as

    a tan function

  • 8/8/2019 Ch11.1-BasicSampling

    14/26

    Machine Learning Srihari

    14

    Generalization: Multivariate and Gaussian

    Box-Muller for Gaussian Example of a bivariate Gaussian Generate pairs of uniformly distributed

    random numbersz1,z2 (-1,1)

    Can be done from U(0,1) usingz2z-1 Discard each pair unless z12+z22

  • 8/8/2019 Ch11.1-BasicSampling

    15/26

    Machine Learning Srihari

    15

    Generating a Gaussian

    For each pairz1,z2 evaluate the quantities

    Theny1andy2are independent Gaussians withzero mean and variance

    Ify ~ N(0,1) then y + hasN(,2) In multivariate case

    If components are independent andN(0,1) then y=+Lzwill haveN(,)

    where = LLT, is called Cholesky decomposition

  • 8/8/2019 Ch11.1-BasicSampling

    16/26

    Machine Learning Srihari

    16

    4. Rejection Sampling

    Transformation method depends on ability tocalculate and then invert indefinite integral

    Method feasible only for some standarddistributions

    More general strategy is needed Rejection sampling and importance sampling are

    limited to univariate distributions

    Although not applicable to complex problems, theyare important components in more general strategies Allows sampling from complex distributions

  • 8/8/2019 Ch11.1-BasicSampling

    17/26

    Machine Learning Srihari

    17

    Rejection Sampling Method Wish to sample from distributionp(z) Suppose we are able to easily evaluatep(z) for any

    given value ofz

    Samples are drawn from simple distribution, calledproposal distribution q(z) Introduce constant kwhose value is such that kq(z)

    >p(z)for allz

    Called comparison function

  • 8/8/2019 Ch11.1-BasicSampling

    18/26

    Machine Learning Srihari

    18

    Rejection Sampling Intuition Samples are drawn from

    simple distribution q(z)

    Rejected if they fall in greyarea

    Between un-normalizeddistributionp~(z) and scaleddistribution kq(z)

    Resulting samples aredistributed according top(z)which is the normalized

    version ofp~(z)

  • 8/8/2019 Ch11.1-BasicSampling

    19/26

    Machine Learning Srihari

    19

    How to determine if sample is in

    shaded region? Each step involves generating two randomnumbers

    z0 from q(z) and u0 from uniform distribution [0,kq(z0)]

    This pair has uniform distribution under the curve offunction kq(z)

    Ifu0 > p(z0) the pair is rejected otherwise it isretained

    Remaining pairs have a uniform distribution underthe curve ofp(z) and hence the corresponding zvalues are distributed according top(z) as desired

  • 8/8/2019 Ch11.1-BasicSampling

    20/26

    Machine Learning Srihari

    20

    Rejection Sampling from Gamma

    Task of sampling from Gamma

    Since Gamma is roughly bell-shaped, proposal distribution isCauchy

    Cauchy has to be slightlygeneralized To ensure it is nowhere smaller

    than Gamma

  • 8/8/2019 Ch11.1-BasicSampling

    21/26

    Machine Learning Srihari

    21

    Adaptive Rejection Sampling When difficult to find suitable

    analytic distributionp(z)

    Straight-forward whenp(z) islog concave When ln p(z) has derivatives that

    are non-increasing functions ofz

    Function ln p(z) and its gradientare evaluated at initial set of grid

    points

    Intersections are used toconstruct envelope

    A sequence of linear functions

  • 8/8/2019 Ch11.1-BasicSampling

    22/26

    Machine Learning Srihari

    22

    Dimensionality and Rejection Sampling

    Gaussian example Acceptance rate is

    ratio of volumesunderp(z) and

    kq(z)

    diminishesexponentially withdimensionality

    Proposal distributionq(z) is Gaussian

    Scaled version iskq(z)

    True distributionp(z)

  • 8/8/2019 Ch11.1-BasicSampling

    23/26

    Machine Learning Srihari

    23

    5. Importance Sampling Evaluating expectation

    off(z) with respect todistributionf(z) fromwhich it is difficult todraw samples directly

    Samples {z(l)} are drawnfrom simpler distribution

    q(z) Terms in summation

    are weighted by ratios

    p( z(l) ) /q( z(l) )

    Proposal distribution

  • 8/8/2019 Ch11.1-BasicSampling

    24/26

    Machine Learning Srihari

    24

    6. Sampling Importance Re-sampling (SIR)

    Rejection sampling depends onsuitable value ofk For many pairs of distributionsp(z)

    and q(z) it is impractical to

    determine value ofk If it issufficiently large to

    guarantee a bound thenimpractically small acceptancerates

    Method makes use of samplingdistribution q(z) but avoidshaving to determine k

    Proposal q(z) is Gaussian.

    Scaled version is kq(z)

    True distributionp(z)

    Rejection Sampling

  • 8/8/2019 Ch11.1-BasicSampling

    25/26

    Machine Learning Srihari

    25

    SIR Method

    Two stages Stage 1:L samples z(1),..,z(L) are drawn

    from q(z)

    Stage 2: Weights w1,..,wL are constructedAs in importance sampling

    Finally a second set ofL samples aredrawn from the discrete distribution{z(1),..,z(L)} with probabilities given by{w1,..,wL}

  • 8/8/2019 Ch11.1-BasicSampling

    26/26

    Machine Learning Srihari

    Next Topic

    Markov Chain Monte Carlo (MCMC)Does not have limitation of Rejection sampling

    and Importance Sampling in High

    Dimensional Spaces

    26