Ch11.1-BasicSampling

8/8/2019 Ch11.1-BasicSampling

1/26

Machine Learning Srihari

1

Basic Sampling Methods

Sargur Srihari

[email protected]


2/26


2

Topics

1. Motivation2. Ancestral Sampling3. Basic Sampling Algorithms4. Rejection Sampling5. Importance Sampling6. Sampling-Importance-Resampling


3/26


3

1. Motivation

When exact inference is intractable, we needsome form of approximation

True of probabilistic models of practical significance Inference methods based on numerical sampling

are known as Monte Carlo techniques

Most situations will require evaluatingexpectations of unobserved variables, e.g., to

make predictions

Rather than the posterior distribution


4/26


4

Task

Find expectationE[f] of some functionf(z) wrtdistributionp(z) Components ofz can be discrete, continuous or combination Function can be z, z2, etc

We wish to evaluate

Assume it is too complex to be evaluated analytically E.g., Mixture of Gaussians

Note: EM with GMM is for clustering, Our current interest isinference

In discrete case, integral replaces by summation


5/26


6/26


6

2. Ancestral Sampling

If joint distribution is represented by a directed graphwith no observed variables

a straightforward method exists Distribution is specified by

where ziare set of variables associated with node i and pai are set of variables associated with node parents of node i

To obtain samples from joint we make one pass through set of variables in orderz1,..zM

sampling from conditional distributionp(z|pai)

After one pass through the graph we obtain one sample


7/26


7

Logic Sampling

Directed graph where some nodes areinstantiated with observed values

Use ancestral sampling, except

When sample is obtained for an observed value, ifthey agree then sample value is retained and proceedto next variable

If they dont agree, whole sample is discarded Samples correctly from posterior distribution

However probability of accepting sample decreasesas no of variables increase and number of states thatvariables can take increases

This is a special case ofImportance Sampling Rarely used in practice


8/26


8

Undirected Graphs

No one-pass sampling strategy even forcase of no observed variables

Computationally expensive methods suchas Gibbs sampling must be used


9/26


9

3. Basic Sampling Algorithms

Strategies for generating samples from a givenstandard distribution, e.g., Gaussian

Assume that we have a pseudo-randomgenerator foruniform distribution over (0,1)

For standard distributions we can transformuniformly distributed samples into desired

distributions


10/26


10

Transforming Uniform to Standard Distribution

Ifzis uniformly distributed over(0,1), i.e.,p(z) =1 in the interval If we transform values ofzusingf() such thaty=f(z) Distribution ofy is governed by Goal is to choosef(z) such that values ofy have distributionp(y) Integrating (1) above

which is an indefinite integral ofp(y) Thusy = h-1(z) So we have to transform uniformly distributed random numbers

using a function which is the inverse of the indefinite integral of thedistribution

0 1z yf(z)

Sincep(z)=1and integral

ofdz/dy wrtyisz


11/26


11

Geometry of Transformation Generating non-uniform random variables

h(y) is indefinite integral of desiredp(y)z~U(0,1) is transformed usingy = h-1(z) Results iny being distributed asp(y)


12/26


12

Transformation for Exponential

Exponential Distribution

where 0


13/26


13

Transformation for Cauchy

Cauchy Distribution Inverse of the integral can be expressed as

a tan function


14/26


14

Generalization: Multivariate and Gaussian

Box-Muller for Gaussian Example of a bivariate Gaussian Generate pairs of uniformly distributed

random numbersz1,z2 (-1,1)

Can be done from U(0,1) usingz2z-1 Discard each pair unless z12+z22


15/26


15

Generating a Gaussian

For each pairz1,z2 evaluate the quantities

Theny1andy2are independent Gaussians withzero mean and variance

Ify ~ N(0,1) then y + hasN(,2) In multivariate case

If components are independent andN(0,1) then y=+Lzwill haveN(,)

where = LLT, is called Cholesky decomposition


16/26


16

4. Rejection Sampling

Transformation method depends on ability tocalculate and then invert indefinite integral

Method feasible only for some standarddistributions

More general strategy is needed Rejection sampling and importance sampling are

limited to univariate distributions

Although not applicable to complex problems, theyare important components in more general strategies Allows sampling from complex distributions


17/26


17

Rejection Sampling Method Wish to sample from distributionp(z) Suppose we are able to easily evaluatep(z) for any

given value ofz

Samples are drawn from simple distribution, calledproposal distribution q(z) Introduce constant kwhose value is such that kq(z)

>p(z)for allz

Called comparison function


18/26


18

Rejection Sampling Intuition Samples are drawn from

simple distribution q(z)

Rejected if they fall in greyarea

Between un-normalizeddistributionp~(z) and scaleddistribution kq(z)

Resulting samples aredistributed according top(z)which is the normalized

version ofp~(z)


19/26


19

How to determine if sample is in

shaded region? Each step involves generating two randomnumbers

z0 from q(z) and u0 from uniform distribution [0,kq(z0)]

This pair has uniform distribution under the curve offunction kq(z)

Ifu0 > p(z0) the pair is rejected otherwise it isretained

Remaining pairs have a uniform distribution underthe curve ofp(z) and hence the corresponding zvalues are distributed according top(z) as desired


20/26


20

Rejection Sampling from Gamma

Task of sampling from Gamma

Since Gamma is roughly bell-shaped, proposal distribution isCauchy

Cauchy has to be slightlygeneralized To ensure it is nowhere smaller

than Gamma


21/26


21

Adaptive Rejection Sampling When difficult to find suitable

analytic distributionp(z)

Straight-forward whenp(z) islog concave When ln p(z) has derivatives that

are non-increasing functions ofz

Function ln p(z) and its gradientare evaluated at initial set of grid

points

Intersections are used toconstruct envelope

A sequence of linear functions


22/26


22

Dimensionality and Rejection Sampling

Gaussian example Acceptance rate is

ratio of volumesunderp(z) and

kq(z)

diminishesexponentially withdimensionality

Proposal distributionq(z) is Gaussian

Scaled version iskq(z)

True distributionp(z)


23/26


23

5. Importance Sampling Evaluating expectation

off(z) with respect todistributionf(z) fromwhich it is difficult todraw samples directly

Samples {z(l)} are drawnfrom simpler distribution

q(z) Terms in summation

are weighted by ratios

p( z(l) ) /q( z(l) )

Proposal distribution


24/26


24

6. Sampling Importance Re-sampling (SIR)

Rejection sampling depends onsuitable value ofk For many pairs of distributionsp(z)

and q(z) it is impractical to

determine value ofk If it issufficiently large to

guarantee a bound thenimpractically small acceptancerates

Method makes use of samplingdistribution q(z) but avoidshaving to determine k

Proposal q(z) is Gaussian.

Scaled version is kq(z)

True distributionp(z)

Rejection Sampling


25/26


25

SIR Method

Two stages Stage 1:L samples z(1),..,z(L) are drawn

from q(z)

Stage 2: Weights w1,..,wL are constructedAs in importance sampling

Finally a second set ofL samples aredrawn from the discrete distribution{z(1),..,z(L)} with probabilities given by{w1,..,wL}


26/26


Next Topic

Markov Chain Monte Carlo (MCMC)Does not have limitation of Rejection sampling

and Importance Sampling in High

Dimensional Spaces

26

Documents

Ch11.1-BasicSampling