8/8/2019 Ch11.1-BasicSampling
1/26
Machine Learning Srihari
1
Basic Sampling Methods
Sargur Srihari
8/8/2019 Ch11.1-BasicSampling
2/26
Machine Learning Srihari
2
Topics
1. Motivation2. Ancestral Sampling3. Basic Sampling Algorithms4. Rejection Sampling5. Importance Sampling6. Sampling-Importance-Resampling
8/8/2019 Ch11.1-BasicSampling
3/26
Machine Learning Srihari
3
1. Motivation
When exact inference is intractable, we needsome form of approximation
True of probabilistic models of practical significance Inference methods based on numerical sampling
are known as Monte Carlo techniques
Most situations will require evaluatingexpectations of unobserved variables, e.g., to
make predictions
Rather than the posterior distribution
8/8/2019 Ch11.1-BasicSampling
4/26
Machine Learning Srihari
4
Task
Find expectationE[f] of some functionf(z) wrtdistributionp(z) Components ofz can be discrete, continuous or combination Function can be z, z2, etc
We wish to evaluate
Assume it is too complex to be evaluated analytically E.g., Mixture of Gaussians
Note: EM with GMM is for clustering, Our current interest isinference
In discrete case, integral replaces by summation
8/8/2019 Ch11.1-BasicSampling
5/26
8/8/2019 Ch11.1-BasicSampling
6/26
Machine Learning Srihari
6
2. Ancestral Sampling
If joint distribution is represented by a directed graphwith no observed variables
a straightforward method exists Distribution is specified by
where ziare set of variables associated with node i and pai are set of variables associated with node parents of node i
To obtain samples from joint we make one pass through set of variables in orderz1,..zM
sampling from conditional distributionp(z|pai)
After one pass through the graph we obtain one sample
8/8/2019 Ch11.1-BasicSampling
7/26
Machine Learning Srihari
7
Logic Sampling
Directed graph where some nodes areinstantiated with observed values
Use ancestral sampling, except
When sample is obtained for an observed value, ifthey agree then sample value is retained and proceedto next variable
If they dont agree, whole sample is discarded Samples correctly from posterior distribution
However probability of accepting sample decreasesas no of variables increase and number of states thatvariables can take increases
This is a special case ofImportance Sampling Rarely used in practice
8/8/2019 Ch11.1-BasicSampling
8/26
Machine Learning Srihari
8
Undirected Graphs
No one-pass sampling strategy even forcase of no observed variables
Computationally expensive methods suchas Gibbs sampling must be used
8/8/2019 Ch11.1-BasicSampling
9/26
Machine Learning Srihari
9
3. Basic Sampling Algorithms
Strategies for generating samples from a givenstandard distribution, e.g., Gaussian
Assume that we have a pseudo-randomgenerator foruniform distribution over (0,1)
For standard distributions we can transformuniformly distributed samples into desired
distributions
8/8/2019 Ch11.1-BasicSampling
10/26
Machine Learning Srihari
10
Transforming Uniform to Standard Distribution
Ifzis uniformly distributed over(0,1), i.e.,p(z) =1 in the interval If we transform values ofzusingf() such thaty=f(z) Distribution ofy is governed by Goal is to choosef(z) such that values ofy have distributionp(y) Integrating (1) above
which is an indefinite integral ofp(y) Thusy = h-1(z) So we have to transform uniformly distributed random numbers
using a function which is the inverse of the indefinite integral of thedistribution
0 1z yf(z)
Sincep(z)=1and integral
ofdz/dy wrtyisz
8/8/2019 Ch11.1-BasicSampling
11/26
Machine Learning Srihari
11
Geometry of Transformation Generating non-uniform random variables
h(y) is indefinite integral of desiredp(y)z~U(0,1) is transformed usingy = h-1(z) Results iny being distributed asp(y)
8/8/2019 Ch11.1-BasicSampling
12/26
Machine Learning Srihari
12
Transformation for Exponential
Exponential Distribution
where 0
8/8/2019 Ch11.1-BasicSampling
13/26
Machine Learning Srihari
13
Transformation for Cauchy
Cauchy Distribution Inverse of the integral can be expressed as
a tan function
8/8/2019 Ch11.1-BasicSampling
14/26
Machine Learning Srihari
14
Generalization: Multivariate and Gaussian
Box-Muller for Gaussian Example of a bivariate Gaussian Generate pairs of uniformly distributed
random numbersz1,z2 (-1,1)
Can be done from U(0,1) usingz2z-1 Discard each pair unless z12+z22
8/8/2019 Ch11.1-BasicSampling
15/26
Machine Learning Srihari
15
Generating a Gaussian
For each pairz1,z2 evaluate the quantities
Theny1andy2are independent Gaussians withzero mean and variance
Ify ~ N(0,1) then y + hasN(,2) In multivariate case
If components are independent andN(0,1) then y=+Lzwill haveN(,)
where = LLT, is called Cholesky decomposition
8/8/2019 Ch11.1-BasicSampling
16/26
Machine Learning Srihari
16
4. Rejection Sampling
Transformation method depends on ability tocalculate and then invert indefinite integral
Method feasible only for some standarddistributions
More general strategy is needed Rejection sampling and importance sampling are
limited to univariate distributions
Although not applicable to complex problems, theyare important components in more general strategies Allows sampling from complex distributions
8/8/2019 Ch11.1-BasicSampling
17/26
Machine Learning Srihari
17
Rejection Sampling Method Wish to sample from distributionp(z) Suppose we are able to easily evaluatep(z) for any
given value ofz
Samples are drawn from simple distribution, calledproposal distribution q(z) Introduce constant kwhose value is such that kq(z)
>p(z)for allz
Called comparison function
8/8/2019 Ch11.1-BasicSampling
18/26
Machine Learning Srihari
18
Rejection Sampling Intuition Samples are drawn from
simple distribution q(z)
Rejected if they fall in greyarea
Between un-normalizeddistributionp~(z) and scaleddistribution kq(z)
Resulting samples aredistributed according top(z)which is the normalized
version ofp~(z)
8/8/2019 Ch11.1-BasicSampling
19/26
Machine Learning Srihari
19
How to determine if sample is in
shaded region? Each step involves generating two randomnumbers
z0 from q(z) and u0 from uniform distribution [0,kq(z0)]
This pair has uniform distribution under the curve offunction kq(z)
Ifu0 > p(z0) the pair is rejected otherwise it isretained
Remaining pairs have a uniform distribution underthe curve ofp(z) and hence the corresponding zvalues are distributed according top(z) as desired
8/8/2019 Ch11.1-BasicSampling
20/26
Machine Learning Srihari
20
Rejection Sampling from Gamma
Task of sampling from Gamma
Since Gamma is roughly bell-shaped, proposal distribution isCauchy
Cauchy has to be slightlygeneralized To ensure it is nowhere smaller
than Gamma
8/8/2019 Ch11.1-BasicSampling
21/26
Machine Learning Srihari
21
Adaptive Rejection Sampling When difficult to find suitable
analytic distributionp(z)
Straight-forward whenp(z) islog concave When ln p(z) has derivatives that
are non-increasing functions ofz
Function ln p(z) and its gradientare evaluated at initial set of grid
points
Intersections are used toconstruct envelope
A sequence of linear functions
8/8/2019 Ch11.1-BasicSampling
22/26
Machine Learning Srihari
22
Dimensionality and Rejection Sampling
Gaussian example Acceptance rate is
ratio of volumesunderp(z) and
kq(z)
diminishesexponentially withdimensionality
Proposal distributionq(z) is Gaussian
Scaled version iskq(z)
True distributionp(z)
8/8/2019 Ch11.1-BasicSampling
23/26
Machine Learning Srihari
23
5. Importance Sampling Evaluating expectation
off(z) with respect todistributionf(z) fromwhich it is difficult todraw samples directly
Samples {z(l)} are drawnfrom simpler distribution
q(z) Terms in summation
are weighted by ratios
p( z(l) ) /q( z(l) )
Proposal distribution
8/8/2019 Ch11.1-BasicSampling
24/26
Machine Learning Srihari
24
6. Sampling Importance Re-sampling (SIR)
Rejection sampling depends onsuitable value ofk For many pairs of distributionsp(z)
and q(z) it is impractical to
determine value ofk If it issufficiently large to
guarantee a bound thenimpractically small acceptancerates
Method makes use of samplingdistribution q(z) but avoidshaving to determine k
Proposal q(z) is Gaussian.
Scaled version is kq(z)
True distributionp(z)
Rejection Sampling
8/8/2019 Ch11.1-BasicSampling
25/26
Machine Learning Srihari
25
SIR Method
Two stages Stage 1:L samples z(1),..,z(L) are drawn
from q(z)
Stage 2: Weights w1,..,wL are constructedAs in importance sampling
Finally a second set ofL samples aredrawn from the discrete distribution{z(1),..,z(L)} with probabilities given by{w1,..,wL}
8/8/2019 Ch11.1-BasicSampling
26/26
Machine Learning Srihari
Next Topic
Markov Chain Monte Carlo (MCMC)Does not have limitation of Rejection sampling
and Importance Sampling in High
Dimensional Spaces
26
Recommended