Introduction to Bayesian Methods

Introduction to Bayesian Methods Theory, Computation, Inference and Prediction

Corey ChiversPhD CandidateDepartment of BiologyMcGill University

Script to run examples in these slides can be found here:

bit.ly/Wnmb2W

These slides are here:

bit.ly/P9Xa9G

Corey Chivers, 2012

The Likelihood Principle

L |x ∝P X= x |

● All information contained in data x, with respect to inference about the value of θ, is contained in the likelihood function:

Corey Chivers, 2012

The Likelihood Principle

L.J. Savage R.A. Fisher

Corey Chivers, 2012

The Likelihood Function

L |x ∝P X= x |

Where θ is(are) our parameter(s) of interestex:

Attack rate

Fitness

Mean body mass

Mortality

etc...

L |x = f | x

Corey Chivers, 2012

The Ecologist's Quarter Lands tails (caribou up) 60% of the time

Corey Chivers, 2012

The Ecologist's Quarter

● 1) What is the probability that I will flip tails, given that I am flipping an ecologist's quarter (p(tail=0.6))?

● 2) What is the likelihood that I am flipping an ecologist's quarter, given the flip(s) that I have observed?

Px |=0.6

L=0.6 | x

Corey Chivers, 2012

Lands tails (caribou up) 60% of the time

L |x =∏t=1

∏h=1

L=0.6 | x=H T T H T

=∏t=1

0.6∏h=1

=0.03456

Corey Chivers, 2012

L |x =∏t=1

∏h=1

L=0.6 | x=H T T H T

=∏t=1

0.6∏h=1

=0.03456

But what does this mean? 0.03456 ≠ P(θ|x) !!!!

Corey Chivers, 2012

How do we ask Statistical Questions?

A Frequentist asks: What is the probability of having observed data at least as extreme as my data if the null hypothesis is true?

P(data | H0) ? ← note: P=1 does not mean P(H

A Bayesian asks: What is the probability of hypotheses given that I have observed my data?

P(H | data) ? ← note: here H denotes the space of all possible hypotheses

Corey Chivers, 2012

P(data | H0) P(H | data)

But we both want to makeinferences about our hypotheses,not the data.

Corey Chivers, 2012

Bayes Theorem

P | x=P x |P

● The posterior probability of θ, given our observation (x) is proportional to the likelihood times the prior probability of θ.

Corey Chivers, 2012

The Ecologist's Quarter Redux

Corey Chivers, 2012

Lands tails (caribou up) 60% of the time

L |x =∏t=1

∏h=1

L=0.6 | x=H T T H T

=∏t=1

0.6∏h=1

=0.03456

Corey Chivers, 2012

P(x |θ)

P(θ | x )But we want to know

Likelihood of data given hypothesis

● How can we make inferences about our ecologist's quarter using Bayes?

P(θ | x )=P( x |θ)P(θ)

Corey Chivers, 2012

P | x=P x |P

Likelihood

Corey Chivers, 2012

P(θ | x )=P( x |θ)P(θ)

Likelihood Prior

Corey Chivers, 2012

P | x=P x |P

Likelihood Prior

Posterior

Corey Chivers, 2012

P | x=P x |P

Likelihood Prior

Posterior

P x =∫P x |P d

Not always a closed form solution possible!!

Corey Chivers, 2012

Randomization to Solve Difficult Problems

Feynman, Ulam &Von Neumann

∫ f d

Corey Chivers, 2012

(1,0 )

(0 ,1)

(0 .5 ,0 )

Monte Carlo

Throw darts at random

P(blue) = ?

P(blue) = 1/2

P(blue) ~ 7/15 ~ 1/2

Feynman, Ulam &Von Neumann

Corey Chivers, 2012

Your turn...

Let's use Monte Carlo to estimate π

- Generate random x and y values using the number sheet

- Plot those points on your graph

How many of the points fallwithin the circle?

Your turn...

Estimate π using the formula:

≈4 # in circle / total

Now using a more powerful computer!

Posterior Integration via Markov Chain Monte Carlo

A Markov Chain is a mathematical construct where given the present, the past and the future are independent.

“Where I decide to go next depends not on where I have been, or where I may go in the future – but only on where I am right now.”

-Andrey Markov (maybe)

Corey Chivers, 2012

Metropolis-Hastings Algorithm

The Markovian Explorer!1. Pick a starting location at random.

2. Choose a new location in your vicinity.

3. Go to the new location with probability:

4. Otherwise stay where you are.

5. Repeat.

p=min 1, x proposal

xcurrent

Corey Chivers, 2012

MCMC in Action!

Corey Chivers, 2012

● We've solved our integration problem!

P | x=P x |P

P | x∝Px | P

Corey Chivers, 2012

Ex: Bayesian Regression

● Regression coefficients are traditionally estimated via maximum likelihood.

● To obtain full posterior distributions, we can view the regression problem from a Bayesian perspective.

Corey Chivers, 2012

##@ 2.1 @##

Corey Chivers, 2012

Example: Salmon Regression

Y=a+bX+ϵ

ϵ ~ Normal(0,σ)

a ~Normal (0,100)

b ~Normal (0,100)

σ ~gamma (1,1/100)

Model Priors

Corey Chivers, 2012

P(a ,b ,σ |X ,Y )∝P(X ,Y |a ,b ,σ)

P(a)P(b)P(σ)

Corey Chivers, 2012

P(X ,Y |a ,b ,σ)=∏i=1

N ( y i ,μ=a+b x i , sd=σ)

Likelihood of the data (x,y), given the parameters (a,b,σ):

Corey Chivers, 2012

##@ 2.5 @##>## Print the Bayesian Credible Intervals> BCI(mcmc_salmon)

0.025 0.975 post_meana -13.16485 14.84092 0.9762583b 0.127730 0.455046 0.2911597Sigma 1.736082 3.186122 2.3303188

Inference:

Does body length have an effect on egg mass?EM=ab BL

Corey Chivers, 2012

The Prior revisited● What if we do have prior information?

● You have done a literature search and find that a previous study on the same salmon population found a slope of 0.6mg/cm (SE=0.1), and an intercept of -3.1mg (SE=1.2).How does this prior information change your analysis?

Corey Chivers, 2012

EM=ab BL

~ Normal 0,

a ~Normal (−3.1,1 .2)

b ~Normal (0.6,0 .1)

~ gamma1,1 /100

ModelInformative

Priors

Corey Chivers, 2012

If you can formulate the likelihood function, you can estimate the posterior, and we have a coherent way to incorporate prior information.

Corey Chivers, 2012

Most experiments do happen in a vacuum.

Making predictions using point estimates can

be a dangerous endeavor – using the posterior (aka predictive) distribution allows us to take full account of uncertainty.

Corey Chivers, 2012

How sure are we about our predictions?

Aleatory Stochasticity, randomness

Epistemic Incomplete knowledge

##@ 3.1 @##

● Suppose you have a 90cm long individual salmon, what do you predict to be the egg mass produced by this individual?

● What is the posterior probability that the egg mass produced will be greater than 35mg?

Corey Chivers, 2012

P(EM>35mg | θ)

Corey Chivers, 2012

Clark (2005)

Extensions:

Extensions:● By quantifying our uncertainty through

integration of the posterior distribution, we can make better informed decisions.

● Bayesian analysis provides the basis for decision theory.

● Bayesian analysis allows us to construct hierarchical models of arbitrary complexity.

Corey Chivers, 2012

Summary● The output of a Bayesian analysis is not a single estimate of θ, but rather the entire posterior distribution., which represents our degree of belief about the value of θ.

● To get a posterior distribution, we need to specify our prior belief about θ.

● Complex Bayesian models can be estimated using MCMC.

● The posterior can be used to make both inference about θ, and quantitative predictions with proper accounting of uncertainty.

Corey Chivers, 2012

Questions for Corey

● You can email me! Corey.chivers@mail.mcgill.ca

● I blog about statistics:

bayesianbiologist.com

● I tweet about statistics:

@cjbayesian

Resources● Bayesian Updating using Gibbs Sampling

● Just Another Gibbs Sampler

● Chi-squared example, done Bayesian:

http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/

http://madere.biol.mcgill.ca/cchivers/biol373/chi-squared_done_bayesian.pdf

http://www-ice.iarc.fr/~martyn/software/jags/

Corey Chivers, 2012

Introduction to Bayesian Methods

Education

Introduction : Methods of characterization of

Nonparametric Bayesian Methods 1 What is …larry/=sml/nonparbayes.pdfNonparametric Bayesian Methods 1 What is Nonparametric Bayes? In parametric Bayesian inference we have a model

Introduction Yongsik Lee. Classification of Analytical Methods ► Classical methods ► Instrumental methods

Decision Theory and Bayesian Methods - people.stat.sfu.capeople.stat.sfu.ca/~lockhart/richard/801/04_1/lectures/decision_theory/ohd.pdf · Decision Theory and Bayesian Methods Example:

Lecture 17 – Part 1 Bayesian Econometrics 1 Lecture 17 – Part 1 Bayesian Econometrics Bayesian Econometrics: Introduction • Idea: We are not estimating a parameter value,

Introduction to Bayesian Inference - Nikhef · PDF file1 Introduction The Frequentist and Bayesian approaches to statistics diﬀer in the deﬁnition of prob-ability. For a Frequentist,

Bayesian Methods Python

Introduction to Variational Methods and Finite Elements

What's Agile ? Introduction to Agile methods

Bayesian AI Tutorial - worldcolleges.infoNicholson & Korb 2 Schedule 9.30 Welcome 9.35 Bayesian AI Introduction to Bayesian networks Reasoning with Bayesian networks 11.00 Morning

ABC Methods for Bayesian Model Choice

Constraint-Based Learning Bayesian Networks Using Bayes …...Keywords: Bayesian networks · Conditional independence test · Jeﬀreys’ prior · Learning Bayesian networks 1 Introduction

Lecture 14: Bayesian inference and Monte Carlo methods ... · lecture14:bayesianinferenceand montecarlomethods STAT545:Intro.toComputationalStatistics VinayakRao PurdueUniversity

bayesian bayesian network

Bayesian Methods for Historical Linguistics

lecture 12: bayesian inference and monte carlo methods · lecture 12: bayesian inference and monte carlo methods STAT545:Intro.toComputationalStatistics VinayakRao PurdueUniversity

Introduction to Machine Learning Multivariate Methods

Bayesian methods for parameter estimation and model comparison · Bayesian methods for parameter estimation and model comparison Carson C Chow, LBM, NIDDK, NIH Monday, April 26, 2010

P01 introduction cvpr2012 deep learning methods for vision

Inner Product Spaces for Bayesian Networksjmlr.csail.mit.edu/papers/volume6/nakamura05a/nakamura05a.pdf · algebraic structures within Bayesian networks such that known methods for