Statistical Learning: Bayesian and ML COMP155 Sections 20.1-20.2 May 2, 2007

Statistical Learning:Bayesian and ML

COMP155

Sections 20.1-20.2May 2, 2007

Definitions• a posteriori: derived from observed facts

• a priori: based on hypothesis or theory rather than experiment

Bayesian Learning• Make predictions using all hypotheses,

weighted by their probabilities• Bayes’ rule: P(a | b) = α P(b | a) P(a)

• For each hypothesis hi, observed data d:

• P(hi | d) = α P(d | hi) P(hi)

• P(d | hi) is the likelihood of d under hypothesis hi

• P(hi) is the hypothesis prior

α is a normalization constant = 1 / ∑i P(d | hi) P(hi)

Bayesian Learning• We want to predict some quantity X:

• The predictions are weighted averages over the predictions of the individual hypotheses

Example• Suppose we know that there are 5 kinds of bags

of candy:

cherry lime % of all bags

Type 1 100% 10%

Type 2 75% 25% 20%

Type 3 50% 50% 40%

Type 4 25% 75% 20%

Type 5 100% 10%

Example: priors• Given a new bag of candy,

predict the type of the bag:

• Five hypotheses:• h1: bag is type 1, P(h1) = .1

• h2: bag is type 2, P(h2) = .2

• h3: bag is type 3, P(h3) = .4

• h4: bag is type 4, P(h4) = .2

• h5: bag is type 5, P(h5) = .1

With no evidence, we use the hypothesis priors

Example: one lime candy• Suppose we unwrap one candy and

determine that it is lime.• P(h1 | onelime) = α P(onlime | h1)P(h1)

= 0.5*(0 * 0.1) = 0

• P(h2 | onelime) = α P(onlime | h2)P(h2) = 0.5*(0.25 * 0.2) = 0.1

Example: two lime candies• Suppose we unwrap another candy and it

is also lime.• P(h1 | twolime) = α P(twolime | h1)P(h1)

= 0.33*(0 * 0.1) = 0

• P(h2 | twolime) = α P(twolime | h2)P(h2) = 0.33*(0.0625 * 0.2) = 0.05

Example: n lime candies• Suppose we unwrap n candies and they

are all lime.• P(h1 | nlime) = αn (0n * 0.1)

• P(h2 | nlime) = αn (0.25n * 0.2)

• P(h3 | nlime) = αn (0.5n * 0.4)

• P(h4 | nlime) = αn (0.75n * 0.2)

• P(h5 | nlime) = αn (1n * 0.1)

Prediction: what candy is next?• P(nextlime | nlime) =

0 * αn (0n * 0.1) + 0.25 * αn (0.25n * 0.2) + 0.5 * αn (0.5n * 0.4) + 0.75 * αn (0.75n * 0.2) + 1 * αn (1n * 0.1)

Analysis: Bayesian Prediction• The true hypothesis eventually dominates

• The posterior probability of any false hypothesis will eventually dominate

• Probability of uncharacteristic data will become vanishingly small

• Bayesian prediction is optimal

• Bayesian prediction is expensive• Hypothesis space may be very large (or

infinite)

MAP Approximation• To avoid expense of Bayesian learning,

one approach is to simply chose the most probable hypothesis and assume it is correct• MAP = maximum a posteriori

• hmap = hi with highest value for P(hi | d)

• In candy example, after 3 limes have been selected a MAP learner will always predict next candy is lime with 100% probability• Less accurate, but much cheaper

Avoiding Complexity• As we’ve seen earlier, allowing overly

complex hypotheses can lead to overfitting

• Bayesian and MAP learning use the hypothesis prior to penalize complex hypotheses• Complex hypotheses typically have lower

priors – since there are typically more complex hypotheses

• We get the simplest hypothesis consistent with the data (as per Ockham’s razor)

ML Approximation• For large data sets, the priors become

irrelevant, in this case we may use maximum likelihood (ML) learning• Choose hml that maximizes P(d | hi)

• Choose the hypothesis that has the highest probability of causing the observed data

• identical to MAP for uniform priors

• ML is the standard (non-Bayesian) statistical learning method

Exercise• Suppose we were pulling candy from a

50/50 bag (type 3) or a 25/75 bag (type 4)

• With full Bayesian learning, what would the posterior probability and prediction plots look like after 100 candies?

• What would prediction plots look like for MAP and ML learning after 1000 candies?

Bayesian 50/50 bag

Bayesian 25/75 bag

MAP 50/50 bag

ML 50/50 bag

MAP 25/75 bag

ML 25/75 bag

Exercise

Answer

Statistical Learning: Bayesian and ML COMP155 Sections 20.1-20.2 May 2, 2007

Documents

Para além do capital - Rede de Gestoresrededegestoresecosol.org.br/wp-content/uploads/2015/11/...20.1 Mito e realidade do mercado 899 20.2 Para além do capital: o objetivo real da

Bayesian Regression & Classiﬁcation · Bayesian Regression & Classiﬁcation learning as inference, Bayesian Kernel Ridge regression & Gaussian Processes, Bayesian Kernel Logistic

Gebruikshandleiding · 2019. 3. 6. · Märklin signalen reeks 763xx programmeren 20. De ECoSniffer 20.1. Snifferadressen voor locomotieven 20.2. Tips voor omgang met ECoSniffer 20.3

iase-web.org · YEAR 2015 17.8% 20.2% 20.1% 22.8% 8.8% 10.2% ¥rts æeas education 35.3% 31.2% 28.4% 28.2% 26.5% 37.8% 38.4% 41.0% 38.3% raph n03 The number of people who declare

le performance dei progetti costituendo un supporto per il ... · portfolio management 20.1. Premessa 20.2. Applicabilità 20.3. Conclusioni 21.Perché un modello di maturità 21.1

deWageningse Methode Inhoudsopgave 1 20 CoördinatenPilot 3 20.1 Intro 4 20.2 Dewereldinkaart 6 20.3 Hetplattevlak 8 20.4 Rechtelijnen 14 20.5 Afstanden 17 20.6 Deruimtein 18

CAJA COSTARRICENSE DE SEGURO SOCIAL€¦ · F 20.0 Esquizofrenia paranoide. F 20.1 Esquizofrenia hebefrénica. F 20.2 Esquizofrenia catatónica. F 20.3 Esquizofrenia indiferenciada

PLAN ESPECIAL DE ORDENACIÓN URBANA REFERENTE A LAS ...€¦ · PLAN ESPECIAL DE ORDENACIÓN URBANA REFERENTE A LAS PARCELAS “b.20.1” Y “b.20.2, Y ESPACIOS COLINDANTES, DEL

Bayesian Decision and Bayesian Learning

FORMAÇÃO DE UM CLUSTER EM TORNO DO TURISMO · 20.1 Perspectivas para o Desenvolvimento do Turismo de Natureza Sustentável 20.2 Principais Desafios 20.2.1.Nível Meta ... Econômica

ОФІЦІЙНІ ВОЛЕЙБОЛЬНІ ПРАВИЛА 2015-2016...20.1 СПОРТИВНА ПОВЕДІНКА 49 20.2 ЧЕСНА ГРА 49 21 НЕПРАВИЛЬНА ПОВЕДІНКА

REGULAMENTO DE EXPOSIÇÕES CBKC · 2017-08-23 · 20.1. Para efeitos do disposto neste regulamento, entende-se como recinto da exposição toda a área reservada para o evento; 20.2

Empreendedorismo - EFIVESTAula 20 – Empreendedorismo: perspectivas e tendências 115 20.1 Introdução 115 20.2 Perspectivas e tendências 115 Referências 119 Atividades autoinstrutivas

Bayesian Ranking

LABORATORIJSKA OPREMA - sudacka-mreza.hr · mikser 20.1. mikser 300 g;62a610/1963 20.2. mikser 7kg/1978 21. termostatska kupelj 10 - r - 12 22. pipeta automatic (2 kom) 23. ureĐaj

puertobahiablanca.com · 20.SOLDADURA Y OXICORTE 33 20.1. Soldadura eléctrica. 20.2. ... • Dejar aberturas descubiertas en pisos o plataformas sin las señalizaciones adecuadas

Bayesian analysis

Experimenta 20.2

Bayesian AI Tutorial - worldcolleges.infoNicholson & Korb 2 Schedule 9.30 Welcome 9.35 Bayesian AI Introduction to Bayesian networks Reasoning with Bayesian networks 11.00 Morning

Bayesian Networks