95
Bayesian Decision Theory (Classification) 主主主 主主主

Bayesian Decision Theory (Classification) 主講人:虞台文

Embed Size (px)

Citation preview

Page 1: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Theory(Classification)

主講人:虞台文

Page 2: Bayesian Decision Theory (Classification) 主講人:虞台文

Contents

Introduction Generalize Bayesian Decision Rule Discriminant Functions The Normal Distribution Discriminant Functions for the Normal Popula

tions. Minimax Criterion Neyman-Pearson Criterion

Page 3: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Theory(Classification)

Introduction

Page 4: Bayesian Decision Theory (Classification) 主講人:虞台文

What is Bayesian Decision Theory?

Mathematical foundation for decision making.

Using probabilistic approach to help making decision (e.g., classification) so as to minimize the risk (cost).

Page 5: Bayesian Decision Theory (Classification) 主講人:虞台文

Preliminaries and Notations

:},,,{ 21 ci a state of nature

:)( iP prior probability

:x feature vector

:)|( ip x class-conditionaldensity

:)|( xiP posterior probability

Page 6: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Rule

)(

)()|()|(

x

xx

p

PpP ii

i

c

jii Ppp

1

)()|()( xx

Page 7: Bayesian Decision Theory (Classification) 主講人:虞台文

Decision

)(

)()|()|(

x

xx

p

PpP ii

i

)|(maxarg)( xx iPi

D )|(maxarg)( xx iPi

D

unimportant inmaking decision

unimportant inmaking decision

Page 8: Bayesian Decision Theory (Classification) 主講人:虞台文

Decision)(

)()|()|(

x

xx

p

PpP ii

i

( ) arg max ( | )i

iP

x xD( ) arg max ( | )i

iP

x xD

Decide i if P(i|x) > P(j|x) j i

Decide i if p(x|i)P(i) > p(x|j)P(j) j i

Special cases:1. P(1)=P(2)= =P(c)2. p(x|1)=p(x|2) = = p(x|c)

Page 9: Bayesian Decision Theory (Classification) 主講人:虞台文

Two Categories

Decide i if P(i|x) > P(j|x) j i

Decide i if p(x|i)P(i) > p(x|j)P(j) j i

Decide 1 if P(1|x) > P(2|x); otherwise decide 2

Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2

Special cases:1. P(1)=P(2)

Decide 1 if p(x|1) > p(x|2); otherwise decide 1

2. p(x|1)=p(x|2)Decide 1 if P(1) > P(2); otherwise decide 2

Page 10: Bayesian Decision Theory (Classification) 主講人:虞台文

Example

R2

P(1)=P(2)

R1

Special cases:1. P(1)=P(2)

Decide 1 if p(x|> p(x|2); otherwise decide 1

2. p(x|1)=p(x|2)Decide 1 if P(1) > P(2); otherwise decide 2

Special cases:1. P(1)=P(2)

Decide 1 if p(x|> p(x|2); otherwise decide 1

2. p(x|1)=p(x|2)Decide 1 if P(1) > P(2); otherwise decide 2

Page 11: Bayesian Decision Theory (Classification) 主講人:虞台文

Example

R1R1

R2

R2

P(1)=2/3P(2)=1/3

Decide 1 if p(x|1)P(1) > p(x|2)P(2); otherwise decide 2

Page 12: Bayesian Decision Theory (Classification) 主講人:虞台文

Classification Error

xx derrorperrorP ),()(

xxx dperrorP )()|(

Consider two categories:

21

12

decide weif)|(

decide weif)|()|(

x

xx

P

PerrorP

Decide 1 if P(1|x) > P(2|x); otherwise decide 2

)]|(),|(min[ 21 xx PP

Page 13: Bayesian Decision Theory (Classification) 主講人:虞台文

xxx dperrorP )()|(

Classification Error

xx derrorperrorP ),()(

Consider two categories:

21

12

decide weif)|(

decide weif)|()|(

x

xx

P

PerrorP

Decide 1 if P(1|x) > P(2|x); otherwise decide 2

)]|(),|(min[ 21 xx PP

Page 14: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Theory(Classification)

Generalized Bayesian Decision

Rule

Page 15: Bayesian Decision Theory (Classification) 主講人:虞台文

The Generation

:},,,{ 21 c a set of c states of nature

:},,,{ 21 a a set of a possible actions

:)|( jiij The loss incurred for taking action i when the true state of nature is j.

We want to minimize the expected loss in making decision.

Risk

can be zero.

Page 16: Bayesian Decision Theory (Classification) 主講人:虞台文

Conditional Risk

c

jjjii PR

1

)|()|()|( xx

c

jjij P

1

)|( x

Given x, the expected loss

(risk) associated with taking action

i.

Given x, the expected loss

(risk) associated with taking action

i.

Page 17: Bayesian Decision Theory (Classification) 主講人:虞台文

0/1 Loss Function

otherwise1

with assiciateddecision correct a is 0)|( ji

ji

c

jjjii PR

1

)|()|()|( xx

c

jjij P

1

)|( x

( | ) ( | )iR P error x x

Page 18: Bayesian Decision Theory (Classification) 主講人:虞台文

Decision

c

jjjii PR

1

)|()|()|( xx

c

jjij P

1

)|( x

)|(minarg)( xx iRi

)|(minarg)( xx iRi

Bayesian Decision Rule:

Page 19: Bayesian Decision Theory (Classification) 主講人:虞台文

Overall Risk

xxxx dpRR )()|)((Decision function

Bayesian decision rule:

the optimal one to minimize the overall riskIts resulting overall risk is called the Bayesian risk

)|(minarg)( xx iRi

)|(minarg)( xx iRi

Page 20: Bayesian Decision Theory (Classification) 主講人:虞台文

Two-Category Classification

},{ 21

},{ 21 A

ctio

n

State of Nature

1 2

1 11 12

2 21 22

Loss Function

)|()|()|( 2121111 xxx PPR

)|()|()|( 2221212 xxx PPR

Page 21: Bayesian Decision Theory (Classification) 主講人:虞台文

Two-Category Classification

)|()|()|( 2121111 xxx PPR

)|()|()|( 2221212 xxx PPR

Perform 1 if R(2|x) > R(1|x); otherwise perform 2

)|()|()|()|( 212111222121 xxxx PPPP

)|()()|()( 2221211121 xx PP

Page 22: Bayesian Decision Theory (Classification) 主講人:虞台文

Two-Category Classification

Perform 1 if R(2|x) > R(1|x); otherwise perform 2

)|()|()|()|( 212111222121 xxxx PPPP

positive

)|()()|()( 2221211121 xx PP

positive

Posterior probabilities are scaled before comparison.

Page 23: Bayesian Decision Theory (Classification) 主講人:虞台文

Two-Category Classification

)(

)()|()|(

x

xx

p

PpP ii

i

irrelevan

t

irrelevant

Perform 1 if R(2|x) > R(1|x); otherwise perform 2

)|()|()|()|( 212111222121 xxxx PPPP

)|()()|()( 2221211121 xx PP

)()|()()()|()( 222212111121 PpPp xx

)(

)(

)(

)(

)|(

)|(

1

2

1121

2212

2

1

P

P

p

p

x

x

Page 24: Bayesian Decision Theory (Classification) 主講人:虞台文

Two-Category Classification

)(

)(

)(

)(

)|(

)|(

1

2

1121

2212

2

1

P

P

p

p

x

xPerform 1 if

LikelihoodRatio

Threshold

This slide will be recalled later.This slide will be recalled later.

Page 25: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Theory(Classification)

Discriminant Functions

Page 26: Bayesian Decision Theory (Classification) 主講人:虞台文

The Multicategory Classification

g1(x)g1(x)

g2(x)g2(x)

gc(x)gc(x)

x Action(e.g., classification)

(x)

Assign x to i ifgi(x) > gj(x) for all j i.

gi(x)’s are called the discriminant functions.

How to define discriminant functions?

Page 27: Bayesian Decision Theory (Classification) 主講人:虞台文

Simple Discriminant Functions

)|()( xx ii Rg

)|()( xx ii Pg

Minimum Risk case:

Minimum Error-Rate case:

)()|()( iii Ppg xx

)(ln)|(ln)( iii Ppg xx

If f(. ) is a monotonically increasing function, than f(gi(. ) )’s are also be discriminant functions.

Page 28: Bayesian Decision Theory (Classification) 主講人:虞台文

Decision Regions

} )()(|{ ijgg jii xxxR

Two-category example

Decision regions are separated by decision boundaries.

Page 29: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Theory(Classification)

The Normal Distribution

Page 30: Bayesian Decision Theory (Classification) 主講人:虞台文

Basics of Probability

Discrete random variable (X) - Assume integer

Continuous random variable (X)

Probability mass function (pmf): )()( xXPxp

Cumulative distribution function (cdf):

x

t

tpxXPxF )()()(

Probability density function (pdf): )(or )( xfxp

Cumulative distribution function (cdf):

xdttpxXPxF )()()(

not a probability

Page 31: Bayesian Decision Theory (Classification) 主講人:虞台文

Expectations

continuous is )()(

discrete is )()()]([

Xdxxpxg

XxpxgXgE x

continuous is )()(

discrete is )()()]([

Xdxxpxg

XxpxgXgE x

Let g be a function of random variable X.

The kth moment ][ kXE

The kth central moments ])[( kXXE

The 1st moment ][XEX

Page 32: Bayesian Decision Theory (Classification) 主講人:虞台文

Important Expectations

Mean

continuous is )(

discrete is )(][

Xdxxxp

XxxpXE xX

Variance

continuous is )()(

discrete is )()(])[(][

2

2

22

Xdxxpx

XxpxXEXVar

X

xX

XX

Fact: 22 ])[(][][ XEXEXVar 22 ])[(][][ XEXEXVar

Page 33: Bayesian Decision Theory (Classification) 主講人:虞台文

Entropy

continuous is )(ln)(

discrete is )(ln)(][

Xdxxpxp

XxpxpXH x

continuous is )(ln)(

discrete is )(ln)(][

Xdxxpxp

XxpxpXH x

The entropy measures the fundamental uncertainty in the value of points selected randomly from a distribution.

Page 34: Bayesian Decision Theory (Classification) 主講人:虞台文

Univariate Gaussian Distribution

x

p(x)X~N(μ,σ2)

2

2

2

)(

2

1)(

x

exp2

2

2

)(

2

1)(

x

exp μ

σ σ

2σ 2σ

3σ 3σE[X] =μ

Var[X] =σ2

Properties:1. Maximize the entropy2. Central limit theorem

Page 35: Bayesian Decision Theory (Classification) 主講人:虞台文

Random Vectors

dR:XdR:XA d-dimensional

random vector

TdE ),,,(][ 21 XμVector Mean:

Covariance Matrix:

]))([( TE μXμXΣ

221

22221

11221

ddd

d

d

TdXXX ),,,( 21 X

Page 36: Bayesian Decision Theory (Classification) 主講人:虞台文

Multivariate Gaussian Distribution

X~N(μ,Σ)

)()(

2

1exp

||)2(

1)( 1

2/12/μxΣμx

Σx T

dp

)()(

2

1exp

||)2(

1)( 1

2/12/μxΣμx

Σx T

dp

E[X] =μ

E[(X-μ) (X-μ)T] =Σ

2

2

2

)(

2

1)(

x

exp2

2

2

)(

2

1)(

x

exp

A d-dimensional random vector

Page 37: Bayesian Decision Theory (Classification) 主講人:虞台文

Properties of N(μ,Σ)

X~N(μ,Σ) A d-dimensional random vector

Let Y=ATX, where A is a d × k matrix.

Y~N(ATμ, ATΣA)

Page 38: Bayesian Decision Theory (Classification) 主講人:虞台文

Properties of N(μ,Σ)

X~N(μ,Σ) A d-dimensional random vector

Let Y=ATX, where A is a d × k matrix.

Y~N(ATμ, ATΣA)

Page 39: Bayesian Decision Theory (Classification) 主講人:虞台文

On Parameters of N(μ,Σ)

X~N(μ,Σ) TdXXX ),,,( 21 X

TdE ),,,(][ 21 Xμ

ddijTE ][]))([( μXμXΣ

][ ii XE ][ ii XE

),()])([( jijjiiij XXCovXXE ),()])([( jijjiiij XXCovXXE

)(])[( 22iiiiii XVarXE )(])[( 22

iiiiii XVarXE

0 ijji XX

Page 40: Bayesian Decision Theory (Classification) 主講人:虞台文

More On Covariance Matrix

ddijTE ][]))([( μXμXΣ

),()])([( jijjiiij XXCovXXE ),()])([( jijjiiij XXCovXXE

)(])[( 22iiiiii XVarXE )(])[( 22

iiiiii XVarXE

0 ijji XX

is symmetric and positive semidefinite.TΦΛΦΣ

: orthonormal matrix, whose columns are eigenvectors of . : diagonal matrix (eigenvalues).

TΦΛΦΛ 2/12/1

T))(( 2/12/1 ΦΛΦΛΣ T))(( 2/12/1 ΦΛΦΛΣ

Page 41: Bayesian Decision Theory (Classification) 主講人:虞台文

Whitening Transform

X~N(μ,Σ)

Y=ATX Y~N(ATμ, ATΣA)

T))(( 2/12/1 ΦΛΦΛΣ T))(( 2/12/1 ΦΛΦΛΣ

Let 2/1ΦΛwA2/1ΦΛwA

),(~ wTw

Tww NX ΣAAμAA

IΦΛΦΛΦΛΦΛΣAA )())(()( 2/12/12/12/1 TTw

Tw

),(~ IμAA Tww NX ),(~ IμAA T

ww NX

Page 42: Bayesian Decision Theory (Classification) 主講人:虞台文

Whitening Transform

X~N(μ,Σ)

Y=ATX Y~N(ATμ, ATΣA)

T))(( 2/12/1 ΦΛΦΛΣ T))(( 2/12/1 ΦΛΦΛΣ

Let 2/1ΦΛwA2/1ΦΛwA

),(~ wTw

Tww NX ΣAAμAA

IΦΛΦΛΦΛΦΛΣAA )())(()( 2/12/12/12/1 TTw

Tw

),(~ IμAA Tww NX ),(~ IμAA T

ww NX

Whitening

Projection

LinearTransform

Page 43: Bayesian Decision Theory (Classification) 主講人:虞台文

Mahalanobis Distance

)()(

2

1exp

||)2(

1)( 1

2/12/μxΣμx

Σx T

dp

)()(

2

1exp

||)2(

1)( 1

2/12/μxΣμx

Σx T

dp

constant

)()( 12 μxΣμx Tr )()( 12 μxΣμx Tr

r2depends on the value of r2

X~N(μ,Σ)

Page 44: Bayesian Decision Theory (Classification) 主講人:虞台文

Mahalanobis Distance

)()(

2

1exp

||)2(

1)( 1

2/12/μxΣμx

Σx T

dp

)()(

2

1exp

||)2(

1)( 1

2/12/μxΣμx

Σx T

dp

constant

)()( 12 μxΣμx Tr )()( 12 μxΣμx Tr

r2depends on the value of r2

X~N(μ,Σ)

Page 45: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Theory(Classification)

Discriminant Functions for the Normal Po

pulations

Page 46: Bayesian Decision Theory (Classification) 主講人:虞台文

Minimum-Error-Rate Classification

)|()( xx ii Pg )()|()( iii Ppg xx

)(ln)|(ln)( iii Ppg xx

Xi~N(μi,Σi)

)()(

2

1exp

||)2(

1)|( 1

2/12/ iiT

ii

dip μxΣμxΣ

x

)()(

2

1exp

||)2(

1)|( 1

2/12/ iiT

ii

dip μxΣμxΣ

x

)(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx )(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx

Page 47: Bayesian Decision Theory (Classification) 主講人:虞台文

Minimum-Error-Rate Classification

)(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx )(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx

Three Cases:Case 1: IΣ 2i

Case 2: ΣΣ i

Case 3: ji ΣΣ

Classes are centered at different mean, and their feature components are pairwisely independent have the same variance.

Classes are centered at different mean, but have the same variation.

Arbitrary.

Page 48: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 1. i = 2I

)(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx )(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx

irrelevant

)(ln||||2

1)( 2

2 iii Pg

μxx

IΣ2

1 1

iIΣ

21 1

i

)(ln)2(2

12 ii

Ti

Ti

T P

μμxμxx

irrelevant

)(ln

2

11)(

22 iiTi

Tii Pg

μμxμx

Page 49: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 1. i = 2I

ii μw 21

ii μw 2

1

)(ln

2

11)(

22 iiTi

Tii Pg

μμxμx

)(ln221

0 iiTii Pw

μμ )(ln22

10 ii

Tii Pw

μμ

0)( iTii wg xwx

Page 50: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 1. i = 2I

ii μw 21

ii μw 2

1

)(ln221

0 iiTii Pw

μμ )(ln22

10 ii

Tii Pw

μμ

0)( iTii wg xwx

i j

Boundary btw. i and j

)()( xx ji gg 00 j

Tji

Ti ww xwxw

00)( ijTj

Ti ww xww

)(

)(ln)()( 2

21

j

ij

Tji

Ti

Tj

Ti P

P

μμμμxμμ

)(

)(ln

||||

))(())(()(

22

21

j

i

ji

jiTj

Ti

jiTj

Ti

Tj

Ti P

P

μμ

μμμμμμμμxμμ

Page 51: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 1. i = 2I

i j

Boundary btw. i and j

)()( xx ji gg

)(

)(ln

||||

))(())(()(

22

21

j

i

ji

jiTj

Ti

jiTj

Ti

Tj

Ti P

P

μμ

μμμμμμμμxμμ

wT

w0)( 0 xxwT

)()(

)(ln

||||)(

2

2

21

0 jij

i

jiji P

Pμμ

μμμμx

ji μμw x0

x

xx0

The decision boundary will be a hyperplane perpendicular to the line btw. the means at somewhere.

0 if P( i)=P( j)midpoint

Page 52: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 1. i = 2I

)()(

)(ln

||||)( 21

2

12

21

2

2121

0 μμμμ

μμx

P

P

)()( 21 PP

Minimum distance classifier (template matching)

Page 53: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 1. i = 2I

)()( 21 PP

)()(

)(ln

||||)( 21

2

12

21

2

2121

0 μμμμ

μμx

P

P

Page 54: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 1. i = 2I

)()( 21 PP

)()(

)(ln

||||)( 21

2

12

21

2

2121

0 μμμμ

μμx

P

P

Page 55: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 1. i = 2I

)()( 21 PP

)()(

)(ln

||||)( 21

2

12

21

2

2121

0 μμμμ

μμx

P

P

Demo

Page 56: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 2. i =

)(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx )(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx

Irrelevant ifP( i)= P( j) i, j

)(ln)()(2

1)( 1

iiT

ii Pg μxΣμxx

MahalanobisDistance

irrelevant

Page 57: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 2. i =

)(ln)()(2

1)( 1

iiT

ii Pg μxΣμxx

)(ln)2(2

1 111ii

Ti

Ti

T P μΣμxΣμxΣx

Irrelevant

0)( iTii wg xwx

ii μΣw 1 ii μΣw 1

)(ln121

0 iiTii Pw μΣμ )(ln1

21

0 iiTii Pw μΣμ

Page 58: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 2. i =

0)( iTii wg xwx

ii μΣw 1 ii μΣw 1

)(ln121

0 iiTii Pw μΣμ )(ln1

21

0 iiTii Pw μΣμ

i j

)()( xx ji gg 0)( 0 xxwT

)(1ji μμΣw

)()()(

)](/)(ln[)(

121

0 jiji

Tji

jiji

PPμμ

μμΣμμμμx

w

x0

x

Page 59: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 2. i =

Page 60: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 2. i = Demo

Page 61: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 3. i j

)(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx )(ln||ln2

12ln

2)()(

2

1)( 1

iiiiT

ii Pd

g ΣμxΣμxx

)(ln||ln2

1)()(

2

1)( 1

iiiiT

ii Pg ΣμxΣμxx

irrelevant

0)( iTii

Ti wg xwxWxx

1

2

1 ii ΣW1

2

1 ii ΣW iii μΣw 1 iii μΣw 1 )(ln||ln 1211

21

0 iiiiTii Pw ΣμΣμ )(ln||ln 1

211

21

0 iiiiTii Pw ΣμΣμ

Without this termIn Case 1 and 2

Decision surfaces are hyperquadrics, e.g.,• hyperplanes• hyperspheres• hyperellipsoids• hyperhyperboloids

Page 62: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 3. i j

Non-simply connected decision regions can arise in one dimensions for Gaussians having unequal variance.

Page 63: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 3. i j

Page 64: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 3. i j

Page 65: Bayesian Decision Theory (Classification) 主講人:虞台文

Case 3. i j

Demo

Page 66: Bayesian Decision Theory (Classification) 主講人:虞台文

Multi-Category Classification

Page 67: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Theory(Classification)

Minimax Criterion

Page 68: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Rule:Two-Category Classification

)(

)(

)(

)(

)|(

)|(

1

2

1121

2212

2

1

P

P

p

p

x

xDecide 1 if

LikelihoodRatio

Threshold

Minimax criterion deals with the case thatthe prior probabilities are unknown.

Page 69: Bayesian Decision Theory (Classification) 主講人:虞台文

Basic Concept on Minimax

To choose the worst-case prior probabilities (the maximum loss) and, then, pick the decision rule that will minimize the overall risk.

Minimize the maximum possible overall risk.

Page 70: Bayesian Decision Theory (Classification) 主講人:虞台文

Overall Risk

xxxx dpRR )()|)((

21

)()|()()|( 21 RRxxxxxx dpRdpR

)|()|()|( 2121111 xxx PPR )|()|()|( 2221212 xxx PPR

2

1

)()]|()|([

)()]|()|([

222121

212111

R

R

xxxx

xxxx

dpPP

dpPPR

Page 71: Bayesian Decision Theory (Classification) 主講人:虞台文

Overall Risk

2

1

)()]|()|([

)()]|()|([

222121

212111

R

R

xxxx

xxxx

dpPP

dpPPR

)(

)()|()|(

x

xx

p

PpP ii

i

)(

)()|()|(

x

xx

p

PpP ii

i

2

1

)]|()()|()([

)]|()()|()([

22221121

22121111

R

R

xxx

xxx

dpPpP

dpPpPR

Page 72: Bayesian Decision Theory (Classification) 主講人:虞台文

Overall Risk)(1)( 12 PP )(1)( 12 PP

2

1

)]|()()|()([

)]|()()|()([

22221121

22121111

R

R

xxx

xxx

dpPpP

dpPpPR

2

1

)}|()](1[)|()({

)}|()](1[)|()({

21221121

21121111

R

R

xxx

xxx

dpPpP

dpPpPR

1 2

1 1

2 2

12 2 22 2

11 1 1 12 1 2

21 1 1 22 1 2

( | ) ( | )

( ) ( | ) ( ) ( | )

( ) ( | ) ( ) ( | )

R p d p d

P p d P p d

P p d P p d

x x x x

x x x x

x x x x

R R

R R

R R

Page 73: Bayesian Decision Theory (Classification) 主講人:虞台文

Overall Risk

1 2

1 1

2 2

12 2 22 2

11 1 1 12 1 2

21 1 1 22 1 2

( | ) ( | )

( ) ( | ) ( ) ( | )

( ) ( | ) ( ) ( | )

R p d p d

P p d P p d

P p d P p d

x x x x

x x x x

x x x x

R R

R R

R R

1)|()|(21

RRxxxx dpdp ii 1)|()|(

21

RRxxxx dpdp ii

12

1

)|()()|()()()(

)|()()]([

222121112122111

22212221

RR

R

xxxx

xx

dpdpP

dpPR

Page 74: Bayesian Decision Theory (Classification) 主講人:虞台文

Overall Risk

12

1

)|()()|()()()(

)|()()]([

222121112122111

22212221

RR

R

xxxx

xx

dpdpP

dpPR

The overall risk for a particular P(1).

The value depends onthe setting of decision boundary

The value depends onthe setting of decision boundary

R(x) = ax + bR(x) = ax + b

Page 75: Bayesian Decision Theory (Classification) 主講人:虞台文

Overall Risk

12

1

)|()()|()()()(

)|()()]([

222121112122111

22212221

RR

R

xxxx

xx

dpdpP

dpPR

= 0 for minimax solution

= R mm, minimax risk

R(x) = ax + bR(x) = ax + b

Independent on the value of P(i).

Page 76: Bayesian Decision Theory (Classification) 主講人:虞台文

Minimax Risk

12

1

)|()()|()()()(

)|()()]([

222121112122111

22212221

RR

R

xxxx

xx

dpdpP

dpPR

1

)|()( 2221222 Rxx dpRmm

2

)|()( 1112111 Rxx dp

Page 77: Bayesian Decision Theory (Classification) 主講人:虞台文

Error Probability

12

1

)|()()|()()()(

)|()()]([

222121112122111

22212221

RR

R

xxxx

xx

dpdpP

dpPR

Use 0/1 loss function

12

1

)|()|()(

)|()]([

211

21

RR

R

xxxx

xx

dpdpP

dpPPerror

Page 78: Bayesian Decision Theory (Classification) 主講人:虞台文

Minimax Error-Probability

1

)|()( 2Rxx dperrorPmm

2

)|( 1Rxx dp

Use 0/1 loss function

P( 1| 2) P( 2| 1)

12

1

)|()|()(

)|()]([

211

21

RR

R

xxxx

xx

dpdpP

dpPPerror

Page 79: Bayesian Decision Theory (Classification) 主講人:虞台文

Minimax Error-Probability

R1 R2

1 2

1

)|()( 2Rxx dperrorPmm

2

)|( 1Rxx dp

P( 1| 2) P( 2| 1)

12

1

)|()|()(

)|()]([

211

21

RR

R

xxxx

xx

dpdpP

dpPPerror

12

1

)|()|()(

)|()]([

211

21

RR

R

xxxx

xx

dpdpP

dpPPerror

Page 80: Bayesian Decision Theory (Classification) 主講人:虞台文

Minimax Error-Probability

12

1

)|()|()(

)|()]([

211

21

RR

R

xxxx

xx

dpdpP

dpPPerror

12

1

)|()|()(

)|()]([

211

21

RR

R

xxxx

xx

dpdpP

dpPPerror

Page 81: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Theory(Classification)

Neyman-Pearson Criterion

Page 82: Bayesian Decision Theory (Classification) 主講人:虞台文

Bayesian Decision Rule:Two-Category Classification

)(

)(

)(

)(

)|(

)|(

1

2

1121

2212

2

1

P

P

p

p

x

xDecide 1 if

LikelihoodRatio

Threshold

Neyman-Pearson Criterion deals with the case that both loss functions and the prior probabilities are unknown.

Page 83: Bayesian Decision Theory (Classification) 主講人:虞台文

Signal Detection Theory

The theory of signal detection theory evolved from the development of communications and radar equipment the first half of the last century.

It migrated to psychology, initially as part of sensation and perception, in the 50's and 60's as an attempt to understand some of the features of human behavior when detecting very faint stimuli that were not being explained by traditional theories of thresholds.

Page 84: Bayesian Decision Theory (Classification) 主講人:虞台文

The situation of interest

A person is faced with a stimulus (signal) that is very faint or confusing.

The person must make a decision, is the signal there or not. 

What makes this situation confusing and difficult is the presences of other mess that is similar to the signal.  Let us call this mess noise.

Page 85: Bayesian Decision Theory (Classification) 主講人:虞台文

Example

Noise is present both in the environment and in the sensory system of the observer.

The observer reacts to the momentary total activation of the sensory system, which fluctuates from moment to moment, as well as responding to environmental stimuli, which may include a signal.

Page 86: Bayesian Decision Theory (Classification) 主講人:虞台文

Example A radiologist is examining a CT scan, looking for

evidence of a tumor. A Hard job, because there is always some uncertainty.

There are four possible outcomes: – hit (tumor present and doctor says "yes'')– miss (tumor present and doctor says "no'') – false alarm (tumor absent and doctor says "yes") – correct rejection (tumor absent and doctor says "no").

Two types of Error

Page 87: Bayesian Decision Theory (Classification) 主講人:虞台文

Correct RejectionCorrect Rejection

The Four Cases

P(1|1)

MissMiss

False AlarmsFalse Alarms HitHit

Signal (tumor)

Absent (1) Present (2)

Decision

No (1)

Yes (2)P(2|2)

P(1|2)

P(2|1)

Signal detection theory was developed to help us understand how a continuous and ambiguous signal can lead to a binary yes/no decision.

Signal detection theory was developed to help us understand how a continuous and ambiguous signal can lead to a binary yes/no decision.

Page 88: Bayesian Decision Theory (Classification) 主講人:虞台文

No (1) Yes (2)

Decision Making

d’Noise

1

Noise + Signal

2

Criterion

Hit

FalseAlarm

Discriminability

||' 12 d

||' 12 d

Based on expectancy(decision bias)

P(2|2)

P(2|1)

Page 89: Bayesian Decision Theory (Classification) 主講人:虞台文

ROC Curve(Receiver Operating Characteristic)

Hit

FalseAlarm

PH=P(2|2)

PFA=P(2|1)

Page 90: Bayesian Decision Theory (Classification) 主講人:虞台文

Neyman-Pearson Criterion

FalseAlarm

PFA=P(2|1)

NP:max. PH

subject to PFA ≦ a

Hit

PH=P(2|2)

Page 91: Bayesian Decision Theory (Classification) 主講人:虞台文

Likelihood Ratio Test

T

T

pppp

)|()|()|()|(

2

1

2

1

1

0)(

xxxx

x

where T is a threshold that meets the PFA constraint ( ≦ a).

)}|()|(|{ 211 xxx Tpp R )}|()|(|{ 211 xxx Tpp R

)}|()|(|{ 222 xxx Tpp R )}|()|(|{ 222 xxx Tpp R

How to determine T?

Page 92: Bayesian Decision Theory (Classification) 主講人:虞台文

Likelihood Ratio Test

T

T

pppp

)|()|()|()|(

2

1

2

1

1

0)(

xxxx

x)}|()|(|{ 211 xxx Tpp R )}|()|(|{ 211 xxx Tpp R

)}|()|(|{ 222 xxx Tpp R )}|()|(|{ 222 xxx Tpp R

PH

PFA

R2 R1

xx dpPFA )|( 12

R

xx dpPH )|( 22

R

xxx dp )|()( 12

R

xxx dp )|()( 22

R

]|)([ 1 XEPFA ]|)([ 1 XEPFA

]|)([ 2 XEPH ]|)([ 2 XEPH

Page 93: Bayesian Decision Theory (Classification) 主講人:虞台文

Neyman-Pearson Lemma

Consider the aforementioned rule with T chosen to give PFA() =a. There is no decision rule ’ such that PFA(’) a and PH(’) > PH() .

]|)([ 1 XEPFA ]|)([ 1 XEPFA

]|)([ 2 XEPH ]|)([ 2 XEPH

T

T

pppp

)|()|()|()|(

2

1

2

1

1

0)(

xxxx

x

T

T

pppp

)|()|()|()|(

2

1

2

1

1

0)(

xxxx

x

Pf) Let ’ be a decision rule with .]|)('[)'( 1 aEPFA X

0)]|()|()][(')([ 12 xxxxx dpTp

=1

0 > 0

Page 94: Bayesian Decision Theory (Classification) 主講人:虞台文

Neyman-Pearson Lemma

Consider the aforementioned rule with T chosen to give PFA() =a. There is no decision rule ’ such that PFA(’) ≦a and PH(’) > PH() .

]|)([ 1 XEPFA ]|)([ 1 XEPFA

]|)([ 2 XEPH ]|)([ 2 XEPH

T

T

pp

pp

)|()|(

)|()|(

2

1

2

1

0

1)(

xx

xx

x

T

T

pp

pp

)|()|(

)|()|(

2

1

2

1

0

1)(

xx

xx

x

Pf) Let ’ be a decision rule with .]|)('[)'( 1 aEPFA X

0)]|()|()][(')([ 12 xxxxx dpTp

=0

0 0

Page 95: Bayesian Decision Theory (Classification) 主講人:虞台文

OK

Neyman-Pearson Lemma

Consider the aforementioned rule with T chosen to give PFA() =a. There is no decision rule ’ such that PFA(’) ≦a and PH(’) > PH() .

]|)([ 1 XEPFA ]|)([ 1 XEPFA

]|)([ 2 XEPH ]|)([ 2 XEPH

T

T

pp

pp

)|()|(

)|()|(

2

1

2

1

0

1)(

xx

xx

x

T

T

pp

pp

)|()|(

)|()|(

2

1

2

1

0

1)(

xx

xx

x

Pf) Let ’ be a decision rule with .]|)('[)'( 1 aEPFA X

0)]|()|()][(')([ 12 xxxxx dpTp

xxxxxxxxxxxx dpdpdpdpT )|()(')|()()|()(')|()( 1122

)]'()([)]'()([ FAFAHH PPPPT 0

0

)'()( HH PP )'()( HH PP