08 probabilistic inference over time

Bayesian Networks

Unit 8 Probabilistic Inferenceover Time

Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright

Wang, Yuan-Kai, 王元凱[email protected]

http://www.ykwang.tw

Department of Electrical Engineering, Fu Jen Univ.輔仁大學電機工程系

2006~2011

Reference this document as: Wang, Yuan-Kai, “Probabilistic Inference over Time,"

Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.


Bayesian Networks Unit - Probabilistic Inference over Time p. 2

Goal of This Unit• Know the uncertainty concept in temporal

models• Learn four inference types in temporal

models– Filtering, Prediction, Smoothing,

Most Likely Explanation• See some temporal models

– HMM, Kalman/Particle filtering– Dynamic Bayesian networks


Bayesian Networks Unit - Probabilistic Inference over Time p.

Related Units• Background

– Probabilistic graphical model– Exact inference in BN– Approximate inference in BN

• Next units– HMM– Kalman filter– Particle filter– DBN

3



Self-Study Reference• Chapter 15, Sections 15.1-15.2, Artificial

Intelligence - a modern approach, 2nd, by S. Russel & P. Norvig, Prentice Hall, 2003.



Structure of Related Lecture Notes

PGM Representation

Inference

Problem

Learning

Data

Unit 5 : BNUnit 9 : Hybrid BNUnits 10~15: Naïve Bayes, MRF,

HMM, DBN,Kalman filter

Unit 6: Exact inferenceUnit 7: Approximate inferenceUnit 8: Temporal inference

Units 16~ : MLE, EM

StructureLearning

ParameterLearning

B E

A

J M

P(A|B,E)P(J|A)P(M|A)

P(B)P(E)

Query


Bayesian Networks Unit - Probabilistic Inference over Time p.

Contents

1. Time and Uncertainty …………………...... 72. Inference in Temporal Models ……...……. 463. Various Models .…….................................... 904. References …………………………………. 96

6



1. Time and Uncertainty

• What is probabilistic reasoning over time–There are a lot of time-series data

• Ex: Stock data, weather data, radar signal, ...

–We want to• Predict its next data• Recover correct values of its current data• Recover correct values of its previous data



Example – Stock Data



Example 2 - Visual Tracking• What is visual tracking

– Continuously detect objects in video– Time series data

• What kind of objects– Face, – Facial features (eye, eyebrow, ...)– Human body – Hand – ...



Why Visual Tracking (1/2)• A simple idea to detect objects in all frames

of a video– "Detect object at every frame with the same

detection method• Disadvantage

– A detection of a frame may be slow– Detections at all frames become very slow

• So, if you have a very quick detection method, the simple method is OK?



Why Visual Tracking (2/2)• A better approach to detect objects in all

frames of a video– Detect objects at the first frame– Find objects at succeeding frames with a quick

method tracking

• Goal of visual tracking– Fast and accurate detection of objects



Front-View Face TrackingSingle frame detector

Temporal detector



Side-View Face Tracking

without temporal continuity without temporal continuity



Two Kinds of Approaches• Neighborhood-based

– Search the neighborhood of the object's location in previous frame

• Prediction-based– Search the neighborhood of the predicted

location in current frame



Basic Algorithm• Basic idea of both the two approaches

1.Read first frame2.Detect moving object O

Obtain Region of Interest (ROI), usually rectangle or ellipse

3.Read next frame4.For all possible ROI candidate Oc

a)Compare the similarity between O and Oc

b)If similarity is high, tracking successfully. Break



Neighborhood-search Tracking• Basic idea

1. Read first frame2. Detect face O

Obtain Region of Interest (ROI), usually rectangle or ellipse

3. Read next frame4. For all possible ROI candidate Oc

a) Compare the similarity between O and Oc

b) If similarity is high, break



Basic Ideas

FaceDetection

FaceTracking

First frame

Next frame

O

O

OcSearch Region



Prediction-based Tracking• Three steps

– Predict next position of moving objects with a probabilistic model (parameters)

– Detect new position around the predicted position

• Prediction error– Update

• The correct position• The probabilistic model with the

prediction error



Predict Next Position

( | )t tP z x

Detected position : zt

Real position : xt Predicted positionx-

t+1

( | )t tP z x

1( | )t tP x x

1( | )t tP x xProbabilistic

model

Current framePrevious frames



Detect New Position by LSE

SE = 1032, 2560,1968, 104, 2223, ...

LSE = 104

Predicted position

Search region

Detected position: zt+1



Update

'( | )t tP z x

1'( | )t tP x x

x-t+1 zt+1Prediction Error

x-t+1-zt+1

Corrected position xt+1

Corrected Probabilistic

model



Accurate Tracking = Smoothing



Example 3 - Robot Localization• Localization of AIBO robot in

RoboCup• The robot has to

–See landmark• Object detection & object recognition

–Analyze the landmark• Calculate distance & angle between the

robot and the landmark–Estimate its location



RoboCup Field

),( r



Tracking of Robot



Temporal Patterns• Deterministic patterns :

– Traffic light– FSM

(Finite State Machine)– …

• Non-Deterministic patterns :– Weather– Speech– Tracking– …



How to Do It?• What we want?

–Prediction: Predict its next data–Filtering: Recover correct values of its

current data–Smoothing: Recover correct values of

its previous data• How to achieve it?

–Statistically model the data



Statistically Modeling

x

y

y = 1.3x + 96 : ModelA set of time-related data

PredictFilterSmooth



State• There is a set of time-related data

• We call each data– A state of the system, or – A state of the object

Time t = 0 1 2 3 ... ︵

50, 10050, 18050, 160︶

︵49, 9850, 17849, 158︶

︵50, 9650, 17650, 156︶

︵48, 9447, 17348, 154︶

State s =



Observable v.s. Unobservable States• Observable state

– Measurable values• Sensor values, feature values

– Ex : Localization/Visual Tracking• Measured position, Measured speed

– Ex : Facial Expression Recognition• Eyebrow up, eyebrow down, ...

• Unobservable state– Real state of the system/object– Ex : Localization/Visual Tracking

• Real position, real speed– Ex : Facial Expression Recognition

• Smile, Cry, Anger, ...



Observable v.s. Unobservable States (Math)

• Let– Xt = set of unobservable state variables at

time t– Et = set of observable state variables at

time t• Usually we observe

– E0, E1, ...., Et : time-related data• But we want to derive

– X0, X1, ..., Xt• Notation: Xa:b = Xa, Xa+1, ..., Xb



Markov Chain• Markov chain is an assumption

–A state is dependent on previous state–Xt depends on X0:t-1–Xt+1 will not influence Xt

• Markov process– If we assume that a set of data obeys

Markov assumption,–We say the data perform Markov

process



Markov Process• First-order Markov process

– P(Xt |X0:t-1)=P(Xt | Xt-1 )

• Second-order Markov process– P(Xt |X0:t-1)=P(Xt | Xt-2 , Xt-1 )

• Higher order Markov process ...– Complicate, seldom used



Transition Model & Sensor Model• Transition model

–P(Xt | Xt-1 )–P(Xt | Xt-2 , Xt-1 )

• Sensor model–We usually assume the evidence

variables (sensor values) at time t, Et, depend only on the current state Xt

–P(Et|X0:t, E0:t-1) = P(Et|Xt)– It is also called observation model



Diagram of Transition & Sensor Models for 1st Order Markov

• P(Xt | Xt-1 )

• P(Et|Xt)

Xt-1 Xt

Xt Et

Xt-1 Xt

Et

Xt+1 Xt+2

Et+1 Et+2

A special Bayesian network

Transition of unobservable states

Causal relationship between observable & unobservable states



An "Umbrella World" Example (1/2)• A security guard is always at a secret

underground room, without going out• He wants to know if it is raining today• But he can not observe the outside world• He can only see each morning the

director coming in with, or without, an umbrella

• Rain is the unobservable state• Umbrella is the observable state

(sensors values)



An "Umbrella World" Example (2/2)

• For each day t, the set Et contains a single evidence Ut (whether the umbrella appears)

• The set Xt contains a single state variable Rt(whether it is raining)



Stationary Process• The transition model P(Xt | Xt-1) and the

sensor model P(Et | Xt) are all fixed for all time t

• Stationary process assumption– Can reduce the complexity of the

algorithm for inference



Inference for the Markov Process (1/2)

• A Bayesian net with 2 random variables– X: X0, X1, ..., Xt– E: E1, ..., Et

• We know that P(X0, X1, ..., Xt, E1, ..., Et), the FJD, can answer any query– But it can be reduced

X0 X1

E1

X2 Xt

E2 Et

t

iiiiitt XEPXXPXPEEXXXP

110110 )|()|()(),,,,,(



Inference for the Markov Process (2/2)• We need three PDFs

–P(X0), P(Xi|Xi-1), P(Ei|Xi)• For discrete R.V., we need

–1 prior probability table P(X0) –2 CPTs

• CPT for transition model: P(Xi|Xi-1)• CPT for sensor model: P(Ei|Xi)

• For continuous R.V., we need–Gaussian pdf, Gaussian Mixture, ...

• Here we consider only discrete R.V.



Sequence Diagram (1/4)

P(X0) probability table ?• Suppose the unobservable variable X

is a discrete R.V.•X = S1, S2, S3, ..., Si, ...

• P(X0) is the probability of X=Si at t=0

X0 X1

E1

X2 Xt

E2 Et

X P(X0)S0 0.2S1 0.1... ...Si 0.3




P(Xt|Xt-1) conditional probability table ?

X0 X1

E1

X2 Xt

E2 Et

Xt+1XtS1 S2 ... Si

S1 0.1 0.2 ... 0.05S2 0.2 0.15 ... 0.18... ... ... ... ...Si 0.31 0.03 ... 0.22

Transitionprobability



Sequence Diagram (3/4)X0 X1

E1

X2 Xt

E2 Et

P(Et|Xt) conditional probability table ?EtXt

v1 v2 ... vj

S1 0.1 0.2 ... 0.05S2 0.2 0.15 ... 0.18... ... ... ... ...Si 0.31 0.03 ... 0.22

Observationprobability




S1

S3

S2

S1

S3

S2

S1

S3

S2

S1

S3

S2

S1

S3

S2

S1

S3

S2

S1

S3

S2

t =1 2 3 4 5 6 7

X1 X2 X3 X4 X5 X6 X7S3S3 S1 S1 S3 S2 S3

v2 v4 v1 v1 v2 v3 v4E1 E2 E3 E4 E5 E6 E7



Short Summary

• If we have the three PDFs/Tables– P(X0), P(Xi|Xi-1), P(Ei|Xi)

• We can answer any query– P(X1, X3 | E2, E4), P(X1, E5 | X2, X4), ...

• Do we need to ask many kinds of query?• Or we have some frequently asked queries?

X0 X1

E1

X2 Xt

E2 Et



2. Inference in Temporal Models• Four common query tasks in

temporal inference/reasoning–Filtering: P(Xt | e1:t)= P(Xt | E1:t=e1:t)

• Estimate correct current states–Prediction: P(Xt+k | e1:t) for k > 0

• Predict possible next states–Smoothing: P(Xk | e1:t) for 1 k < t

• Better estimate of past states–Most likely explanation:

arg maxX1:t P(X1:t | e1:t)



Subsections

• 2.1 Graphical models of the 4 inferences

• 2.2 Mathematical formula of the 4 inferences



2.1 Graphical Models of the 4 Inferences

• Use sequence diagram to illustrate what are–Filtering–Prediction–Smoothing–Most likely explanation



Graphical Models - Filtering• P(Xt | e1:t) X0 X1

E1

X2 Xt

E2 Et

A filtering example for 2D position of robot/WLAN card



Graphical Models - Prediction

• P(Xt+k | e1:t) for k > 0

X0 X1

E1

X2 Xt

E2 Et

Xt+1

For k=1



Graphical Models – Smoothing (1/3)

• P(Xk | e1:t) for 1 k < t

X0 X1

E1

X2 Xt

E2 Et

Xk

Ek



Graphical Models – Smoothing (2/3)



Graphical Models – Smoothing (3/3)Smoothing v.s. Filtering



Graphical Models - Most Likely Explanation (1/2)

• arg maxX1:tP(X1:t | e1:t)

X0 X1

E1

X2 Xt

E2 Et



Graphical Models - Most Likely Explanation (2/2)

S1

S3

S2

S1

S3

S2

S1

S3

S2

S1

S3

S2

S1

S3

S2

S1

S3

S2

S1

S3

S2

t =1 2 3 4 5 6 7

E1=v2 E2=v4 E3=v1 E4=v1 E5=v2 E6=v3 E7=v4



2.2 Mathematical Formulaof the 4 Inferences

• Derive mathematical formula of–Prediction–Filtering–Smoothing–Most likely explanation



Prediction (1/3)• P(Xt+1 | e1:t ): one-step prediction as

example

X0 X1

E1

X2 Xt

E2 Et

Xt+1

tXX ti

iiiitttt XePXXPXXPXPeXP , 1

110:110

)|()|()|()()|(

But more efficient formula can be derived



Prediction (2/3)• New formula for P(Xt+1 | e1:t )

– Xt+1 has no relationship to e1, e2, ..., et– But they both have relationship to xt– If X is a Boolean R.V., P(Xt+1|e1:t)

=<P(xt+1=true|e1:t), P(xt+1=false|e1:t)>• P(Xt+1 | e1:t )

– = xtP(Xt+1 | xt , e1:t )P(xt | e1:t )

– = xtP(Xt+1 | xt )P(xt | e1:t )

by transition model

CPT of transition model Filtering



Prediction (3/3)

S1

S3

S2

SN

…

t = 1 2 3 ... t t+1…

S1

S3

S2

S1

S3

S2

SN…

SN

S1

S3

S2

S1

S3

S2

SN

…

SN

…………

…

P(S1|S2)

P(S2|S2)

P(S3|S2)

P(SN|S2)

Et=v2 v4 v1 ... v3

P(Xt+1 | e1:t) = xtP(Xt+1| xt )P(xt | e1:t )

= xtP(S2|Si)P(Si|e1:t)

= xtP(Xt+1=S2| xt=Si)P(xt=Si | e1:t )

P(Xt+1=S2 | e1:t) e1:t

= P(S2|S1)P(S1|e1:t)+ P(S2|S2)P(S2|e1:t)+ P(S2|S3)P(S3|e1:t)+ ...

P(S1|e1:t)

P(S2|e1:t)

P(S3|e1:t)

P(SN|e1:t)



Filtering (1/3)• P(Xt+1 | e1:t+1) X0 X1

E1

X2 Xt+1

E2 Et+1

( or P(Xt | e1:t) )

tXX ti

iiiitt XePXXPXPeXP , 1

101:110

)|()|()()|(

But more efficient formula can be derived



Filtering (2/3)• P(Xt+1 | e1:t+1)• P(Xt+1 | e1:t+1)= P(Xt+1 | e1:t, et+1)

– = P(et+1 | Xt+1 , e1:t) P(Xt+1 | e1:t)– = P(et+1 | Xt+1 ) P(Xt+1 | e1:t )

– = P(et+1 | Xt+1 ) xtP(Xt+1 | xt , e1:t )P(xt | e1:t )

– = P(et+1 | Xt+1 ) xtP(Xt+1 | xt )P(xt | e1:t )

• We derive a recursive algorithm – P(Xt+1 | e1:t+1) can be calculated by P(Xt | e1:t)– There is a function f that

P(Xt+1 | e1:t+1) = f(et+1, P(Xt | e1:t))

by Bayes ruleby sensor model

by transition modelCPT of sensor model Prediction



Filtering (3/3)

et+1

P(et+1|Xt+1)=P(v4|S2)

S1

S3

S2

SN

…t = 1 2 ... t t+1

…S1

S3

S2

S1

S3

S2

SN

…SN

S1

S3

S2

SN

…

………

…

Et=v2 v4 ... v3

P(Xt+1 | e1:t+1)= P(et+1 | Xt+1)P(Xt+1 | e1:t )

P(Xt+1=S2 | e1:t+1)= P(et+1=v4|Xt+1=S2)

P(Xt+1=S2 | e1:t )

v4

= P(v4 | S2)P(S2 | e1:t )

P(S1|S2)

P(S2|S2)

P(S3|S2)

P(SN|S2)

P(S1|e1:t)

P(S2|e1:t)

P(S3|e1:t)

P(SN|e1:t)

P(Xt+1|e1:t)=P(S2|e1:t)



Forward Variable• P(Xt+1 | e1:t ) = xt

P(Xt+1 | xt )P(xt | e1:t )• P(Xt+1 | e1:t+1) = P(et+1 | Xt+1) P(Xt+1 | e1:t ) • They are a kind of recursive function• Interesting points

– Both the prediction & filtering of Xt+1 need P(Xt|e1:t)

– We define the P(Xt|e1:t) as a forward variable f1:t

– i.e., f1:t = P(Xt | e1:t ), f1:t(Si) = P(Xt=Si | e1:t )



Forward Procedure• P(Xt+1 | e1:t+1) = P(et+1 | Xt+1) P(Xt+1 | e1:t )

= P(et+1 | Xt+1) xtP(Xt+1 | xt )P(xt | e1:t )

• The filtering process is rewritten asf1:t+1 = Forward(f1:t , et+1)– A forward procedure (algorithm) :

Forward(f1:t , et+1) = P(et+1 | Xt+1)xt

P(Xt+1 | xt )P(xt | e1:t )



P(ut|rt)=0.9P(ut|rt)=0.1P(ut|rt)=0.2P(ut|rt)=0.8

Filtering Example (1/4)• For the umbrella example

P(Rt|Rt-1)

P(Ut|Rt)

P(Ut|rt)=<P(ut|rt),P(ut|rt)> = <0.9,0.1>P(Ut|rt)=<P(ut|rt),P(ut|rt)> = <0.2,0.8> P(ut|Rt)?



Filtering Example (2/4)• Assume the man believes that

P(R0) = <0.5,0.5> = < P(r0), P(r0) >– The rain probability before the

observation sequence begins• Now we has the observation sequence:

umbrella1=true, umbrella2=true• We will use the

filtering processto find rainprobability

Umbrella1=true

Rain1

P(R1|U1)

Umbrella2=true

Rain2

P(R2|U1,U2)



Filtering Example (3/4)

Rt-1 P(Rt)t 0.7f 0.3

Umbrella1=true

Rain1

Umbrella2=true

Rain2Rain0

P(R1) = r0

P(R1|r0)P(r0)= <0.7,0.3>0.5 + <0.3,0.7>0.5 = <0.5,0.5>P(R1|u1) = P(u1|R1)P(R1)= <0.9,0.2><0.5,0,5> = <0.45,0.1> <0.818, 0.182>

Rt P(Ut)t 0.9f 0.2

P(R1|u1)= P(u1|R1)P(R1)



Filtering Example (4/4)

Rt-1 P(Rt)t 0.7f 0.3

Umbrella1=true

Rain1

Umbrella2=true

Rain2Rain0

P(R2|u1) = r1

P(R2|r1)P(r1|u1)= <0.7,0.3>0.818 + <0.3,0.7>0.182 = <0.627,0.373>

P(R2|u1,u2) = P(u2|R2)P(R2|u1)= <0.9,0.2><0.627,0,373> = <0.565,0.075> <0.883, 0.117>

Rt P(Ut)t 0.9f 0.2P(R2|u1,u2)

= P(u2|R2)P(R2|u1)



Smoothing (1/2)• P(Xk | e1:t) for 1 k < t

–Divide e1:t into e1:k and ek+1:t–P(Xk | e1:t) = P(Xk | e1:k , ek+1:t)–= P(Xk | e1:k)P(ek+1:t | Xk , e1:k)–= P(Xk | e1:k)P(ek+1:t | Xk )



Smoothing (2/2)

t = 1 k-1 k k+1 t

S1

S3

S2

S1

S3

S2

SN

…

SN

…

………

…

S1

S3

S2

S1

S3

S2

SN

…

SN

…

………

…

P(x1|x2)

P(x2|x2)

P(x3|x2)

P(xN|x2)

Et= v2 ... v4 v3 v1 ... v4

P(Xk=S2 | e1:t)

S1

S3

S2

SN

…



Backward Variable• P(ek+1:t | Xk)

– = xk+1P(ek+1:t | Xk, xk+1)P(xk+1 | Xk)

– = xk+1P(ek+1:t | xk+1)P(xk+1 | Xk)

– = xk+1P(ek+1 , ek+2:t | xk+1)P(xk+1 | Xk)

– = xk+1P(ek+1 | xk+1)P(ek+2:t | xk+1)P(xk+1 | Xk)

• This is also a recursive formula• We define a backward variable bk+1:t

– bk+1:t = P(ek+1:t | Xk)



Backward Procedure (1/2)• P(ek+1:t | Xk)

= xk+1P(ek+1|xk+1)P(ek+2:t|xk+1)P(xk+1|Xk)

• The formula is rewritten asbk+1:t = Backward(bk+2:t , ek+1)



Backward Procedure (2/2)

t = 1 k k+1 t

S1

S3

S2

S1

S3

S2

SN

…

SN

…

………

…

Et= v2 ... v4 v1 ... v4

P(ek+1:t |xk)

…S1

S3

S2

SN

………

…

S1

S3

S2

SN

…

= xk+1P(ek+1|xk+1)P(ek+2:t|xk+1)P(xk+1|Xk)

P(ek+1:t |xk=S2)= xk+1

P(v1|xk+1)P(ek+2:t|xk+1)P(xk+1|S2)

= P(v1|S1)P(ek+2:t|S1)P(S1|S2)+ P(v1|S2)P(ek+2:t|S2)P(S2|S2)+ ...+ P(v1|SN)P(ek+2:t|SN)P(SN|S2)



The Smoothing Formula• P(Xk | e1:t) = P(Xk | e1:k , ek+1:t)

–= P(Xk | e1:k)P(ek+1:t | Xk , e1:k)–= P(Xk | e1:k)P(ek+1:t | Xk )–= f1:kbk+1:t

• Time complexity–Both the forward and backward

recursions take a constant time per step

–Complexity of smoothing P(Xk | e1:t) with e1:t is O(t)



Smoothing Example (1/3)• For the umbrella example• P(R1 | u1, u2)

– Computing the smoothed estimate for the probability of rain at t=1,

– Given the umbrella observations on days 1 & 2

Rt-1 P(Rt)t 0.7f 0.3

Umbrella1=true

Rain1

Umbrella2=true

Rain2Rain0 Rt P(Ut)t 0.9f 0.2



Smoothing Example (2/3)• P(R1 | u1, u2) = P(R1|u1)P(u2|R1)

– P(R1|u1) = <0.818, 0.182>– P(u2|R1) = r2

P(u2|r2)P(|r2)P(r2|R1)= (0.91<0.7,0.3>) + (0.21<0.3,0.7>)= <0.69, 0.41>

• P(R1 | u1, u2) = <0.818,0.182><0.69,0.41> <0.883, 0.117>

• Note: P(R1|u1) = <0.818, 0.182>• With more one observation u2, the

probability of r1 increases smoothing



Smoothing Example (3/3)



Most Likely Explanation (1/2)• Smoothing P(Xk | e1:t) considers only

one past state at time step k• Most likely explanation,

arg maxX1:tP(X1:t | e1:t)

–Considers all past states, and–Choose the best state sequence

X0 X1

E1

X2 Xt

E2 Et



Most Likely Explanation (2/2)• We will discuss 3 algorithms

–Algorithm 1: • Very simple, directly using smoothing• Time complexity O(t2)

–Algorithm 2(forward-backward algo.):• Improved usage of smoothing• Time complexity O(t)• But the result may not be the best state

sequence–Algorithm 3(Viterbi algorithm):

• Time complexity O(t)



Algorithm 1• The most simple idea for this

problem–Call smoothing t times, smoothing one

state each time–For (i=0; i<t; i++) P(Xi | e1:t)

• Drawback–Time complexity of O(t2) : too slow

• Improvement–Apply dynamic programming to

reduce the complexity to O(t)



Algorithm 2 (1/2)• Forward-backward algorithm

–First, record the results of the forward filtering over the whole sequence from 1 to t

–Then, run the backward recursion from tdown to 1, and • Compute the smoothed estimate at each time

step k, from the bk+1:t and the stored f1:k



Algorithm 2 (2/2)

in previous slides

fv[i]= f1:t = P(Xt | e1:t )forward procedure: f1:t+1 = Forward(f1:t , et+1)

backward procedure: bk+1:t = Backward(bk+2:t , ek+1)

Smoothing: P(Xk | e1:t) = f1:kbk+1:t



However (1/2)• For the umbrella example, suppose there is

an observation sequence e1:t=[true, true, false, true, true] for umbrella's appearance

• What is the weather sequence most likely to explain this?– Does the absence of the umbrella on day 3

mean that• Day 3 wasn't raining, or• The director forget to bring it?• If day 3 wasn't raining, day 4 may not be raining

either, but the director brought the umbrella just in case



However (2/2)• The forward-backward algorithm

uses smoothing for each single time step

• But to find the most likely sequence, we must consider joint probabilities over all time steps

• To consider joint probabilities of a sequence, we need to consider path



Path• A path is a possible sequence

– There are 25 paths– Each path (sequence) has a probability– Only one path has the maximum

probability



Probability of Path

• arg maxX1:tP(X1:t | e1:t)

t

iiiiitt XePXXPeXP

11:1:1 )|()|()|(



Recursive View • An important idea for finding

arg maxX1:tP(X1:t | e1:t)

–A path in maxX1:t-1P(X1:t-1 | e1:t-1) must

be the path in maxX1:tP(X1:t | e1:t)



The Viterbi Example



Algorithm 3• Viterbi algorithm

)|,,,(max)|(max)|(

)|,,,(max

:111111

1:111

11

1

tttxxttxtt

tttxx

exxxPxXPXeP

eXxxP

tt

t

• It is similar to the filtering algorithmP(Xt+1 | e1:t+1) = P(et+1 | Xt+1) xt

P(Xt+1 | xt )P(xt | e1:t )



3. Various Models

• Hidden Markov Models• Kalman Filter• Particle Filter• Dynamic Bayesian Networks



Hidden Markov Model (1/2)

Y1 Y3

X1 X2 X3

Y2

Observationseg. Detected

location

Hidden stateseg. Real location

n

iiin xpaxPxxxP

121 ))(|(),...,,(



Hidden Markov Model (2/2)

Y1 Y3

X1 X2 X3

Y2

Transition matrix

Observation matrix

Initial state distribution

B

A

Parameter tyeing



Kalman Filtering

Y1 Y3

X1 X2X3

Y2

• The same graphical structure with HMM• But

•In HMM, Xi and Yi are discrete (CPT)•In Kalman filter, Xi and Yi are continuous



Particle Filtering• TBU



Dynamic Bayesian Network (DBN)• TBU



4. References

• Chapter 15, Sections 15.1-15.2, Artificial Intelligence - a modern approach, 2nd, by S. Russel & P. Norvig, Prentice Hall, 2003.

Education

08 probabilistic inference over time