25
Foundations of Statistical NLP Chapter 9. Markov Models 한 한 한

Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

Embed Size (px)

Citation preview

Page 1: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

Foundations of Statistical NLP

Chapter 9. Markov Models

한 기 덕

Page 2: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

2

Contents

Introduction Markov Models Hidden Markov Models

– Why use HMMs– General form of an HMM– The Three Fundamental Questions for HMMs

Fundamental Questions For HMMs Implementation, Properties, and Variants

Page 3: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

3

Introduction

Markov Model– Markov processes/chains/models were first developed by Andrei

A. Markov

– First use linguistic purpose : modeling the letter sequences in Russian literature(1913)

– Current use general statistical tool

VMM (Visible Markov Model)– Words in sentences is depend on their syntax.

HMM (Hidden Markov Model)– operate high level abstraction by postulating additional “hidden”

structures.

Page 4: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

4

Markov Models

Markov assumption– Future elements of the sequence independent of past ele

ments, given the present element.

Limited Horizon– Xt = sequence of random variables

– Sk = state space

Time invariant (stationary)

Page 5: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

5

Markov Models(Cont’)

Notation– stochastic transition

– probability of different initial state

Application : Linear sequence of events– modeling valid phone sequences in speech recognition

– sequences of speech acts in dialog systems

Page 6: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

6

Markov Chain

circle : state and state name arrows connecting states : possible transition arc label : probability of each transition

Page 7: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

7

Visible Markov Model

We know what states the machine is passing through.

mth order Markov model– n 3, n-gram violate Limited Horizen condition– reformulate any n-gram model as a visible Markov model by simpl

y encoding (n-1)-gram

Page 8: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

8

Hidden Markov Model

We don’t know the state sequence that the model passes through, only some probabilistic function of it

Example 1 : The crazy soft drink machine– two state : cola preferring(CP), iced tea preferring(IP)

– VMM : machine always put out a cola in CP

– HMM : emission probability

– Output probability given From state

Page 9: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

9

Crazy soft drink machine

Problem– What is the probability of seeing the output sequence {lem, ice-t} i

f the machine always start off in the cola preferring state?

Page 10: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

10

Crazy soft drink machine(Cont’)

Page 11: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

11

Why use HMMs?

underlying events probabilistically generate surface events– the words in a text parts of speech

Linear interpolation of n-gram

Hidden state– the choice of whether to use the unigram, bigram, or trigram proba

bilities.

Two Keys– This is conversion works by adding epsilon transitions.

– Separate parameters iab don’t adjust them separately.

Page 12: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

12

Page 13: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

13

NotationA

B

AAA

BB

SSS

KKK

S

K

S

K

Page 14: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

14

General form of an HMM

Arc-emission HMM– the symbol emits at time t depends on both the state at time t and

at time(t+1). State-emission HMM : ex) crazy drink machine

– the symbol emits at time t depends just on the state at time t

Figure 9.4 A program for a Markov process.

Page 15: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

15

The Three Fundamental Questions for HMMS

Page 16: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

16

Finding the probability of an observation

Page 17: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

17

The forward procedure

Cheap algorithm required only 2N2T multiplication

Page 18: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

18

The backward procedure

The total probability of seeing the rest of the observation sequence.

Use of a combination of forward and backward probabilities is vital for solving the third problem of parameter reestimation.

Backward variables

Combining forward & backward

Page 19: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

19

Finding the best state sequence

State sequence that explains the observations is more than one way.

Find Xt that maximizes P(Xt|O, )

This may yield a quite unlikely state sequence.

Viterbi algorithm is more efficient.

Page 20: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

20

Viterbi algorithem

The most likely complete path

This is sufficient to maximize for a fixed O

Definition

Page 21: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

21

Variable calculations for O = (lem, ice_t, cola)

Page 22: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

22

Parameter estimation

Given a certain observation sequence Find the values of the model parameter

= (A, B, ) Using Maximum Likelihood Estimation

Locally maximize by an iterative hill-climbing algorithm usually effective for HMM

Page 23: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

23

Parameter estimation (Cont’)

Page 24: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

24

Parameter estimation (Cont’)

Page 25: Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕

25

Implementation, Properties, Variants

Implementation– Obvious issue : keeping on multiplying very small

numbers Use Log function

Variants– It is not impossible to estimate many number

parameter.

Multiple input observations Initialization of parameter values

– Try to approach near global maximum