SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models

SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models

Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs)

Speaker: Yasushi Sakurai

1

Motivation

• HMM(Hidden Markov Model)– Mental task classification

• Understand human brain functions with EEG signals

– Biological analysis• Predict organisms functions with DNA sequences

– Many other applications• Speech recognition, image processing, etc

• Goal– Fast and exact identification of the highest-likelihood

model for large datasets

2

Mini-introduction to HMM

• Observation sequence is a probabilistic function of states

• Consists of the three sets of parameters:– Initial state probability :

• State at time

– State transition probability:• Transition from state to

– Symbol probability:• Output symbol in state

3

mii 1iu 1t

mjiaa ij ,1

iu ju

mivbvb i 1

v iu

nxxxX ,,, 21


• HMM types– Ergodic HMM

• Every state can be reached from every other state

– Left-right HMM• Transitions to lower number states are prohibited

• Always begin with the first state

• Transition are limited to a small number of states

4

Ergodic HMM Left-right HMM


5

• Viterbi path in the trellis structure– Trellis structure: states lie on the vertical axis, the

sequence is aligned along the horizontal axis

– Viterbi path: state sequence which gives the likelihood

Viterbi path

Trellis structure

1u

2u

mu

1x2x nx

・・

・・


• Viterbi algorithm– Dynamic programming approach

– Maximize the probabilities from the previous states

6

1

2max

max

1

11

1

txb

ntxbapp

pP

ii

tijitjmj

it

inmi

: the maximum probability of state at time itp iu t

Problem Definition

• Given– HMM dataset

– Sequence of arbitrary length

• Find– Highest-likelihood model, estimated with respect to X,

from the dataset

7

nxxxX ,,, 21

Why not ‘Naive’

• Naïve solution1. Compute the likelihood for every model using the Viterbi

algorithm

2. Then choose the highest-likelihood model

But..– High search cost: time for every model

• Prohibitive for large HMM datasets

8

2nmO

m: # of statesn: sequence length of X

Our Solution, SPIRAL

• Requirements:– High-speed search

• Identify the model efficiently

– Exactness• Accuracy is not sacrificed

– No restriction on model type• Achieve high search performance for any type of models

9

Likelihood Approximation

10

Reminder: Naive


• Create compact models (reduce the model size)– For given m states and granularity g,

– Create m/g states by merging ‘similar’ states

11

gm

gm

n


• Use the vector Fi of state ui for clustering

　

• Merge all the states ui in a cluster C and create a new state uC

• Choose the highest probability among the probabilities of ui

12

siimiiimiii vbvbaaaaF ,,;,,,,,; 111 s: number of symbols

vbvbaaaa

aa

iCu

CjiCuCu

jCikCuu

CC

ijCuCu

CjiCu

C

ijiki

jii

maxmaxmax

maxmax

,,

,

Obtain the upper bounding likelihood


• Compute approximate likelihood from the compact model

• Upper bounding likelihood– For approximate likelihood , holds

– Exploit this property to guarantee exactness in search processing

13

P

1

2max

max

1

11

1

txb

ntxbapp

pP

ii

tijitjmj

it

inmi

: maximum probability of states

itp

P PP '


Advantages

• The best model can not be pruned– The approximation gives the upper bounding

likelihood of the original model

• Support any model type– Any probabilistic constraint is not applied to the

approximation

14

Multi-granularities

• The likelihood approximation has the trade-off between accuracy and computation time– As the model size increases, accuracy improves

– But the likelihood computation cost increases

• Q: How to choose granularity ?

15

g

Multi-granularities

• The likelihood approximation has the trade-off between accuracy and comparison speed– As the model size increases, accuracy improves

– But the likelihood computation cost increases

• Q: How to choose granularity ?

• A: Use multiple granularities– distinct granularities that form a

geometric progression gi =2i (i=0,1,2,…,h)

– Geometrically increase the model size

16

g

mhh 2log1

Multi-granularities

• Compute the approximate likelihood from the coarsest model as the first step– Coarsest model has states

• Prune the model if , otherwise

17

P

P 12 hm

: threshold

Multi-granularities

• Compute the approximate likelihood from the second coarsest model– Second coarsest model has states

• Prune the model if

18

12 hm

P

P

Multi-granularities

• Threshold– Exploit the fact that we have found a good model of

high likelihood • : exact likelihood of the best-so-far candidate during

search processing

– is updated and increases when promising model is found

– Use for model pruning

19

Multi-granularities

• Compute the approximate likelihood from the second coarsest model– Second coarsest model has states

• Prune the model if , otherwise– : exact likelihood of the best-so-far candidate

20

12 hm

P

P

Multi-granularities

• Compute the likelihood from more accurate model

• Prune the model if

21

P

P

Multi-granularities

• Repeat until the finest granularity (the original model)

• Update the answer candidate and best-so-far likelihood if

22

P

Multi-granularities

• Optimize the trade-off between accuracy and computation time– Low-likelihood models are pruned by coarse-grained

models

– Fine-grained approximation is applied only to high-likelihood models

• Efficiently find the best model for a large dataset– The exact likelihood computations are limited to the

minimum number of necessary

23

Transition Pruning

• Trellis structure has too many transitions

• Q: How to exclude unlikely paths

24

Transition Pruning

• Trellis structure has too many transitions

• Q: How to exclude unlikely paths

• A: Use the two properties – Likelihood is monotone non-increasing (likelihood computation)

– Threshold is monotone non-decreasing (search processing)

25

Transition Pruning

• In likelihood computation, compute the estimate

– eit : conservative estimate of the likelihood pit of state ui at time t

• If , prune all paths that pass through ui at t

– : exact likelihood of the best-so-far candidate

26

ite

ntp

ntxbape

in

n

tjj

tnit

it

111

maxmax

vbvbaa imi

ijmji

1

max,1

max max,maxwhere

ite

Transition Pruning

• Terminate the likelihood computation

if all the paths are excluded

• Efficient especially for long sequences

• Applicable to approximate likelihood computation

27

Accuracy and Complexity

28

• SPIRAL needs the same order of memory space, while can be up to times faster2m

AccuracyComplexity

Memory Space Computation time

Viterbi

Guarantee exactness

SPIRAL At least

At most

msmO 2

2nmO

2nmO

nO

Experimental Evaluation

• Setup– Intel Core 2 1.66GHz, 2GB memory

• Datasets– EEG, Chromosome, Traffic

• Evaluation– Mainly computation time– Ergodic HMM

– Compared the Viterbi algorithm and Beam search• Beam search: popular technique, but does not guarantee

exactness

29


• Evaluation– Wall clock time versus number of states

– Wall clock time versus number of models

– Effect of likelihood approximation

– Effect of transition pruning

– SPIRAL vs Beam search

30


• Wall clock time versus number of states– EEG: up to 200 times faster

31


• Wall clock time versus number of states– Chromosome: up to 150 times faster

32


• Wall clock time versus number of states– Traffic: up to 500 times faster

33







34


• Wall clock time versus number of models– EEG: up to 200 times faster

35







36


• Effect of likelihood approximation– Most of models are pruned by coarser approximations

37







38


• Effect of transition pruning– SPIRAL find the highest-likelihood model more

efficiently by transition pruning

39







40


• SPIRAL vs Beam search– SPIRAL is significantly faster while it guarantees

exactness

41

Wall clock timeSPIRAL is up to 27 times faster

Likelihood error ratioNote: SPIRAL gives no error

Conclusion

42

• Design goals:– High-speed search

• SPIRAL is significantly (up to 500 times) faster

– Exactness• We prove that it guarantees exactness

– No restriction on model type• It can handle any HMM model type

• SPIRAL achieves all the goals

Documents

SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models