Upload
allen-perez
View
44
Download
1
Embed Size (px)
DESCRIPTION
SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models. Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) - PowerPoint PPT Presentation
Citation preview
SPIRAL: Efficient and Exact Model Identification for Hidden Markov Models
Yasuhiro Fujiwara (NTT Cyber Space Labs) Yasushi Sakurai (NTT Communication Science Labs) Masashi Yamamuro (NTT Cyber Space Labs)
Speaker: Yasushi Sakurai
1
Motivation
• HMM(Hidden Markov Model)– Mental task classification
• Understand human brain functions with EEG signals
– Biological analysis• Predict organisms functions with DNA sequences
– Many other applications• Speech recognition, image processing, etc
• Goal– Fast and exact identification of the highest-likelihood
model for large datasets
2
Mini-introduction to HMM
• Observation sequence is a probabilistic function of states
• Consists of the three sets of parameters:– Initial state probability :
• State at time
– State transition probability:• Transition from state to
– Symbol probability:• Output symbol in state
3
mii 1iu 1t
mjiaa ij ,1
iu ju
mivbvb i 1
v iu
nxxxX ,,, 21
Mini-introduction to HMM
• HMM types– Ergodic HMM
• Every state can be reached from every other state
– Left-right HMM• Transitions to lower number states are prohibited
• Always begin with the first state
• Transition are limited to a small number of states
4
Ergodic HMM Left-right HMM
Mini-introduction to HMM
5
• Viterbi path in the trellis structure– Trellis structure: states lie on the vertical axis, the
sequence is aligned along the horizontal axis
– Viterbi path: state sequence which gives the likelihood
Viterbi path
Trellis structure
1u
2u
mu
1x2x nx
・・
・・
Mini-introduction to HMM
• Viterbi algorithm– Dynamic programming approach
– Maximize the probabilities from the previous states
6
1
2max
max
1
11
1
txb
ntxbapp
pP
ii
tijitjmj
it
inmi
: the maximum probability of state at time itp iu t
Problem Definition
• Given– HMM dataset
– Sequence of arbitrary length
• Find– Highest-likelihood model, estimated with respect to X,
from the dataset
7
nxxxX ,,, 21
Why not ‘Naive’
• Naïve solution1. Compute the likelihood for every model using the Viterbi
algorithm
2. Then choose the highest-likelihood model
But..– High search cost: time for every model
• Prohibitive for large HMM datasets
8
2nmO
m: # of statesn: sequence length of X
Our Solution, SPIRAL
• Requirements:– High-speed search
• Identify the model efficiently
– Exactness• Accuracy is not sacrificed
– No restriction on model type• Achieve high search performance for any type of models
9
Likelihood Approximation
10
Reminder: Naive
Likelihood Approximation
• Create compact models (reduce the model size)– For given m states and granularity g,
– Create m/g states by merging ‘similar’ states
11
gm
gm
n
Likelihood Approximation
• Use the vector Fi of state ui for clustering
• Merge all the states ui in a cluster C and create a new state uC
• Choose the highest probability among the probabilities of ui
12
siimiiimiii vbvbaaaaF ,,;,,,,,; 111 s: number of symbols
vbvbaaaa
aa
iCu
CjiCuCu
jCikCuu
CC
ijCuCu
CjiCu
C
ijiki
jii
maxmaxmax
maxmax
,,
,
Obtain the upper bounding likelihood
Likelihood Approximation
• Compute approximate likelihood from the compact model
• Upper bounding likelihood– For approximate likelihood , holds
– Exploit this property to guarantee exactness in search processing
13
P
1
2max
max
1
11
1
txb
ntxbapp
pP
ii
tijitjmj
it
inmi
: maximum probability of states
itp
P PP '
Likelihood Approximation
Advantages
• The best model can not be pruned– The approximation gives the upper bounding
likelihood of the original model
• Support any model type– Any probabilistic constraint is not applied to the
approximation
14
Multi-granularities
• The likelihood approximation has the trade-off between accuracy and computation time– As the model size increases, accuracy improves
– But the likelihood computation cost increases
• Q: How to choose granularity ?
15
g
Multi-granularities
• The likelihood approximation has the trade-off between accuracy and comparison speed– As the model size increases, accuracy improves
– But the likelihood computation cost increases
• Q: How to choose granularity ?
• A: Use multiple granularities– distinct granularities that form a
geometric progression gi =2i (i=0,1,2,…,h)
– Geometrically increase the model size
16
g
mhh 2log1
Multi-granularities
• Compute the approximate likelihood from the coarsest model as the first step– Coarsest model has states
• Prune the model if , otherwise
17
P
P 12 hm
: threshold
Multi-granularities
• Compute the approximate likelihood from the second coarsest model– Second coarsest model has states
• Prune the model if
18
12 hm
P
P
Multi-granularities
• Threshold– Exploit the fact that we have found a good model of
high likelihood • : exact likelihood of the best-so-far candidate during
search processing
– is updated and increases when promising model is found
– Use for model pruning
19
Multi-granularities
• Compute the approximate likelihood from the second coarsest model– Second coarsest model has states
• Prune the model if , otherwise– : exact likelihood of the best-so-far candidate
20
12 hm
P
P
Multi-granularities
• Compute the likelihood from more accurate model
• Prune the model if
21
P
P
Multi-granularities
• Repeat until the finest granularity (the original model)
• Update the answer candidate and best-so-far likelihood if
22
P
Multi-granularities
• Optimize the trade-off between accuracy and computation time– Low-likelihood models are pruned by coarse-grained
models
– Fine-grained approximation is applied only to high-likelihood models
• Efficiently find the best model for a large dataset– The exact likelihood computations are limited to the
minimum number of necessary
23
Transition Pruning
• Trellis structure has too many transitions
• Q: How to exclude unlikely paths
24
Transition Pruning
• Trellis structure has too many transitions
• Q: How to exclude unlikely paths
• A: Use the two properties – Likelihood is monotone non-increasing (likelihood computation)
– Threshold is monotone non-decreasing (search processing)
25
Transition Pruning
• In likelihood computation, compute the estimate
– eit : conservative estimate of the likelihood pit of state ui at time t
• If , prune all paths that pass through ui at t
– : exact likelihood of the best-so-far candidate
26
ite
ntp
ntxbape
in
n
tjj
tnit
it
111
maxmax
vbvbaa imi
ijmji
1
max,1
max max,maxwhere
ite
Transition Pruning
• Terminate the likelihood computation
if all the paths are excluded
• Efficient especially for long sequences
• Applicable to approximate likelihood computation
27
Accuracy and Complexity
28
• SPIRAL needs the same order of memory space, while can be up to times faster2m
AccuracyComplexity
Memory Space Computation time
Viterbi
Guarantee exactness
SPIRAL At least
At most
msmO 2
2nmO
2nmO
nO
Experimental Evaluation
• Setup– Intel Core 2 1.66GHz, 2GB memory
• Datasets– EEG, Chromosome, Traffic
• Evaluation– Mainly computation time– Ergodic HMM
– Compared the Viterbi algorithm and Beam search• Beam search: popular technique, but does not guarantee
exactness
29
Experimental Evaluation
• Evaluation– Wall clock time versus number of states
– Wall clock time versus number of models
– Effect of likelihood approximation
– Effect of transition pruning
– SPIRAL vs Beam search
30
Experimental Evaluation
• Wall clock time versus number of states– EEG: up to 200 times faster
31
Experimental Evaluation
• Wall clock time versus number of states– Chromosome: up to 150 times faster
32
Experimental Evaluation
• Wall clock time versus number of states– Traffic: up to 500 times faster
33
Experimental Evaluation
• Evaluation– Wall clock time versus number of states
– Wall clock time versus number of models
– Effect of likelihood approximation
– Effect of transition pruning
– SPIRAL vs Beam search
34
Experimental Evaluation
• Wall clock time versus number of models– EEG: up to 200 times faster
35
Experimental Evaluation
• Evaluation– Wall clock time versus number of states
– Wall clock time versus number of models
– Effect of likelihood approximation
– Effect of transition pruning
– SPIRAL vs Beam search
36
Experimental Evaluation
• Effect of likelihood approximation– Most of models are pruned by coarser approximations
37
Experimental Evaluation
• Evaluation– Wall clock time versus number of states
– Wall clock time versus number of models
– Effect of likelihood approximation
– Effect of transition pruning
– SPIRAL vs Beam search
38
Experimental Evaluation
• Effect of transition pruning– SPIRAL find the highest-likelihood model more
efficiently by transition pruning
39
Experimental Evaluation
• Evaluation– Wall clock time versus number of states
– Wall clock time versus number of models
– Effect of likelihood approximation
– Effect of transition pruning
– SPIRAL vs Beam search
40
Experimental Evaluation
• SPIRAL vs Beam search– SPIRAL is significantly faster while it guarantees
exactness
41
Wall clock timeSPIRAL is up to 27 times faster
Likelihood error ratioNote: SPIRAL gives no error
Conclusion
42
• Design goals:– High-speed search
• SPIRAL is significantly (up to 500 times) faster
– Exactness• We prove that it guarantees exactness
– No restriction on model type• It can handle any HMM model type
• SPIRAL achieves all the goals