Upload
lehuong
View
231
Download
8
Embed Size (px)
Citation preview
Hybrid Naive Bayes HMM
Emotional Classification from Text Data
by Hybrid Naive Bayes HMM
2002 2
.
,
.
.
.
.
Naive Bayes
HMM
naive Bayes hybrid naive Bayes HMM
.
hybrid naive Bayes
,
.
4 1013
hybrid 93.7% .
: , HMM, naive Bayes
: 2000-21205
1 1
2 5
3 Naive Bayes 7
4 Hidden Markov Models 10
1 Forward algorithm, Backward algorithm
2 Viterbi algorithm
3 Baum-Welch algorithm
5 17
1 TexMo1: naive Bayes
2 TexMo2: Hybrid naive Bayes HMM
6 26
1
2
7 34
1
1
20
.
.
.
.
.
(e-mail)
(instance messenger), (chatting) .
(character)
.
.
.
,
2
.
.
(Emoticon) : (Emoticon)
(Emotion) (Icon)
(Sign) [ 1] .
[ 1]
^^
: (
: ) -.-;
^_^ :-P
^o^ -o-
^^;
T_T
^^a :'(
70 (Smiley) (Unix)
,
,
.
.
:
.
.
3
:
(GUI)
. [ 1]
.
,
.
.
.
[ 1] MSN
.
naive Bayes Hidden Markov
4
Model(HMM). Naive Bayes
[21].
HMM naive Bayes hybrid naive Bayes
HMM .
. 2
, 3 4 naive Bayes
HMM 5
. 6
7
.
5
2
(text categorization)
. . (categorization)
,
.
. (regression models) Yang[5] , Cohen[8]
(inductive logic programming), Lewis[10]
(decision tree), Goh[20] (neural networks), Joachims[12]
(support vector machine), [9, 16, 21]
.
Hidden Markov
Models(HMM) (probabilistic models) .
HMM
HMM
(transition probability) (spectrum)
. Rabiner[1]
Baum-Welch ,
Viterbi
.
6
(Part-of-Speech Tagging) HMM
.
, HMM
,
.
HMM (supervised learning)
(unsupervised learning) .
Viterbi
, Baum-Welch
.
HMM Kupiec[17] .
HMM Frasconi[4] .
Frasconi HMM (observation probability)
naive Bayes hybrid .
OCR
.
, , , . HMM
,
naive Bayes
. HMM labeled
data unlabeled data [3] [11]
EM .
Viterbi .
[6, 7] HMM (information extraction)
, [6, 7]
.
7
3
Naive Bayes
Naive Bayes
[21].
S
naive Bayes ,
(tagging)
.
(labeling) .
Bayesian
,
, qMAP .
qMAP=argmaxq jS
P(q j|a 1,a 2an)
Bayes .
(2.1) .
(2.1)
qMAP =argmaxq jS
P(a 1,a 2an|q j)P(q j)
P(a 1,a 2an)
=argmaxq jS
P(a 1,a 2an|q j)P(q j)
8
P(q j) q j
,
P(a 1,a 2an|q j)
.
Naive Bayes
(mutually independent) .
. n
q j
q j .
P(a 1,a 2an|q j)=iP(a i|q j) (2.2)
Bayesian [ 2] .
[ 2] Bayesian network
naive Bayes
1a 2a
jq
na1a 2a
jq
na
naive Bayes
.
(2.2) (2.1) naive Bayes
.
9
Naive Bayes classifier:
qNB naive Bayes . P(a i|q j)
P(a 1,a 2an|q j)
.
, naive Bayes P(q j)
P(a i|q j) , (2.3)
q j
.
naive Bayes qNB (2.1) MAP .
Naive Bayes
.
.
Naive Bayes
. TREC-8
[24].
(2.3)qNB=argmaxq jS
P(q j)iP(a i|q j)
10
4
Hidden Markov Models
HMM (hidden)
.
,
(stochastic process)
. HMM
(speech recognition)
(part-of-speech tagging)
.
[ 3] Bayesian network HMM
tS 1+tS1tS
tv 1+tv1tv
tS 1+tS1tS
tv 1+tv1tv
HMM Bayesian [13, 14, 18]
HMM [ 3]
. HMM
[1].
11
(state) N :
(hidden)
.
S={S 1,S 2,,SN} t q t
.
(observation symbol) M : (state)
(symbol) .
V={v 1,v 2,,vM} .
(state transition probability) A={a ij} : i
j .
a ij= P[q t+ 1=S j |q t= S i], 1i, jN.
a ij0, i, j
N
j= 1a ij=1, i
i j a ij=0
. HMM topology .
.
(observation symbol probability) B={b j(k) } : j
k .
b j(k)= P [v k at t |q t= S j], 1jN
1kM.
12
(initial state distribution) ={ i} : t=1
.
i=P[q 1=Si], 1iN.
HMM (N, M)
, A, B, .
=(A,B,) .
HMM
[23].
1) Evaluation problem
O=O 1O 2OT = (A,B,)
O P(O|) .
2) Decoding problem
O=O 1O 2OT = (A,B,)
Q=q 1q 2qT .
3) Estimation Problem
P(O|) = (A,B,) .
forward , Viterbi
, Baum-Welch [1].
13
4.1 Forward algorithm, Backward algorithm
Forward (sequence) O=O 1O 2OT
=(A,B,) O P(O|)
.
t(i) .
t(i)=P(O 1O 2Ot,q t=Si |)
t O 1O 2Ot
Si . t(i)
.
[ 2] Forward likelihood
1) Initialization :
1(i)= ib i(O 1 ), 1iN.
2) Induction :
t+1(j)= [ N
i=1 t(i)a ij]b j(Ot+1), 1tT-1
1jN.
3) Termination :
P(O |)= N
i=1T(i).
Backward Baum-Welch
14
Forward .
t(i)
t(i)=P(Ot+1Ot+2OT|q t=Si,)
Backward .
[ 3] Backward
1) Initialization :
T(i)=1, 1iN.
2) Induction :
t(j)= N
j=1aijb j(Ot+1) t+1(j),
t=T-1,T-2,,1, 1iN.
15
4.2 Viterbi algorithm
Viterbi O=O 1O 2OT
=(A,B,) Q=q 1q 2qT
, Decoding problem .
t(i)
t(i)=max
q 1,q 2,,q t-1P[q 1q 2q t= i,O 1O 2Ot |]
t(j) t j t-1
.
[ 4] Viterbi
1) Initialization :
1(i)= ib i(O 1 ), 1iN
1(i)= 0.
2) Recursion :
t(j)=
max1iN
[ t-1 (i)a ij]b j(Ot), 2tT
1jN
t(j)=
arg max1iN
[ t-1 (i)aij], 2tT
1jN.
3) Termination :
P*=
max1iN
[T (i) ]
q*t=
arg max1iN
[T (i)].
4) backtracking :
q*t= t+1(q
*t ),t=T-1,T-2,,1.
16
4.3 Baum-Welch algorithm
Baum-Welch HMM = (A,B,)
. t(i,j) t(i)
.
t(i, j) = t(i)a ijb j(Ot+1) t+1(j)
P(O|)
= t(i)a ijb j(Ot+1) t+1(j)
N
i=1N
j=1 t(i)a ijb j(Ot+1) t+1(j)
t(i) = N
i=1 t(i, j).
t(i, j) t t+1 Si Sj
t(i) t Si .
P(O ( k)|) Pk O ( k) k
.
(4.1)
i =expected frequency (number of times) in state Si at time (t=1)=1(i)
aij =expected number of transitions from state Si to state Sj
expected number of transitions from state Si
=T-1
t=1t(i,j)
T-1
t=1t(i)
bj(k) =expected number of times in state j and observing symbol vk
expected number of times in state j
=
T
t=1s.t.Ot=vl
t(j)
T
t=1t(j)
17
5
TexMo
. TexMo naive Bayes
TexMo1 hybrid naive Bayes HMM TexMo2
.
.
5.1 TexMo1 : naive Bayes
TexMo1 [ 4] .
[ 4] TexMo
P(vj)
P(ai|vj)
P(vj)
P(ai|vj)
18
Naive Bayes
.
. TexMo1 [ 5]
.
[ 5] TexMo1
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19. //
20. /
21.
22.
23.
24.
25.
, , 2,
, ,
^_^ ^^ ^ ^ ^.^ ,
^o^ :-D , ,
, , ,
, ~
^^; ^^a ,
, , !
T.T . T_T
,
:-o , !, !
, ,
~, , ,
,
, ,
, , !
:-p , ?
?, ?, ?,
, , , ,
, , , ,
,
, ~
--; -.-; -_-; -.-a --a
19
[ 5] 3 7
. 18
.
9 23
.
. [ 6] .
[ 6]
@1
: 1
, . ^^ @1
: 19
: 1
? @18
:
: 18
? : 18
? @18
? : 18
.. .. @
:
:
..... . @
:
:
:
~~ .. @19
: 19
:
.. .. ^^ @3
: 3
:
. @
:
:
? @18
:
:
? : 18
.. @
:
.. ...
? @18
:
:
: 12
:
:
? : 18
. @19
: 19
20
, @
: . @
, :
. .
,
.
(2.3) naive Bayes
. @
(prior probability) P(v j) :
P(ai|v j) . (2.3)
naive Bayes
. Naive Bayes [ 5] 1 24
.
h 25 .
. ,
... , ,
,
.
.
(1) :
.
21
(2) : ^^, -.-;, :-) .
,
.
(3) ASCII : *, #, @, ^
ASCII .
(4) : (postprocessing)
. TexMo1
.
4
.
.
(5) / :
18 ,
11 .
naive Bayes
HMM .
Naive Bayes
.
5 ?
19 ! 18
.
, naive Bayes .
HMM .
22
5.2 TexMo2 : Hybrid naive Bayes HMM
HMM
(modeling). HMM
. HMM naive Bayes
. HMM naive
Bayes TexMo2
.
5.2.1
HMM . TexMo2
[ 7] .
[ 7] TexMo2
0.
1.
2.
3.
4.
5.
6.
, , , ,
, ,
,
, ,
,
,
.
[ 5] TexMo1
, TexMo1
.
23
HMM , ,
. ,
, HMM
. (4.1) j t
(observation symbol probability) b j(t)
naive Bayes .
b j(t)=P(v t|q j)=|v t|
i=1P(wit|q j)
v t t wit v t .
P(wit|q j) TexMo1 naive
Bayes .
Bayesian [ 5]
.
[ 5] Hybrid naive Bayes HMM Bayesian network
11tw
1tq
2 1tw 1 1+tw
1+tq
2 1+tw
tq
1tw
2tw || tvtw
11tw
1tq
2 1tw 1 1+tw
1+tq
2 1+tw
tq
1tw
2tw || tvtw
24
HMM
(state transition probability) a ij. (observation symbol
probability) b j(l) . [4] EM
b j(l)
, TexMo2 TexMo1
.
5.2.2.
[ 6] hybrid naive Bayes HMM
TexMo2 .
, naive Bayes
(observation symbol probability)
b j(k) . hybrid naive Bayes HMM
. b j(k) HMM
=(A,B,) A .
.
(single observation) HMM .
(multiple observation) HMM
,
.
Viterbi
.
N .
N . N
25
naive Bayes . N
N
.
.
[ 6] hybrid naive Bayes HMM TexMo2
tagged data
hybrid nave BayesHMM learner
)|( jk qwP
nave Bayes learner ija
test data
result(emotion code)
preprocessing
tagging
)( jqP )(kbj=Viterbi method
chatting data
tagged data
hybrid nave BayesHMM learner
)|( jk qwP
nave Bayes learner ija
test data
result(emotion code)
preprocessing
tagging
)( jqP )(kbj=Viterbi method
chatting data
26
6
6.1.
. 4
4
.
. [ 8] . TexMo1
TexMo2
. [
8] TexMo1 18
.
,
.
, ,
.
80%
TexMo2 3
.
.
27
[ 8] TexMo1 TexMo2
TexMo1 (naive Bayes) TexMo2 (hybrid naive Bayes HMM)
1. 310 0. 3072
2. 234 1. 1162
3. 110 2. 513
4. 236 3. 969
5. 191 4. 453
6. 21 5. 288
7. 5 6. 134
8. 253
9. 64
10. 35
11. 89
12. 129
13. 72
14. 28
15. 80
16. 31
17. 33
18. 5164
19. // 442
20. / 226
21. 33
22. 29
23. 156
24. 6
25. 0
.
. naive Bayes
HMM ,
(noise) .
28
6.2
.
.
.
1013 420 .
TexMo1 TexMo2
.
[ 9] TexMo2 hybrid naive Bayes HMM
(state transition probability) .
.
[ 9] Hybrid naive Bayes HMM t-1
t state transition probability :
.
tt-1
0.76 0.02 0.0 0.15 0.04 0.01 0.01
0.47 0.29 0.0 0.13 0.07 0.01 0.02
0.48 0.03 0.23 0.17 0.03 0.02 0.03
0.35 0.03 0.0 0.14 0.25 0.22 0.01
0.45 0.25 0.0 0.15 0.12 0.01 0.01
0.45 0.04 0.22 0.17 0.06 0.04 0.02
0.33 0.15 0.1 0.13 0.03 0.0 0.25
29
[ 10] 4
(accuracy).
, ,
.
hybrid naive Bayes HMM
TexMo2 .
.
TexMo1
, TexMo1
.
.
.
[ 10] TexMo1 TexMo2
TexMo1 (naive Bayes)
84.6 72.3
, 91.6 74.9
TexMo2 (hybrid)
86.9 76.0
, 93.7 78.1
[ 7] [ 8] TexMo1 TexMo2
. 2
.
.
.
30
. TexMo2 HMM
(observation symbol probability) naive Bayes
.
[ 7] naive Bayes TexMo1
(accuracy)
CP:
CA: ,
SP:
SA: ,
0
10
20
30
40
50
60
70
80
90
100
1 5 10 15 20 25 30 35 40
(:1000)
accuracy
CP
CA
SP
SA
31
[ 8] hybrid naive Bayes HMM TexMo2
0
10
20
30
40
50
60
70
80
90
100
1 5 10 15 20 25 30 35 40
(:1000)
accuracy CP
CA
SP
SA
32
[ 9]
. TexMo2
.
[ 9] TexMo1 TexMo2 CP
0
10
20
30
40
50
60
70
80
90
100
1 5 10 15 20 25 30 35 40
(:1000)
accuracy
TexMo1
TexMo2
33
HMM TexMo2
N . [ 10] N
N . 4
,
.
[ 10] HMM
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
accuracy
34
7
. naive
Bayes
. TexMo1
. TexMo1
. HMM hybrid
naive Bayes HMM
TexMo2 TexMo1
.
.
.
.
,
.
35
[1] L. R. Rabiner. A tutorial on hidden Markov models and selected
applications in speech recognition. Proceedings of the IEEE,
77(2):257-286, 1989.
[2] A. Stolcke and S. Omohundro. Hidden Markov Model induction by
bayesian model merging. In S. J. Hanson, J. D. Cowan, and C. L.
Giles, editors, Advances in Neural Information Processing Systems,
volume 5, pages 11-18, Morgan Kaufmann, San Mateo, CA, 1993.
[3] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification
from labeled and unlabeled documents using EM. Machine Learning,
39(2/3):103-134, 2000.
[4] P. Frasconi, G. Soda, A. Vullo. Text categorization for multi-page
documents: A hybrid naive Bayes HMM approach. JCDL, pages 11-20,
2001
[5] Y. Yang and C. Chute. An example-based mapping method for text
classification and retrieval. ACM Transactions on Information Systems,
12(3):252-277, 1994
[6] D. Freitag and A. McCallum. Information extraction with HMM
structures learned by stochastic optimization. In Proc. 12th AAAI
Conference, Austine, TX, 2000.
[7] A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the
construction of internet portals with machine learning. Information
36
Retrieval Journal, 3:127-163, 2000.
[8] W. W. Cohen. Text categorization and relational learning. In
Proceedings of the Twelfth International Conference on Machine
Learning, Lake Tahoe, California, 1995.
[9] D. Lewis and W. Gale. A sequential algorithm for training text
classirfiers. In SIGIR-94, 1994.
[10] D. Lewis and M. Ringuette. Comparison of two learning algorithms for
text categorization. In Proc. 3rd Annual Symposium on Document
Analysis and Information Retrieval, 1994.
[11] T. Joachims. Transductive inference for text classification using
support vector machines. In Int. conf. on Machine Learning, 1999
[12] T. Joachims. Text categorization with support vector machines:
Learning with many relevant features. In Proceedings of the European
Conference on Machine Learning. Springer, 1998.
[13] Y. Bengio and P. Frasconi. An input output HMM architecture. In G.
Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural
Information Processing Systems 7, pages 427-434. MIT Press, 1995.
[14] H. Lucke. Bayesian belief networks as a tool for stochastic parsing.
Speech Communication, 16:89-118, 1995.
[15] J. Pearl. Probabilistic Reasoning in Intellegent Systems: Networks of
Plausible Inference. Morgan Kaufmann, 1988.
[16] D. Koller and M. Sahami. Hierarchically classifying documents using
very few words. In Proc. Fourteenth Int. Conf. on Machine Learning,
1997.
[17] J. Kupiec. Robust Part-of-speech tagging using a hidden Markov
37
model. In Computer Speech and Language 6, 1992.
[18] P. Smyth, D. Heckerman, and M. I. Jordan. Probabilistic independence
networks for hidden Markov probability models. Neural Computation,
9(2):227-269, 1997.
[19] F. Jensen. An Introduction to Bayesian Networks. Springer Verlag,
New York, 1996.
[20] H. Ng, W. Goh, and K. Low. Feature selection, perceptron learning,
and a usability case study for text categorization. In Proc. of the 20th
Int. ACM SIGIR Conference on Research and Development in
Information Retrieval, pages 67-73, 1997.
[21] T. Mitchell. Machine Learning. McGraw-Hill, 1997.
[22] T. Kalt. A new probabilistic model of text classification and retrieval.
CIIR TR98-18, University of Massachusetts, 1996.
[23] T. Starner. Visual Recognition of American Sign Language Using
Hidden Markov Models. Master's thesis, MIT, Media Laboratory, 1995.
[24] . Naive Bayes Boosting . ,
, 2001.
Abstract
Recently, a huge amount of documents are scattered in the Web.
Therefore, automatic classification techniques of text documents help us
to organize large 0text documents efficiently. In addition, with a
development of graphical user interfaces, there are numerous efforts to
satisfy users by injecting those features into chatting or messenger
system. Especially, several researchers tried to extract emotional states
from text documents.
In this paper, we present an emotional classification system which aims to
analyze on-line text chatting document and determine the emotional state
of the speaker in each sentence based on statistical learning method.
However, even a single sentence in a chatting dialog may contain many
misspelled words, jargons, and slang. Therefore, it is extremely hard for
us to analyze dialogs through natural language processing techniques.
Instead, we try to overcome this problem by transforming each word
appeared in the given documents into the corresponding, more abstract,
symbol from which statistical data are constructed. With these
symbol-based representations, we determine the emotional state of each
sentence of documents. Our previous approach uses naive Bayes algorithm
which tries to finds emotional states using only the current sentence. The
second system, proposed in this paper, is a hybridization of naive Bayes
and HMM method which determine emotional states from not only the
current sentence but also the immediately preceding ones.
In the experiments, the hybrid method showed a remarkable performance
compared with the pure naive Bayes method. In addition, we are able to
improve the performance further by attaching a preprocessor which
automatically extracts emoticons, onomatopoeic words, question marks,
etc which is widely believed to be valuable resources identifying the
emotional state of a sentence. Trained with 40,000 sentences, we obtained
93.7% accuracy in the final hybrid system with attached preprocessor for
1013 test sentences.
1 2 3 Naive Bayes 4 Hidden Markov Models1 Forward algorithm, Backward algorithm2 Viterbi algorithm3 Baum-Welch algorithm
5 1 TexMo1: naive Bayes 2 TexMo2: Hybrid naive Bayes HMM
6 1 2
7 Abstract