Hybrid Naive Bayes HMM 기법을 사용한 텍스트로부터의 감정 분류

Hybrid Naive Bayes HMM

Emotional Classification from Text Data

by Hybrid Naive Bayes HMM

2002 2

.

,

.

.

.

.

Naive Bayes

HMM

naive Bayes hybrid naive Bayes HMM

.

hybrid naive Bayes

,

.

4 1013

hybrid 93.7% .

: , HMM, naive Bayes

: 2000-21205

1 1

2 5

3 Naive Bayes 7

4 Hidden Markov Models 10

1 Forward algorithm, Backward algorithm

2 Viterbi algorithm

3 Baum-Welch algorithm

5 17

1 TexMo1: naive Bayes

2 TexMo2: Hybrid naive Bayes HMM

6 26

1

2

7 34

1

1

20

.

.

.

.

.

(e-mail)

(instance messenger), (chatting) .

(character)

.

.

.

,

2

.

.

(Emoticon) : (Emoticon)

(Emotion) (Icon)

(Sign) [ 1] .

[ 1]

^^

: (

: ) -.-;

^_^ :-P

^o^ -o-

^^;

T_T

^^a :'(

70 (Smiley) (Unix)

,

,

.

.

:

.

.

3

:

(GUI)

. [ 1]

.

,

.

.

.

[ 1] MSN

.

naive Bayes Hidden Markov

4

Model(HMM). Naive Bayes

[21].

HMM naive Bayes hybrid naive Bayes

HMM .

. 2

, 3 4 naive Bayes

HMM 5

. 6

7

.

5

2

(text categorization)

. . (categorization)

,

.

. (regression models) Yang[5] , Cohen[8]

(inductive logic programming), Lewis[10]

(decision tree), Goh[20] (neural networks), Joachims[12]

(support vector machine), [9, 16, 21]

.

Hidden Markov

Models(HMM) (probabilistic models) .

HMM

HMM

(transition probability) (spectrum)

. Rabiner[1]

Baum-Welch ,

Viterbi

.

6

(Part-of-Speech Tagging) HMM

.

, HMM

,

.

HMM (supervised learning)

(unsupervised learning) .

Viterbi

, Baum-Welch

.

HMM Kupiec[17] .

HMM Frasconi[4] .

Frasconi HMM (observation probability)

naive Bayes hybrid .

OCR

.

, , , . HMM

,

naive Bayes

. HMM labeled

data unlabeled data [3] [11]

EM .

Viterbi .

[6, 7] HMM (information extraction)

, [6, 7]

.

7

3

Naive Bayes

Naive Bayes

[21].

S

naive Bayes ,

(tagging)

.

(labeling) .

Bayesian

,

, qMAP .

qMAP=argmaxq jS

P(q j|a 1,a 2an)

Bayes .

(2.1) .

(2.1)

qMAP =argmaxq jS

P(a 1,a 2an|q j)P(q j)

P(a 1,a 2an)

=argmaxq jS

P(a 1,a 2an|q j)P(q j)

8

P(q j) q j

,

P(a 1,a 2an|q j)

.

Naive Bayes

(mutually independent) .

. n

q j

q j .

P(a 1,a 2an|q j)=iP(a i|q j) (2.2)

Bayesian [ 2] .

[ 2] Bayesian network

naive Bayes

1a 2a

jq

na1a 2a

jq

na

naive Bayes

.

(2.2) (2.1) naive Bayes

.

9

Naive Bayes classifier:

qNB naive Bayes . P(a i|q j)

P(a 1,a 2an|q j)

.

, naive Bayes P(q j)

P(a i|q j) , (2.3)

q j

.

naive Bayes qNB (2.1) MAP .

Naive Bayes

.

.

Naive Bayes

. TREC-8

[24].

(2.3)qNB=argmaxq jS

P(q j)iP(a i|q j)

10

4

Hidden Markov Models

HMM (hidden)

.

,

(stochastic process)

. HMM

(speech recognition)

(part-of-speech tagging)

.

[ 3] Bayesian network HMM

tS 1+tS1tS

tv 1+tv1tv

tS 1+tS1tS

tv 1+tv1tv

HMM Bayesian [13, 14, 18]

HMM [ 3]

. HMM

[1].

11

(state) N :

(hidden)

.

S={S 1,S 2,,SN} t q t

.

(observation symbol) M : (state)

(symbol) .

V={v 1,v 2,,vM} .

(state transition probability) A={a ij} : i

j .

a ij= P[q t+ 1=S j |q t= S i], 1i, jN.

a ij0, i, j

N

j= 1a ij=1, i

i j a ij=0

. HMM topology .

.

(observation symbol probability) B={b j(k) } : j

k .

b j(k)= P [v k at t |q t= S j], 1jN

1kM.

12

(initial state distribution) ={ i} : t=1

.

i=P[q 1=Si], 1iN.

HMM (N, M)

, A, B, .

=(A,B,) .

HMM

[23].

1) Evaluation problem

O=O 1O 2OT = (A,B,)

O P(O|) .

2) Decoding problem

O=O 1O 2OT = (A,B,)

Q=q 1q 2qT .

3) Estimation Problem

P(O|) = (A,B,) .

forward , Viterbi

, Baum-Welch [1].

13

4.1 Forward algorithm, Backward algorithm

Forward (sequence) O=O 1O 2OT

=(A,B,) O P(O|)

.

t(i) .

t(i)=P(O 1O 2Ot,q t=Si |)

t O 1O 2Ot

Si . t(i)

.

[ 2] Forward likelihood

1) Initialization :

1(i)= ib i(O 1 ), 1iN.

2) Induction :

t+1(j)= [ N

i=1 t(i)a ij]b j(Ot+1), 1tT-1

1jN.

3) Termination :

P(O |)= N

i=1T(i).

Backward Baum-Welch

14

Forward .

t(i)

t(i)=P(Ot+1Ot+2OT|q t=Si,)

Backward .

[ 3] Backward

1) Initialization :

T(i)=1, 1iN.

2) Induction :

t(j)= N

j=1aijb j(Ot+1) t+1(j),

t=T-1,T-2,,1, 1iN.

15

4.2 Viterbi algorithm

Viterbi O=O 1O 2OT

=(A,B,) Q=q 1q 2qT

, Decoding problem .

t(i)

t(i)=max

q 1,q 2,,q t-1P[q 1q 2q t= i,O 1O 2Ot |]

t(j) t j t-1

.

[ 4] Viterbi

1) Initialization :

1(i)= ib i(O 1 ), 1iN

1(i)= 0.

2) Recursion :

t(j)=

max1iN

[ t-1 (i)a ij]b j(Ot), 2tT

1jN

t(j)=

arg max1iN

[ t-1 (i)aij], 2tT

1jN.

3) Termination :

P*=

max1iN

[T (i) ]

q*t=

arg max1iN

[T (i)].

4) backtracking :

q*t= t+1(q

*t ),t=T-1,T-2,,1.

16

4.3 Baum-Welch algorithm

Baum-Welch HMM = (A,B,)

. t(i,j) t(i)

.

t(i, j) = t(i)a ijb j(Ot+1) t+1(j)

P(O|)

= t(i)a ijb j(Ot+1) t+1(j)

N

i=1N

j=1 t(i)a ijb j(Ot+1) t+1(j)

t(i) = N

i=1 t(i, j).

t(i, j) t t+1 Si Sj

t(i) t Si .

P(O ( k)|) Pk O ( k) k

.

(4.1)

i =expected frequency (number of times) in state Si at time (t=1)=1(i)

aij =expected number of transitions from state Si to state Sj

expected number of transitions from state Si

=T-1

t=1t(i,j)

T-1

t=1t(i)

bj(k) =expected number of times in state j and observing symbol vk

expected number of times in state j

=

T

t=1s.t.Ot=vl

t(j)

T

t=1t(j)

17

5

TexMo

. TexMo naive Bayes

TexMo1 hybrid naive Bayes HMM TexMo2

.

.

5.1 TexMo1 : naive Bayes

TexMo1 [ 4] .

[ 4] TexMo

P(vj)

P(ai|vj)

P(vj)

P(ai|vj)

18

Naive Bayes

.

. TexMo1 [ 5]

.

[ 5] TexMo1

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

11.

12.

13.

14.

15.

16.

17.

18.

19. //

20. /

21.

22.

23.

24.

25.

, , 2,

, ,

^_^ ^^ ^ ^ ^.^ ,

^o^ :-D , ,

, , ,

, ~

^^; ^^a ,

, , !

T.T . T_T

,

:-o , !, !

, ,

~, , ,

,

, ,

, , !

:-p , ?

?, ?, ?,

, , , ,

, , , ,

,

, ~

--; -.-; -_-; -.-a --a

19

[ 5] 3 7

. 18

.

9 23

.

. [ 6] .

[ 6]

@1

: 1

, . ^^ @1

: 19

: 1

? @18

:

: 18

? : 18

? @18

? : 18

.. .. @

:

:

..... . @

:

:

:

~~ .. @19

: 19

:

.. .. ^^ @3

: 3

:

. @

:

:

? @18

:

:

? : 18

.. @

:

.. ...

? @18

:

:

: 12

:

:

? : 18

. @19

: 19

20

, @

: . @

, :

. .

,

.

(2.3) naive Bayes

. @

(prior probability) P(v j) :

P(ai|v j) . (2.3)

naive Bayes

. Naive Bayes [ 5] 1 24

.

h 25 .

. ,

... , ,

,

.

.

(1) :

.

21

(2) : ^^, -.-;, :-) .

,

.

(3) ASCII : *, #, @, ^

ASCII .

(4) : (postprocessing)

. TexMo1

.

4

.

.

(5) / :

18 ,

11 .

naive Bayes

HMM .

Naive Bayes

.

5 ?

19 ! 18

.

, naive Bayes .

HMM .

22

5.2 TexMo2 : Hybrid naive Bayes HMM

HMM

(modeling). HMM

. HMM naive Bayes

. HMM naive

Bayes TexMo2

.

5.2.1

HMM . TexMo2

[ 7] .

[ 7] TexMo2

0.

1.

2.

3.

4.

5.

6.

, , , ,

, ,

,

, ,

,

,

.

[ 5] TexMo1

, TexMo1

.

23

HMM , ,

. ,

, HMM

. (4.1) j t

(observation symbol probability) b j(t)

naive Bayes .

b j(t)=P(v t|q j)=|v t|

i=1P(wit|q j)

v t t wit v t .

P(wit|q j) TexMo1 naive

Bayes .

Bayesian [ 5]

.

[ 5] Hybrid naive Bayes HMM Bayesian network

11tw

1tq

2 1tw 1 1+tw

1+tq

2 1+tw

tq

1tw

2tw || tvtw

11tw

1tq

2 1tw 1 1+tw

1+tq

2 1+tw

tq

1tw

2tw || tvtw

24

HMM

(state transition probability) a ij. (observation symbol

probability) b j(l) . [4] EM

b j(l)

, TexMo2 TexMo1

.

5.2.2.

[ 6] hybrid naive Bayes HMM

TexMo2 .

, naive Bayes

(observation symbol probability)

b j(k) . hybrid naive Bayes HMM

. b j(k) HMM

=(A,B,) A .

.

(single observation) HMM .

(multiple observation) HMM

,

.

Viterbi

.

N .

N . N

25

naive Bayes . N

N

.

.

[ 6] hybrid naive Bayes HMM TexMo2

tagged data

hybrid nave BayesHMM learner

)|( jk qwP

nave Bayes learner ija

test data

result(emotion code)

preprocessing

tagging

)( jqP )(kbj=Viterbi method

chatting data

tagged data

hybrid nave BayesHMM learner

)|( jk qwP

nave Bayes learner ija

test data

result(emotion code)

preprocessing

tagging

)( jqP )(kbj=Viterbi method

chatting data

26

6

6.1.

. 4

4

.

. [ 8] . TexMo1

TexMo2

. [

8] TexMo1 18

.

,

.

, ,

.

80%

TexMo2 3

.

.

27

[ 8] TexMo1 TexMo2

TexMo1 (naive Bayes) TexMo2 (hybrid naive Bayes HMM)

1. 310 0. 3072

2. 234 1. 1162

3. 110 2. 513

4. 236 3. 969

5. 191 4. 453

6. 21 5. 288

7. 5 6. 134

8. 253

9. 64

10. 35

11. 89

12. 129

13. 72

14. 28

15. 80

16. 31

17. 33

18. 5164

19. // 442

20. / 226

21. 33

22. 29

23. 156

24. 6

25. 0

.

. naive Bayes

HMM ,

(noise) .

28

6.2

.

.

.

1013 420 .

TexMo1 TexMo2

.

[ 9] TexMo2 hybrid naive Bayes HMM

(state transition probability) .

.

[ 9] Hybrid naive Bayes HMM t-1

t state transition probability :

.

tt-1

0.76 0.02 0.0 0.15 0.04 0.01 0.01

0.47 0.29 0.0 0.13 0.07 0.01 0.02

0.48 0.03 0.23 0.17 0.03 0.02 0.03

0.35 0.03 0.0 0.14 0.25 0.22 0.01

0.45 0.25 0.0 0.15 0.12 0.01 0.01

0.45 0.04 0.22 0.17 0.06 0.04 0.02

0.33 0.15 0.1 0.13 0.03 0.0 0.25

29

[ 10] 4

(accuracy).

, ,

.

hybrid naive Bayes HMM

TexMo2 .

.

TexMo1

, TexMo1

.

.

.

[ 10] TexMo1 TexMo2

TexMo1 (naive Bayes)

84.6 72.3

, 91.6 74.9

TexMo2 (hybrid)

86.9 76.0

, 93.7 78.1

[ 7] [ 8] TexMo1 TexMo2

. 2

.

.

.

30

. TexMo2 HMM

(observation symbol probability) naive Bayes

.

[ 7] naive Bayes TexMo1

(accuracy)

CP:

CA: ,

SP:

SA: ,

0

10

20

30

40

50

60

70

80

90

100

1 5 10 15 20 25 30 35 40

(:1000)

accuracy

CP

CA

SP

SA

31

[ 8] hybrid naive Bayes HMM TexMo2

0

10

20

30

40

50

60

70

80

90

100

1 5 10 15 20 25 30 35 40

(:1000)

accuracy CP

CA

SP

SA

32

[ 9]

. TexMo2

.

[ 9] TexMo1 TexMo2 CP

0

10

20

30

40

50

60

70

80

90

100

1 5 10 15 20 25 30 35 40

(:1000)

accuracy

TexMo1

TexMo2

33

HMM TexMo2

N . [ 10] N

N . 4

,

.

[ 10] HMM

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10

accuracy

34

7

. naive

Bayes

. TexMo1

. TexMo1

. HMM hybrid

naive Bayes HMM

TexMo2 TexMo1

.

.

.

.

,

.

35

[1] L. R. Rabiner. A tutorial on hidden Markov models and selected

applications in speech recognition. Proceedings of the IEEE,

77(2):257-286, 1989.

[2] A. Stolcke and S. Omohundro. Hidden Markov Model induction by

bayesian model merging. In S. J. Hanson, J. D. Cowan, and C. L.

Giles, editors, Advances in Neural Information Processing Systems,

volume 5, pages 11-18, Morgan Kaufmann, San Mateo, CA, 1993.

[3] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification

from labeled and unlabeled documents using EM. Machine Learning,

39(2/3):103-134, 2000.

[4] P. Frasconi, G. Soda, A. Vullo. Text categorization for multi-page

documents: A hybrid naive Bayes HMM approach. JCDL, pages 11-20,

2001

[5] Y. Yang and C. Chute. An example-based mapping method for text

classification and retrieval. ACM Transactions on Information Systems,

12(3):252-277, 1994

[6] D. Freitag and A. McCallum. Information extraction with HMM

structures learned by stochastic optimization. In Proc. 12th AAAI

Conference, Austine, TX, 2000.

[7] A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the

construction of internet portals with machine learning. Information

36

Retrieval Journal, 3:127-163, 2000.

[8] W. W. Cohen. Text categorization and relational learning. In

Proceedings of the Twelfth International Conference on Machine

Learning, Lake Tahoe, California, 1995.

[9] D. Lewis and W. Gale. A sequential algorithm for training text

classirfiers. In SIGIR-94, 1994.

[10] D. Lewis and M. Ringuette. Comparison of two learning algorithms for

text categorization. In Proc. 3rd Annual Symposium on Document

Analysis and Information Retrieval, 1994.

[11] T. Joachims. Transductive inference for text classification using

support vector machines. In Int. conf. on Machine Learning, 1999

[12] T. Joachims. Text categorization with support vector machines:

Learning with many relevant features. In Proceedings of the European

Conference on Machine Learning. Springer, 1998.

[13] Y. Bengio and P. Frasconi. An input output HMM architecture. In G.

Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural

Information Processing Systems 7, pages 427-434. MIT Press, 1995.

[14] H. Lucke. Bayesian belief networks as a tool for stochastic parsing.

Speech Communication, 16:89-118, 1995.

[15] J. Pearl. Probabilistic Reasoning in Intellegent Systems: Networks of

Plausible Inference. Morgan Kaufmann, 1988.

[16] D. Koller and M. Sahami. Hierarchically classifying documents using

very few words. In Proc. Fourteenth Int. Conf. on Machine Learning,

1997.

[17] J. Kupiec. Robust Part-of-speech tagging using a hidden Markov

37

model. In Computer Speech and Language 6, 1992.

[18] P. Smyth, D. Heckerman, and M. I. Jordan. Probabilistic independence

networks for hidden Markov probability models. Neural Computation,

9(2):227-269, 1997.

[19] F. Jensen. An Introduction to Bayesian Networks. Springer Verlag,

New York, 1996.

[20] H. Ng, W. Goh, and K. Low. Feature selection, perceptron learning,

and a usability case study for text categorization. In Proc. of the 20th

Int. ACM SIGIR Conference on Research and Development in

Information Retrieval, pages 67-73, 1997.

[21] T. Mitchell. Machine Learning. McGraw-Hill, 1997.

[22] T. Kalt. A new probabilistic model of text classification and retrieval.

CIIR TR98-18, University of Massachusetts, 1996.

[23] T. Starner. Visual Recognition of American Sign Language Using

Hidden Markov Models. Master's thesis, MIT, Media Laboratory, 1995.

[24] . Naive Bayes Boosting . ,

, 2001.

Abstract

Recently, a huge amount of documents are scattered in the Web.

Therefore, automatic classification techniques of text documents help us

to organize large 0text documents efficiently. In addition, with a

development of graphical user interfaces, there are numerous efforts to

satisfy users by injecting those features into chatting or messenger

system. Especially, several researchers tried to extract emotional states

from text documents.

In this paper, we present an emotional classification system which aims to

analyze on-line text chatting document and determine the emotional state

of the speaker in each sentence based on statistical learning method.

However, even a single sentence in a chatting dialog may contain many

misspelled words, jargons, and slang. Therefore, it is extremely hard for

us to analyze dialogs through natural language processing techniques.

Instead, we try to overcome this problem by transforming each word

appeared in the given documents into the corresponding, more abstract,

symbol from which statistical data are constructed. With these

symbol-based representations, we determine the emotional state of each

sentence of documents. Our previous approach uses naive Bayes algorithm

which tries to finds emotional states using only the current sentence. The

second system, proposed in this paper, is a hybridization of naive Bayes

and HMM method which determine emotional states from not only the

current sentence but also the immediately preceding ones.

In the experiments, the hybrid method showed a remarkable performance

compared with the pure naive Bayes method. In addition, we are able to

improve the performance further by attaching a preprocessor which

automatically extracts emoticons, onomatopoeic words, question marks,

etc which is widely believed to be valuable resources identifying the

emotional state of a sentence. Trained with 40,000 sentences, we obtained

93.7% accuracy in the final hybrid system with attached preprocessor for

1013 test sentences.

1 2 3 Naive Bayes 4 Hidden Markov Models1 Forward algorithm, Backward algorithm2 Viterbi algorithm3 Baum-Welch algorithm

5 1 TexMo1: naive Bayes 2 TexMo2: Hybrid naive Bayes HMM

6 1 2

7 Abstract

Documents

Hybrid Naive Bayes HMM 기법을 사용한 텍스트로부터의 감정 분류