41
공학석사학위논문 Hybrid Naive Bayes HMM 기법을 사용한 텍스트로부터의 감정 분류 Emotional Classification from Text Data by Hybrid Naive Bayes HMM 20022서울대학교 대학원 컴퓨터공학과 문 현 구

Hybrid Naive Bayes HMM 기법을 사용한 텍스트로부터의 감정 분류

  • Upload
    lehuong

  • View
    231

  • Download
    8

Embed Size (px)

Citation preview

  • Hybrid Naive Bayes HMM

    Emotional Classification from Text Data

    by Hybrid Naive Bayes HMM

    2002 2

  • .

    ,

    .

    .

    .

    .

    Naive Bayes

    HMM

    naive Bayes hybrid naive Bayes HMM

    .

    hybrid naive Bayes

    ,

    .

    4 1013

    hybrid 93.7% .

    : , HMM, naive Bayes

    : 2000-21205

  • 1 1

    2 5

    3 Naive Bayes 7

    4 Hidden Markov Models 10

    1 Forward algorithm, Backward algorithm

    2 Viterbi algorithm

    3 Baum-Welch algorithm

    5 17

    1 TexMo1: naive Bayes

    2 TexMo2: Hybrid naive Bayes HMM

    6 26

    1

    2

    7 34

  • 1

    1

    20

    .

    .

    .

    .

    .

    (e-mail)

    (instance messenger), (chatting) .

    (character)

    .

    .

    .

    ,

  • 2

    .

    .

    (Emoticon) : (Emoticon)

    (Emotion) (Icon)

    (Sign) [ 1] .

    [ 1]

    ^^

    : (

    : ) -.-;

    ^_^ :-P

    ^o^ -o-

    ^^;

    T_T

    ^^a :'(

    70 (Smiley) (Unix)

    ,

    ,

    .

    .

    :

    .

    .

  • 3

    :

    (GUI)

    . [ 1]

    .

    ,

    .

    .

    .

    [ 1] MSN

    .

    naive Bayes Hidden Markov

  • 4

    Model(HMM). Naive Bayes

    [21].

    HMM naive Bayes hybrid naive Bayes

    HMM .

    . 2

    , 3 4 naive Bayes

    HMM 5

    . 6

    7

    .

  • 5

    2

    (text categorization)

    . . (categorization)

    ,

    .

    . (regression models) Yang[5] , Cohen[8]

    (inductive logic programming), Lewis[10]

    (decision tree), Goh[20] (neural networks), Joachims[12]

    (support vector machine), [9, 16, 21]

    .

    Hidden Markov

    Models(HMM) (probabilistic models) .

    HMM

    HMM

    (transition probability) (spectrum)

    . Rabiner[1]

    Baum-Welch ,

    Viterbi

    .

  • 6

    (Part-of-Speech Tagging) HMM

    .

    , HMM

    ,

    .

    HMM (supervised learning)

    (unsupervised learning) .

    Viterbi

    , Baum-Welch

    .

    HMM Kupiec[17] .

    HMM Frasconi[4] .

    Frasconi HMM (observation probability)

    naive Bayes hybrid .

    OCR

    .

    , , , . HMM

    ,

    naive Bayes

    . HMM labeled

    data unlabeled data [3] [11]

    EM .

    Viterbi .

    [6, 7] HMM (information extraction)

    , [6, 7]

    .

  • 7

    3

    Naive Bayes

    Naive Bayes

    [21].

    S

    naive Bayes ,

    (tagging)

    .

    (labeling) .

    Bayesian

    ,

    , qMAP .

    qMAP=argmaxq jS

    P(q j|a 1,a 2an)

    Bayes .

    (2.1) .

    (2.1)

    qMAP =argmaxq jS

    P(a 1,a 2an|q j)P(q j)

    P(a 1,a 2an)

    =argmaxq jS

    P(a 1,a 2an|q j)P(q j)

  • 8

    P(q j) q j

    ,

    P(a 1,a 2an|q j)

    .

    Naive Bayes

    (mutually independent) .

    . n

    q j

    q j .

    P(a 1,a 2an|q j)=iP(a i|q j) (2.2)

    Bayesian [ 2] .

    [ 2] Bayesian network

    naive Bayes

    1a 2a

    jq

    na1a 2a

    jq

    na

    naive Bayes

    .

    (2.2) (2.1) naive Bayes

    .

  • 9

    Naive Bayes classifier:

    qNB naive Bayes . P(a i|q j)

    P(a 1,a 2an|q j)

    .

    , naive Bayes P(q j)

    P(a i|q j) , (2.3)

    q j

    .

    naive Bayes qNB (2.1) MAP .

    Naive Bayes

    .

    .

    Naive Bayes

    . TREC-8

    [24].

    (2.3)qNB=argmaxq jS

    P(q j)iP(a i|q j)

  • 10

    4

    Hidden Markov Models

    HMM (hidden)

    .

    ,

    (stochastic process)

    . HMM

    (speech recognition)

    (part-of-speech tagging)

    .

    [ 3] Bayesian network HMM

    tS 1+tS1tS

    tv 1+tv1tv

    tS 1+tS1tS

    tv 1+tv1tv

    HMM Bayesian [13, 14, 18]

    HMM [ 3]

    . HMM

    [1].

  • 11

    (state) N :

    (hidden)

    .

    S={S 1,S 2,,SN} t q t

    .

    (observation symbol) M : (state)

    (symbol) .

    V={v 1,v 2,,vM} .

    (state transition probability) A={a ij} : i

    j .

    a ij= P[q t+ 1=S j |q t= S i], 1i, jN.

    a ij0, i, j

    N

    j= 1a ij=1, i

    i j a ij=0

    . HMM topology .

    .

    (observation symbol probability) B={b j(k) } : j

    k .

    b j(k)= P [v k at t |q t= S j], 1jN

    1kM.

  • 12

    (initial state distribution) ={ i} : t=1

    .

    i=P[q 1=Si], 1iN.

    HMM (N, M)

    , A, B, .

    =(A,B,) .

    HMM

    [23].

    1) Evaluation problem

    O=O 1O 2OT = (A,B,)

    O P(O|) .

    2) Decoding problem

    O=O 1O 2OT = (A,B,)

    Q=q 1q 2qT .

    3) Estimation Problem

    P(O|) = (A,B,) .

    forward , Viterbi

    , Baum-Welch [1].

  • 13

    4.1 Forward algorithm, Backward algorithm

    Forward (sequence) O=O 1O 2OT

    =(A,B,) O P(O|)

    .

    t(i) .

    t(i)=P(O 1O 2Ot,q t=Si |)

    t O 1O 2Ot

    Si . t(i)

    .

    [ 2] Forward likelihood

    1) Initialization :

    1(i)= ib i(O 1 ), 1iN.

    2) Induction :

    t+1(j)= [ N

    i=1 t(i)a ij]b j(Ot+1), 1tT-1

    1jN.

    3) Termination :

    P(O |)= N

    i=1T(i).

    Backward Baum-Welch

  • 14

    Forward .

    t(i)

    t(i)=P(Ot+1Ot+2OT|q t=Si,)

    Backward .

    [ 3] Backward

    1) Initialization :

    T(i)=1, 1iN.

    2) Induction :

    t(j)= N

    j=1aijb j(Ot+1) t+1(j),

    t=T-1,T-2,,1, 1iN.

  • 15

    4.2 Viterbi algorithm

    Viterbi O=O 1O 2OT

    =(A,B,) Q=q 1q 2qT

    , Decoding problem .

    t(i)

    t(i)=max

    q 1,q 2,,q t-1P[q 1q 2q t= i,O 1O 2Ot |]

    t(j) t j t-1

    .

    [ 4] Viterbi

    1) Initialization :

    1(i)= ib i(O 1 ), 1iN

    1(i)= 0.

    2) Recursion :

    t(j)=

    max1iN

    [ t-1 (i)a ij]b j(Ot), 2tT

    1jN

    t(j)=

    arg max1iN

    [ t-1 (i)aij], 2tT

    1jN.

    3) Termination :

    P*=

    max1iN

    [T (i) ]

    q*t=

    arg max1iN

    [T (i)].

    4) backtracking :

    q*t= t+1(q

    *t ),t=T-1,T-2,,1.

  • 16

    4.3 Baum-Welch algorithm

    Baum-Welch HMM = (A,B,)

    . t(i,j) t(i)

    .

    t(i, j) = t(i)a ijb j(Ot+1) t+1(j)

    P(O|)

    = t(i)a ijb j(Ot+1) t+1(j)

    N

    i=1N

    j=1 t(i)a ijb j(Ot+1) t+1(j)

    t(i) = N

    i=1 t(i, j).

    t(i, j) t t+1 Si Sj

    t(i) t Si .

    P(O ( k)|) Pk O ( k) k

    .

    (4.1)

    i =expected frequency (number of times) in state Si at time (t=1)=1(i)

    aij =expected number of transitions from state Si to state Sj

    expected number of transitions from state Si

    =T-1

    t=1t(i,j)

    T-1

    t=1t(i)

    bj(k) =expected number of times in state j and observing symbol vk

    expected number of times in state j

    =

    T

    t=1s.t.Ot=vl

    t(j)

    T

    t=1t(j)

  • 17

    5

    TexMo

    . TexMo naive Bayes

    TexMo1 hybrid naive Bayes HMM TexMo2

    .

    .

    5.1 TexMo1 : naive Bayes

    TexMo1 [ 4] .

    [ 4] TexMo

    P(vj)

    P(ai|vj)

    P(vj)

    P(ai|vj)

  • 18

    Naive Bayes

    .

    . TexMo1 [ 5]

    .

    [ 5] TexMo1

    1.

    2.

    3.

    4.

    5.

    6.

    7.

    8.

    9.

    10.

    11.

    12.

    13.

    14.

    15.

    16.

    17.

    18.

    19. //

    20. /

    21.

    22.

    23.

    24.

    25.

    , , 2,

    , ,

    ^_^ ^^ ^ ^ ^.^ ,

    ^o^ :-D , ,

    , , ,

    , ~

    ^^; ^^a ,

    , , !

    T.T . T_T

    ,

    :-o , !, !

    , ,

    ~, , ,

    ,

    , ,

    , , !

    :-p , ?

    ?, ?, ?,

    , , , ,

    , , , ,

    ,

    , ~

    --; -.-; -_-; -.-a --a

  • 19

    [ 5] 3 7

    . 18

    .

    9 23

    .

    . [ 6] .

    [ 6]

    @1

    : 1

    , . ^^ @1

    : 19

    : 1

    ? @18

    :

    : 18

    ? : 18

    ? @18

    ? : 18

    .. .. @

    :

    :

    ..... . @

    :

    :

    :

    ~~ .. @19

    : 19

    :

    .. .. ^^ @3

    : 3

    :

    . @

    :

    :

    ? @18

    :

    :

    ? : 18

    .. @

    :

    .. ...

    ? @18

    :

    :

    : 12

    :

    :

    ? : 18

    . @19

    : 19

  • 20

    , @

    : . @

    , :

    . .

    ,

    .

    (2.3) naive Bayes

    . @

    (prior probability) P(v j) :

    P(ai|v j) . (2.3)

    naive Bayes

    . Naive Bayes [ 5] 1 24

    .

    h 25 .

    . ,

    ... , ,

    ,

    .

    .

    (1) :

    .

  • 21

    (2) : ^^, -.-;, :-) .

    ,

    .

    (3) ASCII : *, #, @, ^

    ASCII .

    (4) : (postprocessing)

    . TexMo1

    .

    4

    .

    .

    (5) / :

    18 ,

    11 .

    naive Bayes

    HMM .

    Naive Bayes

    .

    5 ?

    19 ! 18

    .

    , naive Bayes .

    HMM .

  • 22

    5.2 TexMo2 : Hybrid naive Bayes HMM

    HMM

    (modeling). HMM

    . HMM naive Bayes

    . HMM naive

    Bayes TexMo2

    .

    5.2.1

    HMM . TexMo2

    [ 7] .

    [ 7] TexMo2

    0.

    1.

    2.

    3.

    4.

    5.

    6.

    , , , ,

    , ,

    ,

    , ,

    ,

    ,

    .

    [ 5] TexMo1

    , TexMo1

    .

  • 23

    HMM , ,

    . ,

    , HMM

    . (4.1) j t

    (observation symbol probability) b j(t)

    naive Bayes .

    b j(t)=P(v t|q j)=|v t|

    i=1P(wit|q j)

    v t t wit v t .

    P(wit|q j) TexMo1 naive

    Bayes .

    Bayesian [ 5]

    .

    [ 5] Hybrid naive Bayes HMM Bayesian network

    11tw

    1tq

    2 1tw 1 1+tw

    1+tq

    2 1+tw

    tq

    1tw

    2tw || tvtw

    11tw

    1tq

    2 1tw 1 1+tw

    1+tq

    2 1+tw

    tq

    1tw

    2tw || tvtw

  • 24

    HMM

    (state transition probability) a ij. (observation symbol

    probability) b j(l) . [4] EM

    b j(l)

    , TexMo2 TexMo1

    .

    5.2.2.

    [ 6] hybrid naive Bayes HMM

    TexMo2 .

    , naive Bayes

    (observation symbol probability)

    b j(k) . hybrid naive Bayes HMM

    . b j(k) HMM

    =(A,B,) A .

    .

    (single observation) HMM .

    (multiple observation) HMM

    ,

    .

    Viterbi

    .

    N .

    N . N

  • 25

    naive Bayes . N

    N

    .

    .

    [ 6] hybrid naive Bayes HMM TexMo2

    tagged data

    hybrid nave BayesHMM learner

    )|( jk qwP

    nave Bayes learner ija

    test data

    result(emotion code)

    preprocessing

    tagging

    )( jqP )(kbj=Viterbi method

    chatting data

    tagged data

    hybrid nave BayesHMM learner

    )|( jk qwP

    nave Bayes learner ija

    test data

    result(emotion code)

    preprocessing

    tagging

    )( jqP )(kbj=Viterbi method

    chatting data

  • 26

    6

    6.1.

    . 4

    4

    .

    . [ 8] . TexMo1

    TexMo2

    . [

    8] TexMo1 18

    .

    ,

    .

    , ,

    .

    80%

    TexMo2 3

    .

    .

  • 27

    [ 8] TexMo1 TexMo2

    TexMo1 (naive Bayes) TexMo2 (hybrid naive Bayes HMM)

    1. 310 0. 3072

    2. 234 1. 1162

    3. 110 2. 513

    4. 236 3. 969

    5. 191 4. 453

    6. 21 5. 288

    7. 5 6. 134

    8. 253

    9. 64

    10. 35

    11. 89

    12. 129

    13. 72

    14. 28

    15. 80

    16. 31

    17. 33

    18. 5164

    19. // 442

    20. / 226

    21. 33

    22. 29

    23. 156

    24. 6

    25. 0

    .

    . naive Bayes

    HMM ,

    (noise) .

  • 28

    6.2

    .

    .

    .

    1013 420 .

    TexMo1 TexMo2

    .

    [ 9] TexMo2 hybrid naive Bayes HMM

    (state transition probability) .

    .

    [ 9] Hybrid naive Bayes HMM t-1

    t state transition probability :

    .

    tt-1

    0.76 0.02 0.0 0.15 0.04 0.01 0.01

    0.47 0.29 0.0 0.13 0.07 0.01 0.02

    0.48 0.03 0.23 0.17 0.03 0.02 0.03

    0.35 0.03 0.0 0.14 0.25 0.22 0.01

    0.45 0.25 0.0 0.15 0.12 0.01 0.01

    0.45 0.04 0.22 0.17 0.06 0.04 0.02

    0.33 0.15 0.1 0.13 0.03 0.0 0.25

  • 29

    [ 10] 4

    (accuracy).

    , ,

    .

    hybrid naive Bayes HMM

    TexMo2 .

    .

    TexMo1

    , TexMo1

    .

    .

    .

    [ 10] TexMo1 TexMo2

    TexMo1 (naive Bayes)

    84.6 72.3

    , 91.6 74.9

    TexMo2 (hybrid)

    86.9 76.0

    , 93.7 78.1

    [ 7] [ 8] TexMo1 TexMo2

    . 2

    .

    .

    .

  • 30

    . TexMo2 HMM

    (observation symbol probability) naive Bayes

    .

    [ 7] naive Bayes TexMo1

    (accuracy)

    CP:

    CA: ,

    SP:

    SA: ,

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    1 5 10 15 20 25 30 35 40

    (:1000)

    accuracy

    CP

    CA

    SP

    SA

  • 31

    [ 8] hybrid naive Bayes HMM TexMo2

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    1 5 10 15 20 25 30 35 40

    (:1000)

    accuracy CP

    CA

    SP

    SA

  • 32

    [ 9]

    . TexMo2

    .

    [ 9] TexMo1 TexMo2 CP

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    1 5 10 15 20 25 30 35 40

    (:1000)

    accuracy

    TexMo1

    TexMo2

  • 33

    HMM TexMo2

    N . [ 10] N

    N . 4

    ,

    .

    [ 10] HMM

    0

    10

    20

    30

    40

    50

    60

    70

    80

    90

    100

    1 2 3 4 5 6 7 8 9 10

    accuracy

  • 34

    7

    . naive

    Bayes

    . TexMo1

    . TexMo1

    . HMM hybrid

    naive Bayes HMM

    TexMo2 TexMo1

    .

    .

    .

    .

    ,

    .

  • 35

    [1] L. R. Rabiner. A tutorial on hidden Markov models and selected

    applications in speech recognition. Proceedings of the IEEE,

    77(2):257-286, 1989.

    [2] A. Stolcke and S. Omohundro. Hidden Markov Model induction by

    bayesian model merging. In S. J. Hanson, J. D. Cowan, and C. L.

    Giles, editors, Advances in Neural Information Processing Systems,

    volume 5, pages 11-18, Morgan Kaufmann, San Mateo, CA, 1993.

    [3] K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification

    from labeled and unlabeled documents using EM. Machine Learning,

    39(2/3):103-134, 2000.

    [4] P. Frasconi, G. Soda, A. Vullo. Text categorization for multi-page

    documents: A hybrid naive Bayes HMM approach. JCDL, pages 11-20,

    2001

    [5] Y. Yang and C. Chute. An example-based mapping method for text

    classification and retrieval. ACM Transactions on Information Systems,

    12(3):252-277, 1994

    [6] D. Freitag and A. McCallum. Information extraction with HMM

    structures learned by stochastic optimization. In Proc. 12th AAAI

    Conference, Austine, TX, 2000.

    [7] A. McCallum, K. Nigam, J. Rennie, and K. Seymore. Automating the

    construction of internet portals with machine learning. Information

  • 36

    Retrieval Journal, 3:127-163, 2000.

    [8] W. W. Cohen. Text categorization and relational learning. In

    Proceedings of the Twelfth International Conference on Machine

    Learning, Lake Tahoe, California, 1995.

    [9] D. Lewis and W. Gale. A sequential algorithm for training text

    classirfiers. In SIGIR-94, 1994.

    [10] D. Lewis and M. Ringuette. Comparison of two learning algorithms for

    text categorization. In Proc. 3rd Annual Symposium on Document

    Analysis and Information Retrieval, 1994.

    [11] T. Joachims. Transductive inference for text classification using

    support vector machines. In Int. conf. on Machine Learning, 1999

    [12] T. Joachims. Text categorization with support vector machines:

    Learning with many relevant features. In Proceedings of the European

    Conference on Machine Learning. Springer, 1998.

    [13] Y. Bengio and P. Frasconi. An input output HMM architecture. In G.

    Tesauro, D. Touretzky, and T. Leen, editors, Advances in Neural

    Information Processing Systems 7, pages 427-434. MIT Press, 1995.

    [14] H. Lucke. Bayesian belief networks as a tool for stochastic parsing.

    Speech Communication, 16:89-118, 1995.

    [15] J. Pearl. Probabilistic Reasoning in Intellegent Systems: Networks of

    Plausible Inference. Morgan Kaufmann, 1988.

    [16] D. Koller and M. Sahami. Hierarchically classifying documents using

    very few words. In Proc. Fourteenth Int. Conf. on Machine Learning,

    1997.

    [17] J. Kupiec. Robust Part-of-speech tagging using a hidden Markov

  • 37

    model. In Computer Speech and Language 6, 1992.

    [18] P. Smyth, D. Heckerman, and M. I. Jordan. Probabilistic independence

    networks for hidden Markov probability models. Neural Computation,

    9(2):227-269, 1997.

    [19] F. Jensen. An Introduction to Bayesian Networks. Springer Verlag,

    New York, 1996.

    [20] H. Ng, W. Goh, and K. Low. Feature selection, perceptron learning,

    and a usability case study for text categorization. In Proc. of the 20th

    Int. ACM SIGIR Conference on Research and Development in

    Information Retrieval, pages 67-73, 1997.

    [21] T. Mitchell. Machine Learning. McGraw-Hill, 1997.

    [22] T. Kalt. A new probabilistic model of text classification and retrieval.

    CIIR TR98-18, University of Massachusetts, 1996.

    [23] T. Starner. Visual Recognition of American Sign Language Using

    Hidden Markov Models. Master's thesis, MIT, Media Laboratory, 1995.

    [24] . Naive Bayes Boosting . ,

    , 2001.

  • Abstract

    Recently, a huge amount of documents are scattered in the Web.

    Therefore, automatic classification techniques of text documents help us

    to organize large 0text documents efficiently. In addition, with a

    development of graphical user interfaces, there are numerous efforts to

    satisfy users by injecting those features into chatting or messenger

    system. Especially, several researchers tried to extract emotional states

    from text documents.

    In this paper, we present an emotional classification system which aims to

    analyze on-line text chatting document and determine the emotional state

    of the speaker in each sentence based on statistical learning method.

    However, even a single sentence in a chatting dialog may contain many

    misspelled words, jargons, and slang. Therefore, it is extremely hard for

    us to analyze dialogs through natural language processing techniques.

    Instead, we try to overcome this problem by transforming each word

    appeared in the given documents into the corresponding, more abstract,

    symbol from which statistical data are constructed. With these

    symbol-based representations, we determine the emotional state of each

    sentence of documents. Our previous approach uses naive Bayes algorithm

    which tries to finds emotional states using only the current sentence. The

    second system, proposed in this paper, is a hybridization of naive Bayes

    and HMM method which determine emotional states from not only the

    current sentence but also the immediately preceding ones.

    In the experiments, the hybrid method showed a remarkable performance

    compared with the pure naive Bayes method. In addition, we are able to

    improve the performance further by attaching a preprocessor which

    automatically extracts emoticons, onomatopoeic words, question marks,

    etc which is widely believed to be valuable resources identifying the

    emotional state of a sentence. Trained with 40,000 sentences, we obtained

    93.7% accuracy in the final hybrid system with attached preprocessor for

    1013 test sentences.

    1 2 3 Naive Bayes 4 Hidden Markov Models1 Forward algorithm, Backward algorithm2 Viterbi algorithm3 Baum-Welch algorithm

    5 1 TexMo1: naive Bayes 2 TexMo2: Hybrid naive Bayes HMM

    6 1 2

    7 Abstract