Click here to load reader

教科書コーパスを用いた日本語テキストの難易度推定must.c.u-tokyo.ac.jp/nlpann/pdf/nlp2008/D5-05.pdf · 教科書コーパスを用いた日本語テキストの難易度推定

Embed Size (px)

Citation preview

  • 1 1,2 1

    1 , 2

    1.

    1920 1940 1)Flesch Reading Ease Kincaid Grade Level 2) 3 3)

    2.

    2.1 Reading-age 1 312 13

    2.2 13 4) 1 1111167 728,002 16311345,261

    3.

    3.1 Collins-Thompson 5) N Gi(i = 1, 2, ..., N)N Mi Mi Gi unigram T Mi

    L(Mi|T ) =wT

    C(w)logP (w|Mi) (1)

    w T C(w)

    - 1113 -

  • T w P (w|Mi)Mi w N Mi Gi

    3.2 unigram unigram 3) 2( 1 )

    unigram unigram

    ( 2 ) 1

    4.

    1 T 1 13 Gi(i = 1, 2, ..., 13) 1

    4.1 213 unigramMi x P (x|Mi)

    P (x|Mi) = C(x, Di)zDi C(z, Di)

    (2)

    Di Gi zDi C(z, Di) Di z

    4.2 T

    Gi Mi

    L(Mi|T ) =zT

    C(z, T )logP (z|Mi) (3)

    13Mi Gi

    4.3 Mi T Mi 0 (3) 2 1 13

    2 2 2 1 xP (x|Mi)

    0

    2 1 0

    P (x|Mi) = P (x|Mi1) + P (x|Mi+1)2

    (4)

    P (x|Mi) 0 P (x|Mi1) P (x|Mi+1) 0 Mi (4) 0

    4.4 P (x|Mi) L(Mi|T )2 1 1 P (x|Mi)Gaussian-kernel P (x|Mi)

    P (x|Mi) =

    jAi K(i, j)P (x|Mi)jAi K(i, j)

    (5)

    (Ai = {k : | k i | h, 1 k N})

    hN K(i, j)

    K(i, j) = exp

    (i j)222

    (6)

    2 2 L(Mi|T ) 2 3

    - 1114 -

  • GT

    T

    G1

    M1

    G1

    G2

    GN

    G2

    M2

    GN

    MN

    zC(z,T) L(Mi |T)

    P(x|Mi)

    1

    1 RMSE RMSE 0.895 1.638

    Gaussian-kernel 0.915 1.482 0.900 1.626

    2 0.886 1.7943 0.898 1.701

    -0.793 - -0.783 -

    13 1 P (x|Mi)2 L(Mi|T )

    5.

    5.1 Leave-one-out RMSE(Root Mean Square Error) 1 100 12 h 2 10RMSE (h = 2, = 0.9) 1 5 2)

    2 3 RMSE 3 RMSE

    , +2 , 3 0.920 1.413, GK , 2 0.919 1.412

    +GK , 2 , 3 0.919 1.436

    2 0.8 5 0.95 1 1RMSE 3 3 3 2 1+GK 1 1 +2 2 22+3 2 23

    5.2 4 12 4

    - 1115 -

  • 3

    () Web 4 0 0 0 0 4 0 0 0 0 4 0 0 0 3 1

    16 3

    6.

    2) 6) 14 7)

    7.

    unigram 0.92

    (A) ( 16200009

    1) : ,Vol.1, pp. 6685 (1991).

    2) , , : , 1988-HI-018, pp.18 (1988).

    3) , : , 13 , pp. 534537(2007).

    4) , , , : , 14,D3-2 (2008).

    5) Collins-Thompson, K. and Callan, J.: Predicting

    Reading Difficulty with Statistical Language Mod-

    els, Journal of the American Society for Informa-

    tion Science and Technology , Vol. 56, No. 13, pp.

    14481462 (2005).

    6) : , , 34, pp. 122 (1998).

    7) , : , NLC2007-32(2007-10), pp. 1924 (2007).

    - 1116 -