46
ピボット翻訳あれこれ 先端科学技術学院学 知能コミュニケーション研究室 三浦 明波 15/03/15 2015©Akiva Miura AHCLab, IS, NAIST 1 第11回 関MT勉強会 合宿

Kansai MT Pivot Arekore

Embed Size (px)

Citation preview

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 1

    11 MT

  • l :

    l :

    3 (B.Sc) NAIST (M1)

    l :

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 2

  • Overview

    0. 1. 2. - 3. - 4. 5. 6. 7. Appendix

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 3

  • 1.

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 4

  • l StaHsHcal Machine TranslaHon ; SMT) : [Brown et al., 1993]

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 5

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 6

    l

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 7

    !!

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 8

    (via )

    (via )

    l

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 9

    ()

    (via )

    (via ?)

    l

  • 2.

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 10

  • l Phrase-Based Machine TranslaHon ; PBMT : [Koehn et al., 2003]

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 11

    natuerlich hat john spass am spiel

    of course john has fun with the game

    :

    :

  • l Hierarchical Phrase-Based Machine TranslaHon ; Hiero :

    [Chiang, 2007]

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 12

    [X0] of [X1] [X1] [X0] :

    friends of Taro the parents of Taro and Hanako

  • l Tree-to-String T2S)

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 13

    X1:NP

    S

    VP

    X2:VBD X3:NP

    X1 X3 X2

    (SVO SOV)

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 14

    ()

    PBMT

    Hiero

    T2S, F2S

    (via )

    (via )

    l

  • 3.

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 15

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 16

    SMT fr en

    SMT en zhinput.fr translated.en translated.zh

    train.fr-en.fr train.fr-en.en train.en-zh.en train.en-zh.zh

    [De Gispert et al.,2006]

    (Cascade)

  • (TriangulaHon)

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 17

    Phrase Table fr en

    Phrase Table

    en zh

    input.fr translated.zh

    train.fr-en.fr train.fr-en.en train.en-zh.en train.en-zh.zh

    SMT fr zh

    21 [Cohn et al., 2007]

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 18

    ()

    PBMT

    Hiero

    T2S, F2S

    (via ) PBMT

    (via ?)

    Hiero ? T2S/F2S ?

  • 4.

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 19

  • l: Triangulation

    Hiero (12NL

    Triangulation

    l : 10

    15/03/15 202015Akiva Miura AHC-Lab, IS, NAIST

  • 5.

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 21

  • TriangulationHiero

    PBMTHiero

    Direct Cascade

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 22

  • : MarginalizaHon

    l 1: MarginalizaHon[UHyama et al., 2007]

    p

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 23

    (trg | src) = (trg | pvt)(pvt | src)pvtT1T2

    p (trg | src) = p (trg | pvt)p (pvt | src)pvtT1T2

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 24

    : [X1] leave [X1] 0.6 [X1] leave [X1] 0.7 : leave [X1] [X1] 0.5) leave [X1] [X1] 0.3 : [X1] [X1] 0.6 0.5 0.3 [X1] [X1] 0.6 0.3 0.18 [X1] [X1] 0.7 0.5 0.35 [X1] [X1] 0.7 0.3 = 0.21

  • Fr Es (via En)

    15/03/15 25

    Method BLUE

    PBMT Hiero

    Direct 40.15 40.19

    Cascade 36.20 36.30

    TriangulaHon (MarginalizaHon) 39.13 38.75

    2015Akiva Miura AHC-Lab, IS, NAIST

    Direct > Triangulation > Cascade

  • Fr Zh (via En)

    15/03/15 26

    Method BLUEPBMT Hiero

    Direct 14.31 16.33Cascade 14. 05 16.23

    TriangulaHon (MarginalizaHon) 14.3 16.66

    2015Akiva Miura AHC-Lab, IS, NAIST

    Direct > Triangulation > Cascade

  • Triangulation

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 27

  • 2: MarginalizaHon

    l 2: CountMin [Zhu et al, 2014]

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 28

    c(src,trg) = min(c(src, pvt),c(pvt,trg))pvt

    (trg | src) = c(src,trg)c(src,trg ')

    trg '

    c

  • (CountMin)

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 29

    : [X1] leave [X1] 60, 0.6 [X1] leave [X1] 70, 0.7 : leave [X1] [X1] 100, 0.5) leave [X1] [X1] 75, 0.3 : [X1] [X1] 60, 0.5 [X1] [X1] 60, 0.5 [X1] [X1] 70, 0.5 [X1] [X1] 70, 0.5

  • : BidirecHonal

    l 3: BidirecHonal

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 30

    c(src, pvt,trg) = min(c(src, pvt)(trg | pvt),c(pvt,trg)(src | pvt))

    = c(src, pvt)c(pvt,trg)max c1(pvt),c2 (pvt)( )

    c(src,trg) = c(src, pvt,trg)pvt

  • (BidirecHonal)

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 31

    : [X1] leave [X1] 60, 0.6 [X1] leave [X1] 70, 0.7 : leave [X1] [X1] 100, 0.5) leave [X1] [X1] 75, 0.3 : [X1] [X1] min(60 0.5, 100 0.6) 30) [X1] [X1] min(60 0.3, 75 0.6) 18 [X1] [X1] min(70 0.5, 100 0.7) 35 [X1] [X1] min(70 0.3, 75 0.7) 21

  • Fr Es (via En)

    15/03/15 32

    Method BLUE

    PBMT Hiero

    Direct 40.15 40.19

    Cascade 36.20 36.30

    MarginalizaHon 39.13 38.75

    CountMin 38.25 37.89

    CountMin +Lex MarginalizaHon

    38.77 37.92

    BidirecHon 38.52 38.28

    BidirecHon +Lex MarginalizaHon

    39.16 38.82

    CountMinBidirection BidirectionMarginalization

    2015Akiva Miura AHC-Lab, IS, NAIST

  • Fr Zh (via En)

    15/03/15 33

    Method BLUE

    PBMT Hiero

    Direct 14.31 16.33

    Cascade 14. 05 16.23

    MarginalizaHon 14.3 16.66

    CountMin 13.69 15.89

    CountMin +Lex MarginalizaHon

    14.43 16.40

    BidirecHon 14.26 14.61

    BidirecHon +Lex MarginalizaHon

    14.45 16.63

    Fr -> Es (via En)

    2015Akiva Miura AHC-Lab, IS, NAIST

  • Merging

    l 1: InterpolaHon [Zhu et al, 2014]

    0.9

    l 2: SumCount [Zhu et al, 2014]

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 34

    (trg | src) =1(trg | src)+ (1 )2 (trg | src)

    p (trg | src) = p (trg | src)+ (1 )p (trg | src)

    c(src,trg) = c1(src,trg)+ c2 (src,trg)

  • Fr Es

    15/03/15 35

    Method BLUE score Direct Direct w/ TriangulaHon

    PBMT Hiero

    10k Direct 40.15 40.19

    MarginalizaHon 39.13 38.75

    Direct 1k + MarginalizaHon 100k

    (interpolaHon)

    26.94 39.13 26.57 38.82

    Direct 1k + BidirecHon 100k

    (integraHon)

    26.94 39.11 26.57 38.72

    Direct 10k + MarginalizaHon

    100k (interpolaHon)

    36.23 39.25 37.67 38.89

    Direct 10k + BidirecHon 100k (InterpolaHon

    36.23 39.15 37.67 38.82

    2015Akiva Miura AHC-Lab, IS, NAIST

  • Fr Zh

    15/03/15 36

    Method BLUE score Direct Direct w/ TriangulaHon

    PBMT Hiero

    10k Direct 14.31 16.33

    MarginalizaHon 14.43 16.63

    Direct 1k + MarginalizaHon 100k

    (interpolaHon)

    4.30 14.48 4.18 16.40

    Direct 1k + BidirecHon 100k

    (integraHon)

    4.30 14.45 4.18 16.43

    Direct 10k + MarginalizaHon

    100k (interpolaHon)

    13.28 14.47 16.78 16.67

    Direct 10k + BidirecHon 100k (InterpolaHon

    13.28 14.44 16.78 16.59

    2015Akiva Miura AHC-Lab, IS, NAIST

  • 6.

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 37

  • l MarginalizaHon

    MarginalizaHon

    l

    : l

    l T2S 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 38

  • Overview

    1. 2. - 3. - 4. 5. 6. 7. Appendix

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 39

  • 7. Appendix

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 40

  • 15/03/15 41

    SMT fr en

    SMT en zhinput.fr translated.zh

    train.fr-en.fr train.fr-en.en train.en-zh.en train.en-zh.zh

    1

    2

    n

    prepared corpus trained task translated text( )

    2015Akiva Miura AHC-Lab, IS, NAIST

    O(n)

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 42

    SMT en zh

    SMT fr zh

    train.fr-en.entranslated.zh

    as train.fr-zh.zh

    translated.zh

    train.en-zh.en train.en-zh.zh

    train.fr-en.fr as

    train.fr-zh.fr

    input.fr

    ( Synthetic ) :

    (De Gispert et al.,2006)

  • :

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 43

    selon leurs [X0]

    according to their [X0] aper their [X0]

    [X0] [X0]

    0.2 0.6

    0.4 10.6

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 44

    selon leurs [X0]

    according to their [X0] aper their [X0]

    [X0] [X0]

    0.2 0.6

    0.4 10.6

    0.2 0.4 = 0.08

    :

  • 15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 45

    selon leurs [X0]

    according to their [X0] aper their [X0]

    [X0] [X0]

    0.2 0.6

    0.4 10.6

    0.2 0.4 = 0.08 0.2 0.6 + 0.4 1 = 0.52

    :

  • CountMin (FULL)

    l 2: CountMin [Zhu et al, 2014]

    15/03/15 2015Akiva Miura AHC-Lab, IS, NAIST 46

    c(src,trg) = min(c(src, pvt),c(pvt,trg))pvt

    (trg | src) = c(src,trg)c(src,trg ')

    trg '

    (trg | src) = c(src,trg)c(src,trg ')

    trg '

    a = {(t, s) | p : (s, p)a1 (p,t)a 2}

    p (trg | src,a) =1

    { j | (i, j)a}i=1

    n

    (trgi | srcj(i, j )a )