17
1 NAIST at WAT 2014 Forest-to-String SMT for Asian Language Translation: NAIST at WAT 2014 Graham Neubig Nara Institute of Science and Technology (NAIST) 2014-10-4

Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

1

NAIST at WAT 2014

Forest-to-String SMT for Asian Language Translation:NAIST at WAT 2014

Graham NeubigNara Institute of Science and Technology (NAIST)

2014-10-4

Page 2: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

2

NAIST at WAT 2014

Features of ASPEC

● Translation between languages with different grammatical structures

流動 プラズマ を 正確 に 測定 する ため に 画像 を 再 構成 した 。

an image was reconstituted in order to measure flowing plasma correctly .

● We all know: Phrase-based MT is not enough

for the accurate measurement of plasma flow image was reconstructed .

Page 3: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

3

NAIST at WAT 2014

Solution?: 2-step Translation Process

● Pre-ordering [Weblio, SAS_MT, NII, TMU, NICT]

● RBMT+Statistical Post Editing [TOSHIBA, EIWA]

我々 は 科学論文 を 翻訳する

我々 翻訳する 科学論文

we translatescientific papers

我々 は 科学論文 を 翻訳する

we translatescience thesis

we translatescientific papers

Page 4: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

4

NAIST at WAT 2014

This is a lot of work... :(How do I make good

Japanese-Englishpreordering rules?!

How do I make goodJapanese-Chinese

preorderering rules?!

What about error propagation?

What if better preorderingaccuracy doesn't equal better

translation accuracy?

Page 5: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

5

NAIST at WAT 2014

Evidence

Page 6: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

6

NAIST at WAT 2014

Our Solution: Tree-to-String Translation [Liu+ 06]

友達 と ご飯 を 食べ た

SUF5

VP0-5

PP0-1

VP2-5

PP2-3

N2

P3

V4

N0

P1

VP4-5

a meal

a meal

x1 x0

x1 x0

my friend

my friend

x1 with x0

x1 x0

with

ate

ate

Page 7: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

7

NAIST at WAT 2014

Requirements for aTree-to-String Model

This is a test . It uses data .

これはテストです。 データを使用します。ParallelCorpus

Source SentenceParser

Alignments

Rule ExtractionRule ScoringOptimization

Tree-to-StringModel

Page 8: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

8

NAIST at WAT 2014

Reducing our work load.How do I make good

Japanese-Englishpreordering rules?!

How do I make goodJapanese-Chinese

preorderering rules?!

What about error propagation?

What if better preorderingaccuracy doesn't equal better

translation accuracy?

XX

X

Page 9: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

9

NAIST at WAT 2014

Forest-to-string Translation[Mi+ 08]

I saw a girl with a telescope

PRP0,1

VBD1,2

DT2,3

NN3,4

IN4,5

DT5,6

NN6,7

NP5,7

NP2,4

PP4,7

VP1,7

S0,7

NP2,7

NP0,1

Page 10: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

10

NAIST at WAT 2014

Travatar Toolkit

● Forest-to-string translation toolkit

● Supports training, decoding

● Includes preprocessing scripts for parsing, etc.

● Many other features (optimization, Hiero, etc...)

Available open source!http://phontron.com/travatar

Page 11: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

11

NAIST at WAT 2014

NAIST WAT System

Page 12: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

12

NAIST at WAT 2014

WAT Results

en-ja ja-en zh-ja ja-zh0

10

20

30

40

50

BLEU

en-ja ja-en zh-ja ja-zh0

20

40

60

HUMAN

OtherNAIST

First place in all tasks!

+2.2

+2.7

+3.6+1.8

+13.0

+15.0

+28.3

+3.8

Page 13: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

13

NAIST at WAT 2014

System Elements

Travatar!Same as [Neubig & Duh, ACL2014]

Recurrent NeuralNet Language Model

Pre/post Processing(UNK splitting, transliteration)

Dictionaries

Page 14: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

14

NAIST at WAT 2014

Recurrent Neural Network LM

● Vector representation → robustness

● Recurrent architecture → longer context

I can eat an apple </s>

Page 15: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

15

NAIST at WAT 2014

Pre/post processingUNK segmentation (ja-en)

球内部

球 内部

試験 管立て

試験 管 立て

Kanji Normalization (ja-zh, zh-ja)

イチョウ黄叶

イチョウ黄葉

臭気鉴定师

臭気鑑定師

Transliteration (ja-en)

Japan インテック

Japan Intekku

Dictionary addition (ja-en)

膿瘍

apostema

典型

archetype

Page 16: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

16

NAIST at WAT 2014

Conclusion

Page 17: Forest-to-String SMT for Asian Language Translation: NAIST ...Travatar! Same as [Neubig & Duh, ACL2014] Recurrent Neural Net Language Model Pre/post Processing (UNK splitting, transliteration)

17

NAIST at WAT 2014

Future Work

LOSE at next year's WAT.(Make Travatar so easy to use that others can use it to make really good MT systems for Asian languages.)

Starting soon! Training scripts to be available:http://phontron.com/project/wat2014