Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
1
NAIST at WAT 2014
Forest-to-String SMT for Asian Language Translation:NAIST at WAT 2014
Graham NeubigNara Institute of Science and Technology (NAIST)
2014-10-4
2
NAIST at WAT 2014
Features of ASPEC
● Translation between languages with different grammatical structures
流動 プラズマ を 正確 に 測定 する ため に 画像 を 再 構成 した 。
an image was reconstituted in order to measure flowing plasma correctly .
● We all know: Phrase-based MT is not enough
for the accurate measurement of plasma flow image was reconstructed .
3
NAIST at WAT 2014
Solution?: 2-step Translation Process
● Pre-ordering [Weblio, SAS_MT, NII, TMU, NICT]
● RBMT+Statistical Post Editing [TOSHIBA, EIWA]
我々 は 科学論文 を 翻訳する
我々 翻訳する 科学論文
we translatescientific papers
我々 は 科学論文 を 翻訳する
we translatescience thesis
we translatescientific papers
4
NAIST at WAT 2014
This is a lot of work... :(How do I make good
Japanese-Englishpreordering rules?!
How do I make goodJapanese-Chinese
preorderering rules?!
What about error propagation?
What if better preorderingaccuracy doesn't equal better
translation accuracy?
5
NAIST at WAT 2014
Evidence
6
NAIST at WAT 2014
Our Solution: Tree-to-String Translation [Liu+ 06]
友達 と ご飯 を 食べ た
SUF5
VP0-5
PP0-1
VP2-5
PP2-3
N2
P3
V4
N0
P1
VP4-5
a meal
a meal
x1 x0
x1 x0
my friend
my friend
x1 with x0
x1 x0
with
ate
ate
7
NAIST at WAT 2014
Requirements for aTree-to-String Model
This is a test . It uses data .
これはテストです。 データを使用します。ParallelCorpus
Source SentenceParser
Alignments
Rule ExtractionRule ScoringOptimization
Tree-to-StringModel
8
NAIST at WAT 2014
Reducing our work load.How do I make good
Japanese-Englishpreordering rules?!
How do I make goodJapanese-Chinese
preorderering rules?!
What about error propagation?
What if better preorderingaccuracy doesn't equal better
translation accuracy?
XX
X
9
NAIST at WAT 2014
Forest-to-string Translation[Mi+ 08]
I saw a girl with a telescope
PRP0,1
VBD1,2
DT2,3
NN3,4
IN4,5
DT5,6
NN6,7
NP5,7
NP2,4
PP4,7
VP1,7
S0,7
NP2,7
NP0,1
10
NAIST at WAT 2014
Travatar Toolkit
● Forest-to-string translation toolkit
● Supports training, decoding
● Includes preprocessing scripts for parsing, etc.
● Many other features (optimization, Hiero, etc...)
Available open source!http://phontron.com/travatar
11
NAIST at WAT 2014
NAIST WAT System
12
NAIST at WAT 2014
WAT Results
en-ja ja-en zh-ja ja-zh0
10
20
30
40
50
BLEU
en-ja ja-en zh-ja ja-zh0
20
40
60
HUMAN
OtherNAIST
First place in all tasks!
+2.2
+2.7
+3.6+1.8
+13.0
+15.0
+28.3
+3.8
13
NAIST at WAT 2014
System Elements
Travatar!Same as [Neubig & Duh, ACL2014]
Recurrent NeuralNet Language Model
Pre/post Processing(UNK splitting, transliteration)
Dictionaries
14
NAIST at WAT 2014
Recurrent Neural Network LM
● Vector representation → robustness
● Recurrent architecture → longer context
I can eat an apple </s>
15
NAIST at WAT 2014
Pre/post processingUNK segmentation (ja-en)
球内部
球 内部
試験 管立て
試験 管 立て
Kanji Normalization (ja-zh, zh-ja)
イチョウ黄叶
イチョウ黄葉
臭気鉴定师
臭気鑑定師
Transliteration (ja-en)
Japan インテック
Japan Intekku
Dictionary addition (ja-en)
膿瘍
apostema
典型
archetype
16
NAIST at WAT 2014
Conclusion
17
NAIST at WAT 2014
Future Work
LOSE at next year's WAT.(Make Travatar so easy to use that others can use it to make really good MT systems for Asian languages.)
Starting soon! Training scripts to be available:http://phontron.com/project/wat2014