Upload
adele-farmer
View
217
Download
1
Embed Size (px)
Citation preview
Kyoto-U: Syntactical EBMT System for NTCIR-7 Patent
Translation Task
Kyoto University
Toshiaki Nakazawa Sadao Kurohashi
Overview of Kyoto-U SystemTranslation Examples
J: 図書館で新聞を読むE: I read a newspaper in the library
J: 政治の本が売れ残っているE: A book in politics was left on the shelf
・・・・・
・・・・・ ・・・・・
本 が売れ残って いる
政治 の a book
in politicswas left
on the shelf
図書館 で新聞 を
読む
I
read
a newspaper
in the library
library in
newspaper ACC
read
politics in
book NOM
left unsold
Overview of Kyoto-U SystemTranslation Examples
Input:図書館で政治の本を読む。
Output:I read a book in politicsin the library
本 が売れ残って いる
政治 の a book
in politicswas left
on the shelf
図書館 で新聞 を
読む
I
read
a newspaper
in the library
・・・・・ ・・・・・
図書館 で
本 を読む
政治 の
read
book ACC
politics in
library in
a book
in politics
in the library
I
read
Overview of Kyoto-U SystemTranslation Examples
Alignment
交差
点 で 、
突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
J: JUMAN/KNPE: Charniak’s nlparser → Dependency tree
Alignment
交差
点 で 、
突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
Finding Correspondences• Bilingual dictionaries (500K entries)
• Substring co-occurrence (Cromieres 2006)
• Numeral normalization
二百十六万 → 2,160,000 ← 2.16 million
• Transliteration (Katakana words, NEs) ローズワイン → rosuwain ⇔ rose wine
(similarity:0.78)新宿 → shinjuku ⇔ shinjuku (similarity:1.0)
)()(
),(
ecountjcount
ejcount
Alignment
交差
点 で 、
突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
3. Disambiguation of correspondences
Alignment
交差
点 で 、
突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
3. Disambiguation of correspondences
4. Handling of remaining phrasesExtension to leaf-nodes
Alignment
交差
点 で 、
突然
あの車 が
飛び出して 来た のです
the car
came
at me
from the side
at the intersection
1. Transformation into dependency structure
2. Detection of word(s) correspondences
3. Disambiguation of correspondences
4. Handling of remaining phrases
5. Registration to translation example database
Alignment Ambiguities
you
will have to file
insurance
an claim
insurance
with the office
in Japan
日本 で
保険
会社 に 対して
保険
請求 の
申し立て が
可能です よ
[in Japan]
[insurance]
[insurance]
[of claim]
[to the company]
[file]
[be able to]
• For each pair of candidates ai and aj
calculate the J-side distance dJ and
the E-side distance dE
• Give a consistency score to the pair based on dJ and dE
• Calculate consistency scores for all the pairs in a possible set of alignment candidates
2/)1(
),(),,(maxarg 1 1
nn
aadaadcsn
i
n
ij jiEjiJ
alignment
Consistency Score• The frequency of distance pair in gold-standard
alignment data (Mainichi newspaper 40K sentence pairs) [Uchimoto04]
Frequency (log)
Dist of J-Side Dist of E-Side
Distance based on Dependency Type
you
will have to file
insurance
an claim
insurance
with the office
in Japan
日本 で
保険
会社 に 対して
保険
請求 の
申し立て が
可能です よ
デ格
文節内
連用
文節内
ノ格
ガ格
NP
NP
NN
PP
NN
PP
3
1
1
3
2
3
3
3
3
3
1
1
[in Japan]
[insurance]
[insurance]
[of claim]
[to the company]
[file]
[be able to]
you
will have to file
insurance
an claim
insurance
with the office
in Japan
日本 で
保険
会社 に 対して
保険
請求 の
申し立て が
可能です よ
デ格
文節内
連用
文節内
ノ格
ガ格
NP
NP
NN
PP
NN
PP
3
1
1
3
2
3
3
3
3
3
1
1
[in Japan]
[insurance]
[insurance]
[of claim]
[to the company]
[file]
[be able to]
Distance based on Dependency Type
you
will have to file
insurance
an claim
insurance
with the office
in Japan
日本 で
保険
会社 に 対して
保険
請求 の
申し立て が
可能です よ
3
1
1
3
2
3
3
3
3
1
1
デ格
文節内
連用
文節内
ノ格
ガ格
NP
NP
NN
PP
NN
PP
3
[in Japan]
[insurance]
[insurance]
[of claim]
[to the company]
[file]
[be able to]
Distance based on Dependency Type
Input:図書館で政治の本を読む。
Output:I read a book in politicsin the library
本 が売れ残って いる
政治 の a book
in politicswas left
on the shelf
図書館 で新聞 を
読む
I
read
a newspaper
in the library
・・・・・ ・・・・・
図書館 で
本 を読む
政治 の
read
book ACC
politics in
library in
a book
in politics
in the library
I
read
TranslationTranslation Examples
Selection of Translation Examples
• Score for an example
1. Size of an example
2. Similarity of neighboring nodes
3. Translation probability
• Beam search from the root of the input
[Sato 91]
Input:
図書館 で
本 を読む
政治 の
read
book ACC
politics in
library in
2sizew
読む
a newspaper
I
read
a newspaper
in the library
I
study
in the library
I
read
a newspaper
in the library
7.0 simw3
2 transw
0.7
Translation example:
新聞 を図書館 で
Input:図書館で政治の本を読む。
本 が売れ残って いる
政治 の a book
in politicswas left
on the shelf
図書館 で新聞 を
読む
I
read
a newspaper
in the library
・・・・・ ・・・・・
図書館 で
本 を読む
政治 の
read
book ACC
politics in
library in
a book
in politics
in the library
I
read
Combination of TMsTranslation Examples
┌ 記録 ┌ 領域 で の ├ 変形 ┌ 形状 と , │ ┌ 記録 ├ 特性 の┌ 関係 を調べた 。
┌ the relationship ││ ┌ deformation ││┌ shape and │││ │ ┌ recording │││ └ in the region ││├ recording │└ between characteristics was examined
InputDependency Tree
Input :記録領域での変形形状と,記録特性の関係を調べた。
OutputDependency Tree┌ 状況 を
調べた 。┌ the situationwas examined
┌ 相互 ┌ 作用 と │┌ 記録 ├ 特性 の┌ 関係 を調べた 。
┌ the relationship ││┌ interaction and ││├ recording │└ between characteristics was investigated
┌ 大変 ┌ 形 ┌ 領域 で の ├ 断面┌ 形状 を模擬 した
┌ cross-sectional ┌ shape ││ ┌ large ││┌ deformation │└ in the region was └ simulated
┌ 記録領域 の
┌ recording of the areas
┌ 変形パターン を
┌ deformation the pattern
Translation Examples
Output :The relationship between deformation shape in the recording region and recording characteristics was examined .
BLEU Adequacy Fluency Average
27.20 NTT 3.81 tsbmt 4.02 Japio 3.88 tsbmt
27.14 moses 3.71 Japio 3.94 tsbmt 3.86 Japio
27.14 MIT 3.15 MIT 3.66 MIT 3.40 MIT
25.48 NAIST-NTT 2.96 NTT 3.65 NTT 3.30 NTT
24.79 NICT-ATR 2.85 Kyoto-U 3.55 moses 3.18 moses
24.49 KLE 2.81 moses 3.44 tori 3.10 Kyoto-U
23.10 tsbmt 2.66 NAIST-NTT 3.43 NAIST-NTT 3.04 NAIST-NTT
22.29 tori 2.59 KLE 3.35 Kyoto-U 3.01 tori
21.57 Kyoto-U 2.58 tori 3.28 HIT2 2.94 KLE
19.93 mibel 2.47 NICT-ATR 3.28 KLE 2.86 HIT2
19.48 HIT2 2.44 HIT2 3.09 mibel 2.78 NICT-ATR
19.46 Japio 2.38 mibel 3.08 NICT-ATR 2.74 mibel
15.90 TH 1.87 TH 2.42 FDU-MCandWI 2.13 TH
9.55 FDU-MCandWI 1.75 FDU-MCandWI 2.39 TH 2.08 FDU-MCandWI
1.41 NTNU 1.08 NTNU 1.04 NTNU 1.06 NTNU
Intrinsic J-E Evaluation Result
BLEU Adequacy Fluency Average
30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt
29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses
28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT
22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR
17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U
Intrinsic E-JEvaluation Result
• Not caring whether a child node is a pre-child or post-child– Resulting target structure goes wrong
• After resolving this defect, BLEU score in EJ translation rose to 24.02 from 22.65
Critical Defect in EJ Translation
BLEU Adequacy Fluency Average
30.58 moses 3.53 tsbmt 3.69 moses 3.60 tsbmt
29.15 NICT-ATR 2.90 moses 3.67 tsbmt 3.30 moses
28.07 NTT 2.74 NTT 3.54 NTT 3.14 NTT
22.65 Kyoto-U 2.59 NICT-ATR 3.20 NICT-ATR 2.89 NICT-ATR
17.46 tsbmt 2.42 Kyoto-U 2.54 Kyoto-U 2.48 Kyoto-U
24.02 ? ? ?
• Kyoto-U Fully Syntactic EBMT system:1. Alignment: Consistency
2. Alignment: Extension
3. Translation: Discontinuous example
4. Translation: Easy combination
• By using syntactic information, we could achieve reasonably high quality translation
• For patent translation, we may need some pre-processings to handle special expressions which cause parsing errors
Conclusion