Upload
deborah-hampton
View
215
Download
0
Embed Size (px)
Citation preview
Structural Phrase AlignmentBased on Consistency Criteria
Toshiaki Nakazawa, Kun Yu, Sadao Kurohashi(Graduate School of Informatics, Kyoto University) {nakazawa, kunyu}@nlp.kuee.kyoto-u.ac.jp [email protected]
my
traffic
The light
was green
when
entering
the intersection
Language Models
My traffic light was green when entering the intersection.
Output
came
at me
from the side
at the intersection
私 の
サイン
家 に
入る
時
脱ぐ
交差
点 で 、
突然
飛び出して 来た のです 。
信号 は
青
でした 。
my
signature
traffic
The light
was green
to remove
when
entering
a house
Translation Examples
(suddenly)
(rush out)
(house)
(put off)
(signal)
(enter)(when)
(cross)
(point)
(my)
(signal)
(blue)
(was)
Input
交差
点 に
入る
時
私 の
信号 は
青
でした 。
(cross)
(point)
(enter)
(when)
(my)
(signal)
(blue)
(was)
交差点に入る時私の信号は青でした。
Near!
Far!
i j
jiEjiJalignment
aadaadcs ),(),,(maxargJ-Side Distance E-Side Distance
Consistency ScoreFrequency (log)
Dist of J-SideDist of E-Side
Score
J-Side Distance
E-SideDistance
Flow of Our EBMT System Core Steps of Alignment• Searching Correspondence Candidates
– Fine alignment is efficient in translation
– Search candidates as much as possible using variety of linguistic information• Bilingual dictionaries• Transliteration (Katakana words, NEs) ローズワイン → rosuwain ⇔ rose wine (similarity:0.78)
新宿 → shinjuku ⇔ shinjuku (similarity:1.0)
• Numeral normalization 二百十六万 → 2,160,000 ← 2.16 million
• Japanese flexible matching (Odani et. al. 2007)• Substring co-occurrence measure (Cromieres 2006)
• Selecting Correspondence Candidates– More candidates derive more ambiguities and improper alignments
– Necessity of robust alignment method which can align parallel sentences consistently by selecting the adequate candidates set
Pre Rec F
Baseline 77.47 64.32 70.29
+Consistency Score 80.30 66.90 72.99
Proposed(+CS,+DpndType) 80.77 69.14 74.51
Filtering (80%) 82.48 71.31 76.49
Moses (SMT Toolkit)* 60.19 33.15 42.75
Manual (upper bound) 95.58 89.80 92.60
English-French
English-Romanian
English-Korean
HLT-NAACL 2003 5.71 28.86 -
ACL 2005 - 26.55 -
( Gildea, 2003 ) - - 32
GIZA++ 15.89 27.19 35
Experimental Result• 500 test sentences from Mainichi newspaper parallel corpus
• Bilingual dictionary: KENKYUSYA J-E/J-E 500K entries
• Evaluation criteria: Precision / Recall / F-measure
• Character-base for Japanese, word-base for English
Quality of Other Language Pairs
* Using 300K newspaper domain bi-sentences for training
(AER)
Conclusion
Selecting Correspondence CandidatesUsing Consistency Score and Dependency Type
you
will have to file
insurance
an claim
insurance
with the office
in Japan
日本 で
保険
会社 に 対して
保険
請求 の
申し立て が
可能ですよ
(in Japan)
(insurance)
(insurance)
(to company)
(claim)
(instance)
(you can)
Ambiguities!
Improper alignments!
Distribution of the distance of alignment pairs in hand-annotated data (Mainichi newspaper 40K sentence pairs) [Uchimoto04]
Consistency Score Function
“Near-Near” pair → Positive Score“Far-Far” pair → 0“Near-Far” pair → Negative Score
1/1+1/2=1.5
EJ
EJ ddddcs
11,
baseline
Japanese
predicate: level C 6
predicate: level B+/B 5
predicate: level B-/A 4
case no / rentai 2
Inside clause 1
predicate: level A-
Others 3
English
S / SBAR / SQ … 5
VP / WHADVP 4
WHADJP
ADVP / ADJP
NP / PP / INTJ
3
QP / PRT / PRN
Others 1
Dependency Type Distance
How to reflect the inconsistency?
• Proposed a new phrase alignment method using consistency criteria.• Enough alignment accuracy compared to other language pairs.• We need to acquire the parameters automatically by machine learning.• We are planning to evolve the framework which revises the parse result.
(There is a translation demos in exhibition corner by NICT which is using our system!)
you
will have to file
insurance
an claim
insurance
with the office
in Japan
日本 で
保険
会社 に 対して
保険
請求 の
申し立て が
可能です よ
3
1
1
3
2
3
3
3
3
1
1
デ格
文節内
連用
文節内
ノ格
ガ格
NP
NP
NN
PP
NN
PP
3
Pair 1:(Ds, Dt) = (1, 1)Positive Score
Pair 2:(Ds, Dt) = (1, 7)Negative Score
(in Japan)
(insurance)
(insurance)
(to company)
(claim)
(instance)
(you can)
[case “de”]
[case “ga”]
[renyou]
[case “no”]
[inside clause]
[inside clause]
Near!
Far!