Upload
yusuke-oda
View
545
Download
2
Embed Size (px)
DESCRIPTION
NAISTの機械翻訳グループの勉強会で発表した資料です。
Citation preview
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 1
Tree-based Translation Models
Yusuke Oda@odashi_t
2014/6/5 NAIST MT-Study Group
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 2
Agenda● (6.2) Synchronous Context Free Grammar (SCFG)
– (6.2.2) Learning SCFG
– (6.2.3) Introducing Syntax Labels
– (6.2.4) Features
– (6.2.5) Decoding
– (6.2.6) Rescoring
● (6.3) Synchronous Tree Substitution Grammar (STSG)
– (6.3.1) Characteristics of STSG
– (6.3.2) Learning STSG
– (6.3.3) Features
– (6.4.4) Decoding
– (6.3.5) Binarization
● (6.4) Synchronous Parsing
– (6.4.1) Inversion Transduction Grammar (ITG)
– (6.4.2) Span Pruning
– (6.4.3) Beam Search
– (6.4.4) Two Parsing
Hiero
Travatar
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 3
Synchronous Context Free Grammar(SCFG)
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 4
Learning SCFG
● Synchronous rules are retrieved from each parallel corpora and their word alignment .
● : Source sentence
● : Target sentence
● : Set of word alignment
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 5
Closed Phrase Pair under Word Alignment● A phrase pair is closed under its word alignment
● Phrase pair and alignment satisfy below:
he
will
dissolve
the
diet
in
the
near
future彼 は 近い
うち
に 国会
を 解散
する
(国会 を → the diet)
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 6
Extracting Abstract Rules
● We can make more abstract synchronous rules by replacing some words in a phrase pair into a non-terminal symbol, when the phrase pair covers other "small" phrase pair.
dissolve
the
diet
in
the
near
future
近い
うち
に 国会
を 解散
する
dissolve
in
the
に 解散
する
(国会 を, the diet)
(近い うち, near future)
(近い うち ... 解散 する, dissolve the ... near future)
(X1 に X2 解散 する, dissolve X2 in the X1)
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 7
Hiero Grammar
● Hierarchical phrase grammar (Hiero Grammar):
– Set of all synchronous rule in the parallel corpus
● Algorithm:
1.
where is the set of all possible phrase pair in the parallel corpora.
2. If a rule and a phrase pair satisfies then
3.
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 8
Constraints of Hiero Rules
● To suppress size and ambiguity of Hiero grammar, we can introduce some constraints for rule extraction.
● Minimal phrase pair
– (国会 を, the diet) ... BAD
– (国会, the diet) ... GOOD
● Phrase length– (奈良 先端 科学 技術 大学院 大学 情報 科学 研究 科 自然 言語 処理 学 研究 室, ...) BAD (too many words)
● Number of symbol– X → 〈あらゆる X1 を 全て X2 の 方 へ ねじ曲げ た の だ, ...〉 BAD (too many symbols)
● Rank of rule– X → 〈X1 が X2 で X3 に X4 した, ...〉 BAD (too many non-terminals)
the
diet
国会
を
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 9
Glue Rules
● To make large size sentence using small rules, we introduce glue rules as below:
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 10
Introducing Syntax Labels
● Up to here, we considered basic ideas of Hiero rules.
– non-terminal symbol are only and .
● This model is very simple, but very ambiguous.
● Next, we introduce syntax information into Hiero rules.
= Syntax-augmented machine translation (SAMT)
S
NP VP
PRP VBZ DT NN
this is a pen
NP
Hiero Syntax
+ → SAMT
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 11
Combinatorial Categorical Grammar (CCG)
● SAMT uses categories (≒partial structure of syntax label) based on the idea of combinatorial categorical grammar (CCG) .
● Categories:
: Syntax label with absence of right-side child
: Syntax label with absence of left-side child
: Concatenation of two syntax labels and
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 12
Extracting SAMT Rules
dissolve
the
diet
in
the
near
future
近い
うち
に 国会
を 解散
する
VP
VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
NP
NP
PP
VP
NP\DT
IN+DT
VP/PP
VP\VB
VP → 〈NP\DT1 に NP2 解散 する, dissolve NP2 in the NP\DT1〉
VP → 〈近い うち IN+DT1 国会 を VB2, VB2 the diet IN+DT1 near future〉
etc...
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 13
Probabilistic Formalization of Hiero Model
● We consider that the translation problem using Hiero grammar is maximization of posterior probability (similar to phrase based model):
● And we assume the probability is modeled as log-linear model:
: Set of derivation (≒ set of used synchronous rules)
: Weights
: Feature functions
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 14
Features of Hiero Model (1)
● Generative model: likelihoods of translation probability
Forward model:
Backward model:
where
Forward
Backward
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 15
Features of Hiero Model (2)
● Generative model: likelihoods of translation probability
Syntax model (f):
Syntax model (e):
where
Syntax (f) Syntax (e)
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 16
Features of Hiero Model (3)
● Lexical translation model: goodness of phrase alignment
Forward model:
Backward model:
whereForward
Backward
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 17
Features of Hiero Model (4)
● Language model: measuring fluency of hypothesis
Out-of-vocabulary (OOV) penalty: adjusting LM
● Length penalty: adjusting number of words in hypothesis
Glueing penalty: adjusting number of glue rules in derivation
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 18
Decoding of Hiero Model
● Now input sentence and set of SCFG rules are given, we find the optimal output sequence :
: Set of possible derivation given a grammar
: Sequence of terminal symbols in given derivationn
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 19
Decoding Process
1. Calculate intersection between and .• = Generating syntax forest using CYK algorithm
2. Transform syntax forest into corresponding translation forest .
3. Output the sequence of terminal symbols in that maximizes model score.
S
NP VP
PP NP V
NP P NP
が
犬
本
の
上に
座った
S
NP VP
the dog V NP PP
sat NP of P NP
the upper on the book
"犬 が 本 の 上 に 座った"
"the dog sat on the book"
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 20
Synchronous Tree Substitution Grammar(STSG)
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 21
Synchronous Tree Substitution Grammar
● STSG is a extension of Tree Substitution Grammar (TSG) for bilingual analysis.
● STSG is a subset of Synchronous Tree Adjoining Grammar (STAG).
● Definition:
SCFG (Hiero)
STSG
STAG
U
USet of non-terminal symbol
Start symbol
Set of terminal symbol
Set of rules
Weight semiring
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 22
Synchronous Rules of STSG
● Definition:
where : Elementary tree (source language)
: Elementary tree (target language)
: Association between and
● All rules are also associated a weight:
S
x1:NP VP
x2:NP V
開けた
S
x1:NP VP
VBD x2:NP
opened
frontier
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 23
Expressive Power of STSG
● SCFG cannot express the difference of syntax, but STSG can treat it.
● Example:
– This synchronous rule cannot generate using more smaller SCFG rulesbecause these trees not corresponds any structure.
– STSG framework can treat these correspondence of tree structure directly.
NP
NP PP
N P x1:CD PC
犬 が 匹
NP
NNSx1:CD
dogs
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 24
Translation Models under STSG Framework
● In the STSG framework, we can use the sequence of frontier nodes (leaves of synchronous rule) instead of full tree.
● 4 translation models are available when we choose either tree or sequence of frontier as data structure about source and target language.
Target : frontier Target : tree
Source : frontierString-to-string
translation(= SCFG)
String-to-treetranslation
Source : treeTree-to-string
translationTree-to-treetranslation
S
x1:NP
VP
x2:NP
V
開けた
Tree
sequence of frontier nodes
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 25
Retrieving STSG Synchronous Rules
● Heuristic method (similar to SCFG rule extraction)
: Syntax tree generated from source sentence
: Syntax tree generated from target sentence
dissolve
the
diet
in
the
near
future
近い
うち
に 国会
を 解散
する
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
VP
PP NP VP
N P
NP
NP VP PVP
x1:PP x2:NP VP
V P
解散 する
VP
x1:PPx2:NPVB
dissolve
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 26
GHKM Algorithm
● Galley-Hopkins-Kinght-Marcu (GHKM) Algorithm
– Generating STSG synchronous rules (string-to-tree rules) by composing minimal rules using inside-outside algorithm.
Minimal ruleSyntax tree
1.Detecting minimal rulesfrom target syntax trees.
2.Generating large synchronousrules by composing minimalrules.
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 27
GHKM: Alignment Span (1)
● Alignment span :
– Set of indexes of words in source sentence aligned to partial tree
● Complement alignment span :
– Set of indexes of words in source sentence aligned to other than
● Closure :
– Minimum range that covers the alignment span
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 28
GHKM: Alignment Span (2)
he
will
dissolve
the
diet
in
the
near
future
彼 は 近い
うち
に 国会
を 解散
する
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
SNP
PRP
MD
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 29
GHKM: Admissible Node
● Admissible node:
– Node in target syntax tree that satisfies:
he
will
dissolve
the
diet
in
the
near
future
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
SNP
PRP
MD
彼 は 近い
うち
に 国会
を 解散
する
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 30
GHKM: Minimal Rule
● Split the syntax tree by admissible node
he
will
dissolve
the
diet
in
the
near
future
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
SNP
PRP
MD
彼 は 近い
うち
に 国会
を 解散
する
VP
x1:PP x2:NP x3:VB
x
x3 x2 x1VP
the near future
x
近い うち
DT JJ NN
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 31
Extension for Tree-to-tree Model (1)
● We need to extract node pairs of two syntax trees that are admissible each other.
● First, find admissible nodes in given .
● A node pair satisfies below then they are bidirectional admissible:
● Span :
– Minimum range over sentence that covers all terminal symbols in
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 32
Extension for Tree-to-tree Model (2)
dissolve
the
diet
in
the
near
future
近い
うち
に 国会
を 解散
する
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
VP
PP NP VP
N P
NP
NP VP P
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 33
Features of STSG Model (1)
● Generative model: likelihoods of translation probability
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 34
Features of STSG Model (2)
● Lexical translation model: goodness of phrase alignment
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 35
Features of STSG Model (3)
● Height penalty: adjusting depth of derivation
● Internal node penalty: adjusting total size of derivation
● Some features introduced to Hiero model are also available
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 36
Decoding of STSG Model
● STSG decoding is basically same method as Hiero decoding:
Depends on translation model
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 37
Difference of Formalization of Each Model
● String-to-string model
– Same model as Hiero (SCFG) model.
● String-to-tree model
– Never use any informations from syntax of source sentence.
● Tree-to-string model
● Tree-to-tree model
– Explicitly use syntax informations of source sentence.
– Translation process can be divided into syntax analysis and decoding.
Sourcesentence
Syntax treeof source sentence
Translationhypothesi(e)s
Syntaxanalyzer Decoder
Non-syntax-basedtranslation
Syntax(tree)-basedtranslation
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 38
Formalization of Syntax-based Translation
● Syntax-based translation model uses the syntax tree of source sentence.
● We can ignore because is already decided while syntax analysis.
14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 39
Questions & Discussions