Tree-based Translation Models (『機械翻訳』§6.2-6.3)

14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 1

Tree-based Translation Models

Yusuke Oda@odashi_t

2014/6/5 NAIST MT-Study Group


Agenda● (6.2) Synchronous Context Free Grammar (SCFG)

– (6.2.2) Learning SCFG

– (6.2.3) Introducing Syntax Labels

– (6.2.4) Features

– (6.2.5) Decoding

– (6.2.6) Rescoring

● (6.3) Synchronous Tree Substitution Grammar (STSG)

– (6.3.1) Characteristics of STSG

– (6.3.2) Learning STSG

– (6.3.3) Features

– (6.4.4) Decoding

– (6.3.5) Binarization

● (6.4) Synchronous Parsing

– (6.4.1) Inversion Transduction Grammar (ITG)

– (6.4.2) Span Pruning

– (6.4.3) Beam Search

– (6.4.4) Two Parsing

Hiero

Travatar


Synchronous Context Free Grammar(SCFG)


Learning SCFG

● Synchronous rules are retrieved from each parallel corpora and their word alignment .

● : Source sentence

● : Target sentence

● : Set of word alignment


Closed Phrase Pair under Word Alignment● A phrase pair is closed under its word alignment

● Phrase pair and alignment satisfy below:

he

will

dissolve

the

diet

in

the

near

future彼は近い

うち

に国会

を解散

する

(国会を → the diet)


Extracting Abstract Rules

● We can make more abstract synchronous rules by replacing some words in a phrase pair into a non-terminal symbol, when the phrase pair covers other "small" phrase pair.

dissolve

the

diet

in

the

near

future

近い

うち

に国会

を解散

する

dissolve

in

the

に解散

する

(国会を, the diet)

(近いうち, near future)

(近いうち ... 解散する, dissolve the ... near future)

(X1 に X2 解散する, dissolve X2 in the X1)


Hiero Grammar

● Hierarchical phrase grammar (Hiero Grammar):

– Set of all synchronous rule in the parallel corpus

● Algorithm:

1.

where is the set of all possible phrase pair in the parallel corpora.

2. If a rule and a phrase pair satisfies then

3.


Constraints of Hiero Rules

● To suppress size and ambiguity of Hiero grammar, we can introduce some constraints for rule extraction.

● Minimal phrase pair

– (国会を, the diet) ... BAD

– (国会, the diet) ... GOOD

● Phrase length– (奈良先端科学技術大学院大学情報科学研究科自然言語処理学研究室, ...) BAD (too many words)

● Number of symbol– X → 〈あらゆる X1 を全て X2 の方へねじ曲げたのだ, ...〉 BAD (too many symbols)

● Rank of rule– X → 〈X1 が X2 で X3 に X4 した, ...〉 BAD (too many non-terminals)

the

diet

国会

を


Glue Rules

● To make large size sentence using small rules, we introduce glue rules as below:


Introducing Syntax Labels

● Up to here, we considered basic ideas of Hiero rules.

– non-terminal symbol are only and .

● This model is very simple, but very ambiguous.

● Next, we introduce syntax information into Hiero rules.

= Syntax-augmented machine translation (SAMT)

S

NP VP

PRP VBZ DT NN

this is a pen

NP

Hiero Syntax

+ →　SAMT


Combinatorial Categorical Grammar (CCG)

● SAMT uses categories (≒partial structure of syntax label) based on the idea of combinatorial categorical grammar (CCG) .

● Categories:

: Syntax label with absence of right-side child

: Syntax label with absence of left-side child

: Concatenation of two syntax labels and


Extracting SAMT Rules

dissolve

the

diet

in

the

near

future

近い

うち

に国会

を解散

する

VP

VB

NP

PP

DT

NNP

IN

NP

DT

JJ

NN

NP

NP

PP

VP

NP\DT

IN+DT

VP/PP

VP\VB

VP → 〈NP\DT1 に NP2 解散する, dissolve NP2 in the NP\DT1〉

VP → 〈近いうち IN+DT1 国会を VB2, VB2 the diet IN+DT1 near future〉

etc...


Probabilistic Formalization of Hiero Model

● We consider that the translation problem using Hiero grammar is maximization of posterior probability (similar to phrase based model):

● And we assume the probability is modeled as log-linear model:

: Set of derivation (≒ set of used synchronous rules)

: Weights

: Feature functions


Features of Hiero Model (1)

● Generative model: likelihoods of translation probability

Forward model:

Backward model:

where

Forward

Backward




Syntax model (f):

Syntax model (e):

where

Syntax (f) Syntax (e)



● Lexical translation model: goodness of phrase alignment

Forward model:

Backward model:

whereForward

Backward



● Language model: measuring fluency of hypothesis

Out-of-vocabulary (OOV) penalty: adjusting LM

● Length penalty: adjusting number of words in hypothesis

Glueing penalty: adjusting number of glue rules in derivation


Decoding of Hiero Model

● Now input sentence and set of SCFG rules are given, we find the optimal output sequence :

: Set of possible derivation given a grammar

: Sequence of terminal symbols in given derivationn


Decoding Process

1. Calculate intersection between and .• = Generating syntax forest using CYK algorithm

2. Transform syntax forest into corresponding translation forest .

3. Output the sequence of terminal symbols in that maximizes model score.

S

NP VP

PP NP V

NP P NP

が

犬

本

の

上に

座った

S

NP VP

the dog V NP PP

sat NP of P NP

the upper on the book

"犬が本の上に座った"

"the dog sat on the book"


Synchronous Tree Substitution Grammar(STSG)


Synchronous Tree Substitution Grammar

● STSG is a extension of Tree Substitution Grammar (TSG) for bilingual analysis.

● STSG is a subset of Synchronous Tree Adjoining Grammar (STAG).

● Definition:

SCFG (Hiero)

STSG

STAG

U

USet of non-terminal symbol

Start symbol

Set of terminal symbol

Set of rules

Weight semiring


Synchronous Rules of STSG

● Definition:

where : Elementary tree (source language)

: Elementary tree (target language)

: Association between and

● All rules are also associated a weight:

S

x1:NP VP

x2:NP V

開けた

S

x1:NP VP

VBD x2:NP

opened

frontier


Expressive Power of STSG

● SCFG cannot express the difference of syntax, but STSG can treat it.

● Example:

– This synchronous rule cannot generate using more smaller SCFG rulesbecause these trees not corresponds any structure.

– STSG framework can treat these correspondence of tree structure directly.

NP

NP PP

N P x1:CD PC

犬が匹

NP

NNSx1:CD

dogs


Translation Models under STSG Framework

● In the STSG framework, we can use the sequence of frontier nodes (leaves of synchronous rule) instead of full tree.

● 4 translation models are available when we choose either tree or sequence of frontier as data structure about source and target language.

Target : frontier Target : tree

Source : frontierString-to-string

translation(= SCFG)

String-to-treetranslation

Source : treeTree-to-string

translationTree-to-treetranslation

S

x1:NP

VP

x2:NP

V

開けた

Tree

sequence of frontier nodes


Retrieving STSG Synchronous Rules

● Heuristic method (similar to SCFG rule extraction)

: Syntax tree generated from source sentence

: Syntax tree generated from target sentence

dissolve

the

diet

in

the

near

future

近い

うち

に国会

を解散

する

VP VB

NP

PP

DT

NNP

IN

NP

DT

JJ

NN

VP

PP NP VP

N P

NP

NP VP PVP

x1:PP x2:NP VP

V P

解散する

VP

x1:PPx2:NPVB

dissolve


GHKM Algorithm

● Galley-Hopkins-Kinght-Marcu (GHKM) Algorithm

– Generating STSG synchronous rules (string-to-tree rules) by composing minimal rules using inside-outside algorithm.

Minimal ruleSyntax tree

1.Detecting minimal rulesfrom target syntax trees.

2.Generating large synchronousrules by composing minimalrules.


GHKM: Alignment Span (1)

● Alignment span :

– Set of indexes of words in source sentence aligned to partial tree

● Complement alignment span :

– Set of indexes of words in source sentence aligned to other than

● Closure :

– Minimum range that covers the alignment span


GHKM: Alignment Span (2)

he

will

dissolve

the

diet

in

the

near

future

彼は近い

うち

に国会

を解散

する

VP VB

NP

PP

DT

NNP

IN

NP

DT

JJ

NN

SNP

PRP

MD


GHKM: Admissible Node

● Admissible node:

– Node in target syntax tree that satisfies:

he

will

dissolve

the

diet

in

the

near

future

VP VB

NP

PP

DT

NNP

IN

NP

DT

JJ

NN

SNP

PRP

MD

彼は近い

うち

に国会

を解散

する


GHKM: Minimal Rule

● Split the syntax tree by admissible node

he

will

dissolve

the

diet

in

the

near

future

VP VB

NP

PP

DT

NNP

IN

NP

DT

JJ

NN

SNP

PRP

MD

彼は近い

うち

に国会

を解散

する

VP

x1:PP x2:NP x3:VB

x

x3 x2 x1VP

the near future

x

近いうち

DT JJ NN


Extension for Tree-to-tree Model (1)

● We need to extract node pairs of two syntax trees that are admissible each other.

● First, find admissible nodes in given .

● A node pair satisfies below then they are bidirectional admissible:

● Span :

– Minimum range over sentence that covers all terminal symbols in


Extension for Tree-to-tree Model (2)

dissolve

the

diet

in

the

near

future

近い

うち

に国会

を解散

する

VP VB

NP

PP

DT

NNP

IN

NP

DT

JJ

NN

VP

PP NP VP

N P

NP

NP VP P


Features of STSG Model (1)




● Lexical translation model: goodness of phrase alignment



● Height penalty: adjusting depth of derivation

● Internal node penalty: adjusting total size of derivation

● Some features introduced to Hiero model are also available


Decoding of STSG Model

● STSG decoding is basically same method as Hiero decoding:

Depends on translation model


Difference of Formalization of Each Model

● String-to-string model

– Same model as Hiero (SCFG) model.

● String-to-tree model

– Never use any informations from syntax of source sentence.

● Tree-to-string model

● Tree-to-tree model

– Explicitly use syntax informations of source sentence.

– Translation process can be divided into syntax analysis and decoding.

Sourcesentence

Syntax treeof source sentence

Translationhypothesi(e)s

Syntaxanalyzer Decoder

Non-syntax-basedtranslation

Syntax(tree)-basedtranslation


Formalization of Syntax-based Translation

● Syntax-based translation model uses the syntax tree of source sentence.

● We can ignore because is already decided while syntax analysis.


Questions & Discussions

Education

Tree-based Translation Models (『機械翻訳』§6.2-6.3)