23
Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Discriminative Learning of Extraction Sets for Machine Translation

John DeNero and Dan KleinUC Berkeley

Page 2: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Identifying Phrasal Translations

In the past two years , a number of US citizens …

过去 两 年 中 , 一 批 美国 公民 …

past two year in , one lots US citizen

Phrase alignment models: Choose a segmentation and a one-to-one phrase alignment

Past Go over

Underlying assumption: There is a correct phrasal segmentation

Page 3: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Unique Segmentations?

In the past two years , a number of US citizens …

过去 两 年 中 , 一 批 美国 公民 …

past two year in , one lots US citizen

Problem 1: Overlapping phrases can be useful (and complementary)

Problem 2: Phrases and their sub-phrases can both be useful

Hypothesis: This is why models of phrase alignment don’t work well

Page 4: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Identifying Phrasal Translations

This talk: Modeling sets of overlapping, multi-scale phrase pairs

In the past two years , a number of US citizens …

过去 两 年 中 , 一 批 美国 公民 …

past two year in , one lots US citizen

Input: sentence pairs

Output: extracted phrases

Page 5: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

… But the Standard Pipeline has Overlap!

M O T I V A T I O N

In the past two years

过去

past

two

year

in

Sentence Pair

Word Alignment

Extracted Phrases

Page 6: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Related Work

M O T I V A T I O N

Sentence Pair

Word Alignment

Extracted Phrases

Translation models: Sinuhe system (Kääriäinen, 2009)

Combining Aligners: Yonggang Deng & Bowen Zhou (2009)

Fixed alignments; learned phrase pair weights

Fixed directional alignments; learned symmetrization

Extraction models: Moore and Quirk, 2007

Fixed alignments; learned phrase pair weights

Page 7: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Our Task: Predict Extraction Sets

M O T I V A T I O N

Sentence Pair

Extracted Phrases

Conditional model of extraction sets given sentence pairs

In the past two years

过去两年中

0

1

2

3

40 1 2 3 4 5

In the past two years

过去两年中

0

1

2

3

40 1 2 3 4 5

Extracted Phrases + ``Word Alignments’’

Page 8: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Alignments Imply Extraction Sets

M O D E L

In the past two years

过去

past

two

year

in

0

1

2

3

40 1 2 3 4 5

Word-level alignment

links

Word-to-span alignments

Extraction set of bispans

Page 9: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Nulls and Possibles

报道

according to

news report

it is reported

报道

according to

news report

it is reported

Nulls:

Possibles:

Page 10: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Incorporating Possible Alignments

M O D E L

In the past two years

过去

past

two

year

in

0

1

2

3

40 1 2 3 4 5

Sure and possible

word links

Word-to-span alignments

Extraction set of bispans

Page 11: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Linear Model for Extraction Sets

M O D E L

In the past two years

过去

0

1

2

3

40 1 2 3 4 5

Features on sure links

Features on all bispans

Page 12: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Features on Bispans and Sure Links

F E A T U R E S

地球

go over

Earth

over the Earth

Some features on sure links

HMM posteriors

Presence in dictionary

Numbers & punctuation

Features on bispans

HMM phrase table features: e.g., phrase relative frequencies

Lexical indicator features for phrases with common words

Monolingual phrase features: e.g., “the _____”

Shape features: e.g., Chinese character counts

Page 13: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Getting Gold Extraction Sets

T R A I N I N G

Hand Aligned: Sure and possible

word links

Word-to-span alignments

Extraction set of bispans

Deterministic: A bispan is included iff every word within the bispan aligns within the bispan

Deterministic: Find min and max alignment index for each word

Page 14: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Discriminative Training with MIRA

T R A I N I N G

Loss function: F-score of bispan errors (precision & recall)

Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin

Gold (annotated) Guess (arg max w ɸ)∙

Page 15: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Inference: An ITG Parser

I N F E R E N C E

ITG captures some bispans

Page 16: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Coarse-to-Fine Approximation

I N F E R E N C E

Coarse Pass: Features that are local to terminal productions

Fine Pass: Agenda search using coarse pass as a heuristic

We use an agenda-based parser. It’s fast!

Page 17: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Experimental Setup

R E S U L T S

Chinese-to-English newswire

Parallel corpus: 11.3 million words; sentences length ≤ 40

MT systems: Tuned and tested on NIST ‘04 and ‘05

Supervised data: 150 training & 191 test sentences (NIST ‘02)

Unsupervised Model: Jointly trained HMM (Berkeley Aligner)

Page 18: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Baselines and Limited Systems

R E S U L T S

HMM:

ITG:

Coarse:

State-of-the-art unsupervised baseline

Joint training & competitive posterior decoding

Source of many features for supervised models

Supervised ITG aligner with block terminals

State-of-the-art supervised baseline

Re-implementation of Haghighi et al., 2009

Supervised block ITG + possible alignments

Coarse pass of full extraction set model

Page 19: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Word Alignment Performance

R E S U L T S

Precision

Recall

1 - AER

84.7

84.0

84.4

82.2

84.2

83.1

83.4

83.8

83.6

84.0

76.9

80.4 HMMITGCoarseFull

Page 20: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Extracted Bispan Performance

R E S U L T S

Precision

Recall

F1

F5

69.0

74.2

71.6

74.0

70.0

72.9

71.4

72.8

75.8

62.3

68.4

62.8

69.5

59.5

64.1

59.9

HMMITGCoarseFull

Page 21: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Translation Performance (BLEU)

R E S U L T S

Moses

Joshua

31.5 32 32.5 33 33.5 34 34.5 35 35.5 36 36.5

34.4

35.9

34.2

35.7

33.6

34.7

33.2

34.5

HMMITGCoarseFull

Supervised conditions also included HMM alignments

Page 22: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Conclusions

Extraction set model directly learns what phrases to extract

The system performs well as an aligner or a rule extractor

Are segmentations always bad?

Idea: get overlap and multi-scale into the learning!

Page 23: Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Thank you!

nlp.cs.berkeley.edu