Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC...

Discriminative Learning of Extraction Sets for Machine Translation

John DeNero and Dan KleinUC Berkeley

Identifying Phrasal Translations

In the past two years , a number of US citizens …

过去两年中 , 一批美国公民 …

past two year in , one lots US citizen

Phrase alignment models: Choose a segmentation and a one-to-one phrase alignment

Past Go over

Underlying assumption: There is a correct phrasal segmentation

Unique Segmentations?

Problem 1: Overlapping phrases can be useful (and complementary)

Problem 2: Phrases and their sub-phrases can both be useful

Hypothesis: This is why models of phrase alignment don’t work well

Identifying Phrasal Translations

This talk: Modeling sets of overlapping, multi-scale phrase pairs

Input: sentence pairs

Output: extracted phrases

… But the Standard Pipeline has Overlap!

M O T I V A T I O N

In the past two years

过去

Sentence Pair

Word Alignment

Extracted Phrases

Related Work

M O T I V A T I O N

Sentence Pair

Word Alignment

Extracted Phrases

Translation models: Sinuhe system (Kääriäinen, 2009)

Combining Aligners: Yonggang Deng & Bowen Zhou (2009)

Fixed alignments; learned phrase pair weights

Fixed directional alignments; learned symmetrization

Extraction models: Moore and Quirk, 2007

Fixed alignments; learned phrase pair weights

Our Task: Predict Extraction Sets

M O T I V A T I O N

Sentence Pair

Extracted Phrases

Conditional model of extraction sets given sentence pairs

过去两年中

40 1 2 3 4 5

过去两年中

40 1 2 3 4 5

Extracted Phrases + ``Word Alignments’’

Alignments Imply Extraction Sets

M O D E L

过去

40 1 2 3 4 5

Word-level alignment

Word-to-span alignments

Extraction set of bispans

Nulls and Possibles

报道

according to

news report

it is reported

报道

according to

news report

it is reported

Nulls:

Possibles:

Incorporating Possible Alignments

M O D E L

过去

40 1 2 3 4 5

Sure and possible

word links

Linear Model for Extraction Sets

M O D E L

过去

40 1 2 3 4 5

Features on sure links

Features on all bispans

Features on Bispans and Sure Links

F E A T U R E S

地球

go over

over the Earth

Some features on sure links

HMM posteriors

Presence in dictionary

Numbers & punctuation

Features on bispans

HMM phrase table features: e.g., phrase relative frequencies

Lexical indicator features for phrases with common words

Monolingual phrase features: e.g., “the _____”

Shape features: e.g., Chinese character counts

Getting Gold Extraction Sets

T R A I N I N G

Hand Aligned: Sure and possible

word links

Deterministic: A bispan is included iff every word within the bispan aligns within the bispan

Deterministic: Find min and max alignment index for each word

Discriminative Training with MIRA

T R A I N I N G

Loss function: F-score of bispan errors (precision & recall)

Training Criterion: Minimal change to w such that the gold is preferred to the guess by a loss-scaled margin

Gold (annotated) Guess (arg max w ɸ)∙

Inference: An ITG Parser

I N F E R E N C E

ITG captures some bispans

Coarse-to-Fine Approximation

I N F E R E N C E

Coarse Pass: Features that are local to terminal productions

Fine Pass: Agenda search using coarse pass as a heuristic

We use an agenda-based parser. It’s fast!

Experimental Setup

R E S U L T S

Chinese-to-English newswire

Parallel corpus: 11.3 million words; sentences length ≤ 40

MT systems: Tuned and tested on NIST ‘04 and ‘05

Supervised data: 150 training & 191 test sentences (NIST ‘02)

Unsupervised Model: Jointly trained HMM (Berkeley Aligner)

Baselines and Limited Systems

R E S U L T S

Coarse:

State-of-the-art unsupervised baseline

Joint training & competitive posterior decoding

Source of many features for supervised models

Supervised ITG aligner with block terminals

State-of-the-art supervised baseline

Re-implementation of Haghighi et al., 2009

Supervised block ITG + possible alignments

Coarse pass of full extraction set model

Word Alignment Performance

R E S U L T S

Precision

Recall

1 - AER

80.4 HMMITGCoarseFull

Extracted Bispan Performance

R E S U L T S

Precision

Recall

HMMITGCoarseFull

Translation Performance (BLEU)

R E S U L T S

Joshua

31.5 32 32.5 33 33.5 34 34.5 35 35.5 36 36.5

HMMITGCoarseFull

Supervised conditions also included HMM alignments

Conclusions

Extraction set model directly learns what phrases to extract

The system performs well as an aligner or a rule extractor

Are segmentations always bad?

Idea: get overlap and multi-scale into the learning!

Thank you!

nlp.cs.berkeley.edu

Discriminative Learning of Extraction Sets for Machine Translation John DeNero and Dan Klein UC...

Documents

CS 188: Artificial Intelligence Fall 2009 Lecture 20: Particle Filtering 11/5/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint

Discriminative SNMF EA201603

4 Speicher TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A A 4.1 Anforderungen 4.1 Speicherzellen 4.3 SRAM,

Douleurs Nociceptivessf3pa-congres.com/wp-content/uploads/2019/01/RAT-SF3PA...Composante sensori-discriminative Composante affectivo-émotionelle Composante comportementale Composante

Diskrete Mathematik Angelika Steger Institut für Theoretische Informatik steger@inf.ethz.ch TexPoint fonts used in EMF. Read the TexPoint manual before

20170806 Discriminative Optimization

Lecture 8: Maschinelles Lernen mit multiplen Kernen Marius Kloft HU Berlin TexPoint fonts used in EMF. Read the TexPoint manual before you delete this

Online Tracking by Learning Discriminative Saliency Map with … · 2015. 2. 25. · Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network Seunghoon

A Discriminative CNN Video Representation for Event Detection Zhongwen Xu †, Yi Yang † and Alexander G. Hauptmann ‡ †QCIS, University of Technology, Sydney

Boletin Del 5 Denero 2014

CDA6530: Performance Models of Computers and Networks Chapter 5: Generating Random Number and Random Variables TexPoint fonts used in EMF. Read the TexPoint

CRF-Filters: Discriminative Particle Filters for Sequential State Estimation

Symplectic Tracking Routine Malte Titze, Helmholtz-Zentrum Berlin, 10.05.2014 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this

Information Theory for High Throughput Sequencing TexPoint fonts used in EMF: AAAAAAAAAAAAAAAA David Tse U.C. Berkeley ITA Feb, 2013 Research supported

Vertex Sparsification of Cuts, Flows, and Distances Robert Krauthgamer, Weizmann Institute of Science WorKer 2015, Nordfjordeid TexPoint fonts used in

Fast Interactive Image Segmentation by Discriminative Clustering

Non-Uniform ACC Circuit Lower Bounds Ryan Williams IBM Almaden TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A

24.10.20071 Übungsbetrieb Di. 8-10 neuer Raum HZO 60 Anmeldung zum Übungsbetrieb erforderlich Anmeldung über VSPL Anmeldefrist: Freitag, der 02.11. TexPoint

PID control. Practical issues Smith Predictor (NOT PID…) PID Controller forms Ziegler-Nichols tuning Windup Digital implementation TexPoint fonts used

Méthodes numériques pour la dynamique moléculaire IRISA Eric Darve 13 Décembre 2006 TexPoint fonts used in EMF. Read the TexPoint manual before you delete