59
Predicting machine translation quality

#5 Predicting Machine Translation Quality

Embed Size (px)

Citation preview

Page 1: #5 Predicting Machine Translation Quality

Predictingmachine translation quality

Page 2: #5 Predicting Machine Translation Quality

I am @bittlingmayer.My company is @SignalNLabs

interests: translation quality, translation crowdsourcing, transliteration, browser translation integrations, topic classification, automatic source-side correction

previously @Google, @Adobe, @Cerner

Ciao!

Page 3: #5 Predicting Machine Translation Quality

Today’s topics

◉ Why translation quality?◉ What is the problem?◉ Our data model◉ Our learning infra

Page 4: #5 Predicting Machine Translation Quality

Quality estimation?

sentence-level quality

good machine translation vs bad

1

Page 5: #5 Predicting Machine Translation Quality

Quality evaluation?

corpus-level quality given reference translations

machine translation vs human translation

2

Page 6: #5 Predicting Machine Translation Quality

Why quality?Why is predicting quality useful?

Page 7: #5 Predicting Machine Translation Quality

Machine translation should not be a gamble.

Page 8: #5 Predicting Machine Translation Quality

$4.501M chars by machine

Optimisation Function

$100001M chars at 5¢/word by human

Page 9: #5 Predicting Machine Translation Quality

Perfect Prediction == Perfect Translation

translator

predictor

reward [scores, rankings]

state

action [translations]

Reinforcement Learning

Page 10: #5 Predicting Machine Translation Quality

What’s the problem?Is it really harder than self-driving cars?

Page 11: #5 Predicting Machine Translation Quality

Language is hard.

Page 12: #5 Predicting Machine Translation Quality

Context.

Page 13: #5 Predicting Machine Translation Quality

Data are dirty.

Page 14: #5 Predicting Machine Translation Quality

Bridging.

Page 15: #5 Predicting Machine Translation Quality
Page 16: #5 Predicting Machine Translation Quality

Payoff

What is solvable?

Effort

bad input

50% of errors

context/customisation

like a human

like Search, FB, Maps...source-side ambiguity

ideally interactivebad output

Page 17: #5 Predicting Machine Translation Quality

What is quality?Can we quantify the quality of a translation?

Page 18: #5 Predicting Machine Translation Quality

Accuracy

What is sentence-level quality?

Fluency

Low Quality

Good Enough

Misleading

Human Quality

Page 19: #5 Predicting Machine Translation Quality

Recall vs Precision vs Accuracy

actual bad

predicted bad

Page 20: #5 Predicting Machine Translation Quality

Trivial 90% Accuracy Example

actual bad

predicted bad: 100%

Page 21: #5 Predicting Machine Translation Quality

How does quality vary?

to English to top languages to other

from English

from top languages

from other

Page 22: #5 Predicting Machine Translation Quality

How does quality vary?

Wikipedia

news

dialogues, film subtitles, Coursera, Medium

“everyday” reviews, customer service

your children’s WhatsApp messages

my WhatsApp messages

Page 23: #5 Predicting Machine Translation Quality

Other concepts of quality?

Page 24: #5 Predicting Machine Translation Quality

How do we solve it?With data and features

Page 25: #5 Predicting Machine Translation Quality

What is our data model?

source target score

en-zh Hello 您好 1.0

en-zh The car is driving. The car is driving. 0.0

en-ru The car is driving. Автомобиль вождения. 0.3

... ... ... ...

Page 26: #5 Predicting Machine Translation Quality

What is our data model?

source target src_length_bytes ... trg_spam_prob score

en-zh Hello 您好 5 ... 0.5 1.0

en-zh The car is driving. The car is driving. 19 ... 0.2 0.0

en-ru The car is driving. Автомобиль вождения. 19 ... 0.1 0.3

... ... ... ... ... ... ...

Page 27: #5 Predicting Machine Translation Quality

10-1000 featuressignals engineered by us

1000-10M rowssentences* hand-scored by linguists

language-agnosticLanguage is just another feature.

Page 28: #5 Predicting Machine Translation Quality

Human scoresEvaluate many translations by hand

Page 29: #5 Predicting Machine Translation Quality

Human Evaluation Score Types

Labels

good/bad

multilabels

word-level labels

Ranking

rank multiple systems

Post-Edit

to comprehensible

to human quality

Page 30: #5 Predicting Machine Translation Quality

Human Evaluation Score Types

Labels

good/bad

0.0-1.0

multilabels

word-level labels

Ranking

rank multiple systems

Post-Edit

to comprehensible

to human quality

requires smaller dataset and budget

Page 31: #5 Predicting Machine Translation Quality

$0.001 / row @ 5x redundancy$

Page 32: #5 Predicting Machine Translation Quality

QuEst baseline featuresquest.dcs.shef.ac.uk/quest_files/features_blackbox_baseline_17

Page 33: #5 Predicting Machine Translation Quality

number of tokens in the source sentencenumber of tokens in the target sentenceaverage source token lengthLM probability of source sentenceLM probability of target sentencenumber of occurrences of the target word within the target hypothesis (averaged for all words in the hypothesis - type/token ratio)average number of translations per source word in the sentence (as given by IBM 1 table thresholded such that prob(t|s) > 0.2)average number of translations per source word in the sentence (as given by IBM 1 table thresholded such that prob(t|s) > 0.01) weighted by the inverse frequency of each word in the source corpuspercentage of unigrams in quartile 1 of frequency (lower frequency words) in a corpus of the source language (SMT training corpus)percentage of unigrams in quartile 4 of frequency (higher frequency words) in a corpus of the source languagepercentage of bigrams in quartile 1 of frequency of source words in a corpus of the source languagepercentage of bigrams in quartile 4 of frequency of source words in a corpus of the source languagepercentage of trigrams in quartile 1 of frequency of source words in a corpus of the source languagepercentage of trigrams in quartile 4 of frequency of source words in a corpus of the source languagepercentage of unigrams in the source sentence seen in a corpus (SMT training corpus)number of punctuation marks in the source sentencenumber of punctuation marks in the target sentence

Page 34: #5 Predicting Machine Translation Quality

number of tokens

length

LM probability

number of occurrences of the target word within the target hypothesis

average number of translations per source word in the sentence

percentage of unigrams in quartile 1 of frequency (lower frequency words)

… percentage of unigrams in quartile n of frequency (higher frequency words)

percentage of trigrams in quartile 1 of frequency of source words

… percentage of trigrams in quartile n of frequency of source words

number of punctuation marks

Page 35: #5 Predicting Machine Translation Quality

bad input signals

Page 36: #5 Predicting Machine Translation Quality

vot tak narod ho4et napisat'

Возможно, вы имели в виду: вот так народ хочет написать

Page 37: #5 Predicting Machine Translation Quality

human vot tak narod ho4et napisat' vot tak narod ho4et napisat'

search вот так народ хочет написать That's how people want to write

translation Вот так народ хочет написать. So people want to write.

Page 38: #5 Predicting Machine Translation Quality

bad output signals

Page 39: #5 Predicting Machine Translation Quality

ambiguity signals

Page 40: #5 Predicting Machine Translation Quality
Page 41: #5 Predicting Machine Translation Quality

translation signals

Page 42: #5 Predicting Machine Translation Quality

Google Microsoft Wiktionary ...

Merry Christmas Krismasi! Krismasi Njema! heri ya KrismasiKrismasi njema

...

eat apples kula mapera kula apples ∅ ...

Page 43: #5 Predicting Machine Translation Quality

lexical signals

sygnały leksykalne

Page 44: #5 Predicting Machine Translation Quality

char signals

sygnały znaków

Page 45: #5 Predicting Machine Translation Quality

syntactic signals

Page 46: #5 Predicting Machine Translation Quality

parse tree to sequence conversion

Page 47: #5 Predicting Machine Translation Quality

sequence to sequence learning

Page 48: #5 Predicting Machine Translation Quality

cross-lingual signals

Page 49: #5 Predicting Machine Translation Quality

outside signals

Page 50: #5 Predicting Machine Translation Quality

context/customisation signals

Page 51: #5 Predicting Machine Translation Quality

Other signals?

Page 52: #5 Predicting Machine Translation Quality

50-99+% accuracyDepends on the benchmark! ;-)

1000-10M rows

10-1000 features

Page 53: #5 Predicting Machine Translation Quality

Data augmentation?

Page 54: #5 Predicting Machine Translation Quality

Can we use parallel corpora?target

Onartutako gertaerak Aholkuak eta iradokizunak Etorkizuneko egitasmoei buruz galdetzea onespena eskatzea Laguntza eskatzea Jende galdetzea itxaron Norbait iritzia eskatzea Etorkizunari Garrantzia emanez informazio saihestea Bad pertsona … … ... Aditu batek ingelesez izatea Being Lucky zaharra izatea pobrea izatea ari irekietan aberatsa izatea Ziur izatea / zenbait ari kezkaturik Aspergarria! Your Mind aldatzeak Pertsonak txaloak Up Hipokresia kexu

source

받아 들여지는 사실 조언 및 제안 향후 계획에 대해 물어 승인 요청 도움을 요청 사람을 요구하는 대기 누군가의 의견을 물어 미래에 대한 태도 제공 정보 방지 나쁜 사람들 … … ... 영어 전문가 인 존재 럭키 오래 되 가난 안심되는 부자가되는 확인 인 / 특정 걱정되는 지루한! 당신의 마음을 변경 사람을 응원합니다 위선에 대해 불평

score

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … … ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

Page 55: #5 Predicting Machine Translation Quality

What is our learning infra?

H2O.ai deeplearning

Page 56: #5 Predicting Machine Translation Quality

Do we need deep learning?

Page 57: #5 Predicting Machine Translation Quality

Why doesn’t deep learning

work for translation?

Page 58: #5 Predicting Machine Translation Quality

Want to learn more?

The real experts

◉ Dr. Lucia Specia◉ quest.dcs.shef.ac.uk◉ statmt.org/wmt15/quality-estimation-task.html

ACL 2016 will be held in Berlin in August.

Reading

Page 59: #5 Predicting Machine Translation Quality

Any questions ?

You can find me at

◉ @bittlingmayer◉ [email protected]

Thanks!