36
e University of Wisconsin-Madison e University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin Snyder University of Wisconsin-Madison 28 July, 2011

The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

Embed Size (px)

Citation preview

Page 1: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-MadisonThe University of Wisconsin-Madison

Universal Morphological Analysis using Structured Nearest Neighbor Prediction

Young-Bum Kim, João V. Graça, and Benjamin Snyder

University of Wisconsin-Madison

28 July, 2011

Page 2: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Unsupervised NLP

Unsupervised learning in NLP has become popular

27 papers in this year ACL+EMNLP

Relies on inductive bias, encoded in model structure or learning algorithm.

Example : HMM for POS induction, encodes transitional regularity

? ? ? ?

I like to read

1

Page 3: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Inductive Biases

Formulated with weak empirical grounding (or left implicit)

Single, simple bias for all languages

low performance, complicated models, fragility, language dependence.

Our approach : learn complex, universal bias using labeled languages

2

i.e. Empirically learn what the space of plausible human languages looks like to guide unsupervised learning

Page 4: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Key Idea

1) Collect labeled corpora (non-parallel) for several training languages

Training languages

3

Test language

Page 5: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

1 1( , )f x y

2 2( , )f x y

3 3( , )f x y

2) Map each (x,y) pair into a “universal feature space”

- i.e. to allow cross-lingual generalization

Training languages

4

Test language

Key Idea

Page 6: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

1 1( , )f x y

2 2( , )f x y

3 3( , )f x y

score (·)

3) Train scoring function over universal feature space

- i.e. treat each annotated language as single data point in structured prediction problem

Training languages

5

Test language

Key idea

Page 7: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

1 1( , )f x y

2 2( , )f x y

3 3( , )f x y

score (·)

*y argmax y

4) Predict test labels which yield highest score

Training languages

6

Test language

score ( )

Key Idea

Page 8: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Test Case: Nominal Morphology

Languages differ in morphological complexity

- Only 4 English noun tags in Penn Treebank

- 154 noun tags in Hungarian corpus (suffix encode case, number, and gender) Our analysis will break each noun into :

stem, phonological deletion rule, and suffix

- utiskom [ stem = utisak, del = (..ak# → ..k#), suffix = om ]

Question : Can we use morphologically annotated languages to train a universal morphological analyzer ?

7

Page 9: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Our Method

Universal feature space (8 features)

- Size of stem, suffix, and deletion rule lexicons

- Entropy of stem, suffix, and deletion rule distributions

- Percentage of suffix-free words, and words with phonological deletions.

Learning algorithm

- Broad characteristics of morphology often similar across select language pairs

- Motivates a nearest neighbor approach

- In structured scenario, learning becomes a search problem over label space

8

Page 10: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Structured Nearest Neighbor

Main Idea: predict analysis for test language which brings us closest in feature space to a training language.

1) Initialize analysis of test language:

2) For each training language :

- iteratively and greedily update test language analysis to bring closer

in feature space to

3) After T iterations, choose training language closest in feature space:

4) Predict the associated analysis:

9

(1, ) (2, ) (3, ): , , ,...y y y

0y

Training Test

Page 11: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Structured Nearest Neighbor

10

Training languages:

Initialize test language labels:

2 2( , )x y 3 3( , )x y

Page 12: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Structured Nearest Neighbor

11

Iterative Search:

Page 13: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Structured Nearest Neighbor

12

Iterative Search:

Page 14: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Structured Nearest Neighbor

13

Iterative Search:

Page 15: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Structured Nearest Neighbor

14

Predict:

Page 16: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Morphology Search Algorithm

15

Initialization

Reanalyze Each Word

Find New Stems

Find New Suffixes

Based on (Goldsmith 2005)

- He minimizes description length - We minimize distance to training language

Training

CandidatesSelect

Stage 0:

Stage 1:

Stage 2:

Stage 3:

Page 17: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Iterative Search Algorithm

16

Stage 0 : Using “character successor frequency,” initialize sets T, F, and D.

Stem Set T Suffix Set F Deletion rule Set F

Page 18: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Iterative Search Algorithm

17

Stage 1 :

- greedily reanalyze each word, keeping T and F fixed.

Stem Set T Suffix Set F Deletion rule Set F

Page 19: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Iterative Search Algorithm

18

Stage 2 :

- greedily analyze unsegmented words, keeping F fixed

Stem Set T Suffix Set F Deletion rule Set F

Page 20: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Iterative Search Algorithm

19

Stage 3 : Find new Suffixes

- greedily analyze unsegmented words, keeping T fixed

Stem Set T Suffix Set F Deletion rule Set F

Page 21: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Experimental Setup

Corpus: Orwell’s Nineteen Eighty Four (Multext East V3)

- Languages: Bulgarian, Czech, English, Estonian, Hungarian, Romanian, Slovene, Serbian

- 94,725 tokens (English). Slight confound: data is parallel. Method does not assume or exploit this fact.

- all words tagged with morpho-syntactic analysis.

Baseline: Linguistica model (Goldsmith 2005)

- same search procedure, greedily minimizes description length

Upper bound: supervised model

- structured perceptron framework (Collins 2002)

20

Page 22: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Aggregate Results

21

Accuracy: fraction of word types with correct analysis

avg. over 8 languages40

60

80

100Linguistica

64.6

Page 23: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Aggregate Results

22

Accuracy: fraction of word types with correct analysis

avg. over 8 languages40

60

80

100Linguistica

64.6

92.8

Supervised

Page 24: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Aggregate Results

23

Accuracy: fraction of word types with correct analysis

Our Model: Train with 7, test on 1

-average absolute increase of 11.8 -reduces error by 42%

avg. over 8 languages40

60

80

100Our model Linguistica

76.4 64.6

92.8

Supervised

Page 25: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Aggregate Results

24

Accuracy: fraction of word types with correct analysis

Our Model: Train with 7, test on 1

-average absolute increase of 11.8 -reduces error by 42%

Oracle: Each language guided using own gold standard feature values

Accuracy still below supervised:

(1) search errors (2) coarseness of feature space

avg. over 8 languages40

60

80

100Our model Linguistica

76.4 64.6

81.1

92.8

SupervisedOracle

Page 26: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Results By Language

25

Best accuracy: English

Lowest accuracy: Estonian

Linguistica

BG CS EN ET HU RO SL SR40

60

80

100Linguistica

61 6469 60 5181 65 66

Page 27: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Results By Language

BG CS EN ET HU RO SL SR40

60

80

100Our Model Linguistica

84 83 76 67 69 83 61 6469 60 51 7981 65 71 66

26

Biggest improvements for Serbian (15 points) and Slovene (22 points).

For all languages other than English, improvement over baseline

Our Model (train with 7, test on 1)

Page 28: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Visualization of Feature Space

27

Feature space reduced to 2D using MDS

Linguistica

Gold Standard

Our Method

Page 29: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Visualization of Feature Space

28

Serbian and Slovene: - Closely related Slavic languages- Nearest Neighbors under our model’s analysis- Essentially they “swap places”

Linguistica

Gold Standard

Our Method

Page 30: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Visualization of Feature Space

29

Estonian and Hungarian:

- Highly inflected Uralic Languages- They “swap places”

Linguistica

Gold Standard

Our Method

Page 31: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Visualization of Feature Space

30

English:

- Failed to find a good neighbor - Pulled towards Bulgarian (second least inflected language in dataset)

Linguistica

Gold Standard

Our Method

Page 32: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Accuracy as Training Languages Added

31

Averaged over all language combinations of various sizes

- Accuracy climbs as training languages added

- Worse than baseline when only one training language available

- Better than baseline when two or more training languages available

Page 33: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Why does accuracy improve with more languages?

Resulting distance VS accuracy for all 56 train-test pairs

- More training languages find a closer neighbor⇒- Closer neighbor higher accuracy⇒

32

Page 34: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Summary

33

Main Idea: Recast unsupervised learning as cross-lingual structured prediction

Test case: morphological analysis of 8 languages.

Formulated universal feature space for morphology

Developed novel structured nearest neighbor approach

Our method yields substantial accuracy gains

Page 35: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Future Work

34

Shortcoming

- uniform weighting of dimensions in the the universal feature space

- some features may be more important than others

Future work: learn distance metric on universal feature space

Page 36: The University of Wisconsin-Madison Universal Morphological Analysis using Structured Nearest Neighbor Prediction Young-Bum Kim, João V. Graça, and Benjamin

The University of Wisconsin-Madison

Thank You

35

Thank you