39
Numeracy for Language Models: Evaluating and Improving their Ability to Predict Numbers Georgios Spithourakis, Steffen Petersen, Sebastian Riedel Machine Reading Group

Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Numeracy for Language Models:

Evaluating and Improving their Ability to Predict Numbers

Georgios Spithourakis, Steffen Petersen, Sebastian Riedel

MachineReading

Group

Page 2: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

words

Numeracy

π

i

4

2018

one

two

threefour

dog

0.001

fox

cat mat

satjumped

brown

numbersnumerals

the

sleeping

2

ℂℕ

3.14

1.73

2ℚ

2

2

2/35/8

1

0

-17

3.14…

0. ത9 1+2i

−1

2000

Page 3: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Literate Language Models

‘I eat an apple’‘An apple eats me’‘I eats an apple’

Plausible

(semantically, grammatically,

etc.)

‘A apple eats I’

𝑃𝐿𝑀 𝑡𝑒𝑥𝑡

Page 4: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Numerate Language Models

‘John is 0 m tall’ ‘John is 1.7 m tall’ ‘John is 2 m tall’

𝑃𝐿𝑀 𝑡𝑒𝑥𝑡

‘John is 999 m tall’

1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2

𝑃ℎ𝑒𝑖𝑔ℎ𝑡

Page 5: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Numeracy Matters

0.5‘Unemployment of the US is 5 %’

50500

0‘Our model is 10 times better than the baseline’

1001000

0.023.2

‘Patient’s temperature is 36.6 degrees’41.998.6

Page 6: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Q1: Are existing LMs numerate?

Q2: How to improve the numeracy of LMs?

Page 7: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Q1: Are existing LMs numerate?

Q2: How to improve the numeracy of LMs?

Page 8: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

A Neural Language Model

RNN

𝑝(𝑤𝑡|ℎ𝑡)

ℎ𝑡ℎ𝑡−1

𝑒𝑡

Input

𝑤𝑡−1

Output

Page 9: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

A Neural Language Model

RNN

𝑝(𝑤𝑡|ℎ𝑡)

ℎ𝑡ℎ𝑡−1

𝑒𝑡

Input

𝑤𝑡−1

Output

= softmax𝑉

(𝑤𝑡)

the

cat

mat

sat

UNKV

2

1

1.7

2018

UNKNUM

2.1

0.731

9,846,3212018.3

petrichor

unothrorgaphy

Spithourakis

ht

Page 10: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Evaluation: Adjusted Perplexity

Perplexity

2.1

0.731

9,846,3212018.3

John is 2.1 m tall

𝑝 2.1 = 𝑝

BUT +𝑝 0.731

+𝑝 9,846,321

+⋮

UNKNUM

Page 11: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Evaluation: Adjusted Perplexity

𝑝 2.1 =𝑝

𝑤 ∈

Perplexity Adjusted Perplexity

from test data

UNKNUM

[Ueberla, 1994]

2.1

0.731

9,846,3212018.3

John is 2.1 m tall

𝑝 2.1 = 𝑝

BUT +𝑝 0.731

+𝑝 9,846,321

+⋮

UNKNUMUNKNUM

[Ahn et al., 2016]

a.k.a. Unknown-Penalised Perplexity

Page 12: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Datasets

Clinical Dataset

16,015 clinical patient reports

Source: London Chest Hospital

Scientific Dataset

20,962 paragraphsfrom scientific papers

Source: ARXIV

96%

4%

words numerals

Page 13: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Results: Adjusted Perplexity

8.91

5.99

0

5

10

all tokens words numerals

80.62

51.83

0

50

100

all tokens words numerals

3,505,856.25

Scie

nti

fic

58,443.72

(Lower is better)

Clin

ical

Page 14: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Results: Adjusted Perplexity

8.91

5.99

0

5

10

all tokens words numerals

80.62

51.83

0

50

100

all tokens words numerals

3,505,856.25

Scie

nti

fic

58,443.72

(Lower is better)

Clin

ical

Page 15: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Results: Adjusted Perplexity

PM

F

8.91

5.99

0

5

10

all tokens words numerals

80.62

51.83

0

50

100

all tokens words numerals

3,505,856.25

Scie

nti

fic

58,443.72

softmaxAssumptions Reality (?)

PD

F

(Lower is better)

UNKNUM

large

small

large

small

Clin

ical

Page 16: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Q1: Are existing LMs numerate?

Q2: How to improve the numeracy of LMs?

Page 17: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Strategy: Softmax & Hierarchical Softmax

softmax

V

the

cat

mat

sat

UNK

word

numeralht

𝑤𝑜𝑟𝑑𝑠

softmax softmax

the

cat

mat

sat

UNK

2

1

1.7

2018

UNKNUM

ht

𝑡𝑦𝑝𝑒s

𝒑(𝒏𝒖𝒎𝒆𝒓𝒂𝒍|𝒉𝒕)

Page 18: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Strategy: Softmax & Hierarchical Softmax

softmax

V

the

cat

mat

sat

UNK

2

1

1.7

2018

UNKNUM

word

numeralht

𝑤𝑜𝑟𝑑𝑠

𝑛𝑢𝑚𝑒𝑟𝑎𝑙𝑠

softmax softmax

softmax

the

cat

mat

sat

UNK

2

1

1.7

2018

UNKNUM

ht

𝑡𝑦𝑝𝑒s

Page 19: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Strategy: Softmax & Hierarchical Softmax

softmax

V

the

cat

mat

sat

UNK

word

numeralht

𝑤𝑜𝑟𝑑𝑠

softmax softmax

𝒑(𝒏𝒖𝒎𝒆𝒓𝒂𝒍|𝒉𝒕)

• h-softmax• digit-by-digit• from PDF• etc.

the

cat

mat

sat

UNK

2

1

1.7

2018

UNKNUM

ht

𝑡𝑦𝑝𝑒s

Page 20: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Strategy: Digit-by-Digit Composition

SOS 2 . 1

2 . 1 EOS

ℎ𝑡

𝑝 2.1 = 𝑝 2 𝑝 . |2 𝑝 1 2. )𝑝 𝐸𝑂𝑆 2.1)

d-RNN

Page 21: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Strategy: Digit-by-Digit Composition

SOS 2 . 1

2 . 1 EOS

ℎ𝑡

𝑝 2.1 = 𝑝 2 𝑝 . |2 𝑝 1 2. )𝑝 𝐸𝑂𝑆 2.1)

d-RNN

UNKNUM

1.981.99

1.971.961.95

1.94 2.002.01

2.02

Page 22: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Strategy: from continuous PDF

0

0.5

1

1.5

2

2.5

1 1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2 2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

3 3.1

3.2

3.3

3.4

3.5

PD

F

𝑝 𝐧𝐮𝐦𝐞𝐫𝐚𝐥 = 2.1 = 𝑝𝑷𝑴𝑭 2.05 < 𝐧𝐮𝐦𝐛𝐞𝐫 < 2.15 |precision = 1× 𝑝 precision = 1

Page 23: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Strategy: from continuous PDF

0

0.5

1

1.5

2

2.5

1 1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2 2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

3 3.1

3.2

3.3

3.4

3.5

PD

F

𝑝 𝐧𝐮𝐦𝐞𝐫𝐚𝐥 = 2.1 = 𝑝𝑷𝑴𝑭 2.05 < 𝐧𝐮𝐦𝐛𝐞𝐫 < 2.15 |precision = 1× 𝑝 precision = 1

ht

MoG

softmax𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡

𝑝 precision = 𝑝𝑅𝑁𝑁 𝑑𝑑𝑑𝑑

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛

𝐸𝑂𝑆

Frozen 𝜇sand 𝜎s

Page 24: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Overview of Strategies

<SOS> 2 . 1

2 . 1 <EOS>

MoG

d-RNN

h-softmax

PD

F

2

1.7

2018

UNKNUM

𝑛𝑢𝑚𝑒𝑟𝑎𝑙𝑠

softmax

Page 25: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Overview of Strategies

<SOS> 2 . 1

2 . 1 <EOS>

MoG

d-RNN

h-softmax

ht

softmax𝑠𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠

combination

PD

F

2

1.7

2018

UNKNUM

𝑛𝑢𝑚𝑒𝑟𝑎𝑙𝑠

softmax

Page 26: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Results: Language Modelling (1)

5.99 4.96 4.95 4.99 4.96

0

5

10

softmax h-softmax d-RNN MoG combination

495.95263.22 226.46 197.59

0

500

1000

softmax h-softmax d-RNN MoG combination

8.916.05 5.88 5.88 5.82

0

5

10

softmax h-softmax d-RNN MoG combination

58,443.72

Clinical

All Tokens

Words

Numerals

Ad

just

edPe

rple

xity

(lower is better)

Page 27: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

80.6254.8 53.7 54.37 53.03

0

50

100

softmax h-softmax d-RNN MoG combination51.83 49.81 48.89 48.97 48.25

0

50

100

softmax h-softmax d-RNN MoG combination

Results: Language Modelling (2)

3,505,856.25

550.98 519.8683.16

520.95

0

500

1000

softmax h-softmax d-RNN MoG combination

Scientific

All Tokens

Words

Numerals

Ad

just

edPe

rple

xity

(lower is better)

Page 28: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

𝑀𝐴𝑃𝐸 =𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 − 𝑡𝑎𝑟𝑔𝑒𝑡

𝑡𝑎𝑟𝑔𝑒𝑡× 100%

Results: Number Prediction

numbernumeral

`2.1’ 2.1

426

622747

514348

552

0

200

400

600

800

mean median softmax h-softmax d-RNN MoG combination

2353.11

(lower isbetter)

Clinical

Page 29: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

19471652

1287

590

2333

0

500

1000

1500

2000

2500

mean softmax d-RNN combination

426

622747

514348

552

0

200

400

600

800

mean median softmax h-softmax d-RNN MoG combination

2353.11

80391e23Scientific

Clinical

Results: Number Prediction

(lower isbetter)

Page 30: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Softmax versus Hierarchical Softmax

1 2 3 4 … 100 101 … 2012 2013

20

13

20

12

… 1

01

10

0 …

4 3

2 1

cosine similaritiessoftmax

cosine similaritiesh-softmax

Page 31: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Analysis: d-RNN and Benford’s Law

cosine similaritiesd-RNN

0 1 2 3 4 5 6 7 8 9 . EOS

EOS

. 9 8

7 6

5 4

3 2

1 0

Page 32: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

0

10

20

30

0 1 2 3 4 5 6 7 8 9

Scie

nti

fic

1st digit

0

10

20

30

0 1 2 3 4 5 6 7 8 9

4th digit

d-RNN Benford

0

10

20

30

0 1 2 3 4 5 6 7 8 9

4th digit

0

10

20

30

0 1 2 3 4 5 6 7 8 9

Clin

ical

1st digit

Analysis: d-RNN and Benford’s Law

cosine similaritiesd-RNN

0 1 2 3 4 5 6 7 8 9 . EOS

EOS

. 9 8

7 6

5 4

3 2

1 0

Page 33: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Analysis: Model Predictions

MoG

d-RNN

h-softmax

‘… ejective fraction : ____ % ...’

Page 34: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Analysis: Strategy Selection

4 out of 17 segmentsEnhancement > 25 % Li et al. 2003

Ejective fraction: 27.00 %Ejective fraction: 35.00 %HIP 12961 and GL 676

measured 32 x 31 mmNGC 6334 stars

MoG

d-RNN

h-softmax

ht

softmax𝑠𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠

Small integers,percentiles,years

2-digit integers,some ids

reals,some ids

Page 35: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Conclusion (1)

softmax

the

cat

mat

UNK

2

3.14

2018

UNKNUM

ht

Are existing LMs numerate?

Page 36: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Conclusion (1)

‘John’s height is ___ ’

softmax

the

cat

mat

UNK

2

3.14

2018

UNKNUM

ht

999

0

50

25

20183.14

UNKNUM

12

3

Are existing LMs numerate?

Page 37: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Conclusion (2)

MoG

d-RNN

h-softmax

ht

softmax𝑠𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠

combination

How to improve

the numeracy of LMs?

Page 38: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Conclusion (2)

MoG

d-RNN

h-softmax

ht

softmax𝑠𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠

combination

‘John’s height is ___ ’

2.1

1.731.8

2

How to improve

the numeracy of LMs?

Page 39: Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Thank you!2.1

1.73

1.8

2

999

0

50

25

2018 3.14

12

3 ℤℝ

ℂℕℚ

𝟐

2

2/35/8

1

0

-17

3.14…

𝟎. ഥ𝟗 1+2i

−𝟏

2000

UNKNUM