Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)

Numeracy for Language Models:

Evaluating and Improving their Ability to Predict Numbers

Georgios Spithourakis, Steffen Petersen, Sebastian Riedel

MachineReading

Group

words

Numeracy

π

ℤ

i

4

2018

one

two

threefour

dog

0.001

fox

cat mat

satjumped

brown

numbersnumerals

the

sleeping

2

ℝ

ℂℕ

3.14

1.73

2ℚ

2

2

2/35/8

1

0

-17

3.14…

0. ത9 1+2i

−1

2000

Literate Language Models

‘I eat an apple’‘An apple eats me’‘I eats an apple’

Plausible

(semantically, grammatically,

etc.)

‘A apple eats I’

𝑃𝐿𝑀 𝑡𝑒𝑥𝑡

Numerate Language Models

‘John is 0 m tall’ ‘John is 1.7 m tall’ ‘John is 2 m tall’

𝑃𝐿𝑀 𝑡𝑒𝑥𝑡

‘John is 999 m tall’

1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2

𝑃ℎ𝑒𝑖𝑔ℎ𝑡

Numeracy Matters

0.5‘Unemployment of the US is 5 %’

50500

0‘Our model is 10 times better than the baseline’

1001000

0.023.2

‘Patient’s temperature is 36.6 degrees’41.998.6

Q1: Are existing LMs numerate?

Q2: How to improve the numeracy of LMs?



A Neural Language Model

RNN

𝑝(𝑤𝑡|ℎ𝑡)

ℎ𝑡ℎ𝑡−1

𝑒𝑡

…

Input

𝑤𝑡−1

…

Output

A Neural Language Model

RNN

𝑝(𝑤𝑡|ℎ𝑡)

ℎ𝑡ℎ𝑡−1

𝑒𝑡

…

Input

𝑤𝑡−1

…

Output

= softmax𝑉

(𝑤𝑡)

the

cat

mat

sat

UNKV

2

1

1.7

2018

UNKNUM

2.1

…

0.731

9,846,3212018.3

petrichor

unothrorgaphy

Spithourakis

…

ht

Evaluation: Adjusted Perplexity

Perplexity

2.1

…

0.731

9,846,3212018.3

John is 2.1 m tall

𝑝 2.1 = 𝑝

BUT +𝑝 0.731

+𝑝 9,846,321

+⋮

UNKNUM

Evaluation: Adjusted Perplexity

𝑝 2.1 =𝑝

𝑤 ∈

Perplexity Adjusted Perplexity

from test data

UNKNUM

[Ueberla, 1994]

2.1

…

0.731

9,846,3212018.3

John is 2.1 m tall

𝑝 2.1 = 𝑝

BUT +𝑝 0.731

+𝑝 9,846,321

+⋮

UNKNUMUNKNUM

[Ahn et al., 2016]

a.k.a. Unknown-Penalised Perplexity

Datasets

Clinical Dataset

16,015 clinical patient reports

Source: London Chest Hospital

Scientific Dataset

20,962 paragraphsfrom scientific papers

Source: ARXIV

96%

4%

words numerals

Results: Adjusted Perplexity

8.91

5.99

0

5

10

all tokens words numerals

80.62

51.83

0

50

100


3,505,856.25

Scie

nti

fic

58,443.72

(Lower is better)

Clin

ical


8.91

5.99

0

5

10


80.62

51.83

0

50

100


3,505,856.25

Scie

nti

fic

58,443.72

(Lower is better)

Clin

ical


PM

F

8.91

5.99

0

5

10


80.62

51.83

0

50

100


3,505,856.25

Scie

nti

fic

58,443.72

softmaxAssumptions Reality (?)

PD

F

(Lower is better)

UNKNUM

large

small

large

small

Clin

ical



Strategy: Softmax & Hierarchical Softmax

softmax

V

the

cat

mat

sat

UNK

word

numeralht

𝑤𝑜𝑟𝑑𝑠

softmax softmax

the

cat

mat

sat

UNK

2

1

1.7

2018

UNKNUM

ht

𝑡𝑦𝑝𝑒s

𝒑(𝒏𝒖𝒎𝒆𝒓𝒂𝒍|𝒉𝒕)


softmax

V

the

cat

mat

sat

UNK

2

1

1.7

2018

UNKNUM

word

numeralht


𝑛𝑢𝑚𝑒𝑟𝑎𝑙𝑠

softmax softmax

softmax

the

cat

mat

sat

UNK

2

1

1.7

2018

UNKNUM

ht

𝑡𝑦𝑝𝑒s


softmax

V

the

cat

mat

sat

UNK

word

numeralht


softmax softmax

𝒑(𝒏𝒖𝒎𝒆𝒓𝒂𝒍|𝒉𝒕)

• h-softmax• digit-by-digit• from PDF• etc.

the

cat

mat

sat

UNK

2

1

1.7

2018

UNKNUM

ht

𝑡𝑦𝑝𝑒s

Strategy: Digit-by-Digit Composition

SOS 2 . 1

2 . 1 EOS

ℎ𝑡

𝑝 2.1 = 𝑝 2 𝑝 . |2 𝑝 1 2. )𝑝 𝐸𝑂𝑆 2.1)

d-RNN

Strategy: Digit-by-Digit Composition

SOS 2 . 1

2 . 1 EOS

ℎ𝑡

𝑝 2.1 = 𝑝 2 𝑝 . |2 𝑝 1 2. )𝑝 𝐸𝑂𝑆 2.1)

d-RNN

UNKNUM

1.981.99

1.971.961.95

1.94 2.002.01

2.02

Strategy: from continuous PDF

0

0.5

1

1.5

2

2.5

1 1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2 2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

3 3.1

3.2

3.3

3.4

3.5

PD

F

𝑝 𝐧𝐮𝐦𝐞𝐫𝐚𝐥 = 2.1 = 𝑝𝑷𝑴𝑭 2.05 < 𝐧𝐮𝐦𝐛𝐞𝐫 < 2.15 |precision = 1× 𝑝 precision = 1

Strategy: from continuous PDF

0

0.5

1

1.5

2

2.5

1 1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2 2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

3 3.1

3.2

3.3

3.4

3.5

PD

F

𝑝 𝐧𝐮𝐦𝐞𝐫𝐚𝐥 = 2.1 = 𝑝𝑷𝑴𝑭 2.05 < 𝐧𝐮𝐦𝐛𝐞𝐫 < 2.15 |precision = 1× 𝑝 precision = 1

ht

MoG

softmax𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡

𝑝 precision = 𝑝𝑅𝑁𝑁 𝑑𝑑𝑑𝑑

𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛

𝐸𝑂𝑆

Frozen 𝜇sand 𝜎s

Overview of Strategies

<SOS> 2 . 1

2 . 1 <EOS>

MoG

d-RNN

h-softmax

PD

F

2

1.7

2018

UNKNUM


softmax

Overview of Strategies

<SOS> 2 . 1

2 . 1 <EOS>

MoG

d-RNN

h-softmax

ht

softmax𝑠𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠

combination

PD

F

2

1.7

2018

UNKNUM


softmax

Results: Language Modelling (1)

5.99 4.96 4.95 4.99 4.96

0

5

10

softmax h-softmax d-RNN MoG combination

495.95263.22 226.46 197.59

0

500

1000


8.916.05 5.88 5.88 5.82

0

5

10


58,443.72

Clinical

All Tokens

Words

Numerals

Ad

just

edPe

rple

xity

(lower is better)

80.6254.8 53.7 54.37 53.03

0

50

100

softmax h-softmax d-RNN MoG combination51.83 49.81 48.89 48.97 48.25

0

50

100


Results: Language Modelling (2)

3,505,856.25

550.98 519.8683.16

520.95

0

500

1000


Scientific

All Tokens

Words

Numerals

Ad

just

edPe

rple

xity

(lower is better)

𝑀𝐴𝑃𝐸 =𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 − 𝑡𝑎𝑟𝑔𝑒𝑡

𝑡𝑎𝑟𝑔𝑒𝑡× 100%

Results: Number Prediction

numbernumeral

`2.1’ 2.1

426

622747

514348

552

0

200

400

600

800

mean median softmax h-softmax d-RNN MoG combination

2353.11

(lower isbetter)

Clinical

19471652

1287

590

2333

0

500

1000

1500

2000

2500

mean softmax d-RNN combination

426

622747

514348

552

0

200

400

600

800

mean median softmax h-softmax d-RNN MoG combination

2353.11

80391e23Scientific

Clinical

Results: Number Prediction

(lower isbetter)

Softmax versus Hierarchical Softmax

1 2 3 4 … 100 101 … 2012 2013

20

13

20

12

… 1

01

10

0 …

4 3

2 1

cosine similaritiessoftmax

cosine similaritiesh-softmax

Analysis: d-RNN and Benford’s Law

cosine similaritiesd-RNN

0 1 2 3 4 5 6 7 8 9 . EOS

EOS

. 9 8

7 6

5 4

3 2

1 0

0

10

20

30

0 1 2 3 4 5 6 7 8 9

Scie

nti

fic

1st digit

0

10

20

30

0 1 2 3 4 5 6 7 8 9

4th digit

d-RNN Benford

0

10

20

30

0 1 2 3 4 5 6 7 8 9

4th digit

0

10

20

30

0 1 2 3 4 5 6 7 8 9

Clin

ical

1st digit

Analysis: d-RNN and Benford’s Law

cosine similaritiesd-RNN

0 1 2 3 4 5 6 7 8 9 . EOS

EOS

. 9 8

7 6

5 4

3 2

1 0

Analysis: Model Predictions

MoG

d-RNN

h-softmax

‘… ejective fraction : ____ % ...’

Analysis: Strategy Selection

4 out of 17 segmentsEnhancement > 25 % Li et al. 2003

Ejective fraction: 27.00 %Ejective fraction: 35.00 %HIP 12961 and GL 676

measured 32 x 31 mmNGC 6334 stars

MoG

d-RNN

h-softmax

ht


Small integers,percentiles,years

2-digit integers,some ids

reals,some ids

Conclusion (1)

softmax

the

cat

mat

UNK

2

3.14

2018

UNKNUM

ht

Are existing LMs numerate?

Conclusion (1)

‘John’s height is ___ ’

softmax

the

cat

mat

UNK

2

3.14

2018

UNKNUM

ht

999

0

50

25

20183.14

UNKNUM

12

3

Are existing LMs numerate?

Conclusion (2)

MoG

d-RNN

h-softmax

ht


combination

How to improve

the numeracy of LMs?

Conclusion (2)

MoG

d-RNN

h-softmax

ht


combination

‘John’s height is ___ ’

2.1

1.731.8

2

How to improve

the numeracy of LMs?

Thank you!2.1

1.73

1.8

2

999

0

50

25

2018 3.14

12

3 ℤℝ

ℂℕℚ

𝟐

2

2/35/8

1

0

-17

3.14…

𝟎. ഥ𝟗 1+2i

−𝟏

2000

UNKNUM

Documents

Numeracy for Language Models: Evaluating and Improving ... · Literate Language Models I eats an apple [ ZAn apple eats meI eat an apple [Plausible (semantically, grammatically, etc.)