Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Numeracy for Language Models:
Evaluating and Improving their Ability to Predict Numbers
Georgios Spithourakis, Steffen Petersen, Sebastian Riedel
MachineReading
Group
words
Numeracy
π
ℤ
i
4
2018
one
two
threefour
dog
0.001
fox
cat mat
satjumped
brown
numbersnumerals
the
sleeping
2
ℝ
ℂℕ
3.14
1.73
2ℚ
2
2
2/35/8
1
0
-17
3.14…
0. ത9 1+2i
−1
2000
Literate Language Models
‘I eat an apple’‘An apple eats me’‘I eats an apple’
Plausible
(semantically, grammatically,
etc.)
‘A apple eats I’
𝑃𝐿𝑀 𝑡𝑒𝑥𝑡
Numerate Language Models
‘John is 0 m tall’ ‘John is 1.7 m tall’ ‘John is 2 m tall’
𝑃𝐿𝑀 𝑡𝑒𝑥𝑡
‘John is 999 m tall’
1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
𝑃ℎ𝑒𝑖𝑔ℎ𝑡
Numeracy Matters
0.5‘Unemployment of the US is 5 %’
50500
0‘Our model is 10 times better than the baseline’
1001000
0.023.2
‘Patient’s temperature is 36.6 degrees’41.998.6
Q1: Are existing LMs numerate?
Q2: How to improve the numeracy of LMs?
Q1: Are existing LMs numerate?
Q2: How to improve the numeracy of LMs?
A Neural Language Model
RNN
𝑝(𝑤𝑡|ℎ𝑡)
ℎ𝑡ℎ𝑡−1
𝑒𝑡
…
Input
𝑤𝑡−1
…
Output
A Neural Language Model
RNN
𝑝(𝑤𝑡|ℎ𝑡)
ℎ𝑡ℎ𝑡−1
𝑒𝑡
…
Input
𝑤𝑡−1
…
Output
= softmax𝑉
(𝑤𝑡)
the
cat
mat
sat
UNKV
2
1
1.7
2018
UNKNUM
2.1
…
0.731
9,846,3212018.3
petrichor
unothrorgaphy
Spithourakis
…
ht
Evaluation: Adjusted Perplexity
Perplexity
2.1
…
0.731
9,846,3212018.3
John is 2.1 m tall
𝑝 2.1 = 𝑝
BUT +𝑝 0.731
+𝑝 9,846,321
+⋮
UNKNUM
Evaluation: Adjusted Perplexity
𝑝 2.1 =𝑝
𝑤 ∈
Perplexity Adjusted Perplexity
from test data
UNKNUM
[Ueberla, 1994]
2.1
…
0.731
9,846,3212018.3
John is 2.1 m tall
𝑝 2.1 = 𝑝
BUT +𝑝 0.731
+𝑝 9,846,321
+⋮
UNKNUMUNKNUM
[Ahn et al., 2016]
a.k.a. Unknown-Penalised Perplexity
Datasets
Clinical Dataset
16,015 clinical patient reports
Source: London Chest Hospital
Scientific Dataset
20,962 paragraphsfrom scientific papers
Source: ARXIV
96%
4%
words numerals
Results: Adjusted Perplexity
8.91
5.99
0
5
10
all tokens words numerals
80.62
51.83
0
50
100
all tokens words numerals
3,505,856.25
Scie
nti
fic
58,443.72
(Lower is better)
Clin
ical
Results: Adjusted Perplexity
8.91
5.99
0
5
10
all tokens words numerals
80.62
51.83
0
50
100
all tokens words numerals
3,505,856.25
Scie
nti
fic
58,443.72
(Lower is better)
Clin
ical
Results: Adjusted Perplexity
PM
F
8.91
5.99
0
5
10
all tokens words numerals
80.62
51.83
0
50
100
all tokens words numerals
3,505,856.25
Scie
nti
fic
58,443.72
softmaxAssumptions Reality (?)
PD
F
(Lower is better)
UNKNUM
large
small
large
small
Clin
ical
Q1: Are existing LMs numerate?
Q2: How to improve the numeracy of LMs?
Strategy: Softmax & Hierarchical Softmax
softmax
V
the
cat
mat
sat
UNK
word
numeralht
𝑤𝑜𝑟𝑑𝑠
softmax softmax
the
cat
mat
sat
UNK
2
1
1.7
2018
UNKNUM
ht
𝑡𝑦𝑝𝑒s
𝒑(𝒏𝒖𝒎𝒆𝒓𝒂𝒍|𝒉𝒕)
Strategy: Softmax & Hierarchical Softmax
softmax
V
the
cat
mat
sat
UNK
2
1
1.7
2018
UNKNUM
word
numeralht
𝑤𝑜𝑟𝑑𝑠
𝑛𝑢𝑚𝑒𝑟𝑎𝑙𝑠
softmax softmax
softmax
the
cat
mat
sat
UNK
2
1
1.7
2018
UNKNUM
ht
𝑡𝑦𝑝𝑒s
Strategy: Softmax & Hierarchical Softmax
softmax
V
the
cat
mat
sat
UNK
word
numeralht
𝑤𝑜𝑟𝑑𝑠
softmax softmax
𝒑(𝒏𝒖𝒎𝒆𝒓𝒂𝒍|𝒉𝒕)
• h-softmax• digit-by-digit• from PDF• etc.
the
cat
mat
sat
UNK
2
1
1.7
2018
UNKNUM
ht
𝑡𝑦𝑝𝑒s
Strategy: Digit-by-Digit Composition
SOS 2 . 1
2 . 1 EOS
ℎ𝑡
𝑝 2.1 = 𝑝 2 𝑝 . |2 𝑝 1 2. )𝑝 𝐸𝑂𝑆 2.1)
d-RNN
Strategy: Digit-by-Digit Composition
SOS 2 . 1
2 . 1 EOS
ℎ𝑡
𝑝 2.1 = 𝑝 2 𝑝 . |2 𝑝 1 2. )𝑝 𝐸𝑂𝑆 2.1)
d-RNN
UNKNUM
1.981.99
1.971.961.95
1.94 2.002.01
2.02
Strategy: from continuous PDF
0
0.5
1
1.5
2
2.5
1 1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2 2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3 3.1
3.2
3.3
3.4
3.5
PD
F
𝑝 𝐧𝐮𝐦𝐞𝐫𝐚𝐥 = 2.1 = 𝑝𝑷𝑴𝑭 2.05 < 𝐧𝐮𝐦𝐛𝐞𝐫 < 2.15 |precision = 1× 𝑝 precision = 1
Strategy: from continuous PDF
0
0.5
1
1.5
2
2.5
1 1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2 2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3 3.1
3.2
3.3
3.4
3.5
PD
F
𝑝 𝐧𝐮𝐦𝐞𝐫𝐚𝐥 = 2.1 = 𝑝𝑷𝑴𝑭 2.05 < 𝐧𝐮𝐦𝐛𝐞𝐫 < 2.15 |precision = 1× 𝑝 precision = 1
ht
MoG
softmax𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡
𝑝 precision = 𝑝𝑅𝑁𝑁 𝑑𝑑𝑑𝑑
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
𝐸𝑂𝑆
Frozen 𝜇sand 𝜎s
Overview of Strategies
<SOS> 2 . 1
2 . 1 <EOS>
MoG
d-RNN
h-softmax
PD
F
2
1.7
2018
UNKNUM
𝑛𝑢𝑚𝑒𝑟𝑎𝑙𝑠
softmax
Overview of Strategies
<SOS> 2 . 1
2 . 1 <EOS>
MoG
d-RNN
h-softmax
ht
softmax𝑠𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠
combination
PD
F
2
1.7
2018
UNKNUM
𝑛𝑢𝑚𝑒𝑟𝑎𝑙𝑠
softmax
Results: Language Modelling (1)
5.99 4.96 4.95 4.99 4.96
0
5
10
softmax h-softmax d-RNN MoG combination
495.95263.22 226.46 197.59
0
500
1000
softmax h-softmax d-RNN MoG combination
8.916.05 5.88 5.88 5.82
0
5
10
softmax h-softmax d-RNN MoG combination
58,443.72
Clinical
All Tokens
Words
Numerals
Ad
just
edPe
rple
xity
(lower is better)
80.6254.8 53.7 54.37 53.03
0
50
100
softmax h-softmax d-RNN MoG combination51.83 49.81 48.89 48.97 48.25
0
50
100
softmax h-softmax d-RNN MoG combination
Results: Language Modelling (2)
3,505,856.25
550.98 519.8683.16
520.95
0
500
1000
softmax h-softmax d-RNN MoG combination
Scientific
All Tokens
Words
Numerals
Ad
just
edPe
rple
xity
(lower is better)
𝑀𝐴𝑃𝐸 =𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 − 𝑡𝑎𝑟𝑔𝑒𝑡
𝑡𝑎𝑟𝑔𝑒𝑡× 100%
Results: Number Prediction
numbernumeral
`2.1’ 2.1
426
622747
514348
552
0
200
400
600
800
mean median softmax h-softmax d-RNN MoG combination
2353.11
(lower isbetter)
Clinical
19471652
1287
590
2333
0
500
1000
1500
2000
2500
mean softmax d-RNN combination
426
622747
514348
552
0
200
400
600
800
mean median softmax h-softmax d-RNN MoG combination
2353.11
80391e23Scientific
Clinical
Results: Number Prediction
(lower isbetter)
Softmax versus Hierarchical Softmax
1 2 3 4 … 100 101 … 2012 2013
20
13
20
12
… 1
01
10
0 …
4 3
2 1
cosine similaritiessoftmax
cosine similaritiesh-softmax
Analysis: d-RNN and Benford’s Law
cosine similaritiesd-RNN
0 1 2 3 4 5 6 7 8 9 . EOS
EOS
. 9 8
7 6
5 4
3 2
1 0
0
10
20
30
0 1 2 3 4 5 6 7 8 9
Scie
nti
fic
1st digit
0
10
20
30
0 1 2 3 4 5 6 7 8 9
4th digit
d-RNN Benford
0
10
20
30
0 1 2 3 4 5 6 7 8 9
4th digit
0
10
20
30
0 1 2 3 4 5 6 7 8 9
Clin
ical
1st digit
Analysis: d-RNN and Benford’s Law
cosine similaritiesd-RNN
0 1 2 3 4 5 6 7 8 9 . EOS
EOS
. 9 8
7 6
5 4
3 2
1 0
Analysis: Model Predictions
MoG
d-RNN
h-softmax
‘… ejective fraction : ____ % ...’
Analysis: Strategy Selection
4 out of 17 segmentsEnhancement > 25 % Li et al. 2003
Ejective fraction: 27.00 %Ejective fraction: 35.00 %HIP 12961 and GL 676
measured 32 x 31 mmNGC 6334 stars
MoG
d-RNN
h-softmax
ht
softmax𝑠𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠
Small integers,percentiles,years
2-digit integers,some ids
reals,some ids
Conclusion (1)
softmax
the
cat
mat
UNK
2
3.14
2018
UNKNUM
ht
Are existing LMs numerate?
Conclusion (1)
‘John’s height is ___ ’
softmax
the
cat
mat
UNK
2
3.14
2018
UNKNUM
ht
999
0
50
25
20183.14
UNKNUM
12
3
Are existing LMs numerate?
Conclusion (2)
MoG
d-RNN
h-softmax
ht
softmax𝑠𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠
combination
How to improve
the numeracy of LMs?
Conclusion (2)
MoG
d-RNN
h-softmax
ht
softmax𝑠𝑡𝑟𝑎𝑡𝑒𝑔𝑖𝑒𝑠
combination
‘John’s height is ___ ’
2.1
1.731.8
2
How to improve
the numeracy of LMs?
Thank you!2.1
1.73
1.8
2
999
0
50
25
2018 3.14
12
3 ℤℝ
ℂℕℚ
𝟐
2
2/35/8
1
0
-17
3.14…
𝟎. ഥ𝟗 1+2i
−𝟏
2000
UNKNUM