Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Stochastic Neighbor Compression
Matt J. KusnerStephen Tyree
Kilian Q. WeinbergerKunal Agrawal
NN Classifier
class 1
class 3
class 2
[Cover & Hart, 1967]
class 1
class 3
class 2= test input
[Cover & Hart, 1967]NN Classifier
class 3
class 2
class 1
1-nearest neighbor rule[Cover & Hart, 1967]
NN Classifier
class 3
class 2
class 1
1-nearest neighbor rule
during test-time...
complexity:
training inputsfeatures
O�dn
�
nd
[Cover & Hart, 1967]NN Classifier
class 3
class 2
class 1
1-nearest neighbor rule
during test-time...
for each test input!
e.g.complexity:
training inputsfeatures
O�dn
�
nd
[Cover & Hart, 1967]
n = 6⇥ 104
d = 784
⇡ 47 million
O(dn�
NN Classifier
1-nearest neighbor rule
How can we reduce this complexity?
complexity:
training inputsfeatures
O�dn
�
nd
NN Classifier
1-nearest neighbor rule
1. reduce nHow can we reduce this complexity?
complexity:
training inputsfeatures
O�dn
�
nd
NN Classifier
1-nearest neighbor rule
1. reduce n Training Consistent Sampling
Prototype Generation
Prototype Positioning
[Hart, 1968][Anguilli, 2005]
[Mollineda et al., 2002][Bandyopadhyay & Maulik, 2002]
[Bermejo & Cabestany, 1999][Toussaint 2002]
How can we reduce this complexity?
complexity:
training inputsfeatures
O�dn
�
nd
NN Classifier
1-nearest neighbor rule
1. reduce n
2. reduce d
How can we reduce this complexity?
complexity:
training inputsfeatures
O�dn
�
nd
NN Classifier
1-nearest neighbor rule
1. reduce n
2. reduce d
Dimensionality Reduction
[Hinton & Roweis, 2002][Tenenbaum et al., 2000]
[Weinberger et al., 2004][van der Maaten & Hinton, 2008]
[Weinberger & Saul, 2009]
How can we reduce this complexity?
complexity:
training inputsfeatures
O�dn
�
nd
NN Classifier
1-nearest neighbor rule
3. use data structures
1. reduce n
2. reduce d
How can we reduce this complexity?
complexity:
training inputsfeatures
O�dn
�
nd
NN Classifier
1-nearest neighbor rule
Tree Structures
Hashing
[Omohundro, 1989][Beygelzimer et al., 2006]
[Andoni & Indyk, 2006][Gionis et al., 1999]3. use data structures
1. reduce n
2. reduce d
How can we reduce this complexity?
complexity:
training inputsfeatures
O�dn
�
nd
NN Classifier
1-nearest neighbor rule
3. use data structures
1. reduce n
2. reduce d
How can we reduce this complexity?
complexity:
training inputsfeatures
O�dn
�
nd
NN Classifier
1-nearest neighbor rule
Dataset Compression
3. use data structures
1. reduce n
2. reduce d
How can we reduce this complexity?
complexity:
training inputsfeatures
O�dn
�
nd
NN Classifier
Main Idealearn new synthetic inputs!
‘compressed’ datatraining data
Main Idealearn new synthetic inputs!
‘compressed’ data
z1
"· · · zm
#training data
xn�1
"x2 xix1 · · · · · ·
#xn
y1 y2 yi yn�1 yn· · · · · · y1 ym· · ·
Main Idealearn new synthetic inputs!
‘compressed’ data
z1
"· · · zm
#training data
xn�1
"x2 xix1 · · · · · ·
#xn
y1 y2 yi yn�1 yn· · · · · · y1 ym· · ·
m << n
Main Idealearn new synthetic inputs!
use ‘compressed’ set to make future predictions
‘compressed’ data
z1
"· · · zm
#
y1 ym· · ·
Approach
class 1
class 3
class 2
steps to a new training set:
Approach
class 1
class 3
class 21. subsample
steps to a new training set:
Approach
class 1
class 3
class 21. subsample
steps to a new training set:
2. learn prototypes
Approach
class 1
class 3
class 21. subsample2. learn prototypes
steps to a new training set:
move inputs to minimize 1-nn training error
Learning Prototypesmove inputs to minimize 1-nn training error
label of nearest neighbor to in set {z1, . . . , zm}
xi
true label
minz1,...,zm
nX
i=1
[nn{z1,...,zm}(xi) 6= yi]
move inputs to minimize 1-nn training error
Learning Prototypes
label of nearest neighbor to in set {z1, . . . , zm}
xi
true label
minz1,...,zm
nX
i=1
[nn{z1,...,zm}(xi) 6= yi]
xi
z1z2
z3 z4
move inputs to minimize 1-nn training error
Learning Prototypes
minz1,...,zm
nX
i=1
[nn{z1,...,zm}(xi) 6= yi]
not continousnot differentiable
move inputs to minimize 1-nn training error
Learning Prototypes
minz1,...,zm
nX
i=1
[nn{z1,...,zm}(xi) 6= yi]
[Hinton & Roweis, 2002]Stochastic Neighborhood
move inputs to minimize 1-nn training error
Learning Prototypes
minz1,...,zm
nX
i=1
[nn{z1,...,zm}(xi) 6= yi]
[Hinton & Roweis, 2002]Stochastic Neighborhood
xi
move inputs to minimize 1-nn training error
Learning Prototypes
minz1,...,zm
nX
i=1
[nn{z1,...,zm}(xi) 6= yi]
[Hinton & Roweis, 2002]Stochastic Neighborhood
xi
move inputs to minimize 1-nn training error
Learning Prototypes
pi
probability --- is predicted correctly
xi
minz1,...,zm
nX
i=1
[nn{z1,...,zm}(xi) 6= yi]
[Hinton & Roweis, 2002]Stochastic Neighborhood
xi
move inputs to minimize 1-nn training error
Learning Prototypes
pipi
probability --- is predicted correctly
xi1
Z
X
j:yj=yi
exp(��2kxi � zjk22),
minz1,...,zm
nX
i=1
[nn{z1,...,zm}(xi) 6= yi]
xi
move inputs to minimize 1-nn training error
Learning Prototypes
pipi
probability --- is predicted correctly
xi1
Z
X
j:yj=yi
exp(��2kxi � zjk22),
min
z1,...,zm
nX
i=1
� log(pi)
minimize negative log likelihood
minz1,...,zm
nX
i=1
[nn{z1,...,zm}(xi) 6= yi]
xi
move inputs to minimize 1-nn training error
Learning Prototypes
pipi
probability --- is predicted correctly
xi1
Z
X
j:yj=yi
exp(��2kxi � zjk22),
min
z1,...,zm
nX
i=1
� log(pi)
minimize negative log likelihood
use conjugate gradient descent!
Results[a motivating visualization]
Results
- 38 people- ~64 images/person- lighting changes
Yale-Faces
Results
- 38 people- ~64 images/person- lighting changes
Yale-Faces
ResultsYale-Faces
- lighting changes
- 38 people- ~64 images/person
Results
- 38 people- ~64 images/person- lighting changes
Yale-Faces
Resultssubsampled real faces
Resultssubsampled real faces
Results
Resultslearned faces
Results[datasets]
Results
Resultstraining inputs
Resultsnumber of classes
Resultsoriginal & reduced features
Resultsoriginal & reduced features
[Weinberger & Saul, 2009]
Results[error]
Results
Results
ratio of training inputs in compressed set
Results
test error
Results
Results
on full training set
Results
initialization
Results
training set consistent sampling
Results
our method
Results
Results[test-time speed-up]
Resultscomplementary approaches
ball-treeslocality-sensitive hashing (LSH)
[test-time speed-up]
Resultscomplementary approaches
ball-trees
[test-time speed-up]
locality-sensitive hashing (LSH)
Resultscomplementary approaches
ball-trees
compression rate
[test-time speed-up]
locality-sensitive hashing (LSH)
1-NN full dataset
1-NN after SNC
Resultscomplementary approaches
ball-trees
compression rate
[test-time speed-up]
locality-sensitive hashing (LSH)
ball-trees full dataset
ball-trees after SNC
Resultscomplementary approaches
ball-trees
compression rate
[test-time speed-up]
locality-sensitive hashing (LSH)
LSH full dataset
LSH after SNC
Resultscomplementary approaches
ball-treeslocality-sensitive hashing (LSH)
[test-time speed-up]
Resultscomplementary approaches
ball-treeslocality-sensitive hashing (LSH)
[test-time speed-up]
Summary
subsample after optimizationoriginal data
▫ learns a compressed training set for NN classifierStochastic Neighbor Compression
Summary▫ learns a compressed training set for NN classifier
▫ Stochastic Neighborhood[Hinton & Roweis, 2002]
subsample after optimizationoriginal data
xi
Stochastic Neighbor Compression
Summary▫ learns a compressed training set for NN classifier
▫ Stochastic Neighborhood[Hinton & Roweis, 2002]
subsample after optimizationoriginal data
▫ Compression by 96% without error increase in 5/7 cases
Stochastic Neighbor Compression
Summary▫ learns a compressed training set for NN classifier
▫ Stochastic Neighborhood[Hinton & Roweis, 2002]
subsample after optimizationoriginal data
▫ Compression by 96% without error increase in 5/7 cases
▫ test-time speed-ups on top of NN data structures
Stochastic Neighbor Compression
Thank you!Questions?
Results[robustness to label noise]
Results[robustness to label noise]
Results[robustness to label noise]
noise added to training labels
Results[robustness to label noise]
using larger k
Results[robustness to label noise]
training set consistent sampling
Results[robustness to label noise]
our method