Stochastic Neighbor Compression - GitHub Pagesmkusner.github.io/presentations/Stochastic_Neighbor_Compression.pdf · 1-nearest neighbor rule 1. reduce n Training Consistent Sampling

Stochastic Neighbor Compression

Matt J. KusnerStephen Tyree

Kilian Q. WeinbergerKunal Agrawal

NN Classifier

class 1

class 3

class 2

[Cover & Hart, 1967]

class 1

class 3

class 2= test input

[Cover & Hart, 1967]NN Classifier

class 3

class 2

class 1

1-nearest neighbor rule[Cover & Hart, 1967]

NN Classifier

class 3

class 2

class 1

1-nearest neighbor rule

during test-time...

complexity:

training inputsfeatures

O�dn

�

nd

[Cover & Hart, 1967]NN Classifier

class 3

class 2

class 1


during test-time...

for each test input!

e.g.complexity:


O�dn

�

nd

[Cover & Hart, 1967]

n = 6⇥ 104

d = 784

⇡ 47 million

O(dn�

NN Classifier


How can we reduce this complexity?

complexity:


O�dn

�

nd

NN Classifier


1. reduce nHow can we reduce this complexity?

complexity:


O�dn

�

nd

NN Classifier


1. reduce n Training Consistent Sampling

Prototype Generation

Prototype Positioning

[Hart, 1968][Anguilli, 2005]

[Mollineda et al., 2002][Bandyopadhyay & Maulik, 2002]

[Bermejo & Cabestany, 1999][Toussaint 2002]


complexity:


O�dn

�

nd

NN Classifier


1. reduce n

2. reduce d


complexity:


O�dn

�

nd

NN Classifier


1. reduce n

2. reduce d

Dimensionality Reduction

[Hinton & Roweis, 2002][Tenenbaum et al., 2000]

[Weinberger et al., 2004][van der Maaten & Hinton, 2008]

[Weinberger & Saul, 2009]


complexity:


O�dn

�

nd

NN Classifier


3. use data structures

1. reduce n

2. reduce d


complexity:


O�dn

�

nd

NN Classifier


Tree Structures

Hashing

[Omohundro, 1989][Beygelzimer et al., 2006]

[Andoni & Indyk, 2006][Gionis et al., 1999]3. use data structures

1. reduce n

2. reduce d


complexity:


O�dn

�

nd

NN Classifier



1. reduce n

2. reduce d


complexity:


O�dn

�

nd

NN Classifier


Dataset Compression


1. reduce n

2. reduce d


complexity:


O�dn

�

nd

NN Classifier

Main Idealearn new synthetic inputs!

‘compressed’ datatraining data


‘compressed’ data

z1

"· · · zm

#training data

xn�1

"x2 xix1 · · · · · ·

#xn

y1 y2 yi yn�1 yn· · · · · · y1 ym· · ·



z1

"· · · zm

#training data

xn�1

"x2 xix1 · · · · · ·

#xn

y1 y2 yi yn�1 yn· · · · · · y1 ym· · ·

m << n


use ‘compressed’ set to make future predictions


z1

"· · · zm

#

y1 ym· · ·

Approach

class 1

class 3

class 2

steps to a new training set:

Approach

class 1

class 3

class 21. subsample


Approach

class 1

class 3

class 21. subsample


2. learn prototypes

Approach

class 1

class 3

class 21. subsample2. learn prototypes


move inputs to minimize 1-nn training error

Learning Prototypesmove inputs to minimize 1-nn training error

label of nearest neighbor to in set {z1, . . . , zm}

xi

true label

minz1,...,zm

nX

i=1

[nn{z1,...,zm}(xi) 6= yi]


Learning Prototypes

label of nearest neighbor to in set {z1, . . . , zm}

xi

true label

minz1,...,zm

nX

i=1

[nn{z1,...,zm}(xi) 6= yi]

xi

z1z2

z3 z4


Learning Prototypes

minz1,...,zm

nX

i=1

[nn{z1,...,zm}(xi) 6= yi]

not continousnot differentiable


Learning Prototypes

minz1,...,zm

nX

i=1

[nn{z1,...,zm}(xi) 6= yi]

[Hinton & Roweis, 2002]Stochastic Neighborhood


Learning Prototypes

minz1,...,zm

nX

i=1

[nn{z1,...,zm}(xi) 6= yi]


xi


Learning Prototypes

minz1,...,zm

nX

i=1

[nn{z1,...,zm}(xi) 6= yi]


xi


Learning Prototypes

pi

probability --- is predicted correctly

xi

minz1,...,zm

nX

i=1

[nn{z1,...,zm}(xi) 6= yi]


xi


Learning Prototypes

pipi


xi1

Z

X

j:yj=yi

exp(��2kxi � zjk22),

minz1,...,zm

nX

i=1

[nn{z1,...,zm}(xi) 6= yi]

xi


Learning Prototypes

pipi


xi1

Z

X

j:yj=yi


min

z1,...,zm

nX

i=1

� log(pi)

minimize negative log likelihood

minz1,...,zm

nX

i=1

[nn{z1,...,zm}(xi) 6= yi]

xi


Learning Prototypes

pipi


xi1

Z

X

j:yj=yi


min

z1,...,zm

nX

i=1

� log(pi)

minimize negative log likelihood

use conjugate gradient descent!

Results[a motivating visualization]

Results

- 38 people- ~64 images/person- lighting changes

Yale-Faces

Results


Yale-Faces

ResultsYale-Faces

- lighting changes

- 38 people- ~64 images/person

Results


Yale-Faces

Resultssubsampled real faces

Resultssubsampled real faces

Results

Resultslearned faces

Results[datasets]

Results

Resultstraining inputs

Resultsnumber of classes

Resultsoriginal & reduced features

Resultsoriginal & reduced features

[Weinberger & Saul, 2009]

Results[error]

Results

Results

ratio of training inputs in compressed set

Results

test error

Results

Results

on full training set

Results

initialization

Results

training set consistent sampling

Results

our method

Results

Results[test-time speed-up]

Resultscomplementary approaches

ball-treeslocality-sensitive hashing (LSH)

[test-time speed-up]


ball-trees


locality-sensitive hashing (LSH)


ball-trees

compression rate



1-NN full dataset

1-NN after SNC


ball-trees

compression rate



ball-trees full dataset

ball-trees after SNC


ball-trees

compression rate



LSH full dataset

LSH after SNC







Summary

subsample after optimizationoriginal data

▫ learns a compressed training set for NN classifierStochastic Neighbor Compression

Summary▫ learns a compressed training set for NN classifier

▫ Stochastic Neighborhood[Hinton & Roweis, 2002]


xi





▫ Compression by 96% without error increase in 5/7 cases





▫ Compression by 96% without error increase in 5/7 cases

▫ test-time speed-ups on top of NN data structures


Thank you!Questions?

Results[robustness to label noise]



noise added to training labels


using larger k


training set consistent sampling


our method

Documents

Stochastic Neighbor Compression - GitHub Pagesmkusner.github.io/presentations/Stochastic_Neighbor_Compression.pdf · 1-nearest neighbor rule 1. reduce n Training Consistent Sampling