36
Machine Learning 21431 Instance Based Learning

Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Machine Learning 21431

Instance Based Learning

Page 2: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Outline

K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Page 3: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Literature & Software T. Mitchell, “Machine Learning”, chapter 8,

“Instance-Based Learning” “Locally Weighted Learning”, Christopher

Atkeson, Andrew Moore, Stefan Schaalftp:/ftp.cc.gatech.edu/pub/people/cga/air.htmlR. Duda et ak, “Pattern recognition”, chapter

4 “Non-Parametric Techniques” Netlab toolbox

k-nearest neighbor classification Radial basis function networks

Page 4: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

When to Consider Nearest Neighbors

Instances map to points in RN

Less than 20 attributes per instance Lots of training dataAdvantages: Training is very fast Learn complex target functions Do not loose informationDisadvantages: Slow at query time Easily fooled by irrelevant attributes

Page 5: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Instance Based Learning

Key idea: just store all training examples <xi,f(xi)>

Nearest neighbor: Given query instance xq, first locate nearest

training example xn, then estimate f(xq)=f(xn)

K-nearest neighbor: Given xq, take vote among its k nearest

neighbors (if discrete-valued target function) Take mean of f values of k nearest neighbors (if

real-valued) f(xq)=i=1k f(xi)/k

Page 6: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Voronoi Diagram

query point qf

nearest neighbor qi

Page 7: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

3-Nearest Neighbors

query point qf

3 nearest neighbors

2x,1o

Page 8: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

7-Nearest Neighbors

query point qf

7 nearest neighbors

3x,4o

Page 9: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Behavior in the Limit

Consider p(x) the probability that instance x is classified as positive (1) versus negative (0)

Nearest neighbor:As number of instances approaches Gibbs

algorithm Gibbs algorithm: with probability p(x) predict ,

else 0 K-nearest neighbors:As number of instances approaches Bayes

optimal classifier Bayes optimal: if p(x)> 0.5 predict 1 else 0

Page 10: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Nearest Neighbor (continuous)

3-nearest neighbor

Page 11: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Nearest Neighbor (continuous)

5-nearest neighbor

Page 12: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Nearest Neighbor (continuous)

1-nearest neighbor

Page 13: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Locally Weighted Regression

Forms an explicit approximation f*(x) for region surrounding query point xq. Fit linear function to k nearest neighbors Fit quadratic function Produces piecewiese approximation of f

Squared error over k nearest neighborsE(xq) = xi nearest neighbors (f*(xq)-f(xi))2

Distance weighted error over all neighborsE(xq) = i (f*(xq)-f(xi))2 K(d(xi,xq))

Page 14: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Locally Weighted Regression

Regression means approximating a real-valued target function

Residual is the error f*(x)-f(x) in approximating the target function Kernel function is the function of distance

that is used to determine the weight of each training example. In other words, the kernel function is the function K such that wi=K(d(xi,xq))

Page 15: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Distance Weighted k-NN

Give more weight to neighbors closer to the query point

f*(xq) = i=1k wi f(xi) / i=1

k wi

where wi=K(d(xq,xi))

and d(xq,xi) is the distance between xq and xi

Instead of only k-nearest neighbors use all training examples (Shepard’s method)

Page 16: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Distance Weighted Average

Weighting the data:

f*(xq) = i f(xi) K(d(xi,xq))/ i K(d(xi,xq))

Relevance of a data point (xi,f(xi)) is measured by calculating the distance d(xi,xq) between the query xq and the input vector xi

Weighting the error criterion:

E(xq) = i (f*(xq)-f(xi))2 K(d(xi,xq))

the best estimate f*(xq) will minimize the cost E(xq), therefore E(xq)/f*(xq)=0

Page 17: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Kernel Functions

Page 18: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Distance Weighted NNK(d(xq,xi)) = 1/ d(xq,xi)2

Page 19: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Distance Weighted NNK(d(xq,xi)) = 1/(d0+d(xq,xi))2

Page 20: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Distance Weighted NNK(d(xq,xi)) = exp(-(d(xq,xi)/0)2)

Page 21: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Example: Mexican Hat

f(x1,x2)=sin(x1)sin(x2)/x1x2 approximation

Page 22: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Example: Mexican Hat

residual

Page 23: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Locally Weighted Linear Regression

Local linear function f*(x) = w0 + n wn xn Error criterion E = i (w0 + n wn xqn -f(xi))2 K(d(xi,xq)) Gradient descent

wn = i (f*(xq)- f(xi)) xn K(d(xi,xq)) Least square solution

w = ((KX)T KX)-1 (KX)T f(X)

with KX NxM matrix of row vectors K(d(xi,xq)) xi and

f(X) is a vector whose i-th element is f(xi)

Page 24: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Curse of Dimensionality

Imagine instances described by 20 attributes but only are relevant to target function

Curse of dimensionality: nearest neighbor is easily misled when instance space is high-dimensional

One approach: Stretch j-th axis by weight zj, where z1,…,zn

chosen to minimize prediction error Use cross-validation to automatically choose

weights z1,…,zn Note setting zj to zero eliminates this dimension

alltogether (feature subset selection)

Page 25: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Linear Global Models The model is linear in the parameters k,

which can be estimated using a least squares algorithm f^(xi) = k=1

D wk xki or F(x) = X Where xi=(x1,…,xD)i, i=1..N, with D the input

dimension and N the number of data points.Estimate the k by minimizing the error criterion E= i=1

N (f^(xi) – yi)2

(XTX) = XT F(X) = (XT X)-1 XT F(X) k= m=1

D n=1N (l=1

D xTkl xlm)-1 xT

mn f(xn)

Page 26: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Linear Regression Example

Page 27: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Linear Local Models Estimate the parameters k such that they

locally (near the query point xq) match the training data either by

weighting the data: wi=K(d(xi,xq))1/2 and transforming

zi=wi xi

vi=wi yi

or by weighting the error criterion:

E= i=1N (xi

T – yi)2 K(d(xi,xq))

still linear in with LSQ solution = ((WX)T WX)-1 (WX)T F(X)

Page 28: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Linear Local Model Example

query point Xq=0.35

Kernel K(x,xq)

Local linear model:f^(x)=b1x+b0

f^(xq)=0.266

Page 29: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Linear Local Model Example

Page 30: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Design Issues in Local Regression

Local model order (constant, linear, quadratic) Distance function d

feature scaling: d(x,q)=(j=1d mj(xj-qj)2)1/2

irrelevant dimensions mj=0 kernel function K smoothing parameter bandwidth h in K(d(x,q)/h)

h=|m| global bandwidth h= distance to k-th nearest neighbor point h=h(q) depending on query point h=hi depending on stored data points

See paper by Atkeson [1996] ”Locally Weighted Learning”

Page 31: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Radial Basis Function Network

Global approximation to target function in terms of linear combination of local approximations

Used, e.g. for image classification Similar to back-propagation neural

network but activation function is Gaussian rather than sigmoid

Closely related to distance-weighted regression but ”eager” instead of ”lazy”

Page 32: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Radial Basis Function Network

input layer

Kernel functions

output f(x)

xi

Kn(d(xn,x))=exp(-1/2 d(xn,x)2/2)

wn linear parameters

f(x)=w0+n=1k wn Kn(d(xn,x))

Page 33: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Training Radial Basis Function Networks

How to choose the center xn for each Kernel function Kn? scatter uniformly across instance space use distribution of training instances

(clustering) How to train the weights?

Choose mean xn and variance n for each Kn

non-linear optimization or EM Hold Kn fixed and use local linear regression

to compute the optimal weights wn

Page 34: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Radial Basis Network Example

K1(d(x1,x))=

exp(-1/2 d(x1,x)2/2)

w1 x+ w0

f^(x) = K1 (w1 x+ w0) + K2 (w3 x + w2)

Page 35: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

and Eager Learning

Lazy: wait for query before generalizing k-nearest neighbors, weighted linear regression

Eager: generalize before seeing query Radial basis function networks, decision trees,

back-propagation, LOLIMOT Eager learner must create global

approximation Lazy learner can create local approximations If they use the same hypothesis space, lazy

can represent more complex functions (H=linear functions)

Page 36: Machine Learning 21431 Instance Based Learning. Outline K-Nearest Neighbor Locally weighted learning Local linear models Radial basis functions

Laboration 3 Distance weighted average Cross-validation for optimal kernel width Leave 1-out cross-validation

f*(xq) = iq f(xi) K(d(xi,xq))/ i q K(d(xi,xq))

Cross-validation for feature subset selection Neural Network