58
軟軟軟軟軟軟軟 1 Cross distances Exclusive memberships Clustering An iterative approach Lecture 8 K-means for clustering

Lecture 8 K-means for clustering

  • Upload
    pahana

  • View
    61

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture 8 K-means for clustering. Cross distances Exclusive memberships Clustering An iterative approach. Flow chart: move n disks from tower a to c. function HANOI(n,a,b,c). n==1. Y. %Move n-1disks from a to b HANOI(n-1,a,c,b);. fprintf('%d -> %d\n',a,c);. return. - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture 8 K-means for clustering

軟體實作與計算實驗 1

Cross distancesExclusive memberships Clustering

An iterative approach

Lecture 8 K-means for clustering

Page 2: Lecture 8 K-means for clustering

軟體實作與計算實驗 2

Flow chart: move n disks from tower a to c

function HANOI(n,a,b,c)

n==1

fprintf('%d -> %d\n',a,c);%Move n-1disks from a to b HANOI(n-1,a,c,b);%Move 1 disk from a to cHANOI(1,a,b,c);

%Move n-1 disks from b to cHANOI(n-1,b,a,c);

return

return

Y

Page 3: Lecture 8 K-means for clustering

軟體實作與計算實驗 3

Two clusters

Center 1

Center 2

Page 4: Lecture 8 K-means for clustering

軟體實作與計算實驗 4

Cross distances

How to find distances between centers and given points?

Page 5: Lecture 8 K-means for clustering

軟體實作與計算實驗 5

Calculation of Cross distances

Given N points X: Nx2M centers Y: Mx2 D: NxMD(i,j) denotes the distance between X(i,:)

and Y(j,:)Given X and Y, find D

Page 6: Lecture 8 K-means for clustering

軟體實作與計算實驗 6

It needs to calculate cross distances between N points and M centers to determine memberships of N points

Page 7: Lecture 8 K-means for clustering

軟體實作與計算實驗 7

Nested loops for cross distances

for n=1:N

for m=1:M

D(n,m)=sum((X(n,:) –Y(m,:)).^2)

Page 8: Lecture 8 K-means for clustering

軟體實作與計算實驗 8

Matlab codes for nested codes

for i=1:N for j=1:M dd=X(i,:)-Y(j,:); D(i,j)=sqrt(sum(dd.^2)); endend

Page 9: Lecture 8 K-means for clustering

軟體實作與計算實驗 9

Straightforward implementationNested looping

A loop within a loopMN calculations of the distance between a

point and a centerTime consuming for large M,N and d

Page 10: Lecture 8 K-means for clustering

軟體實作與計算實驗 10

Vector codes

How to calculate cross distances without using for-looping or while-looping ?

Vector codes are loop-freeVector codes for cross distances can

significantly improve efficiency against nested looping in computation

Page 11: Lecture 8 K-means for clustering

軟體實作與計算實驗 11

ijijij

Tjj

Tji

Tii

Tj

Tijiij

CBA

D

2

2

))((

yyyxxx

yxyx

D : cross distances between N points and M centersMatrix D is decomposed to matrices A, B and CA : elements in a row are identical B : multiplication of matrix X and transpose of matrix YC : elements in a column are identical

Page 12: Lecture 8 K-means for clustering

軟體實作與計算實驗 12

Performance comparison

demo_distance2.m

Page 13: Lecture 8 K-means for clustering

軟體實作與計算實驗 13

ijijij

Tjj

Tji

Tii

Tj

Tijiij

CBA

D

2

2

))((

yyyxxx

yxyx

M=size(Y,1);N=size(X,1);A=sum(X.^2,2)*ones(1,M);C=ones(N,1)*sum(Y.^2,2)';B=X*Y';D=sqrt(A-2*B+C);

Page 14: Lecture 8 K-means for clustering

軟體實作與計算實驗 14

Partition to two regions

Each point has its exclusive membership to non-overlapping regions partitioned by two centers

y1y2

R1R2

Page 15: Lecture 8 K-means for clustering

軟體實作與計算實驗 15

Step A: calculate cross distancesStep B: determine membershipsStep C: determine K means

Page 16: Lecture 8 K-means for clustering

軟體實作與計算實驗 16

Exclusive membership

y1 and y2 denote two centers

R1 and R2 denote two regions partitioned by y1 and y2

A point belongs to R1 if it is closer to y1

A point belongs to R2 if it is closer to y2

Page 17: Lecture 8 K-means for clustering

軟體實作與計算實驗 17

Three clusters

y1

y2

y3

R1

R2

R3

Page 18: Lecture 8 K-means for clustering

軟體實作與計算實驗 18

K clusters

Locating K centersSignificant geometric features of points

in Rd

Page 19: Lecture 8 K-means for clustering

軟體實作與計算實驗 19

Exclusive membership

y1 , y2 ,…, yK denote K distinct centers

R1 , R2 ,…, RK denote K regions partitioned by y1 , y2 ,…, yK

A point belongs to Ri if it is closest to yi

among K centers

Page 20: Lecture 8 K-means for clustering

軟體實作與計算實驗 20

Let D denote cross distances between N given points and M centers

Find points nearest to the jth center

Page 21: Lecture 8 K-means for clustering

軟體實作與計算實驗 21

Point x(i,: ) belongs the jth cluster if

),(),D( min kiDjik

Page 22: Lecture 8 K-means for clustering

軟體實作與計算實驗 22

Exclusive memberships (step B)

Given D, exclusive memberships v of N points can be determined by

[xx v]=min(D');

Page 23: Lecture 8 K-means for clustering

軟體實作與計算實驗 23

Updating K-means (step C)

Determine who belong the jth cluster

Determine their mean

ind=find(v == j);

Y(j,:) = mean(X(ind,:))

Page 24: Lecture 8 K-means for clustering

軟體實作與計算實驗 24

Clustering

Where are K centers ?

Page 25: Lecture 8 K-means for clustering

軟體實作與計算實驗 25

K-means

A popular heuristic approach for clustering analysis

An iterative approachStep A : cross distance DStep B : exclusive memberships vStep C : updating centers Y

Page 26: Lecture 8 K-means for clustering

軟體實作與計算實驗 26

Data and MATLAB codes

data_9.zipdemo_kmeans.m

load data_9.mat plot(X(:,1),X(:,2),'.'); [cidx, Y] = kmeans(X,10); hold on; plot(Y(:,1),Y(:,2),'ro');

Page 27: Lecture 8 K-means for clustering

軟體實作與計算實驗 27

Data Clustering

[cidx, Y] = kmeans(X,10);

• Partition given data to K clusters• The K-means algorithm aims to find means (centers) of K clusters

Page 28: Lecture 8 K-means for clustering

軟體實作與計算實驗 28

Memberships

Black points belong to the cluster centered at black circle

Page 29: Lecture 8 K-means for clustering

軟體實作與計算實驗 29

A math tool for data clustering

ClusteringTest.rar

Page 30: Lecture 8 K-means for clustering

軟體實作與計算實驗 30

Steps for data clustering:

NewPenDataEnter MKMEANS

Page 31: Lecture 8 K-means for clustering

軟體實作與計算實驗 31

Page 32: Lecture 8 K-means for clustering

軟體實作與計算實驗 32

Page 33: Lecture 8 K-means for clustering

軟體實作與計算實驗 33

Main steps of K-means clustering

Iterative execution of steps A, B and C until K means convergeA : cross distances DB : exclusive memberships vC : update K means Y

Page 34: Lecture 8 K-means for clustering

軟體實作與計算實驗 34

Step A: Calculation of Cross distances

X: Nx2Y: Mx2D: NxMD(i,j) denotes the distance between X(i,:)

and Y(j,:)Given X and Y, find D

Page 35: Lecture 8 K-means for clustering

軟體實作與計算實驗 35

[cidx, Y] = kmeans(X,10);

M=size(Y,1);N=size(X,1);A=sum(X.^2,2)*ones(1,M);C=ones(N,1)*sum(Y.^2,2)';B=X*Y';D=sqrt(A-2*B+C);

CrossDistances

Page 36: Lecture 8 K-means for clustering

軟體實作與計算實驗 36

Membership

X(:,i) belongs a cluster

Page 37: Lecture 8 K-means for clustering

軟體實作與計算實驗 37

Exclusive memberships (step B)

Given D, exclusive memberships v of N points can be determined by

[xx v]=min(D');

Page 38: Lecture 8 K-means for clustering

軟體實作與計算實驗 38

Updating K-means (step C)

ind=find(v == j);

Y(j,:) = mean(X(ind,:))

Page 39: Lecture 8 K-means for clustering

軟體實作與計算實驗 39

Initialize K centers randomly, Y change = 1; ep = 10.^-6;

change < epslon

T

Calculate change

Step A: find cross distances DStep B: exclusive memberships vStep C: updating Y

Page 40: Lecture 8 K-means for clustering

軟體實作與計算實驗 40

Initialization

Calculate the mean of N points mean_x=mean(X)Y=rand(M,2)*0.1-0.05+mean_x

Page 41: Lecture 8 K-means for clustering

軟體實作與計算實驗 41

Step A: Cross distances

D = cross_distances(X,Y)

Page 42: Lecture 8 K-means for clustering

軟體實作與計算實驗 42

Step B: Exclusive memberships v

[xx v]=min(D');

Page 43: Lecture 8 K-means for clustering

軟體實作與計算實驗 43

Step C: Update centers

for i=1:K

ind=find(v == i);Y_new(i,:) = mean(X( ind,:));

Page 44: Lecture 8 K-means for clustering

軟體實作與計算實驗 44

Partition N points to K clusters

D = cross_distances(X,Y)[xx v]=min(D');

% for i=1:K

ind=find(v == i);Y_new(i,:) = mean(X( ind,:));

end

Page 45: Lecture 8 K-means for clustering

軟體實作與計算實驗 45

Halting

change = mean(mean(abs(Y-Y_new)))Halting condition

chang < ep

Page 46: Lecture 8 K-means for clustering

軟體實作與計算實驗 46

my_kmeans

my_kmeans.m

load data_9.mat>> plot(X(:,1),X(:,2),'.');>> Y=my_kmeans(X,10);>> hold on>> plot(Y(:,1),Y(:,2),'or');

Page 47: Lecture 8 K-means for clustering

軟體實作與計算實驗 47

ClusteringTest2.figClusteringTest2.m

A version that is able to show stepwise execution of partitioning and updating of the kmeans algorithm

Page 48: Lecture 8 K-means for clustering

軟體實作與計算實驗 48

Page 49: Lecture 8 K-means for clustering

軟體實作與計算實驗 49

Initialization

X =

39 78 39 77 43 40 40 47 38 50 72 56 83 59 39 50

Y =

49.0603 57.2121 49.1061 57.2084

Page 50: Lecture 8 K-means for clustering

軟體實作與計算實驗 50

Cross distanceX =

39 78 39 77 43 40 40 47 38 50 72 56 83 59 39 50

Y =

49.0603 57.2121 49.1061 57.2084

D = cross_distances(X,Y);

D =

23.0943 23.1176 22.1984 22.2226 18.2478 18.2596 13.6519 13.6797 13.2039 13.2404 22.9717 22.9257 33.9868 33.9412 12.3783 12.4135

[xx v]=min(D');

v =

1 1 1 1 1 2 2 1

Page 51: Lecture 8 K-means for clustering

軟體實作與計算實驗 51

Partition & Update

ind =

1 1 1 1 1 2 2 1

for i=1:M ind=find(v == i); Y_new(i,:) =mean(X(ind,:)); end

Page 52: Lecture 8 K-means for clustering

軟體實作與計算實驗 52

Initialize K centers randomly, Y change = 1; ep = 10.^-6;

change < epslon

T

Calculate change

function Y=my_kmeans(X,M)

D = cross_distances(X,Y);[xx v]=min(D'); for i=1:M ind=find(v == i); Y_new(i,:) =mean(X(ind,:)); end

change = mean(mean(abs(Y-Y_new)));

Y=Y_new;

Step A: find cross distances DStep B: exclusive memberships vStep C: updating Y

Page 53: Lecture 8 K-means for clustering

軟體實作與計算實驗 53

k-means clustering - Wikipedia

Page 54: Lecture 8 K-means for clustering

軟體實作與計算實驗 54

Description

Page 55: Lecture 8 K-means for clustering

軟體實作與計算實驗 55

History

Page 56: Lecture 8 K-means for clustering

軟體實作與計算實驗 56

Standard Algorithm

Page 57: Lecture 8 K-means for clustering

軟體實作與計算實驗 57

Page 58: Lecture 8 K-means for clustering

軟體實作與計算實驗 58

Advanced topics based on K-means

ClassificationFunction approximationDensity estimation