36
Spectral Clustering 指指指指 : 指指指 S. J. Wang 指指 : 指指指 Jie-Wei Luo

Spectral Clustering

  • Upload
    calix

  • View
    64

  • Download
    0

Embed Size (px)

DESCRIPTION

Spectral Clustering. 指導教授 : 王聖智 S . J. Wang 學生 : 羅介暐 Jie -Wei Luo. Outline. Motivation Graph overview Spectral Clustering Another point of view Conclusion. Motivation. K-means performs very poorly in this space due Dataset exhibits complex cluster shapes. K-means. - PowerPoint PPT Presentation

Citation preview

Page 1: Spectral Clustering

Spectral Clustering

指導教授 : 王聖智 S. J. Wang學生 : 羅介暐 Jie-Wei Luo

Page 2: Spectral Clustering

Outline• Motivation

• Graph overview

• Spectral Clustering

• Another point of view

• Conclusion

Page 3: Spectral Clustering

Motivation

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Þ K-means performs very poorly in this space due Dataset exhibits complex cluster shapes

Page 4: Spectral Clustering

K-means

Page 5: Spectral Clustering

Spectral Clustering

Scatter plot of a 2D data set

K-means Clustering Spectral Clustering

U. von Luxburg. A tutorial on spectral clustering. Technical report, Max Planck Institute for Biological Cybernetics, Germany, 2007.

Page 6: Spectral Clustering

Graph overview

Graph Partitioning

Graph notation

Graph Cut

Distance and Similarity

Page 7: Spectral Clustering

Graph partitioning

First-graph representation of data

Then-graph partitioning

In this talk–mainly how to find a good partitioning of a given graph using spectral properties of that graph

Page 8: Spectral Clustering

Graph notation

Note: 1. Sij0 2. Sij=Sji

Always assume that similarities sij are symmetric, non-negativeThen graph is undirected, can be weighted

Page 9: Spectral Clustering

Graph notation

Degree of vertex vi є V

𝑑𝑖=∑𝑗=1

𝑛

𝑤𝑖𝑗 vi

𝑣𝑜𝑙 ( 𝐴 )=∑𝑖𝜖 𝐴

𝑑𝑖

𝑐𝑢𝑡 ( 𝐴 ,𝐵 )= ∑𝑖 𝜖 𝐴 , 𝑗 𝜖 𝐵

𝑤𝑖𝑗

Page 10: Spectral Clustering

Graph Cuts

Mincut : min Cut(A1,A2)

However, mincut simply seperates one individual vertex from the rest of the graph

Balanced cut

Problem: finding an optimal graph (normalized) cut is NP-hardApproximation: spectral graph partitioning

Page 11: Spectral Clustering
Page 12: Spectral Clustering

12

Page 13: Spectral Clustering

13

Page 14: Spectral Clustering

14

Page 15: Spectral Clustering
Page 16: Spectral Clustering

Spectral Clustering

•Unnormalized graph Laplacian

• Normalized graph Laplacian

• Other explanation

• Example

Page 17: Spectral Clustering

Spectral clustering - main algorithms

Input: Similarity matrix S, number k of clusters to construct• Build similarity graph• Compute the first k eigenvectors v1, . . . , vk of the problem matrix

L for unnormalized spectral clusteringLrw for normalized spectral clustering

• Build the matrix V є Rn×k with the eigenvectors as columns• Interpret the rows of V as new data points Zi є Rk

• Cluster the points Zi with the k-means algorithm in Rk

Page 18: Spectral Clustering

Example-1

2

3

1

4

0 1 1 0 0

1 0 1 0 0

1 1 0 0 0

0 0 0 0 1

0 0 0 1 05

2 0 0 0 0

0 2 0 0 0

0 0 2 0 0

0 0 0 1 0

0 0 0 0 1

L: Laplacian matrix2 -1 -1 0 0

-1 2 -1 0 0

-1 -1 2 0 0

0 0 0 1 -1

0 0 0 -1 1

Similarity Graph

W: adjacency matrix D: degree matrix

Page 19: Spectral Clustering

Example-1

2

3

1

4

5

Similarity Graph 1 0

1 0

1 0

0 1

0 1

L: Laplacian matrix

2 -1 -1 0 0

-1 2 -1 0 0

-1 -1 2 0 0

0 0 0 1 -1

0 0 0 -1 1

Double Zero Eigenvalue Two Connected Components

First TwoEigenvectors

v1 v2

Page 20: Spectral Clustering

Example-1

2

3

1

4

5

Similarity Graph

First k Eigenvectors New Clustering Space

1 0

1 0

1 0

0 1

0 1

y1

y2

y3

y4

y5

v1 v2

Use k-means clustering in the new space

v2

v1

Page 21: Spectral Clustering

Unnormalized graph LaplacianDefine as L=D-W

proof

Page 22: Spectral Clustering

Unnormalized graph Laplacian

proof

Relation between spectrum and clusters: Multiplicity of k eigenvalue 0 = number k of connectedcomponents A1, ..., Ak of the graph. eigenspace is spanned by the characteristic functions 1A1 , ..., 1Akof those components (so all eigenvecotrs are piecewise constant).

Page 23: Spectral Clustering

Unnormalized graph Laplacian

Interpret sij = 1 / d(Xi , Xj )2

looks like a discrete version of the standard Laplace operator

Page 24: Spectral Clustering

Normalized graph Laplacian

Define

Relation between Lsym & Lrw

Lsym Lrw

Eigenvalue λ λ

Eigenvector D1/2u u

Page 25: Spectral Clustering

Normalized graph Laplacian

Spectral properties similar to L:• Positive semi-definite, smallest eigenvalue is 0• Attention: For Lrw, eigenspace spanned by 1Ai (piecewise const.)but for Lsym, eigenspace spanned by D1/21Ai (not piecewiseconst).

Page 26: Spectral Clustering

Random walk explanationsGeneral observation:• Random walk on the graph has transition matrix P = D−1W.• note that Lrw = I − P

Specific observation about Ncut :• define P(A|B) is the probability to jump from B to A if weassume that the random walk starts in the stationarydistribution.• Then: Ncut(A, B) = P(A|B) + P(B|A)• Interpretation: Spectral clustering tries to construct groups such that a random walk stays as long as possible within the same group

Page 27: Spectral Clustering

Possible Explanations

Page 28: Spectral Clustering

Example-2

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 -0.706In the embedded space given by two leading eigenvectors, clusters are trivial to separate.

Page 29: Spectral Clustering

Example-3

In the embedded space given by three leading eigenvectors, clusters are trivial to separate.

Page 30: Spectral Clustering

Another point of view

Page 31: Spectral Clustering
Page 32: Spectral Clustering
Page 33: Spectral Clustering

Connections

Page 34: Spectral Clustering

PCALinear combination the original data Xi to now variable Zi

Page 35: Spectral Clustering

Rank reduce comparison between PCA & Laplacian Eigenmap

PCA is linear combination to reduce dimension, though PCA minimize the Reconstruction error, but it’s not helpful to cluster groups.Spectral clustering is nonlinear reducing dimension which is helpful to cluster, however it’s doesn’t actually have a “rank reduce function” apply to new data, while PCA have it.

Page 36: Spectral Clustering

ConclusionWhy is spectral clustering useful?• Does not make strong assumptions on cluster shape• Is simple to implement (solving an eigenproblem)• Spectral clustering objective does not have local optima• Has several different derivations• Successful in many applications

What are potential problems?• Can be sensitive to choice of parameters (k in kNN-graph).• Computational expensive on large non-sparse graphs