View
64
Download
0
Category
Preview:
DESCRIPTION
Spectral Clustering. 指導教授 : 王聖智 S . J. Wang 學生 : 羅介暐 Jie -Wei Luo. Outline. Motivation Graph overview Spectral Clustering Another point of view Conclusion. Motivation. K-means performs very poorly in this space due Dataset exhibits complex cluster shapes. K-means. - PowerPoint PPT Presentation
Citation preview
Spectral Clustering
指導教授 : 王聖智 S. J. Wang學生 : 羅介暐 Jie-Wei Luo
Outline• Motivation
• Graph overview
• Spectral Clustering
• Another point of view
• Conclusion
Motivation
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Þ K-means performs very poorly in this space due Dataset exhibits complex cluster shapes
K-means
Spectral Clustering
Scatter plot of a 2D data set
K-means Clustering Spectral Clustering
U. von Luxburg. A tutorial on spectral clustering. Technical report, Max Planck Institute for Biological Cybernetics, Germany, 2007.
Graph overview
Graph Partitioning
Graph notation
Graph Cut
Distance and Similarity
Graph partitioning
First-graph representation of data
Then-graph partitioning
In this talk–mainly how to find a good partitioning of a given graph using spectral properties of that graph
Graph notation
Note: 1. Sij0 2. Sij=Sji
Always assume that similarities sij are symmetric, non-negativeThen graph is undirected, can be weighted
Graph notation
Degree of vertex vi є V
𝑑𝑖=∑𝑗=1
𝑛
𝑤𝑖𝑗 vi
𝑣𝑜𝑙 ( 𝐴 )=∑𝑖𝜖 𝐴
𝑑𝑖
𝑐𝑢𝑡 ( 𝐴 ,𝐵 )= ∑𝑖 𝜖 𝐴 , 𝑗 𝜖 𝐵
𝑤𝑖𝑗
Graph Cuts
Mincut : min Cut(A1,A2)
However, mincut simply seperates one individual vertex from the rest of the graph
Balanced cut
Problem: finding an optimal graph (normalized) cut is NP-hardApproximation: spectral graph partitioning
12
13
14
Spectral Clustering
•Unnormalized graph Laplacian
• Normalized graph Laplacian
• Other explanation
• Example
Spectral clustering - main algorithms
Input: Similarity matrix S, number k of clusters to construct• Build similarity graph• Compute the first k eigenvectors v1, . . . , vk of the problem matrix
L for unnormalized spectral clusteringLrw for normalized spectral clustering
• Build the matrix V є Rn×k with the eigenvectors as columns• Interpret the rows of V as new data points Zi є Rk
• Cluster the points Zi with the k-means algorithm in Rk
Example-1
2
3
1
4
0 1 1 0 0
1 0 1 0 0
1 1 0 0 0
0 0 0 0 1
0 0 0 1 05
2 0 0 0 0
0 2 0 0 0
0 0 2 0 0
0 0 0 1 0
0 0 0 0 1
L: Laplacian matrix2 -1 -1 0 0
-1 2 -1 0 0
-1 -1 2 0 0
0 0 0 1 -1
0 0 0 -1 1
Similarity Graph
W: adjacency matrix D: degree matrix
Example-1
2
3
1
4
5
Similarity Graph 1 0
1 0
1 0
0 1
0 1
L: Laplacian matrix
2 -1 -1 0 0
-1 2 -1 0 0
-1 -1 2 0 0
0 0 0 1 -1
0 0 0 -1 1
Double Zero Eigenvalue Two Connected Components
First TwoEigenvectors
v1 v2
Example-1
2
3
1
4
5
Similarity Graph
First k Eigenvectors New Clustering Space
1 0
1 0
1 0
0 1
0 1
y1
y2
y3
y4
y5
v1 v2
Use k-means clustering in the new space
v2
v1
Unnormalized graph LaplacianDefine as L=D-W
proof
Unnormalized graph Laplacian
proof
Relation between spectrum and clusters: Multiplicity of k eigenvalue 0 = number k of connectedcomponents A1, ..., Ak of the graph. eigenspace is spanned by the characteristic functions 1A1 , ..., 1Akof those components (so all eigenvecotrs are piecewise constant).
Unnormalized graph Laplacian
Interpret sij = 1 / d(Xi , Xj )2
looks like a discrete version of the standard Laplace operator
Normalized graph Laplacian
Define
Relation between Lsym & Lrw
Lsym Lrw
Eigenvalue λ λ
Eigenvector D1/2u u
Normalized graph Laplacian
Spectral properties similar to L:• Positive semi-definite, smallest eigenvalue is 0• Attention: For Lrw, eigenspace spanned by 1Ai (piecewise const.)but for Lsym, eigenspace spanned by D1/21Ai (not piecewiseconst).
Random walk explanationsGeneral observation:• Random walk on the graph has transition matrix P = D−1W.• note that Lrw = I − P
Specific observation about Ncut :• define P(A|B) is the probability to jump from B to A if weassume that the random walk starts in the stationarydistribution.• Then: Ncut(A, B) = P(A|B) + P(B|A)• Interpretation: Spectral clustering tries to construct groups such that a random walk stays as long as possible within the same group
Possible Explanations
Example-2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-0.709 -0.7085 -0.708 -0.7075 -0.707 -0.7065 -0.706In the embedded space given by two leading eigenvectors, clusters are trivial to separate.
Example-3
In the embedded space given by three leading eigenvectors, clusters are trivial to separate.
Another point of view
Connections
PCALinear combination the original data Xi to now variable Zi
Rank reduce comparison between PCA & Laplacian Eigenmap
PCA is linear combination to reduce dimension, though PCA minimize the Reconstruction error, but it’s not helpful to cluster groups.Spectral clustering is nonlinear reducing dimension which is helpful to cluster, however it’s doesn’t actually have a “rank reduce function” apply to new data, while PCA have it.
ConclusionWhy is spectral clustering useful?• Does not make strong assumptions on cluster shape• Is simple to implement (solving an eigenproblem)• Spectral clustering objective does not have local optima• Has several different derivations• Successful in many applications
What are potential problems?• Can be sensitive to choice of parameters (k in kNN-graph).• Computational expensive on large non-sparse graphs
Recommended