Upload
tejana
View
28
Download
0
Embed Size (px)
DESCRIPTION
Fast global k-means clustering using cluster membership and inequality. Presenter : Lin, Shu -Han Authors : Jim Z.C. Lai, Tsung -Jen Huang. Pattern Recognition (PR, 2010). Outline. Motivation Objective Methodology Experiments Conclusion Comments. Motivation. FGKM and MGKM - PowerPoint PPT Presentation
Citation preview
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Fast global k-means clustering using cluster membership and
inequality
Pattern Recognition (PR, 2010)
Presenter : Lin, Shu-Han
Authors : Jim Z.C. Lai, Tsung-Jen Huang
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
2
Outline
Motivation Objective Methodology Experiments Conclusion Comments
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
3
Motivation
FGKM
and MGKM
Have the same computational complexity
MGKM Claims that it is more effective than FGKM (see 2008.PR.8.書漢 .1027.Modified global k-means algorithm for minimum sum-of-squares clustering problems)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
4
Objectives
Develop a set of inequalities to Speed up FGKM and MGKM, called MFGKM
Using Karhunen-Loeve Transform (KLT) closely related to the Principal Component
Analysis (PCA)
, th=.9999
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – MFGKM
5
Red = proposed
(or s Yj’ , called MCS)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – cluster center selection algorithm
6
(Speed up)
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Candidate set construction algorithm
7
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Candidate set construction algorithm (Cont.)
8
A1* A2 A3 … Ana
x10 8.2 4.2 … … …
x9 8.0 2.1
x8 (Q) 7.2 2.2 3.2 … …
x7 5.4
x6 5.1
l+p
r10=2, r10=d(x10, c)
|8.2-7.2|=11+|2.2-4.2|=3>r10, delete x10,
x10 cannot be the nearest neighbor of x8
1.
m 1 2
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Candidate set construction algorithm (Cont.)
9
A1* A2 A3 … Ana
x12 11 4 3 1
x11 9.7 4.3 2 0
x10 8.3 4.2
x9 8.0 5.1
X8 (Q) 7.2 2.2 3.2 … …
x7 5.4
x6 5.1
x5 4.7 ... ... ... ...
x4 ... ... ... ... ...
x3 ... ... ... ... ...
x2 ... ... ... ... ...
x1 ... ... ... ... ...
m
rmax=22.
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Candidate set construction algorithm (Cont.)
10
3.
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Candidate set construction algorithm (Cont.)
11
A1* A2 A3 … Ana
x9 8.2 2.2 4 3 2
X8 (Q) 7.2 2.2 4 3 2
x7 6.2 2.2 4 3 2
m
Diff (distortion)
Diff = (r9-d(x8,x9))+(r10-d(x8,x10))=2-1 + 2-1
4.
Return 2 and center of x9 and x7
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Candidate set construction algorithm (Cont.)
12
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Candidate set construction algorithm (Cont.)
13
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Candidate set construction algorithm (Cont.)
14
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – MCS
15
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments – Computing time
16
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Experiments – Distortion
17
Least distortion
Faster, but distortion
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
18
Conclusions
GKM FGKM: faster, but local MGKM: better performance then FGKM, but needs more computational complexity
MFGKM: faster, and better then MGKM MFGKM+MCS: fastest method, and performance is comparable to MGKM
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
19
Comments
Advantage Improve both performance and speed
Drawback …
Application …
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – k-Means
20
sensitive to the choice of a starting point
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – The GKM algorithm
21
Objective function
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – Objective function
22
Old version
Reformulated version
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – fast GKM algorithm
23
Old version
Proposed version (auxiliary cluster function)
i
jk-1
i
yk-1
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – modified GKM algorithm
24
Proposed version
i
S2k-1
S2
S2
S2
S2
ci
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Methodology – modified GKM algorithm
25