Upload
basil-bradford
View
216
Download
0
Embed Size (px)
Citation preview
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Validity index for clusters of different sizes and densities
Presenter: Jun-Yi Wu Authors: Krista Rizman Zalik, Borut Zalik
2011 PRL
國立雲林科技大學National Yunlin University of Science and Technology
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Outline
Motivation Objective Methodology Experiments Conclusion Comments
2
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Motivation
3
Most of the previous validity indices have been considerably dependent on the number of data objects in clusters, on cluster centroids and on average values.
Most popular validity measures have the tendency to ignore clusters with low density and are not efficient in validation of partitions having different sizes and densities.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objective
Two cluster validity indices are proposed for efficient validation of partitions containing clusters that widely differ in sizes and densities.
To design a cluster validity index that is suitable for the validation of partitions having different sizes and densities.
4
Overlap Compactness Separation distance
A good partitions:
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
5
Review several popular validity indicesDunn index; D Indx XiE index
Davies-Bouldin’s index; DB index
C index
G index
G+ index
Partition coefficient; PC index
Classification entropy; CE index
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
6
Review several popular validity indices.
D Index
DB Index
G+ Index
C Index
G Index
PC
CE
XiE
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
7
new clustering validity indices. SV-index
Validation of index SV
Fuzzification of the SV index
The proposed index OS exploiting overlap and separation measures
Overlap measure
Separation measure and validity index SV
Validation of index OS
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
8
SV-index
a measure for partition validity that consists of clusters that widely differ in density or size
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
9
Validation of index SV
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
10
Fuzzification of the SV index
A fuzzy version of the index SV is obtained by integrating the membership values in the variation measure.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
11
The proposed index OS exploiting overlap and separation measure Experiment results suggested that inter-cluster separation plays a more
important role in cluster validation.
Indices are limited in their ability to compute the compactness and the separation in partitions having overlapping clusters and clusters of different sizes, which leads to an incorrect validation results.
Considering these results a cluster validity index is suggested based on an overlap and separation measures.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
12
Overlap measure
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
13
Separation measure and validity index SV
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
14
Validation of index OS
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments
15
To demonstrate the effectiveness of the proposed SV and OS indices for determining the optional number of clusters. Artificial data set A1
Artificial data set A2
Artificial data set A3
Iris data set
Wine data set
Glass data set
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments-Artificial data set A1
16
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments-Artificial data set A2
17
.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments-Artificial data set A3
18
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments-Artificial data set A3
19
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments -Iris data set.
20
.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments-Wine data set
21
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments-Wine data set
22
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusion
The experimental results proved that the new indices outperform the other considered indices, especially when cluster widely differ in sizes or densities.
A good partition is expected to have low degree of overlap and a larger separation distance and compactness.
The maximum value of the ratio of the SV index and the minimum value of the OS index indicate the optimal partition.
23
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Comments
2424
Advantage
Drawback ….
Application Clustering
Validity index