24
Intelligent Database Systems Lab N.Y.U.S. T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman Zalik, Borut Zalik 2011 PRL 國國國國國國國國 National Yunlin University of Science and Technology

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Embed Size (px)

Citation preview

Page 1: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.

Validity index for clusters of different sizes and densities

Presenter: Jun-Yi Wu Authors: Krista Rizman Zalik, Borut Zalik

2011 PRL

國立雲林科技大學National Yunlin University of Science and Technology

Page 2: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Outline

Motivation Objective Methodology Experiments Conclusion Comments

2

Page 3: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Motivation

3

Most of the previous validity indices have been considerably dependent on the number of data objects in clusters, on cluster centroids and on average values.

Most popular validity measures have the tendency to ignore clusters with low density and are not efficient in validation of partitions having different sizes and densities.

Page 4: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Objective

Two cluster validity indices are proposed for efficient validation of partitions containing clusters that widely differ in sizes and densities.

To design a cluster validity index that is suitable for the validation of partitions having different sizes and densities.

4

Overlap Compactness Separation distance

A good partitions:

Page 5: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

5

Review several popular validity indicesDunn index; D Indx XiE index

Davies-Bouldin’s index; DB index

C index

G index

G+ index

Partition coefficient; PC index

Classification entropy; CE index

Page 6: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

6

Review several popular validity indices.

D Index

DB Index

G+ Index

C Index

G Index

PC

CE

XiE

Page 7: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

7

new clustering validity indices. SV-index

Validation of index SV

Fuzzification of the SV index

The proposed index OS exploiting overlap and separation measures

Overlap measure

Separation measure and validity index SV

Validation of index OS

Page 8: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

8

SV-index

a measure for partition validity that consists of clusters that widely differ in density or size

Page 9: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

9

Validation of index SV

Page 10: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

10

Fuzzification of the SV index

A fuzzy version of the index SV is obtained by integrating the membership values in the variation measure.

Page 11: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

11

The proposed index OS exploiting overlap and separation measure Experiment results suggested that inter-cluster separation plays a more

important role in cluster validation.

Indices are limited in their ability to compute the compactness and the separation in partitions having overlapping clusters and clusters of different sizes, which leads to an incorrect validation results.

Considering these results a cluster validity index is suggested based on an overlap and separation measures.

Page 12: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

12

Overlap measure

Page 13: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

13

Separation measure and validity index SV

Page 14: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Methodology

14

Validation of index OS

Page 15: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments

15

To demonstrate the effectiveness of the proposed SV and OS indices for determining the optional number of clusters. Artificial data set A1

Artificial data set A2

Artificial data set A3

Iris data set

Wine data set

Glass data set

Page 16: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Artificial data set A1

16

Page 17: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Artificial data set A2

17

.

Page 18: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Artificial data set A3

18

Page 19: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Artificial data set A3

19

Page 20: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments -Iris data set.

20

.

Page 21: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Wine data set

21

Page 22: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Experiments-Wine data set

22

Page 23: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Conclusion

The experimental results proved that the new indices outperform the other considered indices, especially when cluster widely differ in sizes or densities.

A good partition is expected to have low degree of overlap and a larger separation distance and compactness.

The maximum value of the ratio of the SV index and the minimum value of the OS index indicate the optimal partition.

23

Page 24: Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman

Intelligent Database Systems Lab

N.Y.U.S.T.

I. M.Comments

2424

Advantage

Drawback ….

Application Clustering

Validity index