23
Intelligent Database Systems Lab N.Y.U.S. T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm Presenter : Lin, Shu-Han Authors : Jeen-Shing Wang, Jen-Chieh Chiang PR (2008)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Embed Size (px)

Citation preview

Page 1: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

A cluster validity measure with a hybrid parameter search method

for the support vector clustering algorithm

Presenter : Lin, Shu-Han

Authors : Jeen-Shing Wang, Jen-Chieh Chiang

PR (2008)

Page 2: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

2

Outline

Introduction of SVC Motivation Objective Methodology Experiments Conclusion Comments

Page 3: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC

SVC is from SVMs SVMs is supervised clustering technique

Fast convergence

Good generalization performance

Robustness for noise

SVC is unsupervised approach

1. Data points map to HD feature space using a Gaussian kernel.

2. Look for smallest sphere enclose data.

3. Map sphere back to data space to form set of contours.

4. Contours are treated as the cluster boundaries.

3

Page 4: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC - Sphere Analysis

To find the minimal enclose sphere with soft margin:

To solve this problem, the Lagrangian function:

4

a

Page 5: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC - Sphere Analysis

5

Page 6: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC - Sphere Analysis

Karush-Kuhn-Tucker complementarity:

6

Page 7: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC -Sphere Analysis

To find the minimal enclose sphere with soft margin:

C : existence of outliers allowed

7

Wolfe dual optimization

problem a

Bound SV; Outlier

Page 8: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.SVC -Sphere Analysis

The distance (similarity) between x and a:

q : |clusters| & the smoothness/tightness of the cluster boundaries.

8

Mercer kernelKernel: Gaussian

a

Gaussian function:

Page 9: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

9

Drawbacks of Cluster validation Compactness

Different densities or size

As the # of clusters increases, it will monotonic decrease

Separation

Irregular cluster structures

Page 10: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Motivation

10

Their previous study Can handle

Different sizes Different densities Arbitrary shape

But…

Page 11: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Objectives – A cluster validity method and a parameter search algorithm for SVC

Auto determine the two parameter: Increasing q lead to increasing # of clusters

C regulates the existence of outliers and overlapping clusters

To Identify the optimal structure

11

Page 12: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology- Idea

12

q is related to the densities of the clusters Each cluster structure corresponds to an interval of q Identify the optimal structure is equivalent to finding the

largest interval

N=64, max # of cluster = , 8 N

Page 13: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology- Problem

13

How to locate overall search range of q How to detect outliers/noises How to identify the largest interval

Page 14: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Locate range of q

14

Lower bound

Upper bound: Employ K-Means to get clusters, and get variance of each clusters vi

N

Ascending order: cluster size

n =3, the biggest 3 clusters’ variance

Page 15: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – Outlier Detection

Set q = qmax ,the tightest of q

15

outliersingleton

And we get Copt, remove these outlier

Page 16: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – the largest interval

16

qopt

Page 17: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Methodology – the largest interval

17

Fibonacci search: locate the interval wherethe cluster structure is the same

Bisection search

n: iteration

Page 18: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Methodology – Overview

18

Locate range of q

Outlier Detection

the largest interval

Page 19: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments - Benchmark and Artificial Examples

19

Page 20: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

Experiments - Outlier

20

Copt

Page 21: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.Experiments

21

Page 22: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

22

Conclusions

A new measure: Inspired from the observations of q

Determine the optimal cluster structure with its corresponding range of q and C

qC

Page 23: Intelligent Database Systems Lab N.Y.U.S.T. I. M. A cluster validity measure with a hybrid parameter search method for the support vector clustering algorithm

Intelligent Database Systems Lab

N.Y.U.S.T.I. M.

23

Comments

Advantage Inspired from observation of parameter

Drawback …

Application SVC

DBSCAN: MinPts / Eps