Upload
jenna
View
76
Download
0
Embed Size (px)
DESCRIPTION
Extensions of vector quantization for incremental clustering. Edwin Lughofer PR, Vol.41 2008, pp. 995–1011 Presenter : Wei- Shen Tai 20 11 / 1/19. Outline . Introduction Vector quantization Extensions of vector quantization Evaluation Conclusion and outlook Comments . Motivation . - PowerPoint PPT Presentation
Citation preview
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
Extensions of vector quantization for incremental clustering
Edwin Lughofer
PR, Vol.41 2008, pp. 995–1011
Presenter : Wei-Shen Tai
2011/1/19
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
2
Outline Introduction Vector quantization Extensions of vector quantization Evaluation Conclusion and outlook Comments
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
3
Motivation Incremental clustering processes
Quite often online measurements are recorded resulting in data streams for various applications.
In an online manner, guarantee that queries are up-to-date and that results can be answered with a small time delay.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
4
Objective An incremental and evolving vector quantization
Processes data streams in a on-line clustering scheme. Omits pre-definition of the number of clusters and
improve the quality of cluster partitions with several strategies.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
5
Vector quantization1. Choose initial values for the C cluster centers.2. Fetch out the next data sample of the data set.3. Calculate the distance of the selected data point to all cluster centers.4. Elicit the cluster center which is closest to the data point.5. Update the p components of the winning cluster by moving it towards the
selected point.
6. If the data set contains data points which were not processed through steps 2–5, goto step 2.
7. If any cluster center was moved significantly in the last iteration, say more than , reset the pointer to the data buffer at the beginning and goto step 2, otherwise stop.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
6
Vector quantization in incremental mode
Stability / plasticity dilemma in ART-2 Using vigilance parameter ρ to control the tradeoff between adaptation
of already learned clusters (stability) and generation of new clusters (plasticity).
Differences between VQ and VQ-INC The starting number of clusters is zeros. If the distance between the incoming input x and the closest cluster
center cwin is larger than ρ and x is not faulty, a new cluster will be created. Otherwise, cwin is updated to move toward to x.
Update the ranges of all p variables if x is not faulty. Besides, η is changed with the amount of data points belonging to each cluster in a monotonic decreasing way.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
7
An alternative distance strategy Both ‘over-clustering’ and incorrect partition of the
input space occur in VQ-INC. Instead of classic Euclidean distance, the ranges of influence
for all clusters or the surface along the direction towards the cluster center are applied in VQ-INC-EXT.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
8
Satellite deletion Cluster satellites
Undesirable tiny clusters, which lie very close to significantly bigger ones.
Identify outliers and satellites If ki/N <1%, cluster i is regarded as an outlier cluster. If ki/N < low_mass and ci lies inside the range of influence
of any other cluster, elicit the closest center cwin. Calculate the distance of ci to the surface of all other
clusters.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
9
A split-and-merge strategy Parameter ρ
Cannot be known in advance and a bad setting may cause an incorrect cluster structure.
Not-optimal clustering It is prevented by merging clusters grown together or by
splitting big clusters including more than one distinct data cloud.
Calculate the quality of cluster partition in three phases including before spilt, after spilt (p results) and after merged. Then pick the best cluster partition to replace existing one.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
10
Evaluation
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
11
Conclusion and outlook A new extended vector quantization (VQ-INCEXT)
Can be applied for data streams in fast online applications or for huge data bases.
Provides an incremental learning scheme and incorporates new distance measurement, satellite deletion and online split-and-merge strategy.
Outlooks Split-and-merge strategy may suffer from computation speed. Reacting to drifts or shifts in the data, drifts changes the distribution of
the underlying data smoothly over time; shifts trigger abrupt and sudden changes of the data characteristics.
N.Y.U.S.T.
I. M.
Intelligent Database Systems Lab
12
Comments Advantage
This proposed method extends VQ to a incremental learning VQ and adds several strategies to improve the quality of cluster partition simultaneously.
Data streams can be effectively processed by this on-line learning VQ. Drawback
In algorithm 3, the vector of winning cluster is updated by Eq.(1) according to the Manhattan distance between the winning cluster and the input whenever the new distance strategy is applied.
Application Data stream on-line learning issue.