Tpicos Especiais em Aprendizagem Reinaldo Bianchi Centro
Universitrio da FEI 2012
Slide 2
4a. Aula Parte B
Slide 3
O algoritmo K-means
Slide 4
K-Means n Algoritmo muito conhecido para agrupamento
(clustering) de padres. n Usado quando se pode definir o nmero de
agrupamentos: Escolha o nmero de agrupamentos desejado. Escolha
centros e membros dos agrupamentos de modo a minimizar o erro. No
pode ser feito por busca: muitos parmetros.
Slide 5
K-Means n Algoritmo: Fixe os centros dos agrupamentos. Aloque
os pontos para o agrupamento mais prximo. Recalcule os centros dos
clusters, como sendo a mdia dos pontos que ele representa. Repita
at que os centros parem de se mover.
Slide 6
K-Means n Pode ser usado para qualquer atributo para o qual se
pode calcular uma distncia
Slide 7
Clustering n Partitioning Clustering Approach: a typical
clustering analysis approach via partitioning data set iteratively
construct a partition of a data set to produce several non-empty
clusters (usually, the number of clusters given in advance) in
principle, partitions achieved via minimising the sum of squared
distance in each cluster
Slide 8
Clustering n Given a K, find a partition of K clusters to
optimise the chosen partitioning criterion: global optimal:
exhaustively enumerate all partitions Heuristic method: K-means
algorithm (MacQueen67): each cluster is represented by the center
of the cluster and the algorithm converges to stable centers of
clusters.
Slide 9
Algorithm n Initialisation: set seed points n Assign each
object to the cluster with the nearest seed point; n Compute seed
points as the centroids of the clusters of the current partition
(the centroid is the centre, i.e., mean point, of the cluster) n Go
back to Step 1), n stop when no more new assignment Given the
cluster number K, the K-means algorithm is carried out in three
steps:
Slide 10
Example n Suppose we have 4 types of medicines and each has two
attributes: pH and weight index. n Our goal is to group these
objects into K=2 group of medicine.
Slide 11
Example AB C D MedicineWeightpH-Index A11 B21 C43 D54
Slide 12
Step 1: Use initial seed points for partitioning Assign each
object to the cluster with the nearest seed point Euclidean
distance
Slide 13
Step 2: Compute new centroids of the current partition Knowing
the members of each cluster, now we compute the new centroid of
each group based on these new memberships.
Slide 14
Step 2: Renew membership based on new centroids 14 Compute the
distance of all objects to the new centroids Assign the membership
to objects
Slide 15
Step 3: Repeat the first two steps until its convergence
Knowing the members of each cluster, now we compute the new
centroid of each group based on these new memberships.
Slide 16
Repeat the first two steps until its convergence Compute the
distance of all objects to the new centroids Stop due to no new
assignment
Slide 17
K-means Demo 17 1.User set up the number of clusters theyd
like. (e.g. k=5)
Slide 18
K-means Demo 18 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster Center locations
Slide 19
K-means Demo 19 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster Center locations 3.Each
data point finds out which Center its closest to. (Thus each Center
owns a set of data points)
Slide 20
K-means Demo 20 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each
data point finds out which centre its closest to. (Thus each Center
owns a set of data points) 4.Each centre finds the centroid of the
points it owns
Slide 21
K-means Demo 21 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each
data point finds out which centre its closest to. (Thus each centre
owns a set of data points) 4.Each centre finds the centroid of the
points it owns 5.and jumps there
Slide 22
K-means Demo 22 1.User set up the number of clusters theyd
like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each
data point finds out which centre its closest to. (Thus each centre
owns a set of data points) 4.Each centre finds the centroid of the
points it owns 5.and jumps there 6.Repeat until terminated!
Slide 23
Exemplo K-means no Matlab 23
Slide 24
Exemplo k-means no iPad 24
Slide 25
Relevant Issues n Efficient in computation O(tKn), where n is
number of objects, K is number of clusters, and t is number of
iterations. Normally, K, t