Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012

Tpicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitrio da FEI 2012

4a. Aula Parte B

O algoritmo K-means

K-Means n Algoritmo muito conhecido para agrupamento (clustering) de padres. n Usado quando se pode definir o nmero de agrupamentos: Escolha o nmero de agrupamentos desejado. Escolha centros e membros dos agrupamentos de modo a minimizar o erro. No pode ser feito por busca: muitos parmetros.

K-Means n Algoritmo: Fixe os centros dos agrupamentos. Aloque os pontos para o agrupamento mais prximo. Recalcule os centros dos clusters, como sendo a mdia dos pontos que ele representa. Repita at que os centros parem de se mover.

K-Means n Pode ser usado para qualquer atributo para o qual se pode calcular uma distncia

Clustering n Partitioning Clustering Approach: a typical clustering analysis approach via partitioning data set iteratively construct a partition of a data set to produce several non-empty clusters (usually, the number of clusters given in advance) in principle, partitions achieved via minimising the sum of squared distance in each cluster

Clustering n Given a K, find a partition of K clusters to optimise the chosen partitioning criterion: global optimal: exhaustively enumerate all partitions Heuristic method: K-means algorithm (MacQueen67): each cluster is represented by the center of the cluster and the algorithm converges to stable centers of clusters.

Algorithm n Initialisation: set seed points n Assign each object to the cluster with the nearest seed point; n Compute seed points as the centroids of the clusters of the current partition (the centroid is the centre, i.e., mean point, of the cluster) n Go back to Step 1), n stop when no more new assignment Given the cluster number K, the K-means algorithm is carried out in three steps:

Example n Suppose we have 4 types of medicines and each has two attributes: pH and weight index. n Our goal is to group these objects into K=2 group of medicine.

Example AB C D MedicineWeightpH-Index A11 B21 C43 D54

Step 1: Use initial seed points for partitioning Assign each object to the cluster with the nearest seed point Euclidean distance

Step 2: Compute new centroids of the current partition Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.

Step 2: Renew membership based on new centroids 14 Compute the distance of all objects to the new centroids Assign the membership to objects

Step 3: Repeat the first two steps until its convergence Knowing the members of each cluster, now we compute the new centroid of each group based on these new memberships.

Repeat the first two steps until its convergence Compute the distance of all objects to the new centroids Stop due to no new assignment

K-means Demo 17 1.User set up the number of clusters theyd like. (e.g. k=5)

K-means Demo 18 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster Center locations

K-means Demo 19 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster Center locations 3.Each data point finds out which Center its closest to. (Thus each Center owns a set of data points)

K-means Demo 20 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each Center owns a set of data points) 4.Each centre finds the centroid of the points it owns

K-means Demo 21 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each centre owns a set of data points) 4.Each centre finds the centroid of the points it owns 5.and jumps there

K-means Demo 22 1.User set up the number of clusters theyd like. (e.g. K=5) 2.Randomly guess K cluster centre locations 3.Each data point finds out which centre its closest to. (Thus each centre owns a set of data points) 4.Each centre finds the centroid of the points it owns 5.and jumps there 6.Repeat until terminated!

Exemplo K-means no Matlab 23

Exemplo k-means no iPad 24

Relevant Issues n Efficient in computation O(tKn), where n is number of objects, K is number of clusters, and t is number of iterations. Normally, K, t

Documents

Tópicos Especiais em Aprendizagem Reinaldo Bianchi Centro Universitário da FEI 2012