K-MEANS CLUSTERING ALGORITHM
Outline:
• What is K-means Clustering Algorithm?
• How K-means Clustering Algorithm Works?
• Example.
What is K-means Clustering Algorithm?
It’s a concept formation (clustering) algorithm that groups
objects based on some attributes/features into K clusters.
How K-means Clustering Algorithm Works?
Having a set of unlabeled data, the aim is to group this data into different number of
clusters, the algorithm starts by iterating through the following steps:
FIRST ITERATION
1. Choose the number of clusters K
2. Select the centroid of each cluster K (first iteration) , for example if we have 3 clusters
then we need to choose three centroids randomly (there are other ways to select
centroids).
3. For each cluster, calculate the distance between objects and its centroid
4. Group based on the minimum distance. For example if we have point A(centroid), B and
C(centroid), the distance between A and B is 0.5, the distance between B and C is 1.5, then
B will be in the A group because the distance is less
5. Check if any object moved from one group to another:
• if yes then go for another iteration (calculate new centroids then go to step 3)
• Else the algorithm stops and return the formed clusters
Remark for SECOND, THIRD, FOURTH, … ITERATIONS
Starting from the second iteration, the algorithm updates the centroids , since
by now we have for example n groups, and each group has some objects, the
centroid will be calculated as = the sum of objects in the group/the number of
objects
EXAMPLE:
STEP 1: Choose number of clusters K
Each medicine represents a point in the graph, in this example our goal is to group objects into two clusters (K=2) based
on the two attributes weight index and pH.
Object (X): weight index (Y):pH
Medicine A 1 1
Medicine B 2 1
Medicine C 4 3
Medicine D 5 4
Sincewewanttohavetwoclusters,thenweneedto
selecttwocentroids.Inthisexample,wechosepointA
andBtobethecentroids
c1=(1,1)cluster1
c2=(2,1)cluster2
STEP 2: Choose centroids
STEP3:Calculatedistancebetweeneachobjectandcentroid(EuclideanDistance)
D1 =
A B C D
STEP 4: Choose the minimum distance
G1 =
A B C D
In this 1st iteration: cluster 1 has only A, cluster 2 has B,C, and D.
Euclidean Distance Formula:
D =
2nd
iteration
Now we need to compute the new centroid of each cluster, since
cluster 1 has only A then the centroid remains c1 = (1,1).
However, cluster 2 now has three members B,C, and D, so we
calculate the centroid by taking the average of the three
members.
C2 = () = (3.67 , 2.67)
Calculatethedistancesfor2nd
iteration
D2 =
A B C D
Choose the minimum distance
G2 =
A B C D
In this 2nd
iteration: cluster 1 has A,B and cluster 2 has C,D
3rd
Iteration
We repeat the same steps, first we compute the new centroid for
each cluster by taking the average.
C1 = () = (1.5 , 1)
C2 = () = (4.5 , 3.5)
Calculatethedistancesfor3rd
iteration
D3 =
A B C D
Choose the minimum distance
G3 =
A B C D
In this 3rd
iteration: cluster 1 has A,B and cluster 2 has C,D
• We found out that G3=G2, this means that no object moved to
another group so we stop the algorithm. Now we have divided our
data into two clusters.
• Cluster 1: A,B
• Cluster 2: C,D

K-means Clustering Algorithm Testing Cases

  • 1.
  • 2.
    Outline: • What isK-means Clustering Algorithm? • How K-means Clustering Algorithm Works? • Example.
  • 3.
    What is K-meansClustering Algorithm? It’s a concept formation (clustering) algorithm that groups objects based on some attributes/features into K clusters.
  • 4.
    How K-means ClusteringAlgorithm Works? Having a set of unlabeled data, the aim is to group this data into different number of clusters, the algorithm starts by iterating through the following steps: FIRST ITERATION 1. Choose the number of clusters K 2. Select the centroid of each cluster K (first iteration) , for example if we have 3 clusters then we need to choose three centroids randomly (there are other ways to select centroids). 3. For each cluster, calculate the distance between objects and its centroid 4. Group based on the minimum distance. For example if we have point A(centroid), B and C(centroid), the distance between A and B is 0.5, the distance between B and C is 1.5, then B will be in the A group because the distance is less 5. Check if any object moved from one group to another: • if yes then go for another iteration (calculate new centroids then go to step 3) • Else the algorithm stops and return the formed clusters
  • 5.
    Remark for SECOND,THIRD, FOURTH, … ITERATIONS Starting from the second iteration, the algorithm updates the centroids , since by now we have for example n groups, and each group has some objects, the centroid will be calculated as = the sum of objects in the group/the number of objects
  • 6.
    EXAMPLE: STEP 1: Choosenumber of clusters K Each medicine represents a point in the graph, in this example our goal is to group objects into two clusters (K=2) based on the two attributes weight index and pH. Object (X): weight index (Y):pH Medicine A 1 1 Medicine B 2 1 Medicine C 4 3 Medicine D 5 4
  • 7.
  • 8.
    STEP3:Calculatedistancebetweeneachobjectandcentroid(EuclideanDistance) D1 = A BC D STEP 4: Choose the minimum distance G1 = A B C D In this 1st iteration: cluster 1 has only A, cluster 2 has B,C, and D. Euclidean Distance Formula: D =
  • 9.
    2nd iteration Now we needto compute the new centroid of each cluster, since cluster 1 has only A then the centroid remains c1 = (1,1). However, cluster 2 now has three members B,C, and D, so we calculate the centroid by taking the average of the three members. C2 = () = (3.67 , 2.67)
  • 10.
    Calculatethedistancesfor2nd iteration D2 = A BC D Choose the minimum distance G2 = A B C D In this 2nd iteration: cluster 1 has A,B and cluster 2 has C,D
  • 11.
    3rd Iteration We repeat thesame steps, first we compute the new centroid for each cluster by taking the average. C1 = () = (1.5 , 1) C2 = () = (4.5 , 3.5)
  • 12.
    Calculatethedistancesfor3rd iteration D3 = A BC D Choose the minimum distance G3 = A B C D In this 3rd iteration: cluster 1 has A,B and cluster 2 has C,D
  • 13.
    • We foundout that G3=G2, this means that no object moved to another group so we stop the algorithm. Now we have divided our data into two clusters. • Cluster 1: A,B • Cluster 2: C,D