Fuzzy Clustering
Presenter: Aydin Ayanzadeh
Email:ayanzadeh17@itu.edu.tr
StudentID: 504161503
Outline
● Clustering
● Goals of Clustering
● Clustering Application
● K-means
● C-means
● Fuzzy Clustering Application
● Iris dataset segmentation
● KFCM
Clustering
● Unsupervised learning
Goals of Clustering
● finding representatives for homogeneous
groups(data reduction).
● finding “natural clusters” and describe their unknown
properties (“natural” data types).
● finding useful and suitable groupings (“useful” data
classes).
● finding unusual data objects (outlier detection).
Possible Applications
● Marketing:
● Biology
● City-planning
● Earthquake studies
K-mean
1. The centroids of the K clusters, which can be
used to label new data
2. Labels for the training data (each data point is
assigned to a single cluster)
K-mean Algorithm
K-means Algorithm
Unlucky Centroid
● Choosing poorly the random initial
centroids
Solutions of this problem:
● Distribute them over the space.
● Try different sets of random centroids
Fuzzy c-means clustering(FCM)
Fuzzy c-means (FCM) is a method of clustering
which allows one piece of data to belong to two
or more clusters. This method is frequently used
in pattern recognition. It is based on
minimization of the objective function !
Fuzzy c-Means Algorithm
● Iterations
● Degree of membership
● Fuzziness coefficient
● Termination condition
Pros and Cons of Fuzzy
● Advantages
○ Unsupervised
○ Always converges
● Disadvantages
○ Long computational time
○ Sensitivity to the initial guess (speed, local minima)
○ Sensitivity to noise
○ I One expects low (or even no) membership degree for outliers (noisy
○ points)
Fuzzy C-Means Application
Fuzzy C-Means Clustering for Iris Data
● 4-dimensional data: sepal length, sepal
width, petal length, and petal
● setosa, versicolor and virginica
● number of clusters
● Number of iteration
Fuzzy C-Means Clustering for Iris Data
Image segmentation(KFCM)
● FCM It is not robust in noisy
images
● Lack of local information of image
pixels
● Spatial penalty
Pre-processing Step
SKFCM Algorithm
References
● J. C. Dunn (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", Journal of
Cybernetics 3: 32-57
● J. C. Bezdek (1981): "Pattern Recognition with Fuzzy Objective Function Algoritms", Plenum Press, New York
● Tariq Rashid: “Clustering”
● http://www.cs.bris.ac.uk/home/tr1690/documentation/fuzzy_clustering_initial_report/node11.html
● Hans-Joachim Mucha and Hizir Sofyan: “Nonhierarchical Clustering”
● http://www.quantlet.com/mdstat/scripts/xag/html/xaghtmlframe149.html
24

Fuzzy Clustering(C-means, K-means)

  • 1.
    Fuzzy Clustering Presenter: AydinAyanzadeh Email:[email protected] StudentID: 504161503
  • 2.
    Outline ● Clustering ● Goalsof Clustering ● Clustering Application ● K-means ● C-means ● Fuzzy Clustering Application ● Iris dataset segmentation ● KFCM
  • 3.
  • 4.
    Goals of Clustering ●finding representatives for homogeneous groups(data reduction). ● finding “natural clusters” and describe their unknown properties (“natural” data types). ● finding useful and suitable groupings (“useful” data classes). ● finding unusual data objects (outlier detection).
  • 5.
    Possible Applications ● Marketing: ●Biology ● City-planning ● Earthquake studies
  • 6.
    K-mean 1. The centroidsof the K clusters, which can be used to label new data 2. Labels for the training data (each data point is assigned to a single cluster)
  • 7.
  • 8.
  • 9.
    Unlucky Centroid ● Choosingpoorly the random initial centroids Solutions of this problem: ● Distribute them over the space. ● Try different sets of random centroids
  • 13.
    Fuzzy c-means clustering(FCM) Fuzzyc-means (FCM) is a method of clustering which allows one piece of data to belong to two or more clusters. This method is frequently used in pattern recognition. It is based on minimization of the objective function !
  • 14.
    Fuzzy c-Means Algorithm ●Iterations ● Degree of membership ● Fuzziness coefficient ● Termination condition
  • 15.
    Pros and Consof Fuzzy ● Advantages ○ Unsupervised ○ Always converges ● Disadvantages ○ Long computational time ○ Sensitivity to the initial guess (speed, local minima) ○ Sensitivity to noise ○ I One expects low (or even no) membership degree for outliers (noisy ○ points)
  • 16.
  • 17.
    Fuzzy C-Means Clusteringfor Iris Data ● 4-dimensional data: sepal length, sepal width, petal length, and petal ● setosa, versicolor and virginica ● number of clusters ● Number of iteration
  • 18.
  • 20.
    Image segmentation(KFCM) ● FCMIt is not robust in noisy images ● Lack of local information of image pixels ● Spatial penalty
  • 21.
  • 22.
  • 23.
    References ● J. C.Dunn (1973): "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters", Journal of Cybernetics 3: 32-57 ● J. C. Bezdek (1981): "Pattern Recognition with Fuzzy Objective Function Algoritms", Plenum Press, New York ● Tariq Rashid: “Clustering” ● http://www.cs.bris.ac.uk/home/tr1690/documentation/fuzzy_clustering_initial_report/node11.html ● Hans-Joachim Mucha and Hizir Sofyan: “Nonhierarchical Clustering” ● http://www.quantlet.com/mdstat/scripts/xag/html/xaghtmlframe149.html
  • 24.

Editor's Notes

  • #6 Marketing: finding groups of customers with similar behavior given a large database of customer data containing their properties and past buying records; • Biology: classification of plants and animals given their features; • Libraries: book ordering; • Insurance: identifying groups of motor insurance policy holders with a high average claim cost; identifying frauds; • City-planning: identifying groups of houses according to their house type, value and geographical location; • Earthquake studies: clustering observed earthquake epicenters to identify dangerous zones; • WWW: document classification; clustering weblog data to discover groups of similar access patterns.