 INTRODUCTION
 STING
 WAVECLUSTER
 CLIQUE-Clustering in QUEST
 FAST PROCESSING TIME
 The grid based clustering approach uses a multi
resolution grid data structure.
 The object space is quantized into finite number
of cells that form a grid structure.
 The major advantage of this method is fast
processing time.
 It is dependent only on the number of cells in
each dimension in the quantized space.
 Statistical information GRID.
 Spatial area is divided into rectangular cells
 Several levels of cells-at different levels of
resolution
 High level cell is partitioned into several
lower level cells.
 Statistical attributes are stored in cell.
(mean , maximum , minimum)
 Computation is query independent
 Parallel processing-supported.
 Data is processed in a single pass
 Quality depends on granuerily
 A multi-resolution clustering approach which
applies wavelet transform to the feature space
 A wavelet transform is a signal processing
technique that decomposes a signal into different
frequency sub-band
 Both grid-based and density-based
 Input parameters:
 # of cells for each dimension
 The wavelet , and the # of application wavelet
transform.
 Complexity O(N)
 Detect arbitrary shaped clusters at different
scales.
 Not sensitive to noise , not sensitive to input
order.
 Only applicable to low dimensional data.
CLIQUE can be considered as both density-
based and grid-based
1.It partitions each dimension into the same number
of equal length interval.
2.It partitions an m-dimensional data space into
non-overlapping rectangular units.
3.A unit is dense if the fraction of total data points
contained in the unit exceeds the input model
parameter.
4.A cluster is a maximal set of connected dense units
within a subspace.
 Attempt to optimize the fit between the data
and some mathematical model.
 ASSUMPTION:-data are generated by a
mixture of underlying portability distributes.
 TECHNIQUES:
 expectation-maximization
 Conceptual clustering
 Neural networks approach
 ITERATIVE REFINEMENT ALGORITHM-
used to find parameter estimates
EXTENSION OF K-MEANS
 Assigns an object to a cluster according to a
weight representing portability of
membership.
 Initial estimate of parameters
 Iteratively reassigns scores.
 A form of clustering in machine learning
 Produces a classification scheme for a set of
unlabeled objects.
 Finds characteristics description for each concept
 COBWEB
 A popular and simple method of incremental
conceptual learning.
 Creates a hierarchical clustering in the form of a
classification tree.
Animal
P(Co)=1.0
P(scales | Co)=0.25
Fish
P(C1)=0.25
P(scales|C1)=
1.0
Amphibian
P(C2)=0.25
P(moist|C2)=1.
0
Mammal/bird
P(C3)=0.5
P(hair|C3)=0.
5
Mammal
P(C4)=0.5
P(hair|C4)=1
.0
Bird
P(C5)=0.5
P(feathers|c5
)=1.0
 Represent each cluster as an exemplar , acting as
a “prototype” of the cluster.
 New objects are distributed to the cluster whose
exemplar is the most similar according to some
distance measure.
SELF ORGANIZING MAP
 Competitive learning
 Involves a hierarchical architecture of several
units
 Organization of units-forms a feature map
 Web document clustering.
FEATURE TRANSFORMATION METHODS
 PCA , SVD-Summarize data by creating linear
combinations of attributes.
 But do not remove any attributes ;
transformed attributes-complex to interpret
FEATURE SELECTION METHODS
 Most relevant of attributes with represent to
class labels
 Entropy analysis .

Grid based method & model based clustering method

  • 2.
     INTRODUCTION  STING WAVECLUSTER  CLIQUE-Clustering in QUEST  FAST PROCESSING TIME
  • 3.
     The gridbased clustering approach uses a multi resolution grid data structure.  The object space is quantized into finite number of cells that form a grid structure.  The major advantage of this method is fast processing time.  It is dependent only on the number of cells in each dimension in the quantized space.
  • 4.
     Statistical informationGRID.  Spatial area is divided into rectangular cells  Several levels of cells-at different levels of resolution  High level cell is partitioned into several lower level cells.  Statistical attributes are stored in cell. (mean , maximum , minimum)
  • 5.
     Computation isquery independent  Parallel processing-supported.  Data is processed in a single pass  Quality depends on granuerily
  • 7.
     A multi-resolutionclustering approach which applies wavelet transform to the feature space  A wavelet transform is a signal processing technique that decomposes a signal into different frequency sub-band  Both grid-based and density-based  Input parameters:  # of cells for each dimension  The wavelet , and the # of application wavelet transform.
  • 9.
     Complexity O(N) Detect arbitrary shaped clusters at different scales.  Not sensitive to noise , not sensitive to input order.  Only applicable to low dimensional data.
  • 10.
    CLIQUE can beconsidered as both density- based and grid-based 1.It partitions each dimension into the same number of equal length interval. 2.It partitions an m-dimensional data space into non-overlapping rectangular units. 3.A unit is dense if the fraction of total data points contained in the unit exceeds the input model parameter. 4.A cluster is a maximal set of connected dense units within a subspace.
  • 11.
     Attempt tooptimize the fit between the data and some mathematical model.  ASSUMPTION:-data are generated by a mixture of underlying portability distributes.  TECHNIQUES:  expectation-maximization  Conceptual clustering  Neural networks approach
  • 12.
     ITERATIVE REFINEMENTALGORITHM- used to find parameter estimates EXTENSION OF K-MEANS  Assigns an object to a cluster according to a weight representing portability of membership.  Initial estimate of parameters  Iteratively reassigns scores.
  • 13.
     A formof clustering in machine learning  Produces a classification scheme for a set of unlabeled objects.  Finds characteristics description for each concept  COBWEB  A popular and simple method of incremental conceptual learning.  Creates a hierarchical clustering in the form of a classification tree.
  • 14.
  • 15.
     Represent eachcluster as an exemplar , acting as a “prototype” of the cluster.  New objects are distributed to the cluster whose exemplar is the most similar according to some distance measure. SELF ORGANIZING MAP  Competitive learning  Involves a hierarchical architecture of several units  Organization of units-forms a feature map  Web document clustering.
  • 16.
    FEATURE TRANSFORMATION METHODS PCA , SVD-Summarize data by creating linear combinations of attributes.  But do not remove any attributes ; transformed attributes-complex to interpret FEATURE SELECTION METHODS  Most relevant of attributes with represent to class labels  Entropy analysis .