DATA MINING
MODULE 1
Topics Covered
Data Mining Task Primitive
Integration of Data Mining systems
Major issues in Data Mining
Data Mining Task Primitive
A data mining task can be specified in the form of a data
mining query, which is input to the data mining system.
A data mining query is defined in terms of data mining
task primitives.
These primitives allow the user to interactively
communicate with the data mining system during the
mining process to discover interesting patterns.
List of Data Mining Task Primitives
Set of task relevant data to be mined.
Kind of knowledge to be mined.
Background knowledge to be used in discovery process.
Interestingness measures and thresholds for pattern
evaluation.
Representation for visualizing the discovered patterns.
Set of task relevant data to be mined:
This specifies the portions of the database or the set of data
in which the user is interested.
This portion includes the following
Database Attributes
Data Warehouse dimensions of interest
For example, suppose that you are a manager of All
Electronics in charge of sales in the United States and
Canada. You would like to study the buying trends of
customers in Canada. Rather than mining on the entire
database. These are referred to as relevant attributes.
Kind of knowledge to be mined
This specifies the data mining functions to be performed,
such as
Characterization& Discrimination
Association
Classification
Clustering
Prediction
Outlier analysis
For instance, if studying the buying habits of customers in
Canada, you may choose to mine associations between
customer profiles and the items that these customers like
to buy.
Background knowledge to be used in discovery process
Users can specify background knowledge, or knowledge
about the domain to be mined. This knowledge is useful
for guiding the knowledge discovery process, and for
evaluating the patterns found. User beliefs about
relationship in the data.
There are several kinds of background knowledge.
 Concept hierarchies are a popular form of background
knowledge, which allow data to be mined at multiple
levels of abstraction.
An example of a concept
hierarchy for the attribute
(or dimension) age is
shown in the following
Figure.
the root node represents
the most general
abstraction level, denoted
as all.
Interestingness measures and thresholds for pattern evaluation
The Interestingness measures are used to separate
interesting and uninteresting patterns from the
knowledge. They may be used to guide the mining
process, or after discovery, to evaluate the discovered
patterns. Different kinds of knowledge may have different
interestingness measures.
For example, interesting measures for association rules
include support and confidence.
Representation for visualizing the discovered patterns
This refers to the form in which discovered patterns
are to be displayed.
Users can choose from different forms for
knowledge presentation, such as rules, tables,
reports, charts, graphs, decision trees, and cubes.

Data Mining Task Primitives Issues in Data Mining

  • 1.
  • 2.
    Topics Covered Data MiningTask Primitive Integration of Data Mining systems Major issues in Data Mining
  • 3.
    Data Mining TaskPrimitive A data mining task can be specified in the form of a data mining query, which is input to the data mining system. A data mining query is defined in terms of data mining task primitives. These primitives allow the user to interactively communicate with the data mining system during the mining process to discover interesting patterns.
  • 4.
    List of DataMining Task Primitives Set of task relevant data to be mined. Kind of knowledge to be mined. Background knowledge to be used in discovery process. Interestingness measures and thresholds for pattern evaluation. Representation for visualizing the discovered patterns.
  • 5.
    Set of taskrelevant data to be mined: This specifies the portions of the database or the set of data in which the user is interested. This portion includes the following Database Attributes Data Warehouse dimensions of interest For example, suppose that you are a manager of All Electronics in charge of sales in the United States and Canada. You would like to study the buying trends of customers in Canada. Rather than mining on the entire database. These are referred to as relevant attributes.
  • 6.
    Kind of knowledgeto be mined This specifies the data mining functions to be performed, such as Characterization& Discrimination Association Classification Clustering Prediction Outlier analysis For instance, if studying the buying habits of customers in Canada, you may choose to mine associations between customer profiles and the items that these customers like to buy.
  • 7.
    Background knowledge tobe used in discovery process Users can specify background knowledge, or knowledge about the domain to be mined. This knowledge is useful for guiding the knowledge discovery process, and for evaluating the patterns found. User beliefs about relationship in the data. There are several kinds of background knowledge.  Concept hierarchies are a popular form of background knowledge, which allow data to be mined at multiple levels of abstraction.
  • 8.
    An example ofa concept hierarchy for the attribute (or dimension) age is shown in the following Figure. the root node represents the most general abstraction level, denoted as all.
  • 9.
    Interestingness measures andthresholds for pattern evaluation The Interestingness measures are used to separate interesting and uninteresting patterns from the knowledge. They may be used to guide the mining process, or after discovery, to evaluate the discovered patterns. Different kinds of knowledge may have different interestingness measures. For example, interesting measures for association rules include support and confidence.
  • 10.
    Representation for visualizingthe discovered patterns This refers to the form in which discovered patterns are to be displayed. Users can choose from different forms for knowledge presentation, such as rules, tables, reports, charts, graphs, decision trees, and cubes.