Data Mining TaskPrimitive
A data mining task can be specified in the form of a data
mining query, which is input to the data mining system.
A data mining query is defined in terms of data mining
task primitives.
These primitives allow the user to interactively
communicate with the data mining system during the
mining process to discover interesting patterns.
4.
List of DataMining Task Primitives
Set of task relevant data to be mined.
Kind of knowledge to be mined.
Background knowledge to be used in discovery process.
Interestingness measures and thresholds for pattern
evaluation.
Representation for visualizing the discovered patterns.
5.
Set of taskrelevant data to be mined:
This specifies the portions of the database or the set of data
in which the user is interested.
This portion includes the following
Database Attributes
Data Warehouse dimensions of interest
For example, suppose that you are a manager of All
Electronics in charge of sales in the United States and
Canada. You would like to study the buying trends of
customers in Canada. Rather than mining on the entire
database. These are referred to as relevant attributes.
6.
Kind of knowledgeto be mined
This specifies the data mining functions to be performed,
such as
Characterization& Discrimination
Association
Classification
Clustering
Prediction
Outlier analysis
For instance, if studying the buying habits of customers in
Canada, you may choose to mine associations between
customer profiles and the items that these customers like
to buy.
7.
Background knowledge tobe used in discovery process
Users can specify background knowledge, or knowledge
about the domain to be mined. This knowledge is useful
for guiding the knowledge discovery process, and for
evaluating the patterns found. User beliefs about
relationship in the data.
There are several kinds of background knowledge.
Concept hierarchies are a popular form of background
knowledge, which allow data to be mined at multiple
levels of abstraction.
8.
An example ofa concept
hierarchy for the attribute
(or dimension) age is
shown in the following
Figure.
the root node represents
the most general
abstraction level, denoted
as all.
9.
Interestingness measures andthresholds for pattern evaluation
The Interestingness measures are used to separate
interesting and uninteresting patterns from the
knowledge. They may be used to guide the mining
process, or after discovery, to evaluate the discovered
patterns. Different kinds of knowledge may have different
interestingness measures.
For example, interesting measures for association rules
include support and confidence.
10.
Representation for visualizingthe discovered patterns
This refers to the form in which discovered patterns
are to be displayed.
Users can choose from different forms for
knowledge presentation, such as rules, tables,
reports, charts, graphs, decision trees, and cubes.