Instance Based Learning in Machine Learning

Instance-Based Learning
4/16/2020 1Pavithra T, Dept of ECE, GSKSJTI

Overview
• Instance-Based Learning
•Comparison of Eager and Instance-Based Learning
• Instance Distances for Instance-Based Learning
• Nearest Neighbor (NN) Algorithm
• Advantages and Disadvantages of the NN algorithm
• Approaches to overcome the Disadvantages of the NN
algorithm
•Locally weighted regression
•Radial basis functions
•Case based Reasoning
4/16/202
0
2Pavithra T, Dept of ECE, GSKSJTI

Different Learning Methods
• Eager Learning
– Learning = acquiring an explicit structure of a classifier
on the whole training set;
– Classification = an instance gets a classification using
the explicit structure of the classifier.
• Instance-Based Learning (Lazy Learning)
– Learning = storing all training instances
– Classification = an instance gets a classification equal to
the classification of the nearest instances to the instance.
4/16/202
0

–All learning methods presented so far construct a general
explicit description of the target function when examples are
provided
–In case of Instance Based learning,
– Examples are simply stored
– Generalizing is postponed until a new instance must be
classiﬁed
– Sometimes referred to as lazy learning
– In order to assign a target function value, its relationship
to the previously stored examples is examined
– IBL includes Nearest neighbor , locally weighted regression
and case based reasoning methods
Instance-Based Learning
4/16/202
0
4
Pavithra T, Dept of ECE, GSKSJTI

 Advantages:
 Instead of estimating for the whole instance space, local
approximations to the target function are possible
 Especially if target function is complex but still decomposable
 Disadvantages:
 Classiﬁcation costs are high (number of computations to index
each training example at query time)
 Efﬁcient techniques for indexing examples are important to
reduce computational effort
 Typically all attributes are considered when attempting to
retrieve similar training examples from memory
 If the concept depends only on a few attributes, the truly most
similar instances may be far away
4/16/202
0

The Features of the Task of the NN Algorithm:
• The instance language comes with a set A with n attributes a1,
a2, … an.
• The domain of each attribute ai can be discrete or continuous.
• An instance x is represented as < a1(x), a2(x), … an(x) >,
where ai(x) is the value of the attribute ai for the instance x;
• The classes to be learned can be:
– Discrete: In this case we learn discrete function f(x) and the
co-domain C of the function consists of the classes c to be
learned.
– Continuous: In this case we learn continuous function f(x) and
the co-domain C of the function consists of the classes c to be
learned.
Nearest-Neighbor Algorithm (NN)
4/16/202
0
6

a
ji
jia
range
xaxa
),x(xd
|)()(| 

Distance Functions
The distance functions are composed from difference metrics da
w.r.t. attributes a defined for each two instances xi and xj.
• If the attribute a is numerical, then :
• If the attribute a is discrete, then :


 

otherwise.1,
)a()a(if0, ji
jia
xx
),x(xd
4/16/202
0

Distance Functions
The main distance function for determining nearest
neighbors is the Euclidean distance:
2
),(


Aa
jiaji xxd),xd(x
4/16/202
0

k-Nearest-Neighbor Algorithm
4/16/202
0

+
+
+
+
-
-
-
-
-
-
e1
1-nn:1-nn: q1 is positive
5-nn: q1 is classified as negative
q1
Classification & Decision Boundaries
4/16/202
0
10

4/16/202
0

k-Nearest-Neighbor Algorithm
4/16/202
0

Distance Weighted Nearest-Neighbor Algorithm
4/16/202
0
13

Advantages of the NN Algorithm
• The NN algorithm can estimate complex target classes
locally and differently for each new instance to be
classified;
• The NN algorithm provides good generalization accuracy
on many domains
• The NN algorithm learns very quickly;
• The NN algorithm is robust to noisy training data;
• The NN algorithm is intuitive and easy to understand
which facilitates implementation and modification.
4/16/202
0

4/16/202
0
Pavithra T, Dept of ECE, GSKSJTI 15

Disadvantages of the NN Algorithm
• The NN algorithm has large storage requirements because it
has to store all the data
• The NN algorithm is slow during instance because all the
training instances have to be visited
• The accuracy of the NN algorithm degrades with increase of
noise in the training data
• The accuracy of the NN algorithm degrades with increase of
irrelevant attributes
4/16/202
0

4/16/202
0
Pavithra T, Dept of ECE, GSKSJTI 17

Remarks
 Highly effective inductive inference method for many
practical problems provided a sufficiently large set of
training examples
 Inductive bias of k-nearest neighbours assumption
that the classification of xq will be similar to the
classification of other instances that are nearby in the
Euclidean Distance
 Referred to as Curse of dimensionality
 Solutions to this problem:
 More relevant attributes can be stretched over the
axis and least relevant attributes can be shortened
over the axis
 attributes can be weighted differently and
eliminate least relevant attributes from instance
space 4/16/202
0
18

A note on terminology:
Regression means approximating a real valued target
function
Residual is the error ˆ f(x)−f(x) in approximating the target
function
Kernel function is the function of distance that is used to
determine the weight of each training example.
In other words, the kernel function is the function K such that
wi=K(d(xi,xq))
4/16/202
0

Locally Weighted Linear Regression
• It is a generalization of NN approach
• Why local?
• because function is approximated using based on the data
near the query point
• Why weighted ?
• Methods like gradient descent can be used to calculate the
coefficients w0, w1, ..., wn to minimize the error in fitting
such linear functions
• Why linear?
• Target function is approximated using a linear function ˆ
f(x)=w0+w1a1(x)+...+wnan(x)
• Why regression ?
• Approximating a real valued target function
• ANNs require a global approximation to the target function but
here, just a local approximation is needed
• Therefore the error function has to be redefined
4/16/202
0

Possibilities to redeﬁne the error criterion E
1.Minimize the squared error over just the k nearest
neighbours
E1(xq)≡(1/2) Σ x ∈ k nearest neighbours (f(x)−ˆ f(x))2
2.Minimize the squared error over the entire set D,
while weighting the error of each training example by
some decreasing function K of its distance from xq
E2(xq)≡ (½)Σ x ∈ D (f(x)−ˆ f(x))2·K(d(xq, x))
3.Combine 1 and 2
E3(xq)≡(1/2)Σ x∈k nearest neighbours(f(x)−ˆf(x))2·K(d(xq,x))
4/16/202
0

Choice of the error criterion
 E2 is the most efficient criterion:
 because it allows every training example to have impact
on the classification of xq
 However, computational effort grows with the number of
training examples
 E3 is a good approximation to E2 with constant effort
 Rederiving the gradient descent rule,
 ∆wj =η Σ x ∈ k nearest neighbours K(d(xq, x)) (f(x)−ˆ f(x)) aj
Remarks on locally weighted linear regression:
 In most cases, constant, linear or quadratic functions are
used for target functions
 Because costs for fitting more complex functions are
prohibitively high
 Simple approximations are good enough over a
sufficiently small subregion of instance space
4/16/202
0
22

RADIAL BASIS FUNCTIONS
4/16/202
0
23

 It is common to choose each function Ku(d(xu,x)) to
be a Gaussian function centred at xu with some
variance σ2
 Ku(d(xu,x))= e(1/2σ2)d2(xu,x)
 The function of ˆ f(x) can be viewed as describing a
two-layer network
1. layer1 consists of units computes the values of
various Ku (d(xu,x)) values
2. layer2 computes a linear combination of the
above results
4/16/202
0

CASE BASED REASONING
 3 imp properties of NN and Linear regression
1. Lazy learners
2. new query is classified by analyzing a similar instance
3. Instances are represented as real valued points on a
n dimensional space
• CBR based on first 2 principles
• Instances are represented using symbols
• Example:
i. CADET system uses CBR to assist design of simple mechanical
device like water faucets
ii. Library : 75 designs and design fragments in memory
iii. Instance is stored by describing its structure and qualitative design
iv. New design problem is presented by specifying the desired function
and requesting for corresponding structure4/16/202
0

A STORED CASE AND A NEW PROBLEM
+ indicates variable increases at the arrow head with variable at
its tail end
- indicates variable decreases at the arrow head with variable at
its tail end
4/16/202
0

Generic Properties of CBR
(Distinguishable from NN method)
 Instances represented by rich symbolic descriptions
 Multiple cases may be combined to form solution to
new problem
 There may be tight coupling between case retrieval,
knowledge based reasoning and problem solving
 Summary:
 CBR is a instance based learning method in which instances
are rich relational descriptions and in which retrieval and
combination of cases to current query may rely on
knowledge based reasoning and search intensive problem
solving methods.
4/16/202
0

Instance Based Learning in Machine Learning

More Related Content

What's hot

Similar to Instance Based Learning in Machine Learning

Recently uploaded

In this document

Instance Based Learning in Machine Learning