Lecture05_Version Space Algorithm Part1.pdf

‫ا‬ ِ
‫ن‬
ٰ
‫م‬
ْ
‫ح‬
َّ
‫الر‬ ِ
‫الله‬ ِ
‫م‬
ْ
‫س‬ِ‫ب‬
ِ
‫م‬
ْ
‫ي‬ ِ
‫ح‬
َّ
‫لر‬

Machine Learning
Dr. Rao Muhammad Adeel Nawab
Slides Credits:
Lecture 04: Version Space Algorithm
Dr. Allah Bux Sargana
Edited By:

Recap – FIND-S Algorithm
Strengths – Returns a model (h), which
can be used to make predictions on
unseen data

Weaknesses
Only works on error free data
However, Real-world Data is noisy
Works on assumption that Target Function f is present in
the Hypothesis Space (H)
However, we may/may not find the Target Function f in
the hypothesis Space (H) and this may/may not be known

Weaknesses
Only returns one hypothesis which best fits the Training
Data
However, there can be multiple hypothesis, which best
fits the Training Data

Main Problem in FIND-S Algorithm
Only returns one hypothesis which best fits the
Training Data
However, there can be multiple hypothesis,
which best fit the Training Data
Problem
Proposed Solution – Version Space

Version Space (VSH,D) contains
set of all hypothesis consistent
with the Training Examples
Version Space
(VSH,D)
Version Space vs Version Space Algorithm
Version Space Algorithm computes
the set of all hypothesis consistent
with the Training Examples
Version Space
Algorithm

In this lecture, we will discuss one Version Space
Algorithm
Candidate Elimination Algorithm
Version Space Algorithms

In this lecture, we will use the same Gender
Identification Problem
Gender Identification Problem

Attribute – Value Pair
Representation of Hypothesis (h)
Conjunction of Constraints on Input Attributes
Representation of Training Example (d)
Representation of Training Example (d) and Hypothesis (h)

Sample Data – Vector Representation
Vector Representation of Examples
x1 = < Short, Light, Short, Yes, Yes, Half > +
x2 = < Short, Light, Long, Yes, Yes, Half > +
x3 = < Tall, Heavy, Long, Yes, Yes, Full > -
x4 = < Short, Light, Long, Yes, No, Full > +
x5 = < Short, Light, Short, Yes, Yes, Half > -
x6 = < Tall, Light, Short, No, Yes, Full > -

Training Data – Vector Representation
Vector Representation of Training Examples

Testing Data – Vector Representation
Vector Representation of Test Examples
x5 = < Short, Light, Short, Yes, Yes, Half > -
x6 = < Tall, Light, Short, No, Yes, Full > -

Candidate Elimination
Algorithm

Limitation of List Then Eliminate Algorithm
Proposed Solution
Represent Hypothesis Space (H) as
General to Specific Ordering of
Hypothesis

Representation of (H) in Candidate Elimination Algo.
In Candidate Elimination Algo. (H) is represented as
Specific Boundary = Most Specific Hypothesis
Remaining Hypothesis Lie Here
General Boundary = Most General Hypothesis
General to Specific Ordering of Hypothesis

Representation of (H) in Candidate Elimination Algo.
{ < ∅, ∅, ∅, ∅, ∅, ∅ >}
{ < ? , ? , ? , ? , ? , ? >}
{ < ? , Short, ? , Yes, ? , ? >}
{ < Short, ? , Light, ? , ? , ? >} { < ? , ? , ? , ? , No, No>}
Specific Boundary
(Most Specific Hypothesis)
General Boundary
(Most Generic Hypothesis)
Remaining
Hypothesis
General to Specific Ordering of Hypothesis

Machine Learning Cycle
Training
Phase
Testing
Phase
Feedback
Phase
Application
Phase
Build the Model
using Training Data
Evaluate the Performance
of Model using Testing Data
Deploy the Model in Real-
world, to make Prediction
on Real-time unseen Data
Take Feedback from the
Users and Domain Experts
to Improve the Model

Training Phase – Candidate Elimination Algo.
G  Maximally General Hypothesis in H
S  Maximally Specific Hypothesis in H
For each Training Example d = < x, c(x) > do :
If d is a positive example
Remove from G any hypothesis that is inconsistent with d
For each hypothesis s in S that is not consistent with d
Remove s from S.
Add to S all minimal generalizations h of s such that
h consistent with d
Some member of G is more general than h
Remove from S any hypothesis that is more general than
another hypothesis in S

Training Phase – Candidate Elimination Algo.
If d is a negative example
Remove from S any hypothesis that is inconsistent with d
For each hypothesis g in G that is not consistent with d
Remove g from G.
Add to G all minimal specializations h of g such that
h consistent with d
Some member of S is more specific than h
Remove from G any hypothesis that is less general than
another hypothesis in G

Training Phase
Representing Vector Space (VSH,D)
The General Boundary G of Version Space VSH,D is the set
of maximally general members (or hypotheses)
The Specific Boundary S of Version Space VSH,D is the set of
maximally specific members (or hypotheses)
Every member (or hypotheses) of the Version Space VSH,D
lies between these boundaries

Training Phase
Initial Boundaries Remarks
S: { <∅ , ∅ , ∅ , ∅ , ∅ , ∅> }
G: { <?, ?, ?, ?, ?, ?> }
For x1=< Short, Light, Short, Yes, Yes, Half > +
S: { <∅ , ∅ , ∅ , ∅ , ∅ , ∅> }
G: { <?, ?, ?, ?, ?, ?> }
{ < ∅ , ∅ , ∅ , ∅ , ∅ , ∅ > } from S
boundary is removed because it is
not consistent with x1
Updated Boundaries Remarks
S: { <Short, Light, Short, Yes, Yes, Half> }
G: { <?, ?, ?, ?, ?, ?> }
Minimal generalization (h) is
added to S, which is consistent
with Training Example x1

Training Phase
Current Boundaries Remarks
G: { <?, ?, ?, ?, ?, ?> }
For x2 = <Short, Light, Long, Yes, Yes, Half> +
G: { <?, ?, ?, ?, ?, ?> }
h: <Short, Light, Short, Yes, Yes,
Half> from S boundary is removed
because it is not consistent with
Example x2
S: { <Short, Light, ?, Yes, Yes, Half> }
G: { <?, ?, ?, ?, ?, ?> }
with Training Example x1 and x2

Training Phase
G: { <?, ?, ?, ?, ?, ?> }
For x3 = <Tall, Heavy, Long, Yes, Yes, Full> -
G: { <?, ?, ?, ?, ?, ?> }
h: <?, ?, ?, ?, ?, ?> from G boundary
is removed because it is not
consistent with Training Example x3
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>,
<?, ?, ?, ?, ?, Half> }
added to G, which is consistent
with Training Example x1,x2 and x3

Training Phase
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, <?,
?, ?, ?, ?, Half> }
For x4 = <Short, Light, Long, Yes, No, Full> +
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, <?,
?, ?, ?, ?, Half> }
h: <Short, Light, ?, Yes, Yes, Half>
from S boundary is removed because
it is not consistent with Example x4
S: { <Short, Light, ?, Yes, ?, ?> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, ,
<?, ?, ?, ?, ?, Half> }
with Training Examples x1,x2,x3,x4

Training Phase
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?,
?>, <?, ?, ?, ?, ?, Half> }
h: { <?, ?, ?, ?, ?, Half> from G
boundary is removed because
it is not consistent with
Example x4
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?,
?> }
Minimal generalization (h) and
Minimal specialization (h) is
added to G, which are
consistent with Training
Examples x1,x2,x3,x4

Training Phase
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?,
?> }
Minimal generalization (h) and
Minimal specialization (h) is
added to G, which are
consistent with Training
Examples x1,x2,x3,x4
Adding Intermediate Members (Hypothesis)
Intermediate: {<Short, Light, ?, ?, ?, ?>, <Short, ?, ?, Yes, ?, ?>, <?, Light, ?,
Yes, ?, ?> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?> }

Training Phase
After observing all the Training Examples, the Candidate
Elimination Algorithm will
Output Version Space (Comprising of 6 Hypothesis)
VS(H,D)
<Short, Light, ?, Yes, ?, ?>
<Short, ?, ?, Yes, ?, ?>
<Short, Light, ?, ?, ?, ?>
<?, Light, ?, Yes, ?, ?>
<Short, ?, ?, ?, ?, ?>
<?, Light, ?, ?, ?, ?>

Note that all 6 Hypothesis in the
Version Space (VSH,D) are an
approximation of the Target Function f
Training Phase

Training Phase
Training Data

Training Phase
6 Models in the Version Space (VSH,D)
<Short, Light, ?, Yes, ?, ?>
<Short, ?, ?, Yes, ?, ?>
<Short, Light, ?, ?, ?, ?>
<?, Light, ?, Yes, ?, ?>
<Short, ?, ?, ?, ?, ?>
<?, Light, ?, ?, ?, ?>

Training Phase
Models - in the Form of Rules
If (Height = Short AND Weight = Light AND Hair Length = ?
AND Head Covered = Yes AND Wearing Chain = ? AND Shirt
Sleeves = ?)
THEN Gender = 1 (Female)
OTHERWISE Gender = 0 (Male)

Training Phase
If (Height = ? AND Weight = Light AND Hair Length = ? AND
Head Covered = ? AND Wearing Chain = ? AND Shirt Sleeves
= ?)
If (Height = Short AND Weight = ? AND Hair Length = ? AND
Head Covered = Yes AND Wearing Chain = ? AND Shirt
Sleeves = ?)

Training Phase
If (Height = Short AND Weight = ? AND Hair Length = ? AND
= ?)
Head Covered = Yes AND Wearing Chain = ? AND Shirt
Sleeves = ?)

Training Phase
= ?)

Training Phase
In the next phase i.e. Testing Phase, we will
Evaluate the performance of the Model(s)

Testing Phase
Question ?
Answer
Evaluate the performance of the Model(s) on
unseen data (or Testing Data)
How Good Model(s) has Learned?

Evaluation Measures
Evaluation will be carried out using
Error Measure

Error
Error is defined as the proportion of
incorrectly classified Test instances
Definition
𝑬𝒓𝒓𝒐𝒓 =
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑰𝒏𝒄𝒐𝒓𝒓𝒆𝒄𝒕𝒍𝒚 𝑪𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒆𝒅 𝑻𝒆𝒔𝒕 𝑰𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔
𝑻𝒐𝒕𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑻𝒆𝒔𝒕 𝑰𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔
Accuracy = 1 - Error

Testing Phase
Two Approaches for Testing Phase
Single Model
Ensemble Model

Testing Phase
A single model is trained on the Training Data
and used to make predictions on the Test Data
For Candidate Elimination Algo., we can
Randomly select a Model (or h) from the VSH,D
and use it to make predictions on the Test Data
Single Model

Testing Phase
Strengths
It is computationally fast and requires less
time to make predictions compared to
Ensemble Model
Weaknesses
Error is likely to be high compared to Ensemble
Model

Testing Phase
An Ensemble Model works by training different
Models on the same Training Data and using each
Model to individually make predictions on the Test Data
For Candidate Elimination Algo., we can
Select all 6 Models (or h) from the VSH,D (Ensemble
Model) and use them to make predictions on the
Test Data
Ensemble Model

Testing Phase
Strengths
Error is likely to be low compared to Single
Model
Weaknesses
It is computationally expensive and requires
more time to make predictions compared to
Single Model

Problem
Possible Solution – Use Voting Approach
Testing Phase
How to combine predictions of
different Models to make a Final
Prediction

Voting Approach
Voting is one of the simplest approaches to
combine predictions from different Models
Model Weight
In Voting Approach, we may assign
Same weight to all Models
Different weights to all Models
Voting Approach

Voting Approach
Steps - How Voting Approach Works?
Make individual predictions using different Models
Combine predictions of individual Models
Final Prediction of x will be the class which has the majority
vote
Given a Test Instance (x)

Example 01 – Voting Approach
Assumption
All Models have the same weight i.e. 1
Consider a Binary Classification Problem
Class 01 = Yes
Class 02 = No

Five different Models are trained on same Training Data
Model Weight
Model 1 1
Model 2 1
Model 3 1
Model 4 1
Model 5 1
What will be the Final Prediction of Text Instance (x)
using Ensemble Model ?

Using Voting Approach to Make Predictions on (x)
Model Prediction
Model 1 Yes
Model 2 No
Model 3 No
Model 4 Yes
Model 5 Yes

No. of Votes for Yes Class = 1+1+1 = 3
No. of Votes for No Class = 1+1 = 2
Final Prediction of (x) will be the class which has the
majority vote
Majority Vote = Yes
Final Prediction of (x)
Yes

Note – For Binary Classification Problem with
same weight
use odd number of Models i.e. 3, 5, 7, 11
etc.

Assumption
All Models have different weights
Consider a Binary Classification Problem
Class 01 = Yes
Class 02 = No

Five different Models are trained on same Training Data
Model Weight
Model 1 0.2
Model 2 1.0
Model 3 0.8
Model 4 0.6
Model 5 0.4
What will be the Final Prediction of Text Instance (x)
using Ensemble Model ?

Using Voting Approach to Make Predictions on (x)
Model Prediction
Model 1 Yes
Model 2 No
Model 3 No
Model 4 Yes
Model 5 Yes

No. of Weighted Votes for Yes Class = 0.2+0.6+0.4 = 1.2
No. of Weighted Votes for No Class = 1.0 + 0.8 = 1.8
Final Prediction of (x) will be the class which has the
majority vote
Majority Vote = No
Final Prediction of (x)
No

Effects of Weight on Models in Making Final Predictions
When the Models had same Weights
Final Prediction was Yes
When the Models had different Weights
Final Prediction was No

Conclusion
Model Weight plays an important role
in making the Final Prediction
Effects of Weight on Models in Making Final Predictions

Voting
Voting vs Voting Classifier
Voting is one of the simplest methods to combine
predictions from multiple ML Algorithms
Voting Classifier
Voting Classifier is not an actual Classifier (ML Algos.) but
a wrapper for a set of different ML Algos., which are
trained and tested on the same Data
A Voting Classifier Combines Predictions of Individual ML
Algos. to make Final Predictions on unseen Data

Lecture05_Version Space Algorithm Part1.pdf

More Related Content

Similar to Lecture05_Version Space Algorithm Part1.pdf

More from DrMTayyabChaudhry1

Recently uploaded

Lecture05_Version Space Algorithm Part1.pdf