‫ا‬ ِ
‫ن‬
ٰ
‫م‬
ْ
‫ح‬
َّ
‫الر‬ ِ
‫الله‬ ِ
‫م‬
ْ
‫س‬ِ‫ب‬
ِ
‫م‬
ْ
‫ي‬ ِ
‫ح‬
َّ
‫لر‬
Machine Learning
Dr. Rao Muhammad Adeel Nawab
Slides Credits:
Lecture 04: Version Space Algorithm
Dr. Allah Bux Sargana
Edited By:
Version Space
Recap – FIND-S Algorithm
Strengths – Returns a model (h), which
can be used to make predictions on
unseen data
Weaknesses
Only works on error free data
However, Real-world Data is noisy
Works on assumption that Target Function f is present in
the Hypothesis Space (H)
Recap – FIND-S Algorithm
However, we may/may not find the Target Function f in
the hypothesis Space (H) and this may/may not be known
Recap – FIND-S Algorithm
Weaknesses
Only returns one hypothesis which best fits the Training
Data
However, there can be multiple hypothesis, which best
fits the Training Data
Main Problem in FIND-S Algorithm
Only returns one hypothesis which best fits the
Training Data
However, there can be multiple hypothesis,
which best fit the Training Data
Problem
Proposed Solution – Version Space
Version Space (VSH,D) contains
set of all hypothesis consistent
with the Training Examples
Version Space
(VSH,D)
Version Space vs Version Space Algorithm
Version Space Algorithm computes
the set of all hypothesis consistent
with the Training Examples
Version Space
Algorithm
In this lecture, we will discuss one Version Space
Algorithm
Candidate Elimination Algorithm
Version Space Algorithms
In this lecture, we will use the same Gender
Identification Problem
Gender Identification Problem
Attribute – Value Pair
Representation of Hypothesis (h)
Conjunction of Constraints on Input Attributes
Representation of Training Example (d)
Representation of Training Example (d) and Hypothesis (h)
Sample Data – Vector Representation
Vector Representation of Examples
x1 = < Short, Light, Short, Yes, Yes, Half > +
x2 = < Short, Light, Long, Yes, Yes, Half > +
x3 = < Tall, Heavy, Long, Yes, Yes, Full > -
x4 = < Short, Light, Long, Yes, No, Full > +
x5 = < Short, Light, Short, Yes, Yes, Half > -
x6 = < Tall, Light, Short, No, Yes, Full > -
Training Data – Vector Representation
Vector Representation of Training Examples
x1 = < Short, Light, Short, Yes, Yes, Half > +
x2 = < Short, Light, Long, Yes, Yes, Half > +
x3 = < Tall, Heavy, Long, Yes, Yes, Full > -
x4 = < Short, Light, Long, Yes, No, Full > +
Testing Data – Vector Representation
Vector Representation of Test Examples
x5 = < Short, Light, Short, Yes, Yes, Half > -
x6 = < Tall, Light, Short, No, Yes, Full > -
Candidate Elimination
Algorithm
Limitation of List Then Eliminate Algorithm
Proposed Solution
Represent Hypothesis Space (H) as
General to Specific Ordering of
Hypothesis
Representation of (H) in Candidate Elimination Algo.
In Candidate Elimination Algo. (H) is represented as
Specific Boundary = Most Specific Hypothesis
Remaining Hypothesis Lie Here
General Boundary = Most General Hypothesis
General to Specific Ordering of Hypothesis
Representation of (H) in Candidate Elimination Algo.
{ < ∅, ∅, ∅, ∅, ∅, ∅ >}
{ < ? , ? , ? , ? , ? , ? >}
{ < ? , Short, ? , Yes, ? , ? >}
{ < Short, ? , Light, ? , ? , ? >} { < ? , ? , ? , ? , No, No>}
Specific Boundary
(Most Specific Hypothesis)
General Boundary
(Most Generic Hypothesis)
Remaining
Hypothesis
General to Specific Ordering of Hypothesis
Machine Learning Cycle
Training
Phase
Testing
Phase
Feedback
Phase
Application
Phase
Build the Model
using Training Data
Evaluate the Performance
of Model using Testing Data
Deploy the Model in Real-
world, to make Prediction
on Real-time unseen Data
Take Feedback from the
Users and Domain Experts
to Improve the Model
Training Phase – Candidate Elimination Algo.
G  Maximally General Hypothesis in H
S  Maximally Specific Hypothesis in H
For each Training Example d = < x, c(x) > do :
If d is a positive example
Remove from G any hypothesis that is inconsistent with d
For each hypothesis s in S that is not consistent with d
Remove s from S.
Add to S all minimal generalizations h of s such that
h consistent with d
Some member of G is more general than h
Remove from S any hypothesis that is more general than
another hypothesis in S
Training Phase – Candidate Elimination Algo.
If d is a negative example
Remove from S any hypothesis that is inconsistent with d
For each hypothesis g in G that is not consistent with d
Remove g from G.
Add to G all minimal specializations h of g such that
h consistent with d
Some member of S is more specific than h
Remove from G any hypothesis that is less general than
another hypothesis in G
Training Phase
Representing Vector Space (VSH,D)
The General Boundary G of Version Space VSH,D is the set
of maximally general members (or hypotheses)
The Specific Boundary S of Version Space VSH,D is the set of
maximally specific members (or hypotheses)
Every member (or hypotheses) of the Version Space VSH,D
lies between these boundaries
Training Phase
Initial Boundaries Remarks
S: { <∅ , ∅ , ∅ , ∅ , ∅ , ∅> }
G: { <?, ?, ?, ?, ?, ?> }
For x1=< Short, Light, Short, Yes, Yes, Half > +
S: { <∅ , ∅ , ∅ , ∅ , ∅ , ∅> }
G: { <?, ?, ?, ?, ?, ?> }
{ < ∅ , ∅ , ∅ , ∅ , ∅ , ∅ > } from S
boundary is removed because it is
not consistent with x1
Updated Boundaries Remarks
S: { <Short, Light, Short, Yes, Yes, Half> }
G: { <?, ?, ?, ?, ?, ?> }
Minimal generalization (h) is
added to S, which is consistent
with Training Example x1
Training Phase
Current Boundaries Remarks
S: { <Short, Light, Short, Yes, Yes, Half> }
G: { <?, ?, ?, ?, ?, ?> }
For x2 = <Short, Light, Long, Yes, Yes, Half> +
S: { <Short, Light, Short, Yes, Yes, Half> }
G: { <?, ?, ?, ?, ?, ?> }
h: <Short, Light, Short, Yes, Yes,
Half> from S boundary is removed
because it is not consistent with
Example x2
Updated Boundaries Remarks
S: { <Short, Light, ?, Yes, Yes, Half> }
G: { <?, ?, ?, ?, ?, ?> }
Minimal generalization (h) is
added to S, which is consistent
with Training Example x1 and x2
Training Phase
Current Boundaries Remarks
S: { <Short, Light, ?, Yes, Yes, Half> }
G: { <?, ?, ?, ?, ?, ?> }
For x3 = <Tall, Heavy, Long, Yes, Yes, Full> -
S: { <Short, Light, ?, Yes, Yes, Half> }
G: { <?, ?, ?, ?, ?, ?> }
h: <?, ?, ?, ?, ?, ?> from G boundary
is removed because it is not
consistent with Training Example x3
Updated Boundaries Remarks
S: { <Short, Light, ?, Yes, Yes, Half> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>,
<?, ?, ?, ?, ?, Half> }
Minimal generalization (h) is
added to G, which is consistent
with Training Example x1,x2 and x3
Training Phase
Current Boundaries Remarks
S: { <Short, Light, ?, Yes, Yes, Half> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, <?,
?, ?, ?, ?, Half> }
For x4 = <Short, Light, Long, Yes, No, Full> +
S: { <Short, Light, ?, Yes, Yes, Half> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, <?,
?, ?, ?, ?, Half> }
h: <Short, Light, ?, Yes, Yes, Half>
from S boundary is removed because
it is not consistent with Example x4
Updated Boundaries Remarks
S: { <Short, Light, ?, Yes, ?, ?> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, ,
<?, ?, ?, ?, ?, Half> }
Minimal generalization (h) is
added to S, which is consistent
with Training Examples x1,x2,x3,x4
Training Phase
Updated Boundaries Remarks
S: { <Short, Light, ?, Yes, ?, ?> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?,
?>, <?, ?, ?, ?, ?, Half> }
h: { <?, ?, ?, ?, ?, Half> from G
boundary is removed because
it is not consistent with
Example x4
Updated Boundaries Remarks
S: { <Short, Light, ?, Yes, ?, ?> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?,
?> }
Minimal generalization (h) and
Minimal specialization (h) is
added to G, which are
consistent with Training
Examples x1,x2,x3,x4
Training Phase
Updated Boundaries Remarks
S: { <Short, Light, ?, Yes, ?, ?> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?,
?> }
Minimal generalization (h) and
Minimal specialization (h) is
added to G, which are
consistent with Training
Examples x1,x2,x3,x4
Adding Intermediate Members (Hypothesis)
S: { <Short, Light, ?, Yes, ?, ?> }
Intermediate: {<Short, Light, ?, ?, ?, ?>, <Short, ?, ?, Yes, ?, ?>, <?, Light, ?,
Yes, ?, ?> }
G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?> }
Training Phase
After observing all the Training Examples, the Candidate
Elimination Algorithm will
Output Version Space (Comprising of 6 Hypothesis)
VS(H,D)
<Short, Light, ?, Yes, ?, ?>
<Short, ?, ?, Yes, ?, ?>
<Short, Light, ?, ?, ?, ?>
<?, Light, ?, Yes, ?, ?>
<Short, ?, ?, ?, ?, ?>
<?, Light, ?, ?, ?, ?>
Note that all 6 Hypothesis in the
Version Space (VSH,D) are an
approximation of the Target Function f
Training Phase
Training Phase
Training Data
x1 = < Short, Light, Short, Yes, Yes, Half > +
x2 = < Short, Light, Long, Yes, Yes, Half > +
x3 = < Tall, Heavy, Long, Yes, Yes, Full > -
x4 = < Short, Light, Long, Yes, No, Full > +
Training Phase
6 Models in the Version Space (VSH,D)
<Short, Light, ?, Yes, ?, ?>
<Short, ?, ?, Yes, ?, ?>
<Short, Light, ?, ?, ?, ?>
<?, Light, ?, Yes, ?, ?>
<Short, ?, ?, ?, ?, ?>
<?, Light, ?, ?, ?, ?>
Training Phase
Models - in the Form of Rules
If (Height = Short AND Weight = Light AND Hair Length = ?
AND Head Covered = Yes AND Wearing Chain = ? AND Shirt
Sleeves = ?)
THEN Gender = 1 (Female)
OTHERWISE Gender = 0 (Male)
Training Phase
If (Height = ? AND Weight = Light AND Hair Length = ? AND
Head Covered = ? AND Wearing Chain = ? AND Shirt Sleeves
= ?)
THEN Gender = 1 (Female)
OTHERWISE Gender = 0 (Male)
If (Height = Short AND Weight = ? AND Hair Length = ? AND
Head Covered = Yes AND Wearing Chain = ? AND Shirt
Sleeves = ?)
THEN Gender = 1 (Female)
OTHERWISE Gender = 0 (Male)
Training Phase
If (Height = Short AND Weight = ? AND Hair Length = ? AND
Head Covered = ? AND Wearing Chain = ? AND Shirt Sleeves
= ?)
THEN Gender = 1 (Female)
OTHERWISE Gender = 0 (Male)
If (Height = ? AND Weight = Light AND Hair Length = ? AND
Head Covered = Yes AND Wearing Chain = ? AND Shirt
Sleeves = ?)
THEN Gender = 1 (Female)
OTHERWISE Gender = 0 (Male)
Training Phase
If (Height = ? AND Weight = Light AND Hair Length = ? AND
Head Covered = ? AND Wearing Chain = ? AND Shirt Sleeves
= ?)
THEN Gender = 1 (Female)
OTHERWISE Gender = 0 (Male)
Training Phase
In the next phase i.e. Testing Phase, we will
Evaluate the performance of the Model(s)
Testing Phase
Question ?
Answer
Evaluate the performance of the Model(s) on
unseen data (or Testing Data)
How Good Model(s) has Learned?
Evaluation Measures
Evaluation will be carried out using
Error Measure
Error
Error is defined as the proportion of
incorrectly classified Test instances
Definition
𝑬𝒓𝒓𝒐𝒓 =
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑰𝒏𝒄𝒐𝒓𝒓𝒆𝒄𝒕𝒍𝒚 𝑪𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒆𝒅 𝑻𝒆𝒔𝒕 𝑰𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔
𝑻𝒐𝒕𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑻𝒆𝒔𝒕 𝑰𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔
Accuracy = 1 - Error
Testing Phase
Two Approaches for Testing Phase
Single Model
Ensemble Model
Testing Phase
A single model is trained on the Training Data
and used to make predictions on the Test Data
For Candidate Elimination Algo., we can
Randomly select a Model (or h) from the VSH,D
and use it to make predictions on the Test Data
Single Model
Testing Phase
Strengths
It is computationally fast and requires less
time to make predictions compared to
Ensemble Model
Weaknesses
Error is likely to be high compared to Ensemble
Model
Testing Phase
An Ensemble Model works by training different
Models on the same Training Data and using each
Model to individually make predictions on the Test Data
For Candidate Elimination Algo., we can
Select all 6 Models (or h) from the VSH,D (Ensemble
Model) and use them to make predictions on the
Test Data
Ensemble Model
Testing Phase
Strengths
Error is likely to be low compared to Single
Model
Weaknesses
It is computationally expensive and requires
more time to make predictions compared to
Single Model
Problem
Possible Solution – Use Voting Approach
Testing Phase
How to combine predictions of
different Models to make a Final
Prediction
Voting Approach
Voting is one of the simplest approaches to
combine predictions from different Models
Model Weight
In Voting Approach, we may assign
Same weight to all Models
Different weights to all Models
Voting Approach
Voting Approach
Steps - How Voting Approach Works?
Make individual predictions using different Models
Combine predictions of individual Models
Final Prediction of x will be the class which has the majority
vote
Given a Test Instance (x)
Example 01 – Voting Approach
Assumption
All Models have the same weight i.e. 1
Consider a Binary Classification Problem
Class 01 = Yes
Class 02 = No
Five different Models are trained on same Training Data
Model Weight
Model 1 1
Model 2 1
Model 3 1
Model 4 1
Model 5 1
Example 01 – Voting Approach
What will be the Final Prediction of Text Instance (x)
using Ensemble Model ?
Example 01 – Voting Approach
Using Voting Approach to Make Predictions on (x)
Make individual predictions using different Models
Model Prediction
Model 1 Yes
Model 2 No
Model 3 No
Model 4 Yes
Model 5 Yes
Example 01 – Voting Approach
Combine predictions of individual Models
No. of Votes for Yes Class = 1+1+1 = 3
No. of Votes for No Class = 1+1 = 2
Final Prediction of (x) will be the class which has the
majority vote
Majority Vote = Yes
Final Prediction of (x)
Yes
Example 01 – Voting Approach
Note – For Binary Classification Problem with
same weight
use odd number of Models i.e. 3, 5, 7, 11
etc.
Example 02 – Voting Approach
Assumption
All Models have different weights
Consider a Binary Classification Problem
Class 01 = Yes
Class 02 = No
Five different Models are trained on same Training Data
Model Weight
Model 1 0.2
Model 2 1.0
Model 3 0.8
Model 4 0.6
Model 5 0.4
Example 02 – Voting Approach
What will be the Final Prediction of Text Instance (x)
using Ensemble Model ?
Example 02 – Voting Approach
Using Voting Approach to Make Predictions on (x)
Make individual predictions using different Models
Model Prediction
Model 1 Yes
Model 2 No
Model 3 No
Model 4 Yes
Model 5 Yes
Example 02 – Voting Approach
Combine predictions of individual Models
No. of Weighted Votes for Yes Class = 0.2+0.6+0.4 = 1.2
No. of Weighted Votes for No Class = 1.0 + 0.8 = 1.8
Final Prediction of (x) will be the class which has the
majority vote
Majority Vote = No
Final Prediction of (x)
No
Effects of Weight on Models in Making Final Predictions
When the Models had same Weights
Final Prediction was Yes
When the Models had different Weights
Final Prediction was No
Conclusion
Model Weight plays an important role
in making the Final Prediction
Effects of Weight on Models in Making Final Predictions
Voting
Voting vs Voting Classifier
Voting is one of the simplest methods to combine
predictions from multiple ML Algorithms
Voting Classifier
Voting Classifier is not an actual Classifier (ML Algos.) but
a wrapper for a set of different ML Algos., which are
trained and tested on the same Data
A Voting Classifier Combines Predictions of Individual ML
Algos. to make Final Predictions on unseen Data

Lecture05_Version Space Algorithm Part1.pdf

  • 1.
    ‫ا‬ ِ ‫ن‬ ٰ ‫م‬ ْ ‫ح‬ َّ ‫الر‬ ِ ‫الله‬ِ ‫م‬ ْ ‫س‬ِ‫ب‬ ِ ‫م‬ ْ ‫ي‬ ِ ‫ح‬ َّ ‫لر‬
  • 2.
    Machine Learning Dr. RaoMuhammad Adeel Nawab Slides Credits: Lecture 04: Version Space Algorithm Dr. Allah Bux Sargana Edited By:
  • 3.
  • 4.
    Recap – FIND-SAlgorithm Strengths – Returns a model (h), which can be used to make predictions on unseen data
  • 5.
    Weaknesses Only works onerror free data However, Real-world Data is noisy Works on assumption that Target Function f is present in the Hypothesis Space (H) Recap – FIND-S Algorithm However, we may/may not find the Target Function f in the hypothesis Space (H) and this may/may not be known
  • 6.
    Recap – FIND-SAlgorithm Weaknesses Only returns one hypothesis which best fits the Training Data However, there can be multiple hypothesis, which best fits the Training Data
  • 7.
    Main Problem inFIND-S Algorithm Only returns one hypothesis which best fits the Training Data However, there can be multiple hypothesis, which best fit the Training Data Problem Proposed Solution – Version Space
  • 8.
    Version Space (VSH,D)contains set of all hypothesis consistent with the Training Examples Version Space (VSH,D) Version Space vs Version Space Algorithm Version Space Algorithm computes the set of all hypothesis consistent with the Training Examples Version Space Algorithm
  • 9.
    In this lecture,we will discuss one Version Space Algorithm Candidate Elimination Algorithm Version Space Algorithms
  • 10.
    In this lecture,we will use the same Gender Identification Problem Gender Identification Problem
  • 11.
    Attribute – ValuePair Representation of Hypothesis (h) Conjunction of Constraints on Input Attributes Representation of Training Example (d) Representation of Training Example (d) and Hypothesis (h)
  • 12.
    Sample Data –Vector Representation Vector Representation of Examples x1 = < Short, Light, Short, Yes, Yes, Half > + x2 = < Short, Light, Long, Yes, Yes, Half > + x3 = < Tall, Heavy, Long, Yes, Yes, Full > - x4 = < Short, Light, Long, Yes, No, Full > + x5 = < Short, Light, Short, Yes, Yes, Half > - x6 = < Tall, Light, Short, No, Yes, Full > -
  • 13.
    Training Data –Vector Representation Vector Representation of Training Examples x1 = < Short, Light, Short, Yes, Yes, Half > + x2 = < Short, Light, Long, Yes, Yes, Half > + x3 = < Tall, Heavy, Long, Yes, Yes, Full > - x4 = < Short, Light, Long, Yes, No, Full > +
  • 14.
    Testing Data –Vector Representation Vector Representation of Test Examples x5 = < Short, Light, Short, Yes, Yes, Half > - x6 = < Tall, Light, Short, No, Yes, Full > -
  • 15.
  • 16.
    Limitation of ListThen Eliminate Algorithm Proposed Solution Represent Hypothesis Space (H) as General to Specific Ordering of Hypothesis
  • 17.
    Representation of (H)in Candidate Elimination Algo. In Candidate Elimination Algo. (H) is represented as Specific Boundary = Most Specific Hypothesis Remaining Hypothesis Lie Here General Boundary = Most General Hypothesis General to Specific Ordering of Hypothesis
  • 18.
    Representation of (H)in Candidate Elimination Algo. { < ∅, ∅, ∅, ∅, ∅, ∅ >} { < ? , ? , ? , ? , ? , ? >} { < ? , Short, ? , Yes, ? , ? >} { < Short, ? , Light, ? , ? , ? >} { < ? , ? , ? , ? , No, No>} Specific Boundary (Most Specific Hypothesis) General Boundary (Most Generic Hypothesis) Remaining Hypothesis General to Specific Ordering of Hypothesis
  • 19.
    Machine Learning Cycle Training Phase Testing Phase Feedback Phase Application Phase Buildthe Model using Training Data Evaluate the Performance of Model using Testing Data Deploy the Model in Real- world, to make Prediction on Real-time unseen Data Take Feedback from the Users and Domain Experts to Improve the Model
  • 20.
    Training Phase –Candidate Elimination Algo. G  Maximally General Hypothesis in H S  Maximally Specific Hypothesis in H For each Training Example d = < x, c(x) > do : If d is a positive example Remove from G any hypothesis that is inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S. Add to S all minimal generalizations h of s such that h consistent with d Some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in S
  • 21.
    Training Phase –Candidate Elimination Algo. If d is a negative example Remove from S any hypothesis that is inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G. Add to G all minimal specializations h of g such that h consistent with d Some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in G
  • 22.
    Training Phase Representing VectorSpace (VSH,D) The General Boundary G of Version Space VSH,D is the set of maximally general members (or hypotheses) The Specific Boundary S of Version Space VSH,D is the set of maximally specific members (or hypotheses) Every member (or hypotheses) of the Version Space VSH,D lies between these boundaries
  • 23.
    Training Phase Initial BoundariesRemarks S: { <∅ , ∅ , ∅ , ∅ , ∅ , ∅> } G: { <?, ?, ?, ?, ?, ?> } For x1=< Short, Light, Short, Yes, Yes, Half > + S: { <∅ , ∅ , ∅ , ∅ , ∅ , ∅> } G: { <?, ?, ?, ?, ?, ?> } { < ∅ , ∅ , ∅ , ∅ , ∅ , ∅ > } from S boundary is removed because it is not consistent with x1 Updated Boundaries Remarks S: { <Short, Light, Short, Yes, Yes, Half> } G: { <?, ?, ?, ?, ?, ?> } Minimal generalization (h) is added to S, which is consistent with Training Example x1
  • 24.
    Training Phase Current BoundariesRemarks S: { <Short, Light, Short, Yes, Yes, Half> } G: { <?, ?, ?, ?, ?, ?> } For x2 = <Short, Light, Long, Yes, Yes, Half> + S: { <Short, Light, Short, Yes, Yes, Half> } G: { <?, ?, ?, ?, ?, ?> } h: <Short, Light, Short, Yes, Yes, Half> from S boundary is removed because it is not consistent with Example x2 Updated Boundaries Remarks S: { <Short, Light, ?, Yes, Yes, Half> } G: { <?, ?, ?, ?, ?, ?> } Minimal generalization (h) is added to S, which is consistent with Training Example x1 and x2
  • 25.
    Training Phase Current BoundariesRemarks S: { <Short, Light, ?, Yes, Yes, Half> } G: { <?, ?, ?, ?, ?, ?> } For x3 = <Tall, Heavy, Long, Yes, Yes, Full> - S: { <Short, Light, ?, Yes, Yes, Half> } G: { <?, ?, ?, ?, ?, ?> } h: <?, ?, ?, ?, ?, ?> from G boundary is removed because it is not consistent with Training Example x3 Updated Boundaries Remarks S: { <Short, Light, ?, Yes, Yes, Half> } G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Half> } Minimal generalization (h) is added to G, which is consistent with Training Example x1,x2 and x3
  • 26.
    Training Phase Current BoundariesRemarks S: { <Short, Light, ?, Yes, Yes, Half> } G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Half> } For x4 = <Short, Light, Long, Yes, No, Full> + S: { <Short, Light, ?, Yes, Yes, Half> } G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Half> } h: <Short, Light, ?, Yes, Yes, Half> from S boundary is removed because it is not consistent with Example x4 Updated Boundaries Remarks S: { <Short, Light, ?, Yes, ?, ?> } G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, , <?, ?, ?, ?, ?, Half> } Minimal generalization (h) is added to S, which is consistent with Training Examples x1,x2,x3,x4
  • 27.
    Training Phase Updated BoundariesRemarks S: { <Short, Light, ?, Yes, ?, ?> } G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Half> } h: { <?, ?, ?, ?, ?, Half> from G boundary is removed because it is not consistent with Example x4 Updated Boundaries Remarks S: { <Short, Light, ?, Yes, ?, ?> } G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?> } Minimal generalization (h) and Minimal specialization (h) is added to G, which are consistent with Training Examples x1,x2,x3,x4
  • 28.
    Training Phase Updated BoundariesRemarks S: { <Short, Light, ?, Yes, ?, ?> } G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?> } Minimal generalization (h) and Minimal specialization (h) is added to G, which are consistent with Training Examples x1,x2,x3,x4 Adding Intermediate Members (Hypothesis) S: { <Short, Light, ?, Yes, ?, ?> } Intermediate: {<Short, Light, ?, ?, ?, ?>, <Short, ?, ?, Yes, ?, ?>, <?, Light, ?, Yes, ?, ?> } G: { <Short, ?, ?, ?, ?, ?>, <?, Light, ?, ?, ?, ?> }
  • 29.
    Training Phase After observingall the Training Examples, the Candidate Elimination Algorithm will Output Version Space (Comprising of 6 Hypothesis) VS(H,D) <Short, Light, ?, Yes, ?, ?> <Short, ?, ?, Yes, ?, ?> <Short, Light, ?, ?, ?, ?> <?, Light, ?, Yes, ?, ?> <Short, ?, ?, ?, ?, ?> <?, Light, ?, ?, ?, ?>
  • 30.
    Note that all6 Hypothesis in the Version Space (VSH,D) are an approximation of the Target Function f Training Phase
  • 31.
    Training Phase Training Data x1= < Short, Light, Short, Yes, Yes, Half > + x2 = < Short, Light, Long, Yes, Yes, Half > + x3 = < Tall, Heavy, Long, Yes, Yes, Full > - x4 = < Short, Light, Long, Yes, No, Full > +
  • 32.
    Training Phase 6 Modelsin the Version Space (VSH,D) <Short, Light, ?, Yes, ?, ?> <Short, ?, ?, Yes, ?, ?> <Short, Light, ?, ?, ?, ?> <?, Light, ?, Yes, ?, ?> <Short, ?, ?, ?, ?, ?> <?, Light, ?, ?, ?, ?>
  • 33.
    Training Phase Models -in the Form of Rules If (Height = Short AND Weight = Light AND Hair Length = ? AND Head Covered = Yes AND Wearing Chain = ? AND Shirt Sleeves = ?) THEN Gender = 1 (Female) OTHERWISE Gender = 0 (Male)
  • 34.
    Training Phase If (Height= ? AND Weight = Light AND Hair Length = ? AND Head Covered = ? AND Wearing Chain = ? AND Shirt Sleeves = ?) THEN Gender = 1 (Female) OTHERWISE Gender = 0 (Male) If (Height = Short AND Weight = ? AND Hair Length = ? AND Head Covered = Yes AND Wearing Chain = ? AND Shirt Sleeves = ?) THEN Gender = 1 (Female) OTHERWISE Gender = 0 (Male)
  • 35.
    Training Phase If (Height= Short AND Weight = ? AND Hair Length = ? AND Head Covered = ? AND Wearing Chain = ? AND Shirt Sleeves = ?) THEN Gender = 1 (Female) OTHERWISE Gender = 0 (Male) If (Height = ? AND Weight = Light AND Hair Length = ? AND Head Covered = Yes AND Wearing Chain = ? AND Shirt Sleeves = ?) THEN Gender = 1 (Female) OTHERWISE Gender = 0 (Male)
  • 36.
    Training Phase If (Height= ? AND Weight = Light AND Hair Length = ? AND Head Covered = ? AND Wearing Chain = ? AND Shirt Sleeves = ?) THEN Gender = 1 (Female) OTHERWISE Gender = 0 (Male)
  • 37.
    Training Phase In thenext phase i.e. Testing Phase, we will Evaluate the performance of the Model(s)
  • 38.
    Testing Phase Question ? Answer Evaluatethe performance of the Model(s) on unseen data (or Testing Data) How Good Model(s) has Learned?
  • 39.
    Evaluation Measures Evaluation willbe carried out using Error Measure
  • 40.
    Error Error is definedas the proportion of incorrectly classified Test instances Definition 𝑬𝒓𝒓𝒐𝒓 = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑰𝒏𝒄𝒐𝒓𝒓𝒆𝒄𝒕𝒍𝒚 𝑪𝒍𝒂𝒔𝒔𝒊𝒇𝒊𝒆𝒅 𝑻𝒆𝒔𝒕 𝑰𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔 𝑻𝒐𝒕𝒂𝒍 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝑻𝒆𝒔𝒕 𝑰𝒏𝒔𝒕𝒂𝒏𝒄𝒆𝒔 Accuracy = 1 - Error
  • 41.
    Testing Phase Two Approachesfor Testing Phase Single Model Ensemble Model
  • 42.
    Testing Phase A singlemodel is trained on the Training Data and used to make predictions on the Test Data For Candidate Elimination Algo., we can Randomly select a Model (or h) from the VSH,D and use it to make predictions on the Test Data Single Model
  • 43.
    Testing Phase Strengths It iscomputationally fast and requires less time to make predictions compared to Ensemble Model Weaknesses Error is likely to be high compared to Ensemble Model
  • 44.
    Testing Phase An EnsembleModel works by training different Models on the same Training Data and using each Model to individually make predictions on the Test Data For Candidate Elimination Algo., we can Select all 6 Models (or h) from the VSH,D (Ensemble Model) and use them to make predictions on the Test Data Ensemble Model
  • 45.
    Testing Phase Strengths Error islikely to be low compared to Single Model Weaknesses It is computationally expensive and requires more time to make predictions compared to Single Model
  • 46.
    Problem Possible Solution –Use Voting Approach Testing Phase How to combine predictions of different Models to make a Final Prediction
  • 47.
    Voting Approach Voting isone of the simplest approaches to combine predictions from different Models Model Weight In Voting Approach, we may assign Same weight to all Models Different weights to all Models Voting Approach
  • 48.
    Voting Approach Steps -How Voting Approach Works? Make individual predictions using different Models Combine predictions of individual Models Final Prediction of x will be the class which has the majority vote Given a Test Instance (x)
  • 49.
    Example 01 –Voting Approach Assumption All Models have the same weight i.e. 1 Consider a Binary Classification Problem Class 01 = Yes Class 02 = No
  • 50.
    Five different Modelsare trained on same Training Data Model Weight Model 1 1 Model 2 1 Model 3 1 Model 4 1 Model 5 1 Example 01 – Voting Approach What will be the Final Prediction of Text Instance (x) using Ensemble Model ?
  • 51.
    Example 01 –Voting Approach Using Voting Approach to Make Predictions on (x) Make individual predictions using different Models Model Prediction Model 1 Yes Model 2 No Model 3 No Model 4 Yes Model 5 Yes
  • 52.
    Example 01 –Voting Approach Combine predictions of individual Models No. of Votes for Yes Class = 1+1+1 = 3 No. of Votes for No Class = 1+1 = 2 Final Prediction of (x) will be the class which has the majority vote Majority Vote = Yes Final Prediction of (x) Yes
  • 53.
    Example 01 –Voting Approach Note – For Binary Classification Problem with same weight use odd number of Models i.e. 3, 5, 7, 11 etc.
  • 54.
    Example 02 –Voting Approach Assumption All Models have different weights Consider a Binary Classification Problem Class 01 = Yes Class 02 = No
  • 55.
    Five different Modelsare trained on same Training Data Model Weight Model 1 0.2 Model 2 1.0 Model 3 0.8 Model 4 0.6 Model 5 0.4 Example 02 – Voting Approach What will be the Final Prediction of Text Instance (x) using Ensemble Model ?
  • 56.
    Example 02 –Voting Approach Using Voting Approach to Make Predictions on (x) Make individual predictions using different Models Model Prediction Model 1 Yes Model 2 No Model 3 No Model 4 Yes Model 5 Yes
  • 57.
    Example 02 –Voting Approach Combine predictions of individual Models No. of Weighted Votes for Yes Class = 0.2+0.6+0.4 = 1.2 No. of Weighted Votes for No Class = 1.0 + 0.8 = 1.8 Final Prediction of (x) will be the class which has the majority vote Majority Vote = No Final Prediction of (x) No
  • 58.
    Effects of Weighton Models in Making Final Predictions When the Models had same Weights Final Prediction was Yes When the Models had different Weights Final Prediction was No
  • 59.
    Conclusion Model Weight playsan important role in making the Final Prediction Effects of Weight on Models in Making Final Predictions
  • 60.
    Voting Voting vs VotingClassifier Voting is one of the simplest methods to combine predictions from multiple ML Algorithms Voting Classifier Voting Classifier is not an actual Classifier (ML Algos.) but a wrapper for a set of different ML Algos., which are trained and tested on the same Data A Voting Classifier Combines Predictions of Individual ML Algos. to make Final Predictions on unseen Data