Supervised learning network

Department of Information Technology 1Soft Computing (ITC4256 )
Supervised learning network
Dr. C.V. Suresh Babu
Professor
Department of IT
Hindustan Institute of Science & Technology

Discussion Topics
• Supervised Learning Networks
- Perceptron networks
- Perceptron network- training algorithm
• Quiz at the end of session.

Introduction

Perceptron Networks
• Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is the basic operational
unit of artificial neural networks.

Perceptron Networks (Cont…)
• Perceptron thus has the following three basic elements:
- Links
- Adder
- Activation function
• Perceptron network can be trained for single output unit as well as multiple output units.

Perceptron Networks – Training Algorithm
Training Algorithm for Single Output Unit:
Step 1 − Initialize the following to start the training −
• Weights
• Bias
• Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and
the learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −
xi=si(i=1 to n)

Perceptron Networks – Training Algorithm (Cont…)
Step 5 − Now obtain the net input with the following relation −
n
yin = b + ∑ xi.wi
i
Here ‘b’ is bias and ‘n’ is the total number of input neurons.
Step 6 − Apply the following activation function to obtain the final output.
1 if yin > θ
f( yin ) = 0 if -θ < yin < θ
-1 if yin < -θ

Step 7 − Adjust the weight and bias as follows −
Case 1 − if y ≠ t then,
wi(new)=wi(old)+α txi
b(new)=b(old)+αt
Case 2 − if y = t then,
wi(new)=wi(old)
b(new)=b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which would happen when there is no
change in weight.

Training Algorithm for Multiple Output Units:
Architecture of perceptron for multiple output classes.

• Weights
• Bias
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −
xi=si(i=1 to n)

Step 5 − Now obtain the net input with the following relation −
n
yin = b + ∑ xi.wij
i
Step 6 − Apply the following activation function to obtain the final output for
each output unit j=1 to m,
1 if yin > θ
f( yin ) = 0 if -θ < yin < θ
-1 if yin < -θ

Step 7 − Adjust the weight and bias for x = 1 to n and j = 1 to m as follows −
Case 1 − if yj ≠ tj then,
wij(new) = wij(old) + α tjxi
bj(new) = bj(old) + α tj
Case 2 − if yj = tj then,
wij(new) = wij(old)
bj(new) = bj(old)
Step 8 − Test for the stopping condition, which will happen when there is no change in weight.

Quiz - Questions
1. What are the 3 basic elements of perceptron?
2. ---------- is the basic operational unit of artificial neural networks.
3. Perceptron network can be trained only for single output unit.
a) true b) false
4. What are the parameters to be initialized to start the training?
5. What will happen when there is no change in weight?

Quiz - Answers
1. What are the 3 basic elements of perceptron?
i. links ii. adder iii. activation function
2. ---------- is the basic operational unit of artificial neural networks.
Perceptron
3. Perceptron network can be trained only for single output unit.
b) false
4. Weights, bias and learning rate α
5. Stopping condition

Action Plan
• Supervised Learning Networks (Cont…)
- Introduction to adaptive linear neuron
- Adaptive linear neuron architecture
- Adaptive linear neuron training algorithm
- Introduction to multiple adaptive linear neuron
- Multiple adaptive linear neuron architecture
- Multiple adaptive linear neuron training algorithm
• Quiz at the end of session.

Adaptive Linear Neuron
• Adaline which stands for Adaptive Linear Neuron, is a network having a single linear unit.
• It was developed by Widrow and Hoff in 1960.

Adaptive Linear Neuron - Architecture
• The basic structure of Adaline is similar to perceptron having an extra feedback loop with the help of
which the actual output is compared with the desired/target output.
• After comparison on the basis of training algorithm, the weights and bias will be updated.

Adaptive Linear Neuron – Training Algorithm
• Weights
• Bias
For easy calculation and simplicity, weights and bias must be set equal to 0 and the learning rate must be
set equal to 1.
Step 3 − Continue step 4-6 for every bipolar training pair s:t.
Step 4 − Activate each input unit as follows:
xi=si(i = 1 to n)

Step 5 − Obtain the net input with the following relation −
n
yin = b + ∑ xi.wi
i
Step 6 − Apply the following activation function to obtain the final output.
1 if yin > 0
f( yin ) = -1 if yin < 0

Step 7 − Adjust the weight and bias as follows −
Case 1 − if y ≠ t then,
wi(new) = wi(old) + α(t – yin) xi
b(new) = b(old) + α(t – yin)
Case 2 − if y = t then,
wi(new) = wi(old)
b(new) = b(old)
(t – yin) is the computed error.
Step 8 − Test for the stopping condition, which would happen when there is no
change in weight or the highest weight change occurred during training is
smaller than the specified tolerance.

Multiple Adaptive Linear Neuron
• Madaline which stands for Multiple Adaptive Linear Neuron, is a network which consists of many
Adalines in parallel.
• It will have a single output unit.

Multiple Adaptive Linear Neuron - Architecture
• The architecture of Madaline consists of “n” neurons of the input layer, “m” neurons of the Adaline
layer, and 1 neuron of the Madaline layer.
• The Adaline layer can be considered as the hidden layer as it is between the input layer and the output
layer, i.e. the Madaline layer.

Multiple Adaptive Linear
Neuron – Training Algorithm
• Weights
• Bias
Step 3 − Continue step 4-6 for every bipolar training pair s:t.
Step 4 − Activate each input unit as follows:
xi=si(i = 1 to n)

Multiple Adaptive Linear Neuron – Training Algorithm
Step 5 − Obtain the net input at each hidden layer, i.e. the Adaline layer with
the following relation −
n
Qinj = bj + ∑ xi.wij j = 1 to m
i
Step 6 − Apply the following activation function to obtain the final output at
the Adaline and the Madaline layer -
1 if x > 0
f(x) = -1 if x < 0
Output at the hidden Adaline unit: Qj = f ( Qinj )
m
Final output of the network: y = f ( yin ) i.e. yinj = b0 + ∑ Qj vj
j=1

Step 7 − Calculate the error and adjust the weights as follows −
Case 1 − if y ≠ t and t = 1 then,
wij(new) = wij(old) + α(1 – Qinj) xi
bj(new) = bj(old) + α(1 – Qinj)
In this case, the weights would be updated on Qj where the net input is close to 0 because t = 1.
Case 2 − if y ≠ t and t = -1 then,
wik(new) = wik(old) + α(-1 – Qink) xi
bk(new) = bk(old) + α(-1 – Qink)
In this case, the weights would be updated on Qk where the net input is positive because t = -1.
Case 3 – if y = t then, there would be no change in weights.
Step 8 − Test for the stopping condition.

Quiz - Questions
1. Adaline which stands for Adaptive Linear Neuron, is a network having a --------
linear unit.
a) triple b) double c) single d) multiple
2. Madaline which stands for Multiple Adaptive Linear Neuron, is a network
which consists of many ---------- in parallel.
a) adalines b) neurons c) ports d) none
3. Madaline has a --------- output unit.
4. The weights and the --------- are adjustable.
5. For easy calculation and simplicity, weights and bias must be set equal to 1 and
a) true b) false

Quiz - Answers
1. Adaline which stands for Adaptive Linear Neuron, is a network having a --------
linear unit.
c) single
2. Madaline which stands for Multiple Adaptive Linear Neuron, is a network
which consists of many ---------- in parallel.
a) adalines
3. Madaline has a --------- output unit.
single
4. The weights and the --------- are adjustable.
bias
5. For easy calculation and simplicity, weights and bias must be set equal to 1 and
b) false

Action Plan
• Supervised Learning Networks (Cont…)
- Introduction to back propagation neural networks
- Back propagation network architecture
- Back propagation network training algorithm
- Generalized delta learning rule
• Quiz at the end of session

• Back Propagation Neural BPN is a
multilayer neural network
consisting of the input layer, at least
one hidden layer and output layer.
• As its name suggests, back
propagating will take place in this
network.
– The error which is calculated at the
output layer, by comparing the target
output and the actual output, will be
propagated back towards the input layer.

Back Propagation Neural Networks - Architecture
• As shown in the diagram, the architecture of BPN has three interconnected
layers having weights on them.
• The hidden layer as well as the output layer also has bias, whose weight is
always 1, on them.

Back Propagation Neural Networks – Training Algorithm
For training, BPN will use binary sigmoid activation function. The training of BPN will have the
following three phases.
- Phase 1: Feed Forward Phase
- Phase 2: Back Propagation of error
- Phase 3: Updating of weights
All these steps will be concluded in the algorithm as follows
- Weights
- Learning rate α
For easy calculation and simplicity, take some small random values.
Step 3 − Continue step 4-10 for every training pair.

Phase 1
Step 4 − Each input unit receives input signal xi and sends it to the hidden unit for all i = 1 to n
Step 5 − Calculate the net input at the hidden unit using the following relation-
n
Qinj = b0j + ∑ xi vij j = 1 to p
i=1
Here boj is the bias on hidden unit, vij is the weight on j unit of the hidden layer coming from i unit of the
input layer.
Now calculate the net output by applying the following activation function
Qj = f(Qinj)
Send these output signals of the hidden layer units to the output layer units.

Step 6 – Calculate the net input at the output layer unit using the following relation –
p
yink = bok + ∑ Qj wjk k = 1 to m
j = 1
Here bok is the bias on output unit, wjk is the weight on k unit of the output layer coming from j unit of the
hidden layer.
Calculate the net output by applying the following activation function
yk = f(yink)

Back Propagation Neural Networks – Training
Algorithm
Phase 2
Step 7 − Compute the error correcting term, in correspondence with the target
pattern received at each output unit, as follows −
δk = (tk−yk) f′ (yink)
On this basis, update the weight and bias as follows −
Δvjk=αδkQij
Δb0k=αδk
Then, send δk back to the hidden layer.

Step 8 − Now each hidden unit will be the sum of its delta inputs from the
output units.
m
δinj = ∑ δkwjk
k=1
Error term can be calculated as follows −
δj=δinjf′(Qinj)
On this basis, update the weight and bias as follows −
Δwij=αδjxi
Δb0j=αδj

Back Propagation Neural Networks – Training
Algorithm
Phase 3
Step 9 − Each output unit (ykk = 1 to m) updates the weight and bias as follows
vjk(new)=vjk(old)+Δvjk
b0k(new)=b0k(old)+Δb0k
Step 10 − Each output unit (zjj = 1 to p) updates the weight and bias as follows-
wij(new)=wij(old)+Δwij
b0j(new)=b0j(old)+Δb0j
Step 11 − Check for the stopping condition, which may be either the number of
epochs reached or the target output matches the actual output.

Generalized Delta Learning Rule
• Delta rule works only for the output layer.
• On the other hand, generalized delta rule, also called as back-propagation rule, is a
way of creating the desired values of the hidden layer.
Mathematical Formulation
For the activation function yk=f(yink) the derivation of net input on Hidden layer as well
as on output layer can be given by
yink=∑ ziwjk
i
And yinj=∑ xivij
i
Now the error which has to be minimized is
E=1/2 ∑ [tk−yk]2
k

Generalized Delta Learning Rule (Cont…)
By using the chain rule, we have
∂E / ∂wjk = ∂ / ∂wjk(1 / 2 ∑ [tk−yk]2)
k
= ∂ / ∂wjk ⟮1 / 2 [tk− t(yink)]2⟯
= −[tk − yk] ∂ / ∂wjk f(yink)
= −[tk−yk] f(yink) ∂ / ∂wjk(yink)
= −[tk−yk] f′(yink)zj
Now let us say δk = −[tk−yk] f′(yink)
The weights on connections to the hidden unit zj can be
given by −
∂E / ∂vij = −∑ δk ∂ / ∂vij(yink)
k

Generalized Delta Learning Rule (Cont…)
Putting the value of yink we will get the following
δj = −∑ δkwjkf′(zinj)
k
Weight updating can be done as follows −
For the output unit −
Δwjk = −α ∂E / ∂wjk
= α δk zj
For the hidden unit −
Δvij = −α ∂E / ∂vij
=α δj xi

Quiz - Questions
1. Back Propagation Neural BPN is a multilayer neural network consisting of the
input layer, at least one ----------- layer and output layer.
2. The hidden layer as well as the output layer also has bias, whose weight is
always 2, on them.
a) true b) false
3. What parameters has to be initialized to start the training?
4. For easy calculation and simplicity, take -------- values.
a) 2 b) 3 c) random d) none
5. Delta rule works only for the -------- layer.

Quiz - Answers
1. Back Propagation Neural BPN is a multilayer neural network consisting of the
input layer, at least one ----------- layer and output layer.
hidden
2. The hidden layer as well as the output layer also has bias, whose weight is
always 2, on them.
b) false
3. What parameters has to be initialized to start the training?
i. Weights ii. Learning rate α
4. For easy calculation and simplicity, take -------- values.
c) random
5. Delta rule works only for the --------- layer.
output

Action Plan
• Supervised Learning Network (Cont…)
- Introduction to LVQ
- LVQ architecture
- LVQ training algorithm
- LVQ variants
- Introduction to TDNN
- Implementation of TDNN
- Building & using TDNN
• Quiz at the end of session

Learning Vector Quantization (LVQ)
• Learning Vector Quantization LVQ, different from Vector quantization VQ
and Kohonen Self-Organizing Maps KSOM.
• Basically is a competitive network which uses supervised learning.
• We may define it as a process of classifying the patterns where each output
unit represents a class.

LVQ - Architecture
• It can be seen that there are “n” number of input units and “m” number of
output units.
• The layers are fully interconnected with having weights on them.

LVQ – Training Algorithm
Parameters Used:
Following are the parameters used in LVQ training process as well as in the
flowchart.
x = training vector (x1,...,xi,...,xn)
T = class for training vector x
wj = weight vector for jth output unit
Cj = class associated with the jth output unit
Step 1 − Initialize reference vectors, which can be done as follows −
Step 1a − From the given set of training vectors, take the first “m” number of
clusters training vectors and use them as weight vectors. The remaining vectors can
be used for training.
Step 1b − Assign the initial weight and classification randomly.
Step 1c − Apply K-means clustering method.

LVQ – Training Algorithm (Cont…)
Step 2 − Initialize reference vector α
Step 3 − Continue with steps 4-9, if the condition for stopping this algorithm is not met.
Step 4 − Follow steps 5-6 for every training input vector x.
Step 5 − Calculate Square of Euclidean Distance for j = 1 to m and i = 1 to n
n m
D(j) = ∑ ∑ (xi−wij)2
i=1 j=1
Step 6 − Obtain the winning unit J where D j is minimum.

LVQ – Training Algorithm (Cont…)
Step 7 − Calculate the new weight of the winning unit by the following relation −
if T = Cj then wj(new) = wj(old) + α[x−wj(old)]
if T ≠ Cj then wj(new) = wj(old) − α[x−wj(old)]
Step 8 − Reduce the learning rate α.
Step 9 − Test for the stopping condition. It may be as follows −
• Maximum number of epochs reached.
• Learning rate reduced to a negligible value.

LVQ - Flowchart

LVQ - Variants
• Three other variants namely LVQ2, LVQ2.1 and LVQ3 have been
developed by Kohonen.
LVQ2
• This window will be based on the following parameters −
x − the current input vector
yc − the reference vector closest to x
yr − the other reference vector, which is next closest to x
dc − the distance from x to yc
dr − the distance from x to yr

LVQ - Variants
• The input vector x falls in the window, if
dc / dr > 1 − θ and dr / dc > 1 + θ
• Here, θ is the number of training samples.
• Updating can be done with the following formula −
yc (t+1) = yc(t) + α(t)[x(t) − yc(t)] belongs to different class
yr (t+1) = yr(t) + α(t)[x(t) − yr(t)] belongs to same class
• Here α is the learning rate.

LVQ - Variants
LVQ2.1
• In LVQ2.1, we will take the two closest vectors namely yc1 and yc2 and
the condition for window is as follows −
Min [ dc1 / dc2, dc2 / dc1] > (1−θ)
Max [ dc1 / dc2, dc2 / dc1] < (1+θ)
yc1(t+1) = yc1(t) + α(t) [x(t) − yc1(t)] belongs to different class
yc2(t+1) = yc2(t) + α(t) [x(t) − yc2(t)] belongs to same class
• Here, α is the learning rate.

LVQ - Variants
LVQ3
• In LVQ3, we will take the two closest vectors namely yc1 and yc2 and the
condition for window is as follows −
Min [ dc1 / dc2, dc2 / dc1] > (1 − θ) (1 + θ)
• Here θ ≈ 0.2
yc1(t+1) = yc1(t) + β(t) [x(t) − yc1(t)] belongs to different class
yc2(t+1) = yc2(t) + β(t) [x(t) − yc2(t)] belongs to same class
• Here β is the multiple of the learning rate α and β=mα(t) for every
0.1 < m < 0.5

Time Delay Neural Network (TDNN)
• Time delay networks, introduced by Alex Waibel, are a group of neural networks that have a special
topology.
• They are used for position independent recognition of features within a larger pattern.

TDNNs (Cont…)
• Feature: A component of the pattern to be learned.
• Feature Unit: The unit connected with the feature to be learned.
• Delay: In order to be able to recognize patterns place or time-invariant,
older activation and connection values of the feature units have to be
stored.

TDNNs (Cont…)
• Receptive Field: The feature units and their delays are fully connected to
the original units of the subsequent layer.
• Total Delay Length: The length of the layer.
• Coupled Links: Each link in a receptive field is reduplicated for every
subsequent step of time up to the total delay length.

TDNN Implementation in SNNS
• The original time delay algorithm was slightly modified for implementation
in Stuttgart Neural Network Simulator (SNNS), since it requires either
variable network sizes or fixed length input patterns.
• The coupled links are implemented as one physical (i.e. normal) link and a
set of logical links associated with it.

Building and Using a Time Delay Network
• In SNNS, TDNNs should be generated only with the tool BIGNET (Time
Delay).
• After the creation of the net, the unit activation function Act_TD_Logistic,
the update function TimeDelay_Order, and the learning function
TimeDelayBackprop have to be assigned in the usual way.
• If the application requires variable pattern length, a tool to segment these
patterns into fitting pieces has to be applied.

Quiz - Questions
1. Each output unit represents a ----------.
a) class b) object c) data d) none
2. In step 1b of LVQ training algorithm, assign the initial weight and
------------- randomly.
3. Three other variants namely LVQ2, LVQ2.1 and LVQ3 have been
developed by ------------.
4. The coupled links are implemented as one physical link and a
a) true b) false
5. In SNNS, TDNNs should be generated only with the tool -----------.

Quiz - Answers
1. Each output unit represents a ----------.
a) class
2. In step 1b of LVQ training algorithm, assign the initial weight and
------------- randomly.
classification
3. Three other variants namely LVQ2, LVQ2.1 and LVQ3 have been
developed by ------------ .
Kohonen
4. The coupled links are implemented as one physical link and a
a) true
5. In SNNS, TDNNs should be generated only with the tool ----------- .
BIGNET

Supervised learning network

More Related Content

What's hot

Similar to Supervised learning network

More from Dr. C.V. Suresh Babu

Recently uploaded

In this document

Supervised learning network