UNIT – II
Multi-layerPerceptron– Going Forwards – Going Backwards: Back
Propagation Error – Multi-layer
Perceptron in Practice – Examples of using the MLP – Overview – Deriving
Back-Propagation
2.
Multi-Layer Perceptron Learning
Thisis an artificial neural network widely used for solving classification and
regression tasks.
• MLP consists of multiple layers that transform input data from one dimension
to another.
• An input layer, one or more hidden layers, and an output layer.
• To model complex relationships between inputs and outputs, for machine
learning tasks.
3.
• Input Layer:Each neuron (or node) in this layer corresponds to an input
feature. if you have three input features, three neurons.
• Hidden Layers: An MLP can have any number of hidden layers, with each
layer containing any number of nodes to process the information.
• Output Layer: The output layer generates the final prediction or result.
Working of Multi-Layer Perceptron
The key mechanisms such as forward propagation, loss function,
backpropagation, and optimization.
Step 1: Forward Propagation
The data flows from the input layer to the output layer, passing through any
hidden layers.
Each neuron in the hidden layers processes the input as follows:
Weighted Sum: The neuron computes the weighted sum of the inputs:
z = ∑iwi
xi
+b
4.
xi is theinput feature.
wi is the corresponding weight.
b is the bias term.
Activation Function:
The weighted sum z is passed through an activation function to introduce non-
linearity.
• Sigmoid: σ(z)=1/1+e−z
• ReLU (Rectified Linear Unit): f(z)=max(0,z)
• Tanh (Hyperbolic Tangent): tanh(z)=2/1+e−2z
–1
5.
Step 2: LossFunction
• From output the loss is calculated using a loss function.
• In supervised learning, this compares the predicted output to the actual label.
• For a classification problem, the commonly used binary cross-entropy loss
function is:
Step 3: Backpropagation
This is used to minimize the loss function by adjusting the network’s weights
and biases.
1. Gradient Calculation: The gradients of the loss function with respect to each
weight and bias are calculated.
2. Error Propagation: The error is propagated back through the network, layer
by layer.
6.
Gradient Descent: Thenetwork updates the weights and biases by moving in the
opposite direction of the gradient to reduce the loss:
Step 4: Optimization
MLPs rely on optimization algorithms to iteratively refine the weights and
biases during training.
Stochastic Gradient Descent (SGD):
Updates the weights based on a single sample or a small batch of data:
7.
Significance of ForwardPropagation
• Prediction: It is the mechanism by which neural networks make predictions.
By processing input data through multiple layers, the network can learn
complex patterns and relationships.
• Training: Forward propagation is used during the training phase to calculate
the network’s output, which is then compared to the actual target to
compute the loss. This loss is used to update the weights and biases through
backward propagation.
• Efficiency: Efficient forward propagation ensures that the network can handle
large datasets and complex tasks without excessive computational costs
Significance of Backpropagation
• Learning: It enables neural networks to learn from data by iteratively
adjusting weights and biases to minimize the error.
• Efficiency: Backpropagation leverages the chain rule to efficiently compute
gradients, making it feasible to train deep networks with many layers.
• Generalization: By minimizing the loss, backpropagation helps the network
generalize well to unseen data, improving its predictive performance.
OR Neural Network
•A single-layer perceptron can solve
linearly separable function such as the
OR function.
• It is possible to draw a single straight line
(hyperplane) to separate and groups.
• The data is linearly separable using a 1-
dimensional hyperplane.
10.
XOR Neural Network
•A single-layer perceptron model cannot
solve the XOR function since a single
straight line cannot be drawn.
• it is possible to draw two straight lines
to separate and group the output
patterns.
• A multi-layer perceptron containing an
extra layer of hidden neurons is capable
of solving problems
• The data is now linearly separable
using a 2-dimensional hyperplane.
11.
Multi-Layer Perceptron Algorithm:
1.An input vector is put into the input nodes
2. The inputs are fed forward through the network
• The inputs and the first-layer weights are used to decide whether
the hidden nodes fire or not.
• The activation function g(·) is the sigmoid function
• The outputs of these neurons and the second-layer weights
(labelled as w) are used to decide if the output neurons fire or not
3. The error is computed as the sum-of-squares difference between the
network outputs and the targets
4. This error is fed backwards through the network in order to
• First update the second-layer weights and then afterwards, the first-
layer weights
BACK PROPAGATION ALGORITHM
•This algorithm is used for training feedforward neural networks.
• It computes the gradient of the loss function with respect to the network
weights.
Step 1: Inputs X, arrive through the preconnected path.
Step 2: The input is modeled using true weights W. Weights are usually chosen
randomly.
Step 3: Calculate the output of each neuron from the input layer to the hidden
layer to the output layer.
Step 4: Calculate the error in the outputs
Backpropagation Error= Actual Output – Desired Output
Step 5: From the output layer, go back to the hidden layer to adjust the weights
to reduce the error.
Step 6: Repeat the process until the desired output is achieved.
15.
Parameters :
• x= inputs training vector x=(x1,x2,…………xn).
• t = target vector t=(t1,t2……………tn).
• δk = error at output unit.
• δj = error at hidden layer.
• α = learning rate.
• W0j = bias of hidden unit j.
Training Algorithm :
Step 1: Initialize weight to small random values.
Step 2: While the steps stopping condition is to be false do step 3 to 8.
Step 3: For each training pair do step 4 to 8 (Feed-Forward).
Step 4: Each input unit receives the signal unit and transmitsthe signal xi signal
to all the units.
Step 5 : Each hidden unit Zj (z=1 to a) sums its weighted input signal to
calculate its net input
16.
Step 6:
Each outputunit yk (k=1 to n) receives a target pattern
corresponding to an input pattern then error is calculated as:
δk = ( tk – yk ) + yink
zinj = W0j + ΣxiWij ( i=1 to n)
Applying activation function zj = f(zinj) and sends this signals to all units in
the layer about i.e output units
For each output l=unit yk = (k=1 to m) sums its weighted input
signals.
yink = w0k + Σ ziwjk (j=1 to a)
And applies its activation function to calculate the output signals.
yk = f(yink)
Each hidden unit Zj (j=1 to a) sums its input from all units in the layer above
δinj = Σ δj wj
17.
Updation of weightand bias :
Step 7: Each output unit yk (k=1 to m) updates its bias and weight (j=1
to a). The weight correction term is given by :
Δ wjk = α δk zj
Step 8: Test the stopping condition. The stopping condition can be
the minimization of error, number of epochs.
18.
Multi-layer perceptron inpractice:
• Choices that can be made about the network in order to use it for solving real
problems.
Amount of Training Data:
1. For the MLP with one hidden layer there are (L + 1) ×M + (M + 1) × N
weights, where L,M,N are the number of nodes in the input, hidden, and
output layers, respectively.
2. The extra +1s come from the bias nodes, which also have adjustable
weights
3. Setting the values of these weights is the job of the back-propagation
algorithm, which is driven by the errors coming from the training data.
4. Clearly, the more training data there is, the better for learning, although
the time that the algorithm takes to learn increases.
5. This is probably going to be a very large number of examples, so neural
network training is a fairly computationally expensive operation
19.
Number of HiddenLayers:
Two Choices
o The number of hidden nodes
o The number of hidden layers
• It is possible to show mathematically that one hidden layer with lots of
hidden nodes is sufficient.
• This is known as the Universal Approximation Theorem.
When to stop Learning:
• The training of the MLP requires that the algorithm runs over the entire
dataset many times, with the weights changing as the network makes errors
in each iteration.
• Two options
o Predefined number of Iterations
o Predefined minimum error reached
• Using both of these options together can help, as can terminating the
learning once theerror stops decreasing.
20.
• We trainthe network for some predetermined amount of time, and then
use the validation.
• We then carry on training for a few more iterations, and repeat the whole
process.
• At some stage the error on the validation set will start increasing again,
because the network has stopped learning.
• Start to learn about the noise that is in the data itself.
• At this stage we stop the training.
• This technique is called early stopping.
21.
Examples of usingthe MLP:
We apply MLP to find solutions to four different types of problems:
• Regression
• Classification
• Time-series prediction
• Data compression
1. Regression:
• Regression is a statistical technique that is used for predicting continuous
outcomes.
• We don't apply any activation function to the output layer of MLP, when
dealing with regression tasks, It just does the weighted sum and sends the
output.
• But, in case you want your value between a given range, for example, -1 or
+1 you can use activation like Tanh(Hyperbolic Tangent) function.
22.
• The lossfunctions that can be used
in Regression MLP include Mean
Squared Error(MSE) and Mean
Absolute Error(MAE).
• Example: Rainfall prediction, Stock
price prediction
Classification:
• If the output variable is categorical, then we have to use classification for
prediction.
• Example: Iris Flower classification
23.
• The aimis to classify iris flowers among three species (Setosa, Versicolor, or
Virginica)
• The above neural network has one input layer, two hidden layers and one
output layer.
• In the hidden layers we use sigmoid as an activation function for all neurons.
• In the output layer, we use softmax as an activation function for the three
output neurons.
• In this regard, all outputs are between 0 and 1, and their sum is 1.
• The neural network has three outputs since the target variable contains three
classes(Setosa, Versicolor, and Virginica).
24.
Time series Prediction:
•Here, we have a set of data that show how something varies over time, and
we want to predict how the data will vary in the future.
• The problem is that even if there is some regularity in the time-series, it can
appear over many different scales.
• For example, there is often seasonal variation in temperatures.
• Example: To predict the ozone levels into the future and see if you can
detect an overall drop in the mean ozone level.
Data Compression / Data denoising:
we train the network to
reproduce the inputs at
the output layer called
auto-associative
learning
25.
• Suppose weuse a hidden layer that has fewer neurons than the input layer.
• This bottleneck hidden layer has to represent all of the information in the
input
• It therefore performs some compression of the data, representing it using
fewer dimensions than were used in the input.
• Different representation of the input data that extracts important
components of the data, and ignores the noise.
26.
Example of Backpropagationin Machine Learning
• Assume the neurons use the sigmoid activation function for the forward and
backward pass.
• The target output is 0.5, and the learning rate is 1.
27.
2. Sigmoid Function
Thesigmoid function returns a value between 0 and 1, introducing non-linearity into the
model.
29.
4. Error Calculation
Ouractual output is 0.5 but we obtained 0.67.
To calculate the error we can use the below formula:
Error=0.5−0.67=−0.17
Using this error value we will be backpropagating.