YONG Sopheaktra
M1
Yoshikawa-Ma Laboratory
2015/07/26
Feedforward neural networks
1
(multilayer perceptrons)
Kyoto University
• Artificial Neural Network
• Perceptron Algorithm
• Multi-layer perceptron (MLP)
• Overfitting & Regularization
Content
2
Kyoto University
• An Artificial Neural Network (ANN) is a system that is based on
biological neural network (brain).
▫ The brain has approximately 100 billion neurons, which communicate
through electro-chemical signals
▫ Each neuron receives thousands of connections (signals)
▫ If the resulting sum of signals surpasses certain threshold, the response is
sent
• The ANN attempts to recreate the computational mirror of the
biological neural network …
Artificial Neural Network
3
Kyoto University 4
Kyoto University
What is Perceptron?
5
• A perceptron models a neuron
• It receives n inputs (feature vector)
• It sum those inputs , calculated, then
output
• Used for linear or binary classification
Kyoto University 6
Perceptron
• The perceptron consists of weights, the summation processor, and an
activation function
• A perceptron takes a weighted sum of inputs and outputs:
Kyoto University
Weight & Bias
7
• Bias can also be treated as another input
▫ The bias allow to shift the line
• The weights determine the slope
Kyoto University
Transfer or Activation Functions
8
• The transfer function translate the input signals to output signals
• It uses a threshold to produce an output
• Some examples are
▫ Unit Step (threshold)
▫ Sigmoid (logistic regression)
▫ Piecewise linear
▫ Gaussian
Kyoto University 9
Unit Step (Threshold)
• The output is set depending on whether the total input is greater or less
than some threshold value.
Kyoto University 10
Piecewise Linear
• The output is proportional to the total weighted output.
Kyoto University 11
Sigmoid function
• It is used when the output is expected to be a positive number
▫ It generates outputs between 0 and 1
Kyoto University 12
Gaussian
• Gaussian functions are bell-shaped curves that are continuous
• It is used in radial basis function ANN (RBF kernel – Chapter 14)
▫ Output is real value
Kyoto University 13
The learning rate
• To update the weights and bias to get smaller error
• Help us control how much we change the weight and bias
Kyoto University 14
How the algorithm work?
• Initialize the weights (zero or small random value)
• Pick a learning rate (0 – 1)
• For each training set
• Compute the activation output
▫ Adjusting
 Error = differences between predicted and actual
 Update bias and weight
• Repeating till the error is very small or zero
• If the it is linear separable, we will found solution
Kyoto University 15
https://github.com/nsadawi/perceptronPerceptron.zip/Perceptron.java
Kyoto University 16
What if the data is non-linearly separable?
• Because SLP is a linear classifier and if the data are not linearly
separable, the learning process will never find the solution
• For example: XOR problem
Kyoto University 17
Perceptron.zip/Perc.java
Kyoto University 18
XOR Classification (Xor_classification.zip)
Kyoto University 19
• A series of logistic regression models stacked on top of each other, with
the final layer being either another logistic regression or a linear
regression model, depending on whether we are solving a classification
or regression problem.
Multi-layer perceptron (MLP)
Kyoto University 20
Kyoto University 21
A closer look
Kyoto University 22
Kyoto University 23
• Use output error, to adjust the weights of inputs at the output layer
• Calculate the error at the previous layer and use it to adjust the weights
• Repeat this process of back-propagating errors through any number of
layers
• You may find mathematical equation of how to minimize cost function
of neural network at 16.5.4 The backpropagation algorithm
The Back Propagation Algorithm
Kyoto University 24
Convolutional neural networks
http://yann.lecun.com/exdb/lenet/index.html
• Designed to recognize visual patterns directly from pixel images with
minimal preprocessing.
• The purpose of multiple hidden units are used to learn non-linear
combination of the original inputs (feature extraction)
▫ Individual Informative
▫ Each pixel in an image is not very informative
▫ But the combination will tell
Kyoto University 25
Kyoto University 26
Multiple-Classifier
Kyoto University 27
Machine-learning-ex3.zip
Kyoto University 28
Overfitting Problem
Kyoto University 29
Cross validation error
Kyoto University 30
• Simplifier the parameters/features
▫ Remove some unnecessary features
• Regularization
▫ Adjusting the weight
How to address it?
Kyoto University 31
• The MLP can overfit, esp. if the number of nodes is large
• A simple way to prevent is early stopping
▫ Stopping the training procedure when the error on the validation set first
start to increase
• Techniques are
▫ Consistent Gaussian prior
▫ Weight pruning: smaller the parameters value
▫ Soft weight sharing: group of parameters value have similar value
▫ Semi-supervised embedding: used with deep learning NN
▫ Bayesian Inference
 Determine number of hidden units – faster than cross-validation
Regularization
Kyoto University 32
Thanks You
Kyoto University
• https://www.coursera.org/learn/machine-learning
• https://www.youtube.com/playlist?list=PLea0WJq13cnCS4LLMeUuZmTx
qsqlhwUoe
• http://yann.lecun.com/exdb/lenet/index.html
Reference
33

Feedforward neural network

  • 1.
  • 2.
    Kyoto University • ArtificialNeural Network • Perceptron Algorithm • Multi-layer perceptron (MLP) • Overfitting & Regularization Content 2
  • 3.
    Kyoto University • AnArtificial Neural Network (ANN) is a system that is based on biological neural network (brain). ▫ The brain has approximately 100 billion neurons, which communicate through electro-chemical signals ▫ Each neuron receives thousands of connections (signals) ▫ If the resulting sum of signals surpasses certain threshold, the response is sent • The ANN attempts to recreate the computational mirror of the biological neural network … Artificial Neural Network 3
  • 4.
  • 5.
    Kyoto University What isPerceptron? 5 • A perceptron models a neuron • It receives n inputs (feature vector) • It sum those inputs , calculated, then output • Used for linear or binary classification
  • 6.
    Kyoto University 6 Perceptron •The perceptron consists of weights, the summation processor, and an activation function • A perceptron takes a weighted sum of inputs and outputs:
  • 7.
    Kyoto University Weight &Bias 7 • Bias can also be treated as another input ▫ The bias allow to shift the line • The weights determine the slope
  • 8.
    Kyoto University Transfer orActivation Functions 8 • The transfer function translate the input signals to output signals • It uses a threshold to produce an output • Some examples are ▫ Unit Step (threshold) ▫ Sigmoid (logistic regression) ▫ Piecewise linear ▫ Gaussian
  • 9.
    Kyoto University 9 UnitStep (Threshold) • The output is set depending on whether the total input is greater or less than some threshold value.
  • 10.
    Kyoto University 10 PiecewiseLinear • The output is proportional to the total weighted output.
  • 11.
    Kyoto University 11 Sigmoidfunction • It is used when the output is expected to be a positive number ▫ It generates outputs between 0 and 1
  • 12.
    Kyoto University 12 Gaussian •Gaussian functions are bell-shaped curves that are continuous • It is used in radial basis function ANN (RBF kernel – Chapter 14) ▫ Output is real value
  • 13.
    Kyoto University 13 Thelearning rate • To update the weights and bias to get smaller error • Help us control how much we change the weight and bias
  • 14.
    Kyoto University 14 Howthe algorithm work? • Initialize the weights (zero or small random value) • Pick a learning rate (0 – 1) • For each training set • Compute the activation output ▫ Adjusting  Error = differences between predicted and actual  Update bias and weight • Repeating till the error is very small or zero • If the it is linear separable, we will found solution
  • 15.
  • 16.
    Kyoto University 16 Whatif the data is non-linearly separable? • Because SLP is a linear classifier and if the data are not linearly separable, the learning process will never find the solution • For example: XOR problem
  • 17.
  • 18.
    Kyoto University 18 XORClassification (Xor_classification.zip)
  • 19.
    Kyoto University 19 •A series of logistic regression models stacked on top of each other, with the final layer being either another logistic regression or a linear regression model, depending on whether we are solving a classification or regression problem. Multi-layer perceptron (MLP)
  • 20.
  • 21.
  • 22.
  • 23.
    Kyoto University 23 •Use output error, to adjust the weights of inputs at the output layer • Calculate the error at the previous layer and use it to adjust the weights • Repeat this process of back-propagating errors through any number of layers • You may find mathematical equation of how to minimize cost function of neural network at 16.5.4 The backpropagation algorithm The Back Propagation Algorithm
  • 24.
    Kyoto University 24 Convolutionalneural networks http://yann.lecun.com/exdb/lenet/index.html • Designed to recognize visual patterns directly from pixel images with minimal preprocessing. • The purpose of multiple hidden units are used to learn non-linear combination of the original inputs (feature extraction) ▫ Individual Informative ▫ Each pixel in an image is not very informative ▫ But the combination will tell
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    Kyoto University 29 Crossvalidation error
  • 30.
    Kyoto University 30 •Simplifier the parameters/features ▫ Remove some unnecessary features • Regularization ▫ Adjusting the weight How to address it?
  • 31.
    Kyoto University 31 •The MLP can overfit, esp. if the number of nodes is large • A simple way to prevent is early stopping ▫ Stopping the training procedure when the error on the validation set first start to increase • Techniques are ▫ Consistent Gaussian prior ▫ Weight pruning: smaller the parameters value ▫ Soft weight sharing: group of parameters value have similar value ▫ Semi-supervised embedding: used with deep learning NN ▫ Bayesian Inference  Determine number of hidden units – faster than cross-validation Regularization
  • 32.
  • 33.
    Kyoto University • https://www.coursera.org/learn/machine-learning •https://www.youtube.com/playlist?list=PLea0WJq13cnCS4LLMeUuZmTx qsqlhwUoe • http://yann.lecun.com/exdb/lenet/index.html Reference 33

Editor's Notes

  • #5 Dendrite: input Axon: output