Artificial Neural Network
 Artificial Neural Networks are
computing systems inspired by the
biological neural networks that is
inspired by the way biological
nervous systems, such as the brain,
process information.
 Biological neuron has dendrites to receive signals, a cell body to
process them, and an axon to send signals out to other neurons, the
artificial neuron has a number of input channels, a processing
stage, and one output that can fan out to multiple other artificial
neurons.
 Rosenblatt proposed the
earliest algorithm of
artificial neural networks
called Perceptron.
 The linear activation function determines the output of the unit.
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
x2
x1
 µ=0.3
 w0=-.5, w1=1, w2=1
 θ = 0
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
x2
x1
 µ=0.3
 w0=-.5, w1=1, w2=1
 θ = 0 θ(x)>0 =1 otherwise 0
x1 x2 y
0 0 0
0 1 0
1 0 0
1 1 1
x2
x1
 w0=-.5, w1=1, w2=1
 net= w1x1+w2x2+w0x0 = x1+x2-0.5
x1 x2 ya
0 0 0
0 1 0
1 0 0
1 1 1
x1+x2-0.5
x2
x1
 net =w1x1+w2x2+w0x0 = x1+x2-0.5= 0+0-0.5=-0.5
 yd= θ(net) = θ(-0.5)= 0
 yd =ya correct
x1 x2 ya
0 0 0
0 1 0
1 0 0
1 1 1
x1+x2-0.5
x2
x1
 net =w1x1+w2x2+w0x0 = x1+x2-0.5= 0+1-0.5 =+ 0.5
 yd= θ(net) = θ(+0.5)= +1
 yd != ya incorrect
x1 x2 ya
0 0 0
0 1 0
1 0 0
1 1 1
x1+x2-0.5
x2
x1
 Update weight wi=wi - µ * xi
 w0=-0.5-0.3*1=-0.8 & w1= 1-0.3*0=1 & w2=1-0.3*1=0.7
x1 x2 ya
0 0 0
0 1 0
1 0 0
1 1 1
x1+x2-0.5
x2
x1
 w0=-0.8 & w1= 1 & w2=0.7
x1 x2 ya
0 0 0
0 1 0
1 0 0
1 1 1
x1+0.7x2-0.8
x2
x1
x1 x2 ya
0 0 0
0 1 0
1 0 0
1 1 1
x1+0.7x2-0.8
 net =w1x1+w2x2+w0x0 = x1+0.7x2-0.8= 1+0-0.8 =+0.2
 yd= θ(net) = θ(+0.2)= +1
 yd != ya incorrect
x2
x1
x1 x2 ya
0 0 0
0 1 0
1 0 0
1 1 1
x1+0.7x2-0.8
 Update weight wi = wi - µ * xi
 w0=-0.8-0.3*1=-1.1 & w1= 1-0.3*1=0.7 & w2=0.7-0.3*0=0.7
x2
x1
x1 x2 ya
0 0 0
0 1 0
1 0 0
1 1 1
 w0= -1.1 & w1= 0.7 & w2= 0.7
0.7x1+0.7x2-1.1
x1
x2
 Researchers discovered that
Perceptron cannot approximate
many nonlinear decision functions,
for example, the XOR problem.
x2
x1
 Researchers found a solution
to that problem by stacking
multiple layers of linear
classifiers called multilayer
perceptron to approximate
nonlinear decision functions.
Feedforward
 Werbos effectively solved
the exclusive-or problem
and more generally
accelerated the training of
multi-layer networks using
Backpropagation algorithm.
Backpropagation
 Forward propagation : is simply
multiplying input with weights and
add bias before applying
activation function (sigmoid in
here) at each node.
 Backpropagation: is a method
used in artificial neural networks to
calculate the error contribution of
each neuron after a batch of data
Feedforward
Backpropagation
Three concepts behind Backpropagation (From Calculus)
1) Derivative
2) Partial Derivative
3) Chain Rule
1)Derivative
 =
 ′ = 2
2)Partial Derivative
 , = + 5
 = 5
 = 4 + 5
3)Chain Rule
ℎ( ))′ = ( ( ))′ = ′( ( )) ∗ ′( )
 The linear and non
linear activation
function determines
the output of the
unit.
x1
x2
Net
out
Net
out
Net
out
Net
out
Input Layer Hidden Layer Output Layer
k j i
wkj wji
 Hidden Layer:
 = ∑ ∗ ℎ
 =
x1
x2
Net
out
Net
out
Net
out
Net
out
k j i
wkj wji
 Output Layer:
 = ∑ ∗ ℎ ℎ
 =
x1
x2
Net
out
Net
out
Net
out
Net
out
k j i
wkj wji
 Total Error:
 = ∑ − ℎ ℎ
x1
x2
Net
out
Net
out
Net
out
Net
out
k j i
wkj wji
t
t
 Output Layer:
 = + ∗ ∗ − ∗ ( − )
x1
x2
Net
out
Net
out
Net
out
Net
out
k j i
wkj wji
 Hidden Layer:
 = + ∗ ∗ − ∗
 = ∑ − ∗ ( − ) ∗
x1
x2
Net
out
Net
out
Net
out
Net
out
k j i
wkj wji
 Example
i1
i2
i0
h1
h2
h0
o1
o2
i1
i2
i0
h1
h2
h0
o1
o2
w1
w3
w2
w4
w5
w7
w6
w8
w9
w10
w11
w12
 Assume
= 0.5
t1 = 0.01
t2 = 0.99
.05
0.1
1
h1
h2
1
o1
o2
.15
.25
.20
.30
.40
.5
.45
.55
.35
.35
.60
.60
 Hidden Layers:
 = ∗ + ∗ + ∗
= . ∗ . + . ∗ . + .
∗ = .
 = = .
= .
.05
0.1
1
h1
h2
.15
.25
.20
.30
.35
.35
 Hidden Layers:
 = ∗ + ∗ + ∗
= . ∗. + . ∗ . + .
∗ = .
 = = .
= .
.05
0.1
1
h1
h2
.15
.25
.20
.30
.35
.35
 Output Layer:
 = ∗ + ∗
+ ∗
= . ∗ . + . ∗ .
+ . ∗ = .
 = = .
= .
h1
h2
1
o1
o2
.40
.5
.45
.55
.60
.60
 Output Layer:
 = ∗ + ∗
+ ∗
= . ∗ . + . ∗ .
+ . ∗ = .
 = = .
= .
h1
h2
1
o1
o2
.40
.5
.45
.55
.60
.60
 Total Error:
 = ∑ −
 = − = . − .
= .
 = − = . − .
= .
 = + = . + . = .
o1
o2
t1
t2
 Our goal with back-propagation is to update each of the weights in
the network so that they cause the actual output to be closer the
target output by minimizing the error for each output neuron and
the network as a whole.
 Output Layer
 = + ∗
By applying the chain rule:
 = ∗ ∗
h1
h2
h0
o1
o2
w5
w7
w6
w8
w11
w12
t1
t2
 Output Layer
 = ∗ ∗
 = ∗ + ∗
+ ∗
 = = .
h1
h2
h0
o1
o2
w5
w7
w6
w8
w11
w12
t1
t2
 Output Layer
 = ∗ ∗
 =
 = =
∗ = ( − )
= −
= . − . = .
h1
h2
h0
o1
o2
w5
w7
w6
w8
w11
w12
t1
t2
 Output Layer
 = ∗ ∗
 = − + (
− )
 = ∗ − ∗ −
= −( − )
= −(. − . ) = .
h1
h2
h0
o1
o2
w5
w7
w6
w8
w11
w12
t1
t2
 Output Layer
 = ∗ ∗
 = − ∗ (
− ) ∗
= . ∗ . ∗ .
= .
h1
h2
h0
o1
o2
w5
w7
w6
w8
w11
w12
t1
t2
 Update w5
 = + ∗
= . + . ∗ . = .
h1
h2
h0
o1
o2
w5
w7
w6
w8
w11
w12
t1
t2
 Update w6
 = − ∗ (
− ) ∗
= . ∗ . ∗ .
= .
 = + ∗ = . + .
∗ . = .
h1
h2
h0
o1
o2
w5
w7
w6
w8
w11
w12
t1
t2
 Update w7
 = − ∗ (
− ) ∗
= . − . ∗ .
∗ − . ∗ . = .
 = + ∗ = . + .
∗ . = .
h1
h2
h0
o1
o2
w5
w7
w6
w8
w11
w12
t1
t2
 Update w8
 = − ∗ (
− ) ∗
= . − . ∗ .
∗ − . ∗ . = .
 = + ∗ = . + .
∗ . = .
h1
h2
h0
o1
o2
w5
w7
w6
w8
w11
w12
t1
t2
 Hidden Layer
 = + ∗
By applying the chain rule:
 = ∗ ∗
i1
i2
i0
h1
h2
w1
w3
w2
w4
w9
w10
 Hidden Layer
 = ∗ − ∗ (
− ) ∗ − ∗
+ − ∗ − ∗
 = 0.05 ∗ 0.5933 1 − 0.5933 ∗ [(0.7514(1
− 0.7514) ∗ .01 − 0.7514 ∗ 0.4)
+ (0.7729 1 − 0.7729 ∗ .99 − 0.7729
∗ 0.5)]
 = − .
i1
i2
i0
h1
h2
w1
w3
w2
w4
w9
w10
 Hidden Layer
 Update w1
 = + = . − .
∗ . = .
i1
i2
i0
h1
h2
w1
w3
w2
w4
w9
w10
 Hidden Layer
 Update w2
 = .
 Update w3
 = .
 Update w4
 = .
i1
i2
i0
h1
h2
w1
w3
w2
w4
w9
w10
 MNIST is a large database of handwritten digits.
 MNIST contains 60,000 training images and 10,000 testing
images
Each student select one of neural network architecture from
http://www.asimovinstitute.org/neural-network-zoo/
Download the paper that describe the network from
Original Paper PDF
Make two pages using word to summarize your selected
neural network.
Use MS Word
Send me e-mail to mloey@live.com with email subject “
Advanced Topics in CS2 – Task2 “
Put your Arabic name on word and email body
Finally, press Send
Deadline Next Lecture
 http://playground.tensorflow.org/
facebook.com/mloey
mohamedloey@gmail.com
twitter.com/mloey
linkedin.com/in/mloey
mloey@fci.bu.edu.eg
mloey.github.io
www.YourCompany.com
© 2020 Companyname PowerPoint Business Theme. All Rights Reserved.
THANKS FOR
YOUR TIME

Lecture 2: Artificial Neural Network

  • 1.
  • 2.
     Artificial NeuralNetworks are computing systems inspired by the biological neural networks that is inspired by the way biological nervous systems, such as the brain, process information.
  • 3.
     Biological neuronhas dendrites to receive signals, a cell body to process them, and an axon to send signals out to other neurons, the artificial neuron has a number of input channels, a processing stage, and one output that can fan out to multiple other artificial neurons.
  • 5.
     Rosenblatt proposedthe earliest algorithm of artificial neural networks called Perceptron.
  • 6.
     The linearactivation function determines the output of the unit.
  • 7.
    x1 x2 y 00 0 0 1 0 1 0 0 1 1 1 x2 x1
  • 8.
     µ=0.3  w0=-.5,w1=1, w2=1  θ = 0 x1 x2 y 0 0 0 0 1 0 1 0 0 1 1 1 x2 x1
  • 9.
     µ=0.3  w0=-.5,w1=1, w2=1  θ = 0 θ(x)>0 =1 otherwise 0 x1 x2 y 0 0 0 0 1 0 1 0 0 1 1 1 x2 x1
  • 10.
     w0=-.5, w1=1,w2=1  net= w1x1+w2x2+w0x0 = x1+x2-0.5 x1 x2 ya 0 0 0 0 1 0 1 0 0 1 1 1 x1+x2-0.5 x2 x1
  • 11.
     net =w1x1+w2x2+w0x0= x1+x2-0.5= 0+0-0.5=-0.5  yd= θ(net) = θ(-0.5)= 0  yd =ya correct x1 x2 ya 0 0 0 0 1 0 1 0 0 1 1 1 x1+x2-0.5 x2 x1
  • 12.
     net =w1x1+w2x2+w0x0= x1+x2-0.5= 0+1-0.5 =+ 0.5  yd= θ(net) = θ(+0.5)= +1  yd != ya incorrect x1 x2 ya 0 0 0 0 1 0 1 0 0 1 1 1 x1+x2-0.5 x2 x1
  • 13.
     Update weightwi=wi - µ * xi  w0=-0.5-0.3*1=-0.8 & w1= 1-0.3*0=1 & w2=1-0.3*1=0.7 x1 x2 ya 0 0 0 0 1 0 1 0 0 1 1 1 x1+x2-0.5 x2 x1
  • 14.
     w0=-0.8 &w1= 1 & w2=0.7 x1 x2 ya 0 0 0 0 1 0 1 0 0 1 1 1 x1+0.7x2-0.8 x2 x1
  • 15.
    x1 x2 ya 00 0 0 1 0 1 0 0 1 1 1 x1+0.7x2-0.8  net =w1x1+w2x2+w0x0 = x1+0.7x2-0.8= 1+0-0.8 =+0.2  yd= θ(net) = θ(+0.2)= +1  yd != ya incorrect x2 x1
  • 16.
    x1 x2 ya 00 0 0 1 0 1 0 0 1 1 1 x1+0.7x2-0.8  Update weight wi = wi - µ * xi  w0=-0.8-0.3*1=-1.1 & w1= 1-0.3*1=0.7 & w2=0.7-0.3*0=0.7 x2 x1
  • 17.
    x1 x2 ya 00 0 0 1 0 1 0 0 1 1 1  w0= -1.1 & w1= 0.7 & w2= 0.7 0.7x1+0.7x2-1.1 x1 x2
  • 18.
     Researchers discoveredthat Perceptron cannot approximate many nonlinear decision functions, for example, the XOR problem. x2 x1
  • 19.
     Researchers founda solution to that problem by stacking multiple layers of linear classifiers called multilayer perceptron to approximate nonlinear decision functions. Feedforward
  • 20.
     Werbos effectivelysolved the exclusive-or problem and more generally accelerated the training of multi-layer networks using Backpropagation algorithm. Backpropagation
  • 21.
     Forward propagation: is simply multiplying input with weights and add bias before applying activation function (sigmoid in here) at each node.  Backpropagation: is a method used in artificial neural networks to calculate the error contribution of each neuron after a batch of data Feedforward Backpropagation
  • 22.
    Three concepts behindBackpropagation (From Calculus) 1) Derivative 2) Partial Derivative 3) Chain Rule
  • 23.
  • 24.
    2)Partial Derivative  ,= + 5  = 5  = 4 + 5
  • 25.
    3)Chain Rule ℎ( ))′= ( ( ))′ = ′( ( )) ∗ ′( )
  • 26.
     The linearand non linear activation function determines the output of the unit.
  • 28.
  • 29.
     Hidden Layer: = ∑ ∗ ℎ  = x1 x2 Net out Net out Net out Net out k j i wkj wji
  • 30.
     Output Layer: = ∑ ∗ ℎ ℎ  = x1 x2 Net out Net out Net out Net out k j i wkj wji
  • 31.
     Total Error: = ∑ − ℎ ℎ x1 x2 Net out Net out Net out Net out k j i wkj wji t t
  • 32.
     Output Layer: = + ∗ ∗ − ∗ ( − ) x1 x2 Net out Net out Net out Net out k j i wkj wji
  • 33.
     Hidden Layer: = + ∗ ∗ − ∗  = ∑ − ∗ ( − ) ∗ x1 x2 Net out Net out Net out Net out k j i wkj wji
  • 34.
  • 35.
  • 36.
     Assume = 0.5 t1= 0.01 t2 = 0.99 .05 0.1 1 h1 h2 1 o1 o2 .15 .25 .20 .30 .40 .5 .45 .55 .35 .35 .60 .60
  • 37.
     Hidden Layers: = ∗ + ∗ + ∗ = . ∗ . + . ∗ . + . ∗ = .  = = . = . .05 0.1 1 h1 h2 .15 .25 .20 .30 .35 .35
  • 38.
     Hidden Layers: = ∗ + ∗ + ∗ = . ∗. + . ∗ . + . ∗ = .  = = . = . .05 0.1 1 h1 h2 .15 .25 .20 .30 .35 .35
  • 39.
     Output Layer: = ∗ + ∗ + ∗ = . ∗ . + . ∗ . + . ∗ = .  = = . = . h1 h2 1 o1 o2 .40 .5 .45 .55 .60 .60
  • 40.
     Output Layer: = ∗ + ∗ + ∗ = . ∗ . + . ∗ . + . ∗ = .  = = . = . h1 h2 1 o1 o2 .40 .5 .45 .55 .60 .60
  • 41.
     Total Error: = ∑ −  = − = . − . = .  = − = . − . = .  = + = . + . = . o1 o2 t1 t2
  • 42.
     Our goalwith back-propagation is to update each of the weights in the network so that they cause the actual output to be closer the target output by minimizing the error for each output neuron and the network as a whole.
  • 43.
     Output Layer = + ∗ By applying the chain rule:  = ∗ ∗ h1 h2 h0 o1 o2 w5 w7 w6 w8 w11 w12 t1 t2
  • 44.
     Output Layer = ∗ ∗  = ∗ + ∗ + ∗  = = . h1 h2 h0 o1 o2 w5 w7 w6 w8 w11 w12 t1 t2
  • 45.
     Output Layer = ∗ ∗  =  = = ∗ = ( − ) = − = . − . = . h1 h2 h0 o1 o2 w5 w7 w6 w8 w11 w12 t1 t2
  • 46.
     Output Layer = ∗ ∗  = − + ( − )  = ∗ − ∗ − = −( − ) = −(. − . ) = . h1 h2 h0 o1 o2 w5 w7 w6 w8 w11 w12 t1 t2
  • 47.
     Output Layer = ∗ ∗  = − ∗ ( − ) ∗ = . ∗ . ∗ . = . h1 h2 h0 o1 o2 w5 w7 w6 w8 w11 w12 t1 t2
  • 48.
     Update w5 = + ∗ = . + . ∗ . = . h1 h2 h0 o1 o2 w5 w7 w6 w8 w11 w12 t1 t2
  • 49.
     Update w6 = − ∗ ( − ) ∗ = . ∗ . ∗ . = .  = + ∗ = . + . ∗ . = . h1 h2 h0 o1 o2 w5 w7 w6 w8 w11 w12 t1 t2
  • 50.
     Update w7 = − ∗ ( − ) ∗ = . − . ∗ . ∗ − . ∗ . = .  = + ∗ = . + . ∗ . = . h1 h2 h0 o1 o2 w5 w7 w6 w8 w11 w12 t1 t2
  • 51.
     Update w8 = − ∗ ( − ) ∗ = . − . ∗ . ∗ − . ∗ . = .  = + ∗ = . + . ∗ . = . h1 h2 h0 o1 o2 w5 w7 w6 w8 w11 w12 t1 t2
  • 52.
     Hidden Layer = + ∗ By applying the chain rule:  = ∗ ∗ i1 i2 i0 h1 h2 w1 w3 w2 w4 w9 w10
  • 53.
     Hidden Layer = ∗ − ∗ ( − ) ∗ − ∗ + − ∗ − ∗  = 0.05 ∗ 0.5933 1 − 0.5933 ∗ [(0.7514(1 − 0.7514) ∗ .01 − 0.7514 ∗ 0.4) + (0.7729 1 − 0.7729 ∗ .99 − 0.7729 ∗ 0.5)]  = − . i1 i2 i0 h1 h2 w1 w3 w2 w4 w9 w10
  • 54.
     Hidden Layer Update w1  = + = . − . ∗ . = . i1 i2 i0 h1 h2 w1 w3 w2 w4 w9 w10
  • 55.
     Hidden Layer Update w2  = .  Update w3  = .  Update w4  = . i1 i2 i0 h1 h2 w1 w3 w2 w4 w9 w10
  • 56.
     MNIST isa large database of handwritten digits.  MNIST contains 60,000 training images and 10,000 testing images
  • 69.
    Each student selectone of neural network architecture from http://www.asimovinstitute.org/neural-network-zoo/ Download the paper that describe the network from Original Paper PDF Make two pages using word to summarize your selected neural network.
  • 70.
    Use MS Word Sendme e-mail to [email protected] with email subject “ Advanced Topics in CS2 – Task2 “ Put your Arabic name on word and email body Finally, press Send Deadline Next Lecture
  • 71.
  • 72.
  • 73.
    www.YourCompany.com © 2020 CompanynamePowerPoint Business Theme. All Rights Reserved. THANKS FOR YOUR TIME