Convolutional Neural
Network and Its
Applications
144133E – M.G.K.C.PIYARTHNA
What is CNN?
In machine learning, a convolutional neural network is a class of deep,
feed-forward artificial neural networks that has successfully been
applied fpr analyzing visual imagery.
In the field of ComputerVision and Natural Language Processing, there
can be found more influential innovations by using the concept of
convolutional neural network in Machine Language.
Motivation
• Convolutional Neural Networks (CNN) are biologically-inspired
variants of MLPs. From Hubel andWiesel’s early work on the cat’s
visual cortex ,we know the visual cortex contains a complex
arrangement of cells.These cells are sensitive to small sub-regions of
the visual field, called a receptive field.The sub-regions are tiled to
cover the entire visual field.These cells act as local filters over the
input space and are well-suited to exploit the strong spatially local
correlation present in natural images.
• The animal visual cortex being the most powerful visual processing
system in existence, it seems natural to emulate its behavior
CNN Architecture
ConvNet Architectures
• LeNet (1990s)
• AlexNet (2012)
• ZF Net (2013)
• GoogLeNet (2014)
• VGGNet (2014)
• ResNets (2015)
• DenseNet (August 2016)
Four main operations in the ConvNet
• Convolution
• Non Linearity
• Pooling or Sub Sampling
• Classification (Fully Connected Layer)
• An Image is a matrix of pixel
values
• Channel is a conventional term
used to refer to a certain
component of an image.
• A grayscale image, on the other
hand, has just one channel.
The Convolution Step
• The primary purpose of
Convolution in case of a
ConvNet is to extract features
from the input image.
• In CNN terminology, the 3×3 matrix is called a ‘filter‘ or ‘kernel’ or
‘feature detector’
• the matrix formed by sliding the filter over the image and computing
the dot product is called the ‘Convolved Feature’ or ‘Activation Map’
or the ‘Feature Map‘.
• It is important to note that filters acts as feature detectors from the
original input image.
• In practice, a CNN learns the values of these filters on its own during
the training process.The more number of filters we have, the more
image features get extracted and the better our network becomes at
recognizing patterns in unseen images.
• The size of the Feature Map (Convolved Feature) is controlled by
three parameters
• Depth: Depth corresponds to the number of filters we use for the
convolution operation.
• Stride: Stride is the number of pixels by which we slide our filter
matrix over the input matrix.
• Zero-padding: Sometimes, it is convenient to pad the input
matrix with zeros around the border, so that we can apply the filter to
bordering elements of our input image matrix.
Introducing Non Linearity (ReLU)
• ReLU is an element wise
operation (applied per pixel)
and replaces all negative pixel
values in the feature map by
zero
• Convolution is a linear
operation – element wise
matrix multiplication and
addition, so we account for
non-linearity by introducing a
non-linear function like ReLU
The Pooling Step
• Spatial Pooling (also called
subsampling or downsampling)
reduces the dimensionality of
each feature map but
retains the most
important information. Spatial
Pooling can be of different
types: Max, Average, Sum etc.
• In case of Max Pooling, we
define a spatial neighborhood
(for example, a 2×2 window)
Fully Connected Layer
• The term “Fully Connected”
implies that every neuron in the
previous layer is connected to
every neuron on the next layer.
• The output from the convolutional
and pooling layers represent high-
level features of the input image.
• The purpose of the Fully
Connected layer is to use these
features for classifying the input
image into various classes based
on the training dataset.
Putting it all together – Training using
Backpropagation
• input image is a boat, the
target probability is 1 for Boat
class and 0 for other
three classes
• Input Image = Boat
• TargetVector = [0, 0, 1, 0]
Putting it all together – Training using
Backpropagation
• Step1:We initialize all filters and parameters
• Step2: The network takes a training image as input, goes through the forward propagation step
(convolution, ReLU and pooling operations along with forward propagation in the FullyConnected
layer) and finds the output probabilities for each class
• Lets say the output probabilities for the boat image above are [0.2, 0.4, 0.1, 0.3]
• Step3: Calculate the total error at the output layer (summation over all 4 classes)
• Total Error = ∑ ½ (target probability – output probability) ²
• Step4:The weights are adjusted in proportion to their contribution to the total error.
• When the same image is input again, output probabilities might now be [0.1, 0.1, 0.7, 0.1], which is
closer to the target vector [0, 0, 1, 0].
• This means that the network has learnt to classify this particular image correctly by adjusting its
weights / filters such that the output error is reduced.
CNN Applications
• computer vision
face recognition, scene labeling, image classification, action
recognition, human pose estimation and document analysis
• natural language processing
field of speech recognition and text classification
Face recognition
• Identifying all the faces in the
picture
• Focusing on each face despite
bad lighting or different pose
• Identifying unique features
• Comparing identified features
to existing database and
determining the person's name
Scene labeling
• Real-time scene parsing in
natural conditions.
• Training on SiftFlow dataset(33
classes).
• Display one label per
component in the final
prediction
• Can also used Barcelona
Dataset(170 classes) , Stanford
Background Dataset(8 classes)
Speech Recognition
• Noise robustness
• Distant speech recognition
• Low-footprint models
• Channel-mismatched training-test conditions
Do you know?
• Facebook uses neural nets for
their automatic tagging
algorithms
• Google for their photo search
• Amazon for their product
recommendations
• Pinterest for their home feed
personalization
• Instagram for their search
infrastructure
Q & A?
Convolutional Neural Network and Its Applications

Convolutional Neural Network and Its Applications

  • 1.
    Convolutional Neural Network andIts Applications 144133E – M.G.K.C.PIYARTHNA
  • 2.
    What is CNN? Inmachine learning, a convolutional neural network is a class of deep, feed-forward artificial neural networks that has successfully been applied fpr analyzing visual imagery. In the field of ComputerVision and Natural Language Processing, there can be found more influential innovations by using the concept of convolutional neural network in Machine Language.
  • 3.
    Motivation • Convolutional NeuralNetworks (CNN) are biologically-inspired variants of MLPs. From Hubel andWiesel’s early work on the cat’s visual cortex ,we know the visual cortex contains a complex arrangement of cells.These cells are sensitive to small sub-regions of the visual field, called a receptive field.The sub-regions are tiled to cover the entire visual field.These cells act as local filters over the input space and are well-suited to exploit the strong spatially local correlation present in natural images. • The animal visual cortex being the most powerful visual processing system in existence, it seems natural to emulate its behavior
  • 4.
  • 5.
    ConvNet Architectures • LeNet(1990s) • AlexNet (2012) • ZF Net (2013) • GoogLeNet (2014) • VGGNet (2014) • ResNets (2015) • DenseNet (August 2016)
  • 6.
    Four main operationsin the ConvNet • Convolution • Non Linearity • Pooling or Sub Sampling • Classification (Fully Connected Layer)
  • 7.
    • An Imageis a matrix of pixel values • Channel is a conventional term used to refer to a certain component of an image. • A grayscale image, on the other hand, has just one channel.
  • 8.
    The Convolution Step •The primary purpose of Convolution in case of a ConvNet is to extract features from the input image.
  • 9.
    • In CNNterminology, the 3×3 matrix is called a ‘filter‘ or ‘kernel’ or ‘feature detector’ • the matrix formed by sliding the filter over the image and computing the dot product is called the ‘Convolved Feature’ or ‘Activation Map’ or the ‘Feature Map‘. • It is important to note that filters acts as feature detectors from the original input image. • In practice, a CNN learns the values of these filters on its own during the training process.The more number of filters we have, the more image features get extracted and the better our network becomes at recognizing patterns in unseen images.
  • 11.
    • The sizeof the Feature Map (Convolved Feature) is controlled by three parameters • Depth: Depth corresponds to the number of filters we use for the convolution operation. • Stride: Stride is the number of pixels by which we slide our filter matrix over the input matrix. • Zero-padding: Sometimes, it is convenient to pad the input matrix with zeros around the border, so that we can apply the filter to bordering elements of our input image matrix.
  • 12.
    Introducing Non Linearity(ReLU) • ReLU is an element wise operation (applied per pixel) and replaces all negative pixel values in the feature map by zero • Convolution is a linear operation – element wise matrix multiplication and addition, so we account for non-linearity by introducing a non-linear function like ReLU
  • 13.
    The Pooling Step •Spatial Pooling (also called subsampling or downsampling) reduces the dimensionality of each feature map but retains the most important information. Spatial Pooling can be of different types: Max, Average, Sum etc. • In case of Max Pooling, we define a spatial neighborhood (for example, a 2×2 window)
  • 14.
    Fully Connected Layer •The term “Fully Connected” implies that every neuron in the previous layer is connected to every neuron on the next layer. • The output from the convolutional and pooling layers represent high- level features of the input image. • The purpose of the Fully Connected layer is to use these features for classifying the input image into various classes based on the training dataset.
  • 15.
    Putting it alltogether – Training using Backpropagation • input image is a boat, the target probability is 1 for Boat class and 0 for other three classes • Input Image = Boat • TargetVector = [0, 0, 1, 0]
  • 16.
    Putting it alltogether – Training using Backpropagation • Step1:We initialize all filters and parameters • Step2: The network takes a training image as input, goes through the forward propagation step (convolution, ReLU and pooling operations along with forward propagation in the FullyConnected layer) and finds the output probabilities for each class • Lets say the output probabilities for the boat image above are [0.2, 0.4, 0.1, 0.3] • Step3: Calculate the total error at the output layer (summation over all 4 classes) • Total Error = ∑ ½ (target probability – output probability) ² • Step4:The weights are adjusted in proportion to their contribution to the total error. • When the same image is input again, output probabilities might now be [0.1, 0.1, 0.7, 0.1], which is closer to the target vector [0, 0, 1, 0]. • This means that the network has learnt to classify this particular image correctly by adjusting its weights / filters such that the output error is reduced.
  • 17.
    CNN Applications • computervision face recognition, scene labeling, image classification, action recognition, human pose estimation and document analysis • natural language processing field of speech recognition and text classification
  • 18.
    Face recognition • Identifyingall the faces in the picture • Focusing on each face despite bad lighting or different pose • Identifying unique features • Comparing identified features to existing database and determining the person's name
  • 19.
    Scene labeling • Real-timescene parsing in natural conditions. • Training on SiftFlow dataset(33 classes). • Display one label per component in the final prediction • Can also used Barcelona Dataset(170 classes) , Stanford Background Dataset(8 classes)
  • 20.
    Speech Recognition • Noiserobustness • Distant speech recognition • Low-footprint models • Channel-mismatched training-test conditions
  • 21.
    Do you know? •Facebook uses neural nets for their automatic tagging algorithms • Google for their photo search • Amazon for their product recommendations • Pinterest for their home feed personalization • Instagram for their search infrastructure
  • 22.