Convolutional Neural Network Models - Deep Learning
The document provides an overview of Convolutional Neural Network (CNN) models developed for image classification, highlighting notable models such as AlexNet, ZFNet, VGGNet, GoogLeNet, and ResNet. It outlines the architecture, training processes, and performance metrics of these models, including their respective top-5 error rates in the ImageNet Large Scale Visual Recognition Challenge. Additionally, it covers specific training techniques and innovations employed by each model to improve accuracy and efficiency.
Introduction to Convolutional Neural Network (CNN) models, ILSVRC, including major architectures like AlexNet, ZFNet, VGGNet, GoogleNet, and ResNet.
Description of CNN architecture comprising convolutional layers, pooling layers, and fully connected layers for image feature extraction.
Mathematical formulas for output image sizes from convolutional layers with various kernel sizes, strides, and paddings.
Detailed explanation of max pooling layers to reduce image dimensions and preserve structural details.
Description of ImageNet Large Scale Visual Recognition Challenge (ILSVRC), its classification task, and dataset sizes for training and validation.
AlexNet's achievement of 15.3% Top-5 error rate at ILSVRC, its architecture details, GPU usage, and innovations like dropout.
In-depth explanation of each layer in AlexNet with specific configurations such as filter sizes, memory usage, and the roles of normalization and dropout.
Information on implementing AlexNet using TFLearn framework.
Details about ZFNet, its victory in ILSVRC 2013 with configurations, including changes on filter sizes and the development of deconvnet.
VGGNet's runner-up performance in ILSVRC 2014, architectural strategies using smaller filters, and training techniques.
Layer-wise breakdown of VGGNet architectures (VGG16 and VGG19) with their configurations for image processing.
Information regarding the implementation of VGGNet using TFLearn framework.
GoogleNet winning ILSVRC 2014 with a 6.7% error rate, architecture insights including inception modules and parameter efficiency.
Overview of the inception module in GoogleNet architecture, showcasing its complexity and efficiency.
Details on various layers in GoogleNet, their configurations, and parameters used throughout the architecture.
ResNet's victory in ILSVRC 2015, introducing residual learning to improve training of deep networks.
Layer compositions and configurations of the ResNet architecture, highlighting variations in layer depths.
Details on implementing ResNet using TFLearn framework.
Final slide summarizing the presentation's contents with thank you notes and contact information.
CNN Models
ConvolutionalNeural Network (CNN)is a multi-layer neural
network
Convolutional Neural Network is comprised of one or more
convolutional layers (often with a pooling layers) and then
followed by one or more fully connected layers.
5.
CNN Models
Convolutionallayer acts as a feature extractor that extracts
features of the inputs such as edges, corners , endpoints.
6.
CNN Models
Poolinglayer reduces the resolution of the image that
reduce the precision of the translation (shift and distortion)
effect.
7.
CNN Models
fullyconnected layer have full connections to all activations in
the previous layer.
Fully connect layer act as classifier.
CNN Models
ImageNetLarge Scale Visual Recognition Challenge
is image classification challenge to create model that
can correctly classify an input image into 1,000 separate
object categories.
Models are trained on 1.2 million training images with
another 50,000 images for validation and 150,000
images for testing
CNN Models
AlexNetachieve on ILSVRC 2012 competition 15.3% Top-5
error rate compare to 26.2% achieved by the second best
entry.
AlexNet using batch stochastic gradient descent on training,
with specific values for momentum and weight decay.
AlexNet implement dropout layers in order to combat the
problem of overfitting to the training data.
CNN Models
Layer0: Input image
Size: 227 x 227 x 3
Memory: 227 x 227 x 3
23.
CNN Models
Layer0: 227 x 227 x 3
Layer 1: Convolution with 96 filters, size 11×11, stride 4, padding 0
Outcome Size= 55 x 55 x 96
(227-11)/4 + 1 = 55 is size of outcome
Memory: 55 x 55 x 96 x 3 (because of ReLU & LRN(Local Response Normalization))
Weights (parameters) : 11 x 11 x 3 x 96
24.
CNN Models
Layer1: 55 x 55 x 96
Layer 2: Max-Pooling with 3×3 filter, stride 2
Outcome Size= 27 x 27 x 96
(55 – 3)/2 + 1 = 27 is size of outcome
Memory: 27 x 27 x 96
25.
CNN Models
Layer2: 27 x 27 x 96
Layer 3: Convolution with 256 filters, size 5×5, stride 1, padding 2
Outcome Size = 27 x 27 x 256
original size is restored because of padding
Memory: 27 x 27 x 256 x 3 (because of ReLU and LRN)
Weights: 5 x 5 x 96 x 256
26.
CNN Models
Layer3: 27 x 27 x 256
Layer 4: Max-Pooling with 3×3 filter, stride 2
Outcome Size = 13 x 13 x 256
(27 – 3)/2 + 1 = 13 is size of outcome
Memory: 13 x 13 x 256
27.
CNN Models
Layer4: 13 x 13 x 256
Layer 5: Convolution with 384 filters, size 3×3, stride 1, padding 1
Outcome Size = 13 x 13 x 384
the original size is restored because of padding (13+2 -3)/1 +1 =13
Memory: 13 x 13 x 384 x 2 (because of ReLU)
Weights: 3 x 3 x 256 x 384
28.
CNN Models
Layer5: 13 x 13 x 384
Layer 6: Convolution with 384 filters,
size 3×3, stride 1, padding 1
Outcome Size = 13 x 13 x 384
the original size is restored because of
padding
Memory: 13 x 13 x 384 x 2 (because of ReLU)
Weights: 3 x 3 x 384 x 384
29.
CNN Models
Layer6: 13 x 13 x 384
Layer 7: Convolution with 256 filters, size 3×3, stride 1, padding 1
Outcome Size = 13 x 13 x 256
the original size is restored because of padding
Memory: 13 x 13 x 256 x 2 (because of ReLU)
Weights: 3 x 3 x 384 x 256
30.
CNN Models
Layer7: 13 x 13 x 256
Layer 8: Max-Pooling with 3×3 filter, stride 2
Outcome Size = 6 x 6 x 256
(13 – 3)/2 + 1 = 6 is size of outcome
Memory: 6 x 6 x 256
31.
CNN Models
Layer8: 6x6x256=9216 pixels are fed to FC
Layer 9: Fully Connected with 4096 neuron
Memory: 4096 x 3 (because of ReLU and Dropout)
Weights: 4096 x (6 x 6 x 256)
32.
CNN Models
Layer9: Fully Connected with 4096 neuron
Layer 10: Fully Connected with 4096 neuron
Memory: 4096 x 3 (because of ReLU and Dropout)
Weights: 4096 x 4096
33.
CNN Models
Layer10: Fully Connected with 4096 neuron
Layer 11: Fully Connected with 1000 neurons
Memory: 1000
Weights: 4096 x 1000
34.
CNN Models
Total(label and softmax not
included)
Memory: 2.24 million
Weights: 62.37 million
35.
CNN Models
firstuse of ReLU
Alexnet used Norm layers
Alexnet heavy used data augmentation
Alexnet use dropout 0.5
Alexnet batch size is 128
Alexnet used SGD Momentum 0.9
Alexnet used learning rate 1e-2, reduced by 10
36.
CNN Models
[227x227x3] INPUT
[55x55x96]CONV1 : 96 11x11 filters at stride 4, pad 0
27x27x96] MAX POOL1 : 3x3 filters at stride 2
[27x27x96] NORM1: Normalization layer
[27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2
[13x13x256] MAX POOL2: 3x3 filters at stride 2
[13x13x256] NORM2: Normalization layer
37.
CNN Models
[13x13x384] CONV3:384 3x3 filters at stride 1, pad 1
[13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1
[13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1
[6x6x256] MAX POOL3: 3x3 filters at stride 2
[4096] FC6: 4096 neurons
[4096] FC7: 4096 neurons
[1000] FC8: 1000 neurons
CNN Models
ZFNetthe winner of the competition ILSVRC 2013 with 14.8%
Top-5 error rate
ZFNet built by Matthew Zeiler and Rob Fergus
ZFNet has the same global architecture as Alexnet, that is to say
5 convolutional layers, two fully connected layers and an output
softmax one. The differences are for example better sized
convolutional kernels.
43.
CNN Models
ZFNetused filters of size 7x7 and a decreased stride value,
instead of using 11x11 sized filters in the first layer (which is what
AlexNet implemented).
ZFNet trained on a GTX 580 GPU for twelve days.
Developed a visualization technique named Deconvolutional
Network “deconvnet” because it maps features to pixels.
44.
CNN Models
AlexNet but:
•CONV1: change from (11x11 stride 4) to (7x7 stride 2)
• CONV3,4,5: instead of 384, 384, 256 filters use 512, 1024,
512
CNN Models
Keepit deep. Keep it simple.
VGGNet the runner up of the competition ILSVRC 2014 with 7.3%
Top-5 error rate.
VGGNet use of only 3x3 sized filters is quite different from AlexNet’s
11x11 filters in the first layer and ZFNet’s 7x7 filters.
two 3x3 conv layers have an effective receptive field of 5x5
Three 3x3 conv layers have an effective receptive field of 7x7
VGGNet trained on 4 Nvidia Titan Black GPUs for two to three
weeks
47.
CNN Models
Interestingto notice that the number of filters doubles after each
maxpool layer. This reinforces the idea of shrinking spatial
dimensions, but growing depth.
VGGNet used scale jittering as one data augmentation technique
during training
VGGNet used ReLU layers after each conv layer and trained with
batch gradient descent
CNN Models
GoogleNetis the winner of the competition ILSVRC 2014 with
6.7% Top-5 error rate.
GoogleNet Trained on “a few high-end GPUs with in a week”
GoogleNet uses 12x fewer parameters than AlexNet
GoogleNet use an average pool instead of fully connected
layers, to go from a 7x7x1024 volume to a 1x1x1024 volume. This
saves a huge number of parameters.
58.
CNN Models
GoogleNetused 9 Inception modules in the whole architecture
This 1x1 convolutions (bottleneck convolutions) allow to
control/reduce the depth dimension which greatly reduces the
number of used parameters due to removal of redundancy of
correlated filters.
GoogleNet has 22 Layers deep network
59.
CNN Models
GoogleNetuse an average pool instead of using FC-Layer, to go
from a 7x7x1024 volume to a 1x1x1024 volume. This saves a
huge number of parameters.
GoogleNet use inexpensive Conv1 to compute reduction before
the expensive Conv3 and Conv5
Conv1 follow by Relu to reduce overfitting
CNN Models
ResNetthe winner of the competition ILSVRC 2015 with 3.6%
Top-5 error rate.
ResNet mainly inspired by the philosophy of VGGNet.
ResNet proposed a residual learning approach to ease the
difficulty of training deeper networks. Based on the design ideas
of Batch Normalization (BN), small convolutional kernels.
ResNet is a new 152 layer network architecture.
ResNet Trained on an 8 GPU machine for two to three weeks
67.
CNN Models
Residualnetwork
Keys:
No max pooling
No hidden fc
No dropout
Basic design (VGG-style)
All 3x3 conv (almost)
Batch normalization