LeNet-5 Architecture
The Pioneer of Convolutional Neural
Networks
Introduction
• Developed by Yann LeCun et al., 1998, for
handwritten digit recognition (MNIST dataset)
• One of the first CNN architectures
• Input: 32×32 grayscale image
• Output: 10 classes (digits 0–9)
Applications
• Handwriting recognition in postal services and
banking.
• Object and face recognition in images and
videos.
• Autonomous driving systems for recognizing
and interpreting road signs.
Architecture Overview
• Flow:
• Input (32×32) → C1: Convolution → S2:
Subsampling → C3: Convolution →
• S4: Subsampling → C5: Fully Connected Conv
→ F6: Fully Connected → Output Layer
Layer-by-Layer Details
• C1: 6 feature maps, 5×5 filters, output 28×28,
activation: Tanh/Sigmoid
• S2: Average pooling (stride 2), output 14×14
• C3: 16 feature maps, 5×5 filters, output 10×10
• S4: Average pooling, output 5×5
• C5: 120 feature maps, 5×5 filters (fully connected)
• F6: Fully connected layer with 84 neurons
• Output: 10 neurons (Softmax)
Layer C1 (Convolutional Layer)
• Feature Maps: 6 feature maps.
• Connections: Each unit is connected to a 5x5 neighborhood in the
input, producing 28x28 feature maps to prevent boundary effects.
• Parameters: 156 trainable parameters and 117,600 connections.
Layer S2 (Subsampling Layer)
• Feature Maps: 6 feature maps.
• Size: 14x14 (each unit connected to a 2x2 neighborhood in C1).
• Operation: Each unit adds four inputs, multiplies by a trainable coefficient, adds a
bias, and applies a sigmoid function.
• Parameters: 12 trainable parameters and 5,880 connections.
Layer C3 (Convolutional Layer)
• Feature Maps: 16 feature maps.
• Connections: Each unit is connected to several 5x5 neighborhoods at identical
locations in a subset of S2’s feature maps.
• Parameters and Connections: Connections are partially connected to force feature
maps to learn different features, with 1,516 trainable parameters and 151,600
connections.
Layer S4 (Subsampling Layer)
• Feature Maps: 16 feature maps.
• Size: 7x7 (each unit connected to a 2x2 neighborhood in C3).
• Parameters: 32 trainable parameters and 2,744 connections.
Layer C5 (Convolutional Layer)
• Feature Maps: 120 feature maps.
• Size: 1x1 (each unit connected to a 5x5 neighborhood on all 16 of S4’s
feature maps, effectively fully connected due to input size).
• Parameters: 48,000 trainable parameters and 48,000 connections.
Layer F6 (Fully Connected Layer)
• Units: 84 units.
• Connections: Each unit is fully connected to C5, resulting in 10,164 trainable parameters.
• Activation: Uses a scaled hyperbolic tangent function f(a)=Atan⁡
(Sa)f(a)=Atan(Sa), where A =
1.7159 and S = 2/3
Output Layer
In the output layer of LeNet, each class is represented by an Euclidean Radial Basis Function (RBF) unit.
Key Features & Advantages
• Weight sharing reduces parameters
• Local receptive fields capture spatial patterns
• Pooling layers make the model translation
invariant
• Foundation for modern CNN architectures
AlexNet Architecture
• Overview:
• Developed by Alex Krizhevsky et al. in 2012 – Winner of ImageNet
Large Scale Visual Recognition Challenge (ILSVRC).
• It won the ImageNet Large Scale Visual Recognition Challenge
(ILSVRC) 2012 with a top-5 error rate of 15.3% (beating the runner up
which had a top-5 error rate of 26.2%).
• It became famous for its ability to classify images accurately.
• Total 8 layers:
– 5 Convolutional Layers (feature extraction)
– 3 Fully Connected Layers (classification)
• Input: RGB image 227×227×3
• Output: Softmax over 1000 classes
• Key Features:
• ReLU Activation → Faster convergence than sigmoid/tanh
• Max Pooling (with overlapping pooling) → Reduces
spatial size, increases invariance
• Local Response Normalization (LRN) → Improves
generalization
• Dropout in Fully Connected Layers → Prevents overfitting
• GPU Parallelization → Two GPUs for training due to
VRAM limits
• SGD with Momentum & Data Augmentation
AlexNet Architecture Overview
• 1. Architecture in a Nutshell
• Layers: 8 layers in total—5 convolutional
layers for feature extraction, followed by 3
fully connected layers for classification.
• Input & Output: Processes RGB images of
approximately 227×227×3 (or cropped central
256×256), and outputs a distribution over
1000 classes via a Softmax layer.
AlexNet Architecture
Core Components & Innovations
• ReLU Activation: Applied after every convolutional and
fully connected layer to accelerate convergence and
mitigate vanishing gradients.
• Max Pooling (including overlapping pooling): Used after
certain convolutional layers to reduce spatial dimensions
and improve invariance and generalization.
• Local Response Normalization (LRN): Boosts generalization
by normalizing neuron activities across adjacent channels.
• Dropout in FC Layers: Dropout applied to first two fully
connected layers helps prevent overfitting.
Layer-by-Layer Breakdown
Layer Type Details
Conv1 → Pool → LRN 96 filters of size 11×11, stride 4
Conv2 → Pool → LRN 256 filters of size 5×5
Conv3 → Conv4 → Conv5 384, 384, 256 filters of size 3×3
Pool After Conv5
FC1 → FC2 → FC3
Two 4096-unit layers (with Dropout),
followed by a 1000-unit Softmax output
Training Highlights
• GPU Acceleration: Training was distributed across
two GPUs due to limited VRAM (~3 GB each).
• Optimization: Employed SGD with momentum,
weight decay, and data augmentation techniques like
cropping, flipping, and color jittering to improve
generalization.
CIFAR-10
• It is Canadian Institute For Advanced Research dataset.
• There are a few datasets that are part of tensorflow and widely used in
Machine Learning. CIFAR contains subsets of 80 million small pictures
collected in datasets CIFAR-10 and CIFAR-100.
• These were originally collected by Alex Krizhevsky, Geoffrey Hinton,
and Vinod Nair. There are a total of 60,000 images in these datasets,
both of which have the following composition:
• 10,000 test images, 1000 images per class. Test images are randomly-
selected images from each class.
• 50,000 training images, 5000 images per class. The rest of images
(minus the test images from total images) are comprised of training
images. However, some training images may contain more images in
one class.
• The classes in the dataset are entirely mutually exclusive.
• CIFAR-10 consists of 60,000 32 X 32 images (low resolution).
• They are mostly used in Convolutional Neural Network (CNN) models.
ZFNet Architecture
• Developed by Matthew Zeiler and Rob Fergus
in 2013.
• Winner of ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2013.
• Improvement over AlexNet through
hyperparameter tuning and visualization.
Key Features:
• Smaller receptive field in first convolution
layer: 7x7 filters with stride 2 (vs. AlexNet's
11x11, stride 4).
• Better preservation of spatial information.
• Deconvolutional visualization to understand
feature maps.
• Enhanced depth and fine-tuning for better
accuracy.
VGG-Net Architecture
• The Visual Geometry Group (VGG) models, particularly
VGG-16 and VGG-19, have significantly influenced the
field of computer vision since their inception.
• Introduced by the Visual Geometry Group from the
University of Oxford, stood out in the 2014 ImageNet
Large Scale Visual Recognition Challenge (ILSVRC) for their
deep convolutional neural networks (CNNs) with a
uniform architecture.
• VGG-19, the deeper variant of the VGG models, has
garnered considerable attention due to its simplicity and
effectiveness.
VGG-19 Architecture
• VGG-19 is a deep convolutional neural
network with 19 weight layers, comprising 16
convolutional layers and 3 fully connected
layers.
• The architecture follows a straightforward and
repetitive pattern, making it easier to
understand and implement.
Detailed Layer-by-Layer Architecture of VGG-
Net 19
1. Convolutional Layers: 3x3 filters with a stride of 1 and
padding of 1 to preserve spatial resolution.
2. Activation Function: ReLU (Rectified Linear Unit) applied
after each convolutional layer to introduce non-linearity.
3. Pooling Layers: Max pooling with a 2x2 filter and a stride
of 2 to reduce the spatial dimensions.
4. Fully Connected Layers: Three fully connected layers at
the end of the network for classification.
5. Softmax Layer: Final layer for outputting class
probabilities.
Information about VGGNet-19
• Model Simplicity and Effectiveness: The VGG-19
architecture's simplicity, characterized by its uniform use of
3x3 convolution filters and repetitive block structure,
makes it a highly effective and easy-to-implement model
for various computer vision tasks.
• Computational Requirements: One of the key trade-offs of
the VGG-19 model is its computational demand.
• Due to its depth and the use of small filters, it requires
significant memory and computational power, making it
more suited for environments with robust hardware
capabilities.
• Robust Feature Extraction: The depth of the VGG-19 model allows it to
capture intricate features in images, making it an excellent feature
extractor. This capability is particularly useful in transfer learning, where
pre-trained VGG-19 models are fine-tuned for specific tasks, leveraging
the rich feature representations learned from large datasets.
• Data Augmentation: To enhance the performance and generalization
capability of VGG-19, data augmentation techniques such as random
cropping, horizontal flipping, and color jittering are often employed
during training. These techniques help the model to better handle
variations and improve its robustness.
• Influence on Network Design: The principles established by the VGG-19
architecture, such as the use of small convolution filters and deep
networks, have influenced the design of subsequent state-of-the-art
models. Researchers have built upon these concepts to develop more
advanced architectures that continue to push the boundaries of what is
possible in computer vision.
Convolutional Neural Network -
GoogleNet
Deep Learning Architecture Overview
Introduction to GoogleNet
• Developed by Szegedy et al. at Google in 2014.
• Winner of ILSVRC 2014 with top-5 error rate of 6.67%.
• Introduced the Inception module for efficient computation.
• Deeper network with fewer parameters compared to AlexNet and VGG.
Key Features of GoogleNet
• Inception Modules for multi-scale feature extraction.
• 22 layers deep (27 with pooling layers).
• Uses 1x1 convolutions for dimensionality reduction.
• Global Average Pooling instead of fully connected layers.
• Auxiliary classifiers for training stabilization.
Inception Module
• Combines multiple convolution filters (1x1, 3x3, 5x5) in parallel.
• Includes pooling layer in parallel paths.
• 1x1 convolutions reduce depth before costly convolutions.
• Outputs concatenated to form final feature map.
Inception Module
Inception Module
Inception Module
GoogleNet Architecture
• Input: 224x224 RGB image.
• Initial convolution and pooling layers.
• Stack of Inception modules with occasional pooling.
• Auxiliary classifiers at intermediate layers.
• Global Average Pooling and softmax output.
Advantages of GoogleNet
• High accuracy with fewer parameters (~5 million).
• Computationally efficient due to 1x1 convolutions.
• Good generalization capability.
• Scalable design with modular Inception blocks.
Applications of GoogleNet
• Image classification.
• Object detection.
• Medical image analysis.
• Feature extraction for transfer learning.
Convolutional Neural Network -
ResNet
Deep Residual Learning with
Diagrams
Introduction to ResNet
• Developed by Microsoft Research in 2015.
• Winner of ILSVRC 2015 with 3.57% top-5 error
rate.
• Introduced residual learning framework.
• Allows training of extremely deep networks
(over 100 layers).
Key Features
• Residual blocks with identity shortcut
connections.
• Mitigates vanishing gradient problem.
• Enables deeper networks without
performance degradation.
• Common variants: ResNet-18, ResNet-34,
ResNet-50, ResNet-101, ResNet-152.
ResNet-50 Architecture Diagram
Applications of ResNet
• Image classification
• Object detection (e.g., Faster R-CNN, Mask R-
CNN)
• Face recognition (e.g., ArcFace, FaceNet)
• Medical image analysis
• Transfer learning in various AI domains
Convolutional Neural Network -
ResNet
Deep Residual Learning with
Diagrams
Introduction to ResNet
• Developed by Microsoft Research in 2015.
• Winner of ILSVRC 2015 with 3.57% top-5 error
rate.
• Introduced residual learning framework.
• Allows training of extremely deep networks
(over 100 layers).
Key Features
• Residual blocks with identity shortcut
connections.
• Mitigates vanishing gradient problem.
• Enables deeper networks without
performance degradation.
• Common variants: ResNet-18, ResNet-34,
ResNet-50, ResNet-101, ResNet-152.
ResNet-50 Architecture Diagram
Applications of ResNet
• Image classification
• Object detection (e.g., Faster R-CNN, Mask R-
CNN)
• Face recognition (e.g., ArcFace, FaceNet)
• Medical image analysis
• Transfer learning in various AI domains
Architecture Year Key Features Use Case
LeNet 1998 First successful applications of CNNs, 5 layers
(alternating between convolutional and pooling),
Used tanh/sigmoid activation functions
Recognizing handwritten
and machine-
printed characters
AlexNet 2012 Deeper and wider than LeNet, Used ReLU
activation function, Implemented dropout layers,
Used GPUs for training
Large-scale
image recognition tasks
ZFNet 2013
Similar architecture to AlexNet, but with
different filter sizes and numbers of filters,
Visualization techniques for understanding the
network
ImageNet classification
VGGNet 2014 Deeper networks with smaller filters (3×3), All
convolutional layers have the same
depth,
Multiple configurations (VGG16, VGG19)
Large-scale
image recognition
ResNet 2015
Introduced “skip connections” or “shortcuts” to
enable training of deeper networks, Multiple
configurations (ResNet-50, ResNet-101, ResNet-
152)
Large-scale
image recognition, won 1st
place in the ILSVRC 2015
GoogleLeNet 2014 Introduced Inception module, which allows for
more efficient computation and deeper networks,
multiple versions (Inception v1, v2, v3, v4)
Large-scale
image recognition, won 1st
place in the ILSVRC 2014
MobileNets 2017
Designed for mobile and embedded vision
applications, Uses depthwise separable
convolutions to reduce the model size and
complexity
Mobile and embedded
vision applications, real-
time object detection
LeNet 1998 First successful applications of CNNs, 5 layers
(alternating between convolutional and pooling),
Used tanh/sigmoid activation functions
Recognizing handwritten
and machine-
printed characters
Different Types of CNN Architectures

CNN LeNet5 Architecture: Neural Networks

  • 1.
    LeNet-5 Architecture The Pioneerof Convolutional Neural Networks
  • 2.
    Introduction • Developed byYann LeCun et al., 1998, for handwritten digit recognition (MNIST dataset) • One of the first CNN architectures • Input: 32×32 grayscale image • Output: 10 classes (digits 0–9)
  • 3.
    Applications • Handwriting recognitionin postal services and banking. • Object and face recognition in images and videos. • Autonomous driving systems for recognizing and interpreting road signs.
  • 4.
    Architecture Overview • Flow: •Input (32×32) → C1: Convolution → S2: Subsampling → C3: Convolution → • S4: Subsampling → C5: Fully Connected Conv → F6: Fully Connected → Output Layer
  • 5.
    Layer-by-Layer Details • C1:6 feature maps, 5×5 filters, output 28×28, activation: Tanh/Sigmoid • S2: Average pooling (stride 2), output 14×14 • C3: 16 feature maps, 5×5 filters, output 10×10 • S4: Average pooling, output 5×5 • C5: 120 feature maps, 5×5 filters (fully connected) • F6: Fully connected layer with 84 neurons • Output: 10 neurons (Softmax)
  • 6.
    Layer C1 (ConvolutionalLayer) • Feature Maps: 6 feature maps. • Connections: Each unit is connected to a 5x5 neighborhood in the input, producing 28x28 feature maps to prevent boundary effects. • Parameters: 156 trainable parameters and 117,600 connections.
  • 7.
    Layer S2 (SubsamplingLayer) • Feature Maps: 6 feature maps. • Size: 14x14 (each unit connected to a 2x2 neighborhood in C1). • Operation: Each unit adds four inputs, multiplies by a trainable coefficient, adds a bias, and applies a sigmoid function. • Parameters: 12 trainable parameters and 5,880 connections.
  • 8.
    Layer C3 (ConvolutionalLayer) • Feature Maps: 16 feature maps. • Connections: Each unit is connected to several 5x5 neighborhoods at identical locations in a subset of S2’s feature maps. • Parameters and Connections: Connections are partially connected to force feature maps to learn different features, with 1,516 trainable parameters and 151,600 connections.
  • 9.
    Layer S4 (SubsamplingLayer) • Feature Maps: 16 feature maps. • Size: 7x7 (each unit connected to a 2x2 neighborhood in C3). • Parameters: 32 trainable parameters and 2,744 connections.
  • 10.
    Layer C5 (ConvolutionalLayer) • Feature Maps: 120 feature maps. • Size: 1x1 (each unit connected to a 5x5 neighborhood on all 16 of S4’s feature maps, effectively fully connected due to input size). • Parameters: 48,000 trainable parameters and 48,000 connections.
  • 11.
    Layer F6 (FullyConnected Layer) • Units: 84 units. • Connections: Each unit is fully connected to C5, resulting in 10,164 trainable parameters. • Activation: Uses a scaled hyperbolic tangent function f(a)=Atan⁡ (Sa)f(a)=Atan(Sa), where A = 1.7159 and S = 2/3
  • 12.
    Output Layer In theoutput layer of LeNet, each class is represented by an Euclidean Radial Basis Function (RBF) unit.
  • 13.
    Key Features &Advantages • Weight sharing reduces parameters • Local receptive fields capture spatial patterns • Pooling layers make the model translation invariant • Foundation for modern CNN architectures
  • 14.
    AlexNet Architecture • Overview: •Developed by Alex Krizhevsky et al. in 2012 – Winner of ImageNet Large Scale Visual Recognition Challenge (ILSVRC). • It won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2012 with a top-5 error rate of 15.3% (beating the runner up which had a top-5 error rate of 26.2%). • It became famous for its ability to classify images accurately. • Total 8 layers: – 5 Convolutional Layers (feature extraction) – 3 Fully Connected Layers (classification) • Input: RGB image 227×227×3 • Output: Softmax over 1000 classes
  • 15.
    • Key Features: •ReLU Activation → Faster convergence than sigmoid/tanh • Max Pooling (with overlapping pooling) → Reduces spatial size, increases invariance • Local Response Normalization (LRN) → Improves generalization • Dropout in Fully Connected Layers → Prevents overfitting • GPU Parallelization → Two GPUs for training due to VRAM limits • SGD with Momentum & Data Augmentation
  • 16.
    AlexNet Architecture Overview •1. Architecture in a Nutshell • Layers: 8 layers in total—5 convolutional layers for feature extraction, followed by 3 fully connected layers for classification. • Input & Output: Processes RGB images of approximately 227×227×3 (or cropped central 256×256), and outputs a distribution over 1000 classes via a Softmax layer.
  • 17.
  • 18.
    Core Components &Innovations • ReLU Activation: Applied after every convolutional and fully connected layer to accelerate convergence and mitigate vanishing gradients. • Max Pooling (including overlapping pooling): Used after certain convolutional layers to reduce spatial dimensions and improve invariance and generalization. • Local Response Normalization (LRN): Boosts generalization by normalizing neuron activities across adjacent channels. • Dropout in FC Layers: Dropout applied to first two fully connected layers helps prevent overfitting.
  • 19.
    Layer-by-Layer Breakdown Layer TypeDetails Conv1 → Pool → LRN 96 filters of size 11×11, stride 4 Conv2 → Pool → LRN 256 filters of size 5×5 Conv3 → Conv4 → Conv5 384, 384, 256 filters of size 3×3 Pool After Conv5 FC1 → FC2 → FC3 Two 4096-unit layers (with Dropout), followed by a 1000-unit Softmax output
  • 20.
    Training Highlights • GPUAcceleration: Training was distributed across two GPUs due to limited VRAM (~3 GB each). • Optimization: Employed SGD with momentum, weight decay, and data augmentation techniques like cropping, flipping, and color jittering to improve generalization.
  • 21.
    CIFAR-10 • It isCanadian Institute For Advanced Research dataset. • There are a few datasets that are part of tensorflow and widely used in Machine Learning. CIFAR contains subsets of 80 million small pictures collected in datasets CIFAR-10 and CIFAR-100. • These were originally collected by Alex Krizhevsky, Geoffrey Hinton, and Vinod Nair. There are a total of 60,000 images in these datasets, both of which have the following composition: • 10,000 test images, 1000 images per class. Test images are randomly- selected images from each class. • 50,000 training images, 5000 images per class. The rest of images (minus the test images from total images) are comprised of training images. However, some training images may contain more images in one class. • The classes in the dataset are entirely mutually exclusive. • CIFAR-10 consists of 60,000 32 X 32 images (low resolution). • They are mostly used in Convolutional Neural Network (CNN) models.
  • 23.
    ZFNet Architecture • Developedby Matthew Zeiler and Rob Fergus in 2013. • Winner of ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2013. • Improvement over AlexNet through hyperparameter tuning and visualization.
  • 24.
    Key Features: • Smallerreceptive field in first convolution layer: 7x7 filters with stride 2 (vs. AlexNet's 11x11, stride 4). • Better preservation of spatial information. • Deconvolutional visualization to understand feature maps. • Enhanced depth and fine-tuning for better accuracy.
  • 25.
    VGG-Net Architecture • TheVisual Geometry Group (VGG) models, particularly VGG-16 and VGG-19, have significantly influenced the field of computer vision since their inception. • Introduced by the Visual Geometry Group from the University of Oxford, stood out in the 2014 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) for their deep convolutional neural networks (CNNs) with a uniform architecture. • VGG-19, the deeper variant of the VGG models, has garnered considerable attention due to its simplicity and effectiveness.
  • 26.
    VGG-19 Architecture • VGG-19is a deep convolutional neural network with 19 weight layers, comprising 16 convolutional layers and 3 fully connected layers. • The architecture follows a straightforward and repetitive pattern, making it easier to understand and implement.
  • 27.
    Detailed Layer-by-Layer Architectureof VGG- Net 19 1. Convolutional Layers: 3x3 filters with a stride of 1 and padding of 1 to preserve spatial resolution. 2. Activation Function: ReLU (Rectified Linear Unit) applied after each convolutional layer to introduce non-linearity. 3. Pooling Layers: Max pooling with a 2x2 filter and a stride of 2 to reduce the spatial dimensions. 4. Fully Connected Layers: Three fully connected layers at the end of the network for classification. 5. Softmax Layer: Final layer for outputting class probabilities.
  • 29.
    Information about VGGNet-19 •Model Simplicity and Effectiveness: The VGG-19 architecture's simplicity, characterized by its uniform use of 3x3 convolution filters and repetitive block structure, makes it a highly effective and easy-to-implement model for various computer vision tasks. • Computational Requirements: One of the key trade-offs of the VGG-19 model is its computational demand. • Due to its depth and the use of small filters, it requires significant memory and computational power, making it more suited for environments with robust hardware capabilities.
  • 30.
    • Robust FeatureExtraction: The depth of the VGG-19 model allows it to capture intricate features in images, making it an excellent feature extractor. This capability is particularly useful in transfer learning, where pre-trained VGG-19 models are fine-tuned for specific tasks, leveraging the rich feature representations learned from large datasets. • Data Augmentation: To enhance the performance and generalization capability of VGG-19, data augmentation techniques such as random cropping, horizontal flipping, and color jittering are often employed during training. These techniques help the model to better handle variations and improve its robustness. • Influence on Network Design: The principles established by the VGG-19 architecture, such as the use of small convolution filters and deep networks, have influenced the design of subsequent state-of-the-art models. Researchers have built upon these concepts to develop more advanced architectures that continue to push the boundaries of what is possible in computer vision.
  • 31.
    Convolutional Neural Network- GoogleNet Deep Learning Architecture Overview
  • 32.
    Introduction to GoogleNet •Developed by Szegedy et al. at Google in 2014. • Winner of ILSVRC 2014 with top-5 error rate of 6.67%. • Introduced the Inception module for efficient computation. • Deeper network with fewer parameters compared to AlexNet and VGG.
  • 33.
    Key Features ofGoogleNet • Inception Modules for multi-scale feature extraction. • 22 layers deep (27 with pooling layers). • Uses 1x1 convolutions for dimensionality reduction. • Global Average Pooling instead of fully connected layers. • Auxiliary classifiers for training stabilization.
  • 34.
    Inception Module • Combinesmultiple convolution filters (1x1, 3x3, 5x5) in parallel. • Includes pooling layer in parallel paths. • 1x1 convolutions reduce depth before costly convolutions. • Outputs concatenated to form final feature map.
  • 35.
  • 36.
  • 37.
  • 38.
    GoogleNet Architecture • Input:224x224 RGB image. • Initial convolution and pooling layers. • Stack of Inception modules with occasional pooling. • Auxiliary classifiers at intermediate layers. • Global Average Pooling and softmax output.
  • 39.
    Advantages of GoogleNet •High accuracy with fewer parameters (~5 million). • Computationally efficient due to 1x1 convolutions. • Good generalization capability. • Scalable design with modular Inception blocks.
  • 40.
    Applications of GoogleNet •Image classification. • Object detection. • Medical image analysis. • Feature extraction for transfer learning.
  • 41.
    Convolutional Neural Network- ResNet Deep Residual Learning with Diagrams
  • 42.
    Introduction to ResNet •Developed by Microsoft Research in 2015. • Winner of ILSVRC 2015 with 3.57% top-5 error rate. • Introduced residual learning framework. • Allows training of extremely deep networks (over 100 layers).
  • 43.
    Key Features • Residualblocks with identity shortcut connections. • Mitigates vanishing gradient problem. • Enables deeper networks without performance degradation. • Common variants: ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152.
  • 44.
  • 45.
    Applications of ResNet •Image classification • Object detection (e.g., Faster R-CNN, Mask R- CNN) • Face recognition (e.g., ArcFace, FaceNet) • Medical image analysis • Transfer learning in various AI domains
  • 46.
    Convolutional Neural Network- ResNet Deep Residual Learning with Diagrams
  • 47.
    Introduction to ResNet •Developed by Microsoft Research in 2015. • Winner of ILSVRC 2015 with 3.57% top-5 error rate. • Introduced residual learning framework. • Allows training of extremely deep networks (over 100 layers).
  • 48.
    Key Features • Residualblocks with identity shortcut connections. • Mitigates vanishing gradient problem. • Enables deeper networks without performance degradation. • Common variants: ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152.
  • 49.
  • 51.
    Applications of ResNet •Image classification • Object detection (e.g., Faster R-CNN, Mask R- CNN) • Face recognition (e.g., ArcFace, FaceNet) • Medical image analysis • Transfer learning in various AI domains
  • 52.
    Architecture Year KeyFeatures Use Case LeNet 1998 First successful applications of CNNs, 5 layers (alternating between convolutional and pooling), Used tanh/sigmoid activation functions Recognizing handwritten and machine- printed characters AlexNet 2012 Deeper and wider than LeNet, Used ReLU activation function, Implemented dropout layers, Used GPUs for training Large-scale image recognition tasks ZFNet 2013 Similar architecture to AlexNet, but with different filter sizes and numbers of filters, Visualization techniques for understanding the network ImageNet classification VGGNet 2014 Deeper networks with smaller filters (3×3), All convolutional layers have the same depth, Multiple configurations (VGG16, VGG19) Large-scale image recognition ResNet 2015 Introduced “skip connections” or “shortcuts” to enable training of deeper networks, Multiple configurations (ResNet-50, ResNet-101, ResNet- 152) Large-scale image recognition, won 1st place in the ILSVRC 2015 GoogleLeNet 2014 Introduced Inception module, which allows for more efficient computation and deeper networks, multiple versions (Inception v1, v2, v3, v4) Large-scale image recognition, won 1st place in the ILSVRC 2014 MobileNets 2017 Designed for mobile and embedded vision applications, Uses depthwise separable convolutions to reduce the model size and complexity Mobile and embedded vision applications, real- time object detection LeNet 1998 First successful applications of CNNs, 5 layers (alternating between convolutional and pooling), Used tanh/sigmoid activation functions Recognizing handwritten and machine- printed characters Different Types of CNN Architectures