The document discusses the process of image classification using deep learning, specifically with the CIFAR-10 dataset, and outlines various techniques such as data preprocessing, CNN architecture, data augmentation, and transfer learning. It highlights the use of models like AlexNet, VGG, and GoogLeNet, while addressing challenges like overfitting and the vanishing gradient problem. The results show an increase in accuracy achieved through transfer learning, reaching up to 91.58% with fine-tuned classifiers.
11
CNNs learn Hierarchicalfeatures
As we go deeper into the network , the neurons get
information from larger parts of the image and from various
other neurons
The neurons at the later layers can learn more complicated
features like eyes / legs
12.
12
Max Pooling :
Poolinglayers is mostly used immediately after the
convolutional layer to reduce the spatial size (only width and
height , not depth ) .
This reduces the number of parameters to avoid Overfitting
14
ReLu ( RectifiedLinear Unit ) as Activation Function :
ReLu function is zero when z is less than zero and f(z) is
equal to z when z is above or equal to zero
What is theCNN result ?
▫ This solution return 78% accuracy which is not bad but
there was Overfitting
16
17.
What is DataAugmentation ?
▫ Rotation of the image , shifting left/right/top/bottom
by some amount , flip the image horizontally or
vertically , zoom ..etc.
17
18.
What is theCNN result ?
▫ To solve the Overfitting Problem we did
DATA AUGMENTATION and it solved the problem !!!
Accuracy is 79.57%
18
To use alreadytrained models on very large amounts of data for
difficult tasks with thousands of classes that research group share in
the competitions like ILSVRC and ImageNet
Transfer Learning
20
21.
Pre-trained Models on:
ImageNet
It contains more than 14 million
images belong to more than
20,000 classes
ILSVRC
ImageNet Large Scale Visual
Recognition Challenge to
evaluate their computer vision
algorithms
The training data subset of
ImageNet 1.2 million images
belong to 1000 class
21
Why Use Pre-trainedModels ?
23
20XX
We need powerful GPUs
20XX
It takes a large amount of time
to train the network
20XX
To finding all the unknown
parameters we need a lot of data
20XX
Deep networks have large
number of unknown parameters.
24.
This network ischaracterized by its simplicity , using only 3*3
convolutional layers stacked on top of each other in increasing depth
VGG16 -VGG19
24
The goal ofthe inception module is to act as a “multi-level feature
extractor “ by computing 1*1 , 3*3 and 5*5 convolutions within the
same module of the network .
The output of these filters are then stacked along the channel
dimensions and before being fed into the next layer in the network
GoogleNet Or InceptionV3
26
27.
What is VanishingGradient Problem ?
27
This problem makes it really hard to learn and tune the parameters
of the earlier layers in the network , and it’s becomes worse as the
number of layers in the architectures increase .
OUR PROCESS IS
29
STEP3
compare the results
STEP 1
Extract features from The
pre-trained models (VGG16 &
VGG19 & InceptionV3 &
ResNet)
STEP 2
Fine-tuning the parameters
By trying to build LinearSVC
classifier with many possible C