Introduction to Computer Vision and its Applications

COMPUTER VISION
- Ram S Iyer
Course : Seminar (5th
Semester)

WHAT IS COMPUTER VISION ?
• Computer Vision (CV) is a field that includes methods for acquiring, processing, analyzing
and understanding images.
• Intended to duplicate the abilities of human vision by electronically perceiving and
understanding an image.
• Theory for building artificial systems that obtain information from images.
• Image data can take many forms, such as a video sequence, depth images, views from
multiple cameras, medical scanners, satellite sensors etc.
• Overall modification and enhancement of Human-Machine interaction.

ARTIFICIAL INTELLIGENCE – “THE NEW ELECTRICITY”
• Artificial Intelligence refers to
the simulation of human
intelligence in machines that
are programmed to make
machines to think like
humans and mimic their
actions.
• These processes include
learning, reasoning,
problem-solving, perception,
language understanding and
decision making.

HISTORY OF AI
• 1943 – McCulloch and Pitts – M-P Neuron
• 1951 – Minsky – Neural Net Computer
• 1956 – Artificial Intelligence was coined at Dartmouth conference along with Lisp invention
• 1957 – Frank Rosenblatt – First use of Perceptron
• 1965 – Invention of Multi-Level Perceptron (MLP)
• 1986 – David E. Rumelhart – Backpropagation Algorithm
• 1989 – Yann LeCunn – Convolutional Neural Network (CNN)
• 1991 – Recurrent Neural Networks (RNN)
• 2007 – GPUs and CUDA framework
• 2014 – Ian Goodfellow – Generative Adversarial Network (GAN)
• 2017 – Pascal GPU (with unified memory) were released
AI WINTER

MACHINE LEARNING
• Machine Learning is a subset of AI that deals with the modeling of mathematical equations
in an algorithmic manner, enabling computers to identify patterns and make predictions or
decisions based on examples and experience.
Broad Classification of Machine Learning Methods
Supervised Learning Unsupervised Learning Reinforcement Learning

(x1,y1)
(x2,y2)
(x3,y3)
.
.
.
.
(xn,yn)
Machine Learning Model
(xunknown, ?)
Yunknown
(Features, Targets)
Data
Samples

• m, c are
parameters to be
learned.
• ∑(y-yi)2 is the cost
function.
• y is the actual
value and yi is the
predicted value of
the target and x is
the input feature.
An attempt is made by the system to model the equation
y = mx + c

NEURAL NETWORK AND DEEP LEARNING
• Interconnected networks, inspired by the architecture of the neural system that construct
the structure of the human brain.
• Automatically learn hierarchical representations from data, avoiding feature engineering
• Excel at handling complex high dimensional data such as images.

Each neuron takes in a set of inputs, each input has a certain
amount of importance. These inputs along with their
importances are combined to produce an output for the
neuron
Artificial Neuron – Mathematically construct
mimicking real neurons
Here, x1, x2, x3, x4 are
features or input
parameters and w1,
w2, w3, w4 and b are
parameters to be
learned. F( ) is the
activation function and
z is the predicted
output. The
parameters are learned
through the
backpropagation
mechanism

Now, A neural network is good with small to moderately large
dimensional data like,
But gets severely slow and computationally intensive with high
dimensional data, like

Hence, we use the concept of what is called as a Convolutional Neural Network (CNN)
for feature extraction of meaningful representations from a raw image.

COMPUTER VISION
Computer Vision vs Human Vision – Each pixel is considered as an input feature

Computer vision is a difficult process because,
• It is many-to-one mapping.
• It is computationally intensive.
• We do not understand the recognition problem.

CONVOLUTIONAL NEURAL NETWORK
• When image data is used, the input data is very big because of image size (eg: 1000x1000x3),
so we use convolutions to extract important features and reduce input data dimensions and
parameters.
• CNN takes an image as input and converts it into a smaller image by encoding each pixel with
its neighboring context.
• Allows us to process images faster with lesser compute.
• There are different filters or kernels used in convolution
to extract different features from the input image.
• They are composed of mainly 3 types of layers,
• Convolutional Layers
• Pooling Layers
• Fully connected Layers

• Convolutional Layer
• Elementwise multiplication and addition
• Let input image dimension = (n x n)
• Convolutional filter dimension = (f x f)
• Then, Output image dimension =
(n x n) * (f x f) = (n-f+1 x n-f+1)
• Padding: Every time convolution is used, image shrinks. So padding can be used in order to
preserve the original input dimension. Hence, now output dimension =
(n+2p-f+1 x n+2p-f+1) ;p=padding amount
• Stride: Number of columns by which filter jumps through in one step of convolution.
Now, output dimension =
(n+2p-f +1 x n+2p-f + 1) ;s=stride
s s

• Pooling Layer
• Similar to the convolutional layer, the pooling operation sweeps a filter across the entire
input, but the difference is that this filter does not have any weights, instead it applies an
aggregation function to the receptive field.
• They help reducing complexity, improve efficiency and limit risk of overfitting.

DIFFERENT COMPUTER VISION TASKS
Landmark
• ResNet
• InceptionNet
• EfficientNet
• MobileNet
• R-CNN
• FR-CNN
• YOLO
• SSD
• U-Net
• Mask R-CNN

THANK YOU
GAN generated face
“Doesn’t exist in real”
All IMAGES : COURTESY OF OPEN SOURCE INTERNET

Introduction to Computer Vision and its Applications

More Related Content

Similar to Introduction to Computer Vision and its Applications

Recently uploaded

Introduction to Computer Vision and its Applications

Editor's Notes