COMPUTER VISION
- Ram S Iyer
Course : Seminar (5th
Semester)
WHAT IS COMPUTER VISION ?
• Computer Vision (CV) is a field that includes methods for acquiring, processing, analyzing
and understanding images.
• Intended to duplicate the abilities of human vision by electronically perceiving and
understanding an image.
• Theory for building artificial systems that obtain information from images.
• Image data can take many forms, such as a video sequence, depth images, views from
multiple cameras, medical scanners, satellite sensors etc.
• Overall modification and enhancement of Human-Machine interaction.
CV NLP
etc…
ARTIFICIAL INTELLIGENCE – “THE NEW ELECTRICITY”
• Artificial Intelligence refers to
the simulation of human
intelligence in machines that
are programmed to make
machines to think like
humans and mimic their
actions.
• These processes include
learning, reasoning,
problem-solving, perception,
language understanding and
decision making.
HISTORY OF AI
• 1943 – McCulloch and Pitts – M-P Neuron
• 1951 – Minsky – Neural Net Computer
• 1956 – Artificial Intelligence was coined at Dartmouth conference along with Lisp invention
• 1957 – Frank Rosenblatt – First use of Perceptron
• 1965 – Invention of Multi-Level Perceptron (MLP)
• 1986 – David E. Rumelhart – Backpropagation Algorithm
• 1989 – Yann LeCunn – Convolutional Neural Network (CNN)
• 1991 – Recurrent Neural Networks (RNN)
• 2007 – GPUs and CUDA framework
• 2014 – Ian Goodfellow – Generative Adversarial Network (GAN)
• 2017 – Pascal GPU (with unified memory) were released
AI WINTER
MACHINE LEARNING
• Machine Learning is a subset of AI that deals with the modeling of mathematical equations
in an algorithmic manner, enabling computers to identify patterns and make predictions or
decisions based on examples and experience.
Broad Classification of Machine Learning Methods
Supervised Learning Unsupervised Learning Reinforcement Learning
(x1,y1)
(x2,y2)
(x3,y3)
.
.
.
.
(xn,yn)
Machine Learning Model
(xunknown, ?)
Yunknown
(Features, Targets)
Data
Samples
• m, c are
parameters to be
learned.
• ∑(y-yi)2 is the cost
function.
• y is the actual
value and yi is the
predicted value of
the target and x is
the input feature.
An attempt is made by the system to model the equation
y = mx + c
NEURAL NETWORK AND DEEP LEARNING
• Interconnected networks, inspired by the architecture of the neural system that construct
the structure of the human brain.
• Automatically learn hierarchical representations from data, avoiding feature engineering
• Excel at handling complex high dimensional data such as images.
Each neuron takes in a set of inputs, each input has a certain
amount of importance. These inputs along with their
importances are combined to produce an output for the
neuron
Artificial Neuron – Mathematically construct
mimicking real neurons
Here, x1, x2, x3, x4 are
features or input
parameters and w1,
w2, w3, w4 and b are
parameters to be
learned. F( ) is the
activation function and
z is the predicted
output. The
parameters are learned
through the
backpropagation
mechanism
Multi Layer Perceptron
Now, A neural network is good with small to moderately large
dimensional data like,
But gets severely slow and computationally intensive with high
dimensional data, like
Hence, we use the concept of what is called as a Convolutional Neural Network (CNN)
for feature extraction of meaningful representations from a raw image.
COMPUTER VISION
Computer Vision vs Human Vision – Each pixel is considered as an input feature
Computer vision is a difficult process because,
• It is many-to-one mapping.
• It is computationally intensive.
• We do not understand the recognition problem.
CONVOLUTIONAL NEURAL NETWORK
• When image data is used, the input data is very big because of image size (eg: 1000x1000x3),
so we use convolutions to extract important features and reduce input data dimensions and
parameters.
• CNN takes an image as input and converts it into a smaller image by encoding each pixel with
its neighboring context.
• Allows us to process images faster with lesser compute.
• There are different filters or kernels used in convolution
to extract different features from the input image.
• They are composed of mainly 3 types of layers,
• Convolutional Layers
• Pooling Layers
• Fully connected Layers
CNN
Parameters
• Convolutional Layer
• Elementwise multiplication and addition
• Let input image dimension = (n x n)
• Convolutional filter dimension = (f x f)
• Then, Output image dimension =
(n x n) * (f x f) = (n-f+1 x n-f+1)
• Padding: Every time convolution is used, image shrinks. So padding can be used in order to
preserve the original input dimension. Hence, now output dimension =
(n+2p-f+1 x n+2p-f+1) ;p=padding amount
• Stride: Number of columns by which filter jumps through in one step of convolution.
Now, output dimension =
(n+2p-f +1 x n+2p-f + 1) ;s=stride
s s
• Pooling Layer
• Similar to the convolutional layer, the pooling operation sweeps a filter across the entire
input, but the difference is that this filter does not have any weights, instead it applies an
aggregation function to the receptive field.
• They help reducing complexity, improve efficiency and limit risk of overfitting.
DIFFERENT COMPUTER VISION TASKS
Landmark
• ResNet
• InceptionNet
• EfficientNet
• MobileNet
• R-CNN
• FR-CNN
• YOLO
• SSD
• U-Net
• Mask R-CNN
APPLICATIONS
THANK YOU
GAN generated face
“Doesn’t exist in real”
All IMAGES : COURTESY OF OPEN SOURCE INTERNET

Introduction to Computer Vision and its Applications

  • 1.
    COMPUTER VISION - RamS Iyer Course : Seminar (5th Semester)
  • 2.
    WHAT IS COMPUTERVISION ? • Computer Vision (CV) is a field that includes methods for acquiring, processing, analyzing and understanding images. • Intended to duplicate the abilities of human vision by electronically perceiving and understanding an image. • Theory for building artificial systems that obtain information from images. • Image data can take many forms, such as a video sequence, depth images, views from multiple cameras, medical scanners, satellite sensors etc. • Overall modification and enhancement of Human-Machine interaction.
  • 3.
  • 4.
    ARTIFICIAL INTELLIGENCE –“THE NEW ELECTRICITY” • Artificial Intelligence refers to the simulation of human intelligence in machines that are programmed to make machines to think like humans and mimic their actions. • These processes include learning, reasoning, problem-solving, perception, language understanding and decision making.
  • 5.
    HISTORY OF AI •1943 – McCulloch and Pitts – M-P Neuron • 1951 – Minsky – Neural Net Computer • 1956 – Artificial Intelligence was coined at Dartmouth conference along with Lisp invention • 1957 – Frank Rosenblatt – First use of Perceptron • 1965 – Invention of Multi-Level Perceptron (MLP) • 1986 – David E. Rumelhart – Backpropagation Algorithm • 1989 – Yann LeCunn – Convolutional Neural Network (CNN) • 1991 – Recurrent Neural Networks (RNN) • 2007 – GPUs and CUDA framework • 2014 – Ian Goodfellow – Generative Adversarial Network (GAN) • 2017 – Pascal GPU (with unified memory) were released AI WINTER
  • 6.
    MACHINE LEARNING • MachineLearning is a subset of AI that deals with the modeling of mathematical equations in an algorithmic manner, enabling computers to identify patterns and make predictions or decisions based on examples and experience. Broad Classification of Machine Learning Methods Supervised Learning Unsupervised Learning Reinforcement Learning
  • 7.
  • 8.
    • m, care parameters to be learned. • ∑(y-yi)2 is the cost function. • y is the actual value and yi is the predicted value of the target and x is the input feature. An attempt is made by the system to model the equation y = mx + c
  • 9.
    NEURAL NETWORK ANDDEEP LEARNING • Interconnected networks, inspired by the architecture of the neural system that construct the structure of the human brain. • Automatically learn hierarchical representations from data, avoiding feature engineering • Excel at handling complex high dimensional data such as images.
  • 10.
    Each neuron takesin a set of inputs, each input has a certain amount of importance. These inputs along with their importances are combined to produce an output for the neuron Artificial Neuron – Mathematically construct mimicking real neurons Here, x1, x2, x3, x4 are features or input parameters and w1, w2, w3, w4 and b are parameters to be learned. F( ) is the activation function and z is the predicted output. The parameters are learned through the backpropagation mechanism
  • 11.
  • 12.
    Now, A neuralnetwork is good with small to moderately large dimensional data like, But gets severely slow and computationally intensive with high dimensional data, like
  • 13.
    Hence, we usethe concept of what is called as a Convolutional Neural Network (CNN) for feature extraction of meaningful representations from a raw image.
  • 14.
    COMPUTER VISION Computer Visionvs Human Vision – Each pixel is considered as an input feature
  • 15.
    Computer vision isa difficult process because, • It is many-to-one mapping. • It is computationally intensive. • We do not understand the recognition problem.
  • 16.
    CONVOLUTIONAL NEURAL NETWORK •When image data is used, the input data is very big because of image size (eg: 1000x1000x3), so we use convolutions to extract important features and reduce input data dimensions and parameters. • CNN takes an image as input and converts it into a smaller image by encoding each pixel with its neighboring context. • Allows us to process images faster with lesser compute. • There are different filters or kernels used in convolution to extract different features from the input image. • They are composed of mainly 3 types of layers, • Convolutional Layers • Pooling Layers • Fully connected Layers
  • 17.
  • 18.
    • Convolutional Layer •Elementwise multiplication and addition • Let input image dimension = (n x n) • Convolutional filter dimension = (f x f) • Then, Output image dimension = (n x n) * (f x f) = (n-f+1 x n-f+1) • Padding: Every time convolution is used, image shrinks. So padding can be used in order to preserve the original input dimension. Hence, now output dimension = (n+2p-f+1 x n+2p-f+1) ;p=padding amount • Stride: Number of columns by which filter jumps through in one step of convolution. Now, output dimension = (n+2p-f +1 x n+2p-f + 1) ;s=stride s s
  • 19.
    • Pooling Layer •Similar to the convolutional layer, the pooling operation sweeps a filter across the entire input, but the difference is that this filter does not have any weights, instead it applies an aggregation function to the receptive field. • They help reducing complexity, improve efficiency and limit risk of overfitting.
  • 20.
    DIFFERENT COMPUTER VISIONTASKS Landmark • ResNet • InceptionNet • EfficientNet • MobileNet • R-CNN • FR-CNN • YOLO • SSD • U-Net • Mask R-CNN
  • 22.
  • 23.
    THANK YOU GAN generatedface “Doesn’t exist in real” All IMAGES : COURTESY OF OPEN SOURCE INTERNET

Editor's Notes

  • #23 You can use this slide as your opening or closing slide. Should you choose to use it as a closing, make sure you review the main points of your presentation. One creative way to do that is by adding animations to the various graphics on a slide. This slide has 4 different graphics, and, when you view the slideshow, you will see that you can click to reveal the next graphic. Similarly, as you review the main topics in your presentation, you may want each point to show up when you are addressing that topic. Add animation to images and graphics: Select your image or graphic. Click on the Animations tab. Choose from the options. The animation for this slide is “Split”. The drop-down menu in the Animation section gives even more animations you can use. If you have multiple graphics or images, you will see a number appear next to it that notes the order of the animations. Note: You will want to choose the animations carefully. You do not want to make your audience dizzy from your presentation.