Intro to Deep Learning for Computer Vision

Applications of Deep Learning
in Computer Vision
Christoph Körner

Outline
1) Introduction to Neural Networks
2) Deep Learning
3) Applications in Computer Vision
4) Conclusion

Why Deep Learning?
●
Wins every computer vision challenge
(classification, segmentation, etc.)
●
Can be applied in various domains (speech
recognition, game prediction, computer vision,
etc.)
●
Beats human accuracy
●
Big communities and resources
●
Hardware for Deep Learning

Perceptron (1958)
●
Weighted sum of inputs
●
Threshold operator

Artificial Neural Network (1960)
●
Universal function approximator
●
Can solve the XOR problem

Backpropagation (1982)
●
Propagate the error through the network
●
Allows Optimization (SGD, etc.)
●
Enables training of multi-layer networks

Convolution and Pooling (1989)
●
Less parameters than hidden layers
●
More efficient training

Handwritten ZIP Codes (1989)
●
30 training passes
●
Achieved 92% accuracy

What happened until 2011?
●
Better Initialization
●
Better Non-linearities: ReLU
●
1000 times more training data
●
More computing power
●
Factor 1 million speedup in training time through
parallelization on GPUs

Deep Learning
●
Conv-, Pool- and Fully-Connected Layers
●
ReLU activations
●
Deep nested models with many parameters
●
New layer types and structures
●
New techniques to reduce overfitting
●
Loads of training data and compute power
●
10.000.000 images
●
Weeks of training on multi-GPU machines

AlexNet (2012)
●
62.378.344 parameters (250MB)
●
24 layers

VGGNet (2013)
●
●
23 layers

GoogLeNet (2014)
●
●
143 layers

Inception Module
●
Heavy use of 1x1 convolutions (applied along the
depth dimension)
●
Very efficient

ResNet (2015)
●
Residual learning
●
152 layers

Applications in Computer Vision

Classification
●
One class per image
●
Softmax layer at the end

Localization
●
Bounding box Regression
●
Sigmoid layer with 4 outputs at the end
●
Via Classification

Detection
●
Multiple Objects, multiple classes
●
Solved using multiple networks

More Applications
●
Compression
●
Auto-encoders, Self-organizing maps
●
Image Captioning
●
Solved with Recurrent Architecture
●
Image Stylization
●
Clustering
●
Many more...

Conclusion
●
Powerful, learn from data instead of hand-crafted
feature extraction
●
Better than humans
●
Deeper is always better
●
Overfitting
●
More data is always better
●
Data quality
●
Ground truth

Intro to Deep Learning for Computer Vision

In this document

More Related Content

What's hot

Similar to Intro to Deep Learning for Computer Vision

Recently uploaded

Intro to Deep Learning for Computer Vision