Applications of Deep Learning
in Computer Vision
Christoph Körner
Outline
1) Introduction to Neural Networks
2) Deep Learning
3) Applications in Computer Vision
4) Conclusion
Why Deep Learning?
●
Wins every computer vision challenge
(classification, segmentation, etc.)
●
Can be applied in various domains (speech
recognition, game prediction, computer vision,
etc.)
●
Beats human accuracy
●
Big communities and resources
●
Hardware for Deep Learning
Perceptron (1958)
●
Weighted sum of inputs
●
Threshold operator
Artificial Neural Network (1960)
●
Universal function approximator
●
Can solve the XOR problem
Backpropagation (1982)
●
Propagate the error through the network
●
Allows Optimization (SGD, etc.)
●
Enables training of multi-layer networks
Convolution and Pooling (1989)
●
Less parameters than hidden layers
●
More efficient training
Handwritten ZIP Codes (1989)
●
30 training passes
●
Achieved 92% accuracy
What happened until 2011?
●
Better Initialization
●
Better Non-linearities: ReLU
●
1000 times more training data
●
More computing power
●
Factor 1 million speedup in training time through
parallelization on GPUs
Deep Learning
●
Conv-, Pool- and Fully-Connected Layers
●
ReLU activations
●
Deep nested models with many parameters
●
New layer types and structures
●
New techniques to reduce overfitting
●
Loads of training data and compute power
●
10.000.000 images
●
Weeks of training on multi-GPU machines
AlexNet (2012)
●
62.378.344 parameters (250MB)
●
24 layers
VGGNet (2013)
●
102.908.520 parameters (412MB)
●
23 layers
GoogLeNet (2014)
●
6.998.552 parameters (28MB)
●
143 layers
Inception Module
●
Heavy use of 1x1 convolutions (applied along the
depth dimension)
●
Very efficient
ResNet (2015)
●
Residual learning
●
152 layers
Applications in Computer Vision
Classification
●
One class per image
●
Softmax layer at the end
Localization
●
Bounding box Regression
●
Sigmoid layer with 4 outputs at the end
●
Via Classification
Detection
●
Multiple Objects, multiple classes
●
Solved using multiple networks
Segmentation
More Applications
●
Compression
●
Auto-encoders, Self-organizing maps
●
Image Captioning
●
Solved with Recurrent Architecture
●
Image Stylization
●
Clustering
●
Many more...
Conclusion
●
Powerful, learn from data instead of hand-crafted
feature extraction
●
Better than humans
●
Deeper is always better
●
Overfitting
●
More data is always better
●
Data quality
●
Ground truth
Thank you!
Christoph Körner

Intro to Deep Learning for Computer Vision