Introduction to object detection

Introduction to Object Detection

Computer Vision Tasks
Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf

Classification + Localization
● Classification:
○ Input: Image
○ Output: Class label
○ Loss: Cross entropy (Softmaxlog)
○ Evaluation metric: Accuracy
● Localization:
○ Input: Image
○ Output: Box in the image (x, y, w, h)
○ Loss: L2 Loss (Euclidean distance)
○ Evaluation metric: Intersection over Union
● Classification + Localization:
○ Input: Image
○ Output: Class label + box in the image
○ Loss: Sum of both losses

Classification + Localization: ImageNet Challenge
● Dataset
○ 1000 Classes.
○ Each image has 1 class with at least one
bounding box.
○ ~800 Training images per class.
● Evaluation
○ Algorithm produces 5 (class + bounding box)
guesses.
○ Example is correct if at least one of guess has
correct class AND bounding box at least 50%
intersection over union.

Intersection Over Union (IoU)
● Important measurement for object localization.
● Used in both training and evaluation.
Intersection(A,B)
Union(A,B)
IoU(A,B)
=

Classification + Localization: Model
Classification Head:
● C Scores for C
classes
Localization Head:
● Class agnostic:
(x,y,w,h)
● Class specific:
(x,y,w,h) X C

Object Detection 2001-2007
● Rapid Object Detection using a Boosted Cascade of
Simple Features (2001)
○ Viola & Jones
● Histograms of Oriented Gradients for Human
Detection (2005)
○ Dalal & Triggs
● Object Detection with Discriminatively Trained Part
Based Models (2010)
○ Felzenszwalb, Girshick, Ramanan
● Fast Feature Pyramids for Object Detection (2014)
○ Dollar

Object Detection 2007-2012
Source: Ross Girshick’s CVPR 2017 Tutorial http://deeplearning.csail.mit.edu/instance_ross.pptx

Object Detection Today
Source: Ross Girshick’s CVPR 2017 Tutorial http://deeplearning.csail.mit.edu/instance_ross.pptx

Object Detection: Datasets
2007
Pascal VOC
● 20 Classes
● 11K Training images
● 27K Training objects
Was de-facto standard,
currently used as quick
benchmark to evaluate new
detection algorithms.
2013
ImageNet ILSVRC
● 200 Classes
● 534K Training objects
Essentially scaled up version
of PASCAL VOC, similar object
statistics.
2015
MS COCO
● 80 Classes
● 1.5M Training objects
More categories and more
object instances in every
image. Only 10% of images
contain a single object
category, 60% in Pascal. More
small objects than large
objects.

Object Detection
● Input: Image
● Output: For each object class c and each
image i, an algorithm returns predicted
detections: locations with
confidence scores .

Object Detection: Evaluation
● True positive: correct class prediction AND IoU > 50%.
● False positive: wrong class or IoU < 50%.
● False negative: missed (not detected) object
● Only one detection can be matched to an object.
●

● Mean Average Precision (mAP) across all classes, based on Average Precision
(AP) per class, based on Precision and Recall.

Precision And Recall For a Threshold

Precision-Recall Curve
Source: Drawing by Prof. William H. Press, the University of Texas at Austin

Average Precision (AP)
● [In the vision community] AP is the estimated area under the PR curve

Mean Average Precision (mAP)
● The winner of each object class is the team with the highest average precision
● The winner of the challenge is the team with the highest mean Average
Precision (mAP) across all classes.

● Today new metrics are emerging
○ Averaging precision over all IoU thresholds: 0.5:0.05:0.95
○ Averaging precision for different object sizes (small, medium, big)
○ Averaging recall as a metric to measure object proposal quality.

Looking for brilliant researchers
cv@brodmann17.com

Introduction to object detection

In this document

More Related Content

What's hot

Similar to Introduction to object detection

More from Brodmann17

Recently uploaded

Introduction to object detection