Introduction to Object Detection
Computer Vision Tasks
Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
Computer Vision Tasks
Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
Classification + Localization
● Classification:
○ Input: Image
○ Output: Class label
○ Loss: Cross entropy (Softmaxlog)
○ Evaluation metric: Accuracy
● Localization:
○ Input: Image
○ Output: Box in the image (x, y, w, h)
○ Loss: L2 Loss (Euclidean distance)
○ Evaluation metric: Intersection over Union
● Classification + Localization:
○ Input: Image
○ Output: Class label + box in the image
○ Loss: Sum of both losses
Classification + Localization: ImageNet Challenge
● Dataset
○ 1000 Classes.
○ Each image has 1 class with at least one
bounding box.
○ ~800 Training images per class.
● Evaluation
○ Algorithm produces 5 (class + bounding box)
guesses.
○ Example is correct if at least one of guess has
correct class AND bounding box at least 50%
intersection over union.
Intersection Over Union (IoU)
● Important measurement for object localization.
● Used in both training and evaluation.
Intersection(A,B)
Union(A,B)
IoU(A,B)
=
Classification + Localization: Model
Classification Head:
● C Scores for C
classes
Localization Head:
● Class agnostic:
(x,y,w,h)
● Class specific:
(x,y,w,h) X C
Computer Vision Tasks
Source: CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
Object Detection 2001-2007
● Rapid Object Detection using a Boosted Cascade of
Simple Features (2001)
○ Viola & Jones
● Histograms of Oriented Gradients for Human
Detection (2005)
○ Dalal & Triggs
● Object Detection with Discriminatively Trained Part
Based Models (2010)
○ Felzenszwalb, Girshick, Ramanan
● Fast Feature Pyramids for Object Detection (2014)
○ Dollar
Object Detection 2007-2012
Source: Ross Girshick’s CVPR 2017 Tutorial http://deeplearning.csail.mit.edu/instance_ross.pptx
Object Detection Today
Source: Ross Girshick’s CVPR 2017 Tutorial http://deeplearning.csail.mit.edu/instance_ross.pptx
Object Detection: Datasets
2007
Pascal VOC
● 20 Classes
● 11K Training images
● 27K Training objects
Was de-facto standard,
currently used as quick
benchmark to evaluate new
detection algorithms.
2013
ImageNet ILSVRC
● 200 Classes
● 476K Training images
● 534K Training objects
Essentially scaled up version
of PASCAL VOC, similar object
statistics.
2015
MS COCO
● 80 Classes
● 200K Training images
● 1.5M Training objects
More categories and more
object instances in every
image. Only 10% of images
contain a single object
category, 60% in Pascal. More
small objects than large
objects.
Pascal Examples
COCO Examples
Object Detection
● Input: Image
● Output: For each object class c and each
image i, an algorithm returns predicted
detections: locations with
confidence scores .
Object Detection: Evaluation
● True positive: correct class prediction AND IoU > 50%.
● False positive: wrong class or IoU < 50%.
● False negative: missed (not detected) object
● Only one detection can be matched to an object.
●
Object Detection: Evaluation
● Mean Average Precision (mAP) across all classes, based on Average Precision
(AP) per class, based on Precision and Recall.
Precision And Recall For a Threshold
Precision-Recall Curve
Source: Drawing by Prof. William H. Press, the University of Texas at Austin
Average Precision (AP)
● [In the vision community] AP is the estimated area under the PR curve
 
Mean Average Precision (mAP)
● The winner of each object class is the team with the highest average precision
● The winner of the challenge is the team with the highest mean Average
Precision (mAP) across all classes.
Object Detection: Evaluation
● Mean Average Precision (mAP) across all classes, based on Average Precision
(AP) per class, based on Precision and Recall.
Object Detection: Evaluation
● Today new metrics are emerging
○ Averaging precision over all IoU thresholds: 0.5:0.05:0.95
○ Averaging precision for different object sizes (small, medium, big)
○ Averaging recall as a metric to measure object proposal quality.
Looking for brilliant researchers
cv@brodmann17.com

Introduction to object detection

  • 1.
  • 2.
    Computer Vision Tasks Source:CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
  • 3.
    Computer Vision Tasks Source:CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
  • 4.
    Classification + Localization ●Classification: ○ Input: Image ○ Output: Class label ○ Loss: Cross entropy (Softmaxlog) ○ Evaluation metric: Accuracy ● Localization: ○ Input: Image ○ Output: Box in the image (x, y, w, h) ○ Loss: L2 Loss (Euclidean distance) ○ Evaluation metric: Intersection over Union ● Classification + Localization: ○ Input: Image ○ Output: Class label + box in the image ○ Loss: Sum of both losses
  • 5.
    Classification + Localization:ImageNet Challenge ● Dataset ○ 1000 Classes. ○ Each image has 1 class with at least one bounding box. ○ ~800 Training images per class. ● Evaluation ○ Algorithm produces 5 (class + bounding box) guesses. ○ Example is correct if at least one of guess has correct class AND bounding box at least 50% intersection over union.
  • 6.
    Intersection Over Union(IoU) ● Important measurement for object localization. ● Used in both training and evaluation. Intersection(A,B) Union(A,B) IoU(A,B) =
  • 7.
    Classification + Localization:Model Classification Head: ● C Scores for C classes Localization Head: ● Class agnostic: (x,y,w,h) ● Class specific: (x,y,w,h) X C
  • 8.
    Computer Vision Tasks Source:CS231n Object detection http://cs231n.stanford.edu/slides/2016/winter1516_lecture8.pdf
  • 9.
    Object Detection 2001-2007 ●Rapid Object Detection using a Boosted Cascade of Simple Features (2001) ○ Viola & Jones ● Histograms of Oriented Gradients for Human Detection (2005) ○ Dalal & Triggs ● Object Detection with Discriminatively Trained Part Based Models (2010) ○ Felzenszwalb, Girshick, Ramanan ● Fast Feature Pyramids for Object Detection (2014) ○ Dollar
  • 10.
    Object Detection 2007-2012 Source:Ross Girshick’s CVPR 2017 Tutorial http://deeplearning.csail.mit.edu/instance_ross.pptx
  • 11.
    Object Detection Today Source:Ross Girshick’s CVPR 2017 Tutorial http://deeplearning.csail.mit.edu/instance_ross.pptx
  • 12.
    Object Detection: Datasets 2007 PascalVOC ● 20 Classes ● 11K Training images ● 27K Training objects Was de-facto standard, currently used as quick benchmark to evaluate new detection algorithms. 2013 ImageNet ILSVRC ● 200 Classes ● 476K Training images ● 534K Training objects Essentially scaled up version of PASCAL VOC, similar object statistics. 2015 MS COCO ● 80 Classes ● 200K Training images ● 1.5M Training objects More categories and more object instances in every image. Only 10% of images contain a single object category, 60% in Pascal. More small objects than large objects.
  • 13.
  • 14.
  • 15.
    Object Detection ● Input:Image ● Output: For each object class c and each image i, an algorithm returns predicted detections: locations with confidence scores .
  • 16.
    Object Detection: Evaluation ●True positive: correct class prediction AND IoU > 50%. ● False positive: wrong class or IoU < 50%. ● False negative: missed (not detected) object ● Only one detection can be matched to an object. ●
  • 17.
    Object Detection: Evaluation ●Mean Average Precision (mAP) across all classes, based on Average Precision (AP) per class, based on Precision and Recall.
  • 18.
    Precision And RecallFor a Threshold
  • 19.
    Precision-Recall Curve Source: Drawingby Prof. William H. Press, the University of Texas at Austin
  • 20.
    Average Precision (AP) ●[In the vision community] AP is the estimated area under the PR curve  
  • 21.
    Mean Average Precision(mAP) ● The winner of each object class is the team with the highest average precision ● The winner of the challenge is the team with the highest mean Average Precision (mAP) across all classes.
  • 22.
    Object Detection: Evaluation ●Mean Average Precision (mAP) across all classes, based on Average Precision (AP) per class, based on Precision and Recall.
  • 23.
    Object Detection: Evaluation ●Today new metrics are emerging ○ Averaging precision over all IoU thresholds: 0.5:0.05:0.95 ○ Averaging precision for different object sizes (small, medium, big) ○ Averaging recall as a metric to measure object proposal quality.
  • 24.