The document outlines the fundamentals of object detection in computer vision, including classification, localization, and evaluation metrics like Intersection over Union (IoU) and Mean Average Precision (mAP). It details the evolution of object detection techniques from 2001 to the present and discusses various datasets used in the field, highlighting the increase in complexity and object instances over time. The document also emphasizes the importance of metrics for evaluating detection algorithms and mentions emerging metrics for further refining evaluation processes.
Classification + Localization
●Classification:
○ Input: Image
○ Output: Class label
○ Loss: Cross entropy (Softmaxlog)
○ Evaluation metric: Accuracy
● Localization:
○ Input: Image
○ Output: Box in the image (x, y, w, h)
○ Loss: L2 Loss (Euclidean distance)
○ Evaluation metric: Intersection over Union
● Classification + Localization:
○ Input: Image
○ Output: Class label + box in the image
○ Loss: Sum of both losses
5.
Classification + Localization:ImageNet Challenge
● Dataset
○ 1000 Classes.
○ Each image has 1 class with at least one
bounding box.
○ ~800 Training images per class.
● Evaluation
○ Algorithm produces 5 (class + bounding box)
guesses.
○ Example is correct if at least one of guess has
correct class AND bounding box at least 50%
intersection over union.
6.
Intersection Over Union(IoU)
● Important measurement for object localization.
● Used in both training and evaluation.
Intersection(A,B)
Union(A,B)
IoU(A,B)
=
7.
Classification + Localization:Model
Classification Head:
● C Scores for C
classes
Localization Head:
● Class agnostic:
(x,y,w,h)
● Class specific:
(x,y,w,h) X C
Object Detection 2001-2007
●Rapid Object Detection using a Boosted Cascade of
Simple Features (2001)
○ Viola & Jones
● Histograms of Oriented Gradients for Human
Detection (2005)
○ Dalal & Triggs
● Object Detection with Discriminatively Trained Part
Based Models (2010)
○ Felzenszwalb, Girshick, Ramanan
● Fast Feature Pyramids for Object Detection (2014)
○ Dollar
Object Detection: Datasets
2007
PascalVOC
● 20 Classes
● 11K Training images
● 27K Training objects
Was de-facto standard,
currently used as quick
benchmark to evaluate new
detection algorithms.
2013
ImageNet ILSVRC
● 200 Classes
● 476K Training images
● 534K Training objects
Essentially scaled up version
of PASCAL VOC, similar object
statistics.
2015
MS COCO
● 80 Classes
● 200K Training images
● 1.5M Training objects
More categories and more
object instances in every
image. Only 10% of images
contain a single object
category, 60% in Pascal. More
small objects than large
objects.
Object Detection
● Input:Image
● Output: For each object class c and each
image i, an algorithm returns predicted
detections: locations with
confidence scores .
16.
Object Detection: Evaluation
●True positive: correct class prediction AND IoU > 50%.
● False positive: wrong class or IoU < 50%.
● False negative: missed (not detected) object
● Only one detection can be matched to an object.
●
17.
Object Detection: Evaluation
●Mean Average Precision (mAP) across all classes, based on Average Precision
(AP) per class, based on Precision and Recall.
Mean Average Precision(mAP)
● The winner of each object class is the team with the highest average precision
● The winner of the challenge is the team with the highest mean Average
Precision (mAP) across all classes.
22.
Object Detection: Evaluation
●Mean Average Precision (mAP) across all classes, based on Average Precision
(AP) per class, based on Precision and Recall.
23.
Object Detection: Evaluation
●Today new metrics are emerging
○ Averaging precision over all IoU thresholds: 0.5:0.05:0.95
○ Averaging precision for different object sizes (small, medium, big)
○ Averaging recall as a metric to measure object proposal quality.