Evaluating Object Detection
Evaluating Bounding Box Models: Mean Average Precision
Standard accuracy does not apply to object detection — predictions are boxes, not single class labels. The field has converged on mean Average Precision (mAP) as the primary metric.
Here is how it is constructed:
- For each predicted box, calculate its IoU with the closest ground truth box.
- Apply an IoU threshold (e.g., 0.5) to classify each prediction as a true positive or false positive.
- Compute a precision-recall curve for each class by varying the confidence threshold.
- Calculate the area under each precision-recall curve — this is the Average Precision (AP) for that class.
- Average AP across all classes to get mAP.
The COCO evaluation protocol reports mAP averaged over IoU thresholds from 0.5 to 0.95 in steps of 0.05. This is more demanding than a single 0.5 IoU threshold and rewards precise localization, not just rough detection.
Additional metrics to report alongside mAP: performance broken down by object size (small, medium, large), and false positives per image (FPPI).
Every detection gets an IoU score with the nearest ground-truth box. A threshold decides whether that score is good enough to count as a correct detection (TP) or not (FP).
Step through the mAP calculation: set an IoU threshold, see which detections pass, trace the precision-recall curve, compute AP, then average across classes.
A detection model achieves [email protected] of 0.78 but [email protected]:0.95 of 0.41. What does this discrepancy tell you?