Anchor Boxes and Non-Maximum Suppression

The Anchor Box Problem

Here is the challenge in bounding box detection: objects come in wildly different sizes, shapes, and positions. A naive approach might slide a fixed-size window across every position in the image. But what size window? A window sized for a distant pedestrian will miss a nearby truck, and vice versa.

The solution is anchor boxes: a predefined set of boxes with different sizes and aspect ratios, placed at every position in the image. The model does not predict bounding boxes from scratch — it predicts offsets (adjustments to x, y, width, and height) relative to each anchor box. This is a more tractable prediction problem.

A dog, used to demonstrate anchor box detection

Anchor boxes: multiple candidate boxes of varied sizes converge onto the detected subject. The model predicts offsets from these anchors rather than absolute coordinates.

A typical detection network might use thousands of anchor boxes per image. That raises a new problem: how do you deal with all those predictions?

Bounding boxes RetinaNet
Just 1% of the Bounding Boxes in RetinaNet [Source]

Non-Maximum Suppression: Keeping the Best Boxes

When a model generates thousands of anchor box predictions, two things happen: many boxes will have very low confidence (nothing detected), and many remaining boxes will overlap heavily (multiple anchors detecting the same object from slightly different positions).

Non-Maximum Suppression (NMS) handles both:

  1. Remove all boxes with confidence below a threshold (e.g., 80%).
  2. For the remaining boxes, calculate Intersection over Union (IoU) between every pair.
  3. When two boxes have IoU above a threshold — they are clearly detecting the same object — keep only the one with higher confidence and discard the other.

Intersection over Union is the area of overlap between two boxes divided by the area of their union. An IoU of 1.0 means perfect overlap. An IoU near 0 means the boxes barely share any space.

Explanation of IoU
Intersection over Union (IoU) [Source]
IoU Calculator
Area A
25,200 px²
Area B
25,200 px²
Intersection
3,600 px²
Union
46,800 px²
IoU = Intersection / Union
0.077
weak overlap

Drag boxes to move · drag the corner dot to resize

Drag two bounding boxes on the canvas. The panel shows the intersection area, union area, and IoU score in real time. Experiment with overlapping and non-overlapping configurations.

Checkpoint

A detection model produces three overlapping boxes for what appears to be a single pedestrian. Their confidence scores are 0.92, 0.87, and 0.79. Their IoU values are all above 0.6. After applying NMS with a confidence threshold of 0.80 and an IoU threshold of 0.5, what is the result?