What We Can Do With Computer Vision

Before we get into the mechanics of how computer vision works, it helps to understand where it is currently being used. Computer vision tasks fall into a few broad families:

Image Recognition and Classification

The question is simple: "What is in this image?" You feed in a photo of a golden retriever and the model outputs "dog." You feed in a photo of a skin lesion and the model outputs a diagnosis probability.

◆

Real World: Google Derm Assist

Users take a close-up photo of a skin concern on their phone. The model classifies what condition it might be, helping people decide whether to seek medical attention.

◆

Real World: Domino's Dom Pizza Checker

Every pizza that comes off the line gets photographed and checked by a computer vision model for correct toppings and even distribution. A model that makes a "wrong" prediction here means a customer gets a bad pizza — a great example of how classification quality directly affects customer experience.

Domino's Dom Pizza Checker in action

Object Detection

One step beyond classification where we ask, "where is the car, and are there multiple cars, and how confident are we?" Object detection outputs bounding boxes — rectangles drawn around each identified object — along with class labels and confidence scores.

◆

Real World: DTN Smart Trap (Agriculture)

A sensor-equipped insect trap uses computer vision to identify which species of insects are present. Farmers receive alerts about harmful pest populations before they cause crop damage.

DTN Smart Trap

◆

Real World: Teeth Numbering in Dentistry

In dentistry, object detection is used to number teeth.

Teeth that are numbered — Teeth Numbering in Dentistry [Source]

◆

Real World: LEGO Brick Identifier

Love LEGOs? Build your own custom object detection model to identify individual Lego brick types from a live camera feed — distinguishing hundreds of part shapes and sizes in real time!

Real-time Lego brick detection with object detection

Image Segmentation

Where object detection draws a box around an object, segmentation goes pixel-by-pixel. Semantic segmentation assigns a class label to every single pixel in the image. Instance segmentation goes further, distinguishing between different instances of the same class — so Person A and Person B get different masks, not just the same "person" label.

◆

Real World: Medical Imaging

Radiologists use segmentation models to identify and outline tumors, organs, or regions of interest in CT scans and MRIs. A segmentation model can highlight exactly which pixels correspond to a lung nodule, making the radiologist's job faster and more consistent.

Examples of Image Segmentation in radiology [Source]

Keypoint Detection

Instead of classifying regions or segmenting pixels, keypoint detection identifies specific structural points on an object — the corners of a mouth, the joints of a human skeleton, the tip of a finger. The output is a set of (x, y) coordinates with associated confidence values.

◆

Real World: Fitness Applications

Pose estimation models track the positions of joints during a yoga session or a weightlifting set.

Pose Estimation for fitness

Object Tracking and Scene Reconstruction

Object tracking follows the same object across frames in a video. Scene reconstruction goes further, building a 3D model of an environment from 2D images — the foundation of VR, AR, and architectural visualization.

CV Task Explorer

Input

A single image

Output

A class label + confidence score

Real-world applications

Medical imagingGoogle Derm Assist classifies skin conditions from photos
Quality controlDomino's checks every pizza for correct toppings at scale
Content moderationPlatforms classify uploaded images against policy rules

Example output

Explore each computer vision task type. Select a task to see example inputs, outputs, and real-world applications.

Checkpoint

A hospital wants to build a model that, given an MRI scan, draws a precise outline around each tumor it finds — distinguishing between two separate tumors of the same type. Which task type is this?

←PreviousSeeing Is (Much) Harder Than It LooksIntroduction to Computer Vision Next→How Images Become NumbersIntroduction to Computer Vision