What We Can Do With Computer Vision
Before we get into the mechanics of how computer vision works, it helps to understand where it is currently being used. Computer vision tasks fall into a few broad families:
Image Recognition and Classification
The question is simple: "What is in this image?" You feed in a photo of a golden retriever and the model outputs "dog." You feed in a photo of a skin lesion and the model outputs a diagnosis probability.
Real World: Google Derm Assist
Users take a close-up photo of a skin concern on their phone. The model classifies what condition it might be, helping people decide whether to seek medical attention.

Real World: Domino's Dom Pizza Checker
Every pizza that comes off the line gets photographed and checked by a computer vision model for correct toppings and even distribution. A model that makes a "wrong" prediction here means a customer gets a bad pizza — a great example of how classification quality directly affects customer experience.
Object Detection
One step beyond classification where we ask, "where is the car, and are there multiple cars, and how confident are we?" Object detection outputs bounding boxes — rectangles drawn around each identified object — along with class labels and confidence scores.
Real World: DTN Smart Trap (Agriculture)
A sensor-equipped insect trap uses computer vision to identify which species of insects are present. Farmers receive alerts about harmful pest populations before they cause crop damage.
Real World: Teeth Numbering in Dentistry
In dentistry, object detection is used to number teeth.

Real World: LEGO Brick Identifier
Love LEGOs? Build your own custom object detection model to identify individual Lego brick types from a live camera feed — distinguishing hundreds of part shapes and sizes in real time!
Image Segmentation
Where object detection draws a box around an object, segmentation goes pixel-by-pixel. Semantic segmentation assigns a class label to every single pixel in the image. Instance segmentation goes further, distinguishing between different instances of the same class — so Person A and Person B get different masks, not just the same "person" label.
Real World: Medical Imaging
Radiologists use segmentation models to identify and outline tumors, organs, or regions of interest in CT scans and MRIs. A segmentation model can highlight exactly which pixels correspond to a lung nodule, making the radiologist's job faster and more consistent.

Keypoint Detection
Instead of classifying regions or segmenting pixels, keypoint detection identifies specific structural points on an object — the corners of a mouth, the joints of a human skeleton, the tip of a finger. The output is a set of (x, y) coordinates with associated confidence values.
Real World: Fitness Applications
Pose estimation models track the positions of joints during a yoga session or a weightlifting set.
Object Tracking and Scene Reconstruction
Object tracking follows the same object across frames in a video. Scene reconstruction goes further, building a 3D model of an environment from 2D images — the foundation of VR, AR, and architectural visualization.
Input
A single image
Output
A class label + confidence score
Real-world applications
- Medical imagingGoogle Derm Assist classifies skin conditions from photos
- Quality controlDomino's checks every pizza for correct toppings at scale
- Content moderationPlatforms classify uploaded images against policy rules
Example output
Explore each computer vision task type. Select a task to see example inputs, outputs, and real-world applications.
A hospital wants to build a model that, given an MRI scan, draws a precise outline around each tumor it finds — distinguishing between two separate tumors of the same type. Which task type is this?