Traditional Models for Computer Vision
Pairing Features with Models
Once features are extracted, they are fed into a standard machine learning model. Different models have different strengths depending on the feature types involved:
- Support Vector Machines — Work particularly well with HOG and SIFT features. Effective in high-dimensional feature spaces and good with limited training data.
- Random Forests — Handle varied feature types gracefully. You can combine color histograms with texture features without worrying about scaling. Built-in feature importance tells you which image properties are most useful.
- k-Nearest Neighbors — Effective with normalized feature vectors and clear feature spaces like color distributions. Commonly used for image retrieval (find the most similar image).
- Gradient Boosting (XGBoost) — Robust across combined, high-dimensional feature sets. Excellent when features come in different scales and types.
Why Traditional Approaches Have a Ceiling
Traditional CV has fundamental limitations that become more apparent as the problems get harder.
- Feature engineering is bottlenecked by human expertise. Capturing all relevant properties of an image in hand-designed features is genuinely difficult, and the features that work for one domain often fail in another.
- Traditional features lose spatial relationships between pixels. A color histogram tells you how much red is in an image, but not where. A GLCM captures local texture but cannot capture hierarchical structure across scales.
- Limited generalizability. A feature set tuned for detecting pedestrians will likely perform poorly on detecting medical anomalies. Each new domain often requires re-engineering the features from scratch.
- Hard to capture abstract patterns. Some visual properties that matter most for a task are not easily articulable as explicit features.
Checkpoint
A team uses HOG features and an SVM to build a pedestrian detector for outdoor security cameras. They want to adapt the same pipeline for detecting surgical instruments in operating room footage. What is the most likely outcome?
ℹ
Enter: Deep Learning
What if, instead of engineering features by hand, we let the model learn the features directly from data? That is the idea behind convolutional neural networks — and it is what we turn to next.