Why Deep Learning?
Traditional collaborative filtering methods hit a ceiling. They cannot automatically extract useful features from raw unstructured data (images, text, audio), they model user-item interactions linearly, and they struggle to scale to production datasets of hundreds of millions of users and billions of items.
Deep learning addresses each of these shortcomings:
- Automatic feature extraction: CNNs can extract visual features from product images, transformers can encode item descriptions, no manual feature engineering required
- Handling sparse data: Embedding layers learn dense representations even from sparse interaction matrices
- Scalability: GPU-accelerated mini-batch training scales to massive datasets
- Non-linear and complex interaction modeling: Multi-layer networks can capture higher-order user-item interactions that dot-product similarity misses

