Why Deep Learning?

Traditional collaborative filtering methods hit a ceiling. They cannot automatically extract useful features from raw unstructured data (images, text, audio), they model user-item interactions linearly, and they struggle to scale to production datasets of hundreds of millions of users and billions of items.

Deep learning addresses each of these shortcomings:

  • Automatic feature extraction: CNNs can extract visual features from product images, transformers can encode item descriptions, no manual feature engineering required
  • Handling sparse data: Embedding layers learn dense representations even from sparse interaction matrices
  • Scalability: GPU-accelerated mini-batch training scales to massive datasets
  • Non-linear and complex interaction modeling: Multi-layer networks can capture higher-order user-item interactions that dot-product similarity misses
Motivation for CNNs
CNNs can be used to extract features from unstructed data that can be used to make better recommendations.
Motivation for Sequence Models
When time matters (i.e. behavior changes after getting a significant other), sequence models can capture these temporal dynamics.