Review of Traditional Approaches: Nearest Neighbors

Before neural networks dominated the field, the most widely deployed collaborative filtering approaches were nearest neighbor (NN) methods. They are conceptually transparent, interpretable, and still competitive on smaller datasets.

The core idea: find users (or items) most similar to the target and leverage their behavior to generate a recommendation. There are two flavors, user-user and item-item, reflecting whether we compute similarity across users or across items.

Slide 1
Slide 2
Slide 3
Slide 4
Slide 5
Slide 6
Slide 7
1 / 7

User-User Collaborative Filtering

Main assumption: people like things that others with similar tastes like.

Step 1: Build the similarity matrix. For each pair of users, compute a similarity score based on their shared ratings. The most common metrics are cosine similarity (treats each user's rating vector as a vector in item-space) and Pearson correlation (normalized to account for individual rating scale differences).

Step 2: Generate recommendations. To recommend items to a target user, find their k nearest neighbors (most similar users), then recommend the items those neighbors rated most highly that the target user hasn't yet seen.

Several practical refinements are needed:

  • Handling missing overlap: Not every user pair has rated the same items. We only compute similarity over items both users have rated. Items with no overlap contribute nothing to the similarity score.
  • Bias correction: Some users rate everything 5 stars; others are harsh critics. Subtracting each user's mean rating before computing similarity normalizes for these individual rating biases.
  • Weighted prediction: Rather than a simple average of neighbor ratings, we weight each neighbor's rating contribution by their similarity score: items rated highly by very similar users get more weight.
  • Co-rating frequency: User pairs who have rated many items in common provide more reliable similarity estimates. Systems often discount similarities computed on very few shared items.

Item-Item Collaborative Filtering

Main assumption: people will like things similar to other things they've previously liked.

The procedure mirrors user-user CF, but now the similarity matrix is computed over items rather than users. For a given user, we find items similar to those they've rated highly, and recommend those similar items.

Why prefer item-item over user-user?

  • In most systems, items outnumber users, but items are more stable. A movie's "similarity neighborhood" rarely changes after release; a user's preferences can shift dramatically.
  • When there are many users, computing all pairwise user similarities is expensive. Item similarities can be precomputed offline and reused.
  • Items tend to have denser rating distributions than users, making item-item similarities more stable and reliable.
Slide 1
Slide 2
Slide 3
1 / 3

Limitations of Nearest Neighbor Methods

NN methods have a fundamental ceiling: they can only recommend items similar to what a user already knows and likes, and they cannot capture latent structure (e.g., genre preferences that aren't directly visible in the ratings). They also struggle badly with sparse data and are computationally prohibitive at scale without significant engineering. These limitations motivate the shift to model-based approaches.

Challenge with nearest neighbor methods
Challenge with NN methods:Content (features) not included in model

Hybrid Methods: Adding Content

Pure collaborative filtering has no knowledge of item content. This means it can't distinguish why two movies are similar, it only knows they were rated similarly. A user who only watches Adam Sandler films will receive CF recommendations based on other users who also watch a lot of Adam Sandler, but those users might also love action films the Adam-Sandler-fan has no interest in.

Hybrid methods address this by incorporating item content features into the similarity computation or as additional model inputs, allowing the system to be more precise about the dimension of similarity that matters to a user.

CheckpointMultiple Choice

You're building a recommender for a library catalog with 50,000 books and 500,000 registered users. Patrons rarely rate more than 10 books. Which nearest-neighbor approach would you prefer and why?

CheckpointReflective Question

Cosine similarity treats the direction of rating vectors as important but ignores magnitude. Pearson correlation normalizes each user's ratings by subtracting their mean. In what real-world scenario would Pearson correlation clearly outperform cosine similarity for user-user CF?