Probabilistic Matrix Factorization
Probabilistic Matrix Factorization (PMF)
Matrix factorization is the bridge between classical collaborative filtering and deep learning. The core idea: the user-item rating matrix R is assumed to be low-rank, that is, the preferences of millions of users can be explained by a relatively small number of latent factors.
We decompose R into two embedding matrices:
- U: a matrix of user embeddings, where each row is a dense vector of size d representing a user's latent preferences
- V: a matrix of item embeddings, where each column is a dense vector of size d representing an item's latent attributes
The predicted rating of user u for item i is then simply the dot product of their embedding vectors: R̂ui = Uu · Vi.
We can't simply decompose R directly because it has millions of missing values. Instead, we treat this as a supervised learning problem: we define a loss (usually MSE) over observed ratings, and use stochastic gradient descent to learn the embedding values that minimize that loss.

What do latent dimensions represent?
The d dimensions of the embedding don't come pre-labeled, but they often emerge with interpretable structure. In a movie recommender, one dimension might correlate strongly with "action vs. drama," another with "mainstream vs. arthouse." We never specify these, the model learns them by minimizing prediction error. Inspecting learned embeddings is a useful debugging and interpretability technique.
In PMF, the embedding dimension d is a hyperparameter. What is the effect of choosing d too small versus d too large?