Similarity Metrics and the Limits of Cosine

When you do work in embedding space — retrieval, clustering, RAG — you need a similarity (or distance) metric. The standard options:

Metric Description Range Notes
Cosine similarity Angle between vectors [−1, 1] Magnitude-invariant; higher = more similar
Euclidean distance Straight-line distance [0, ∞) Sensitive to magnitude; lower = more similar
Dot product Like cosine but not normalized (−∞, ∞) Sensitive to magnitude; higher = more similar
Manhattan distance (L1) Sum of absolute differences along each axis [0, ∞) More robust to outlier dimensions than Euclidean
Jaccard similarity Intersection over union of sets [0, 1] For set-like or sparse data; higher = more similar

Cosine similarity is the default everywhere — and it's the default for good reason. It ignores vector magnitude (which is often meaningless in learned embedding spaces) and works reasonably for most tasks. But "everyone uses it" is not a technical justification.

The Limits of Cosine Similarity

  • No concept of proximity. Cosine cares only about the angle between vectors, not their position. Two vectors on opposite sides of the space can have high cosine similarity if they point in similar directions.
  • Assumes linear relationships. If meaning bends across the space nonlinearly, cosine misses it.
  • Struggles with sparse vectors. Many zeros, few signal dimensions, artifacts in computed similarity.
  • What's a "good" score? 0.85 is great in some embedding spaces, mediocre in others. There's no universal threshold — evaluate relative to your specific model and task.

Real World: Choosing a Similarity Metric for RAG

When you build a RAG system, the choice of similarity metric is one of several decisions to evaluate empirically. Some embedding models produce vectors where magnitude carries information — for those, dot product may outperform cosine. For sparse vectors (like TF-IDF), Jaccard or BM25-style metrics often win. Don't reach for cosine because it's the default. Reach for it because you tested it and it worked.

Checkpoint

You are using cosine similarity to find the most relevant documents for a query in a RAG system. A document with a cosine similarity of 0.91 is returned, but when you read it, it's clearly not relevant to the query. Which property of cosine similarity best explains this failure?