The Most Famous Equation in NLP

The wild thing about Word2Vec embeddings is that you can do arithmetic with them:

king − man + woman ≈ queen

This is evidence that the model has learned to separate the dimensions of meaning. Subtracting man from king removes the gender direction. Adding woman puts it back, oriented the other way. The result lands closest to queen.

You can do similar things with Paris − France + Italy ≈ Rome. Geography, gender, verb tense, plurality — all of these end up as approximate directions in the embedding space.

Vector Arithmetic — king − man + woman = queenStep 1 of 4
man [0.72, 0.22]woman [0.38, 0.32]king [0.72, 0.78]queen [0.38, 0.88]result [0.38, 0.88]

Step 1

Start: four word vectors

Every word is a point in embedding space, shown here as a vector from the origin. King, man, woman, and queen each live at a fixed location the model learned from billions of words.

king=[0.72, 0.78]

Each word is a point in 2D embedding space. The red arrows show the two geometric steps of the analogy. Switch between presets to see how the same pattern holds across different word pairs.

Doc2Vec and GloVe

Two extensions of Word2Vec you should know:

  • Doc2Vec extends Word2Vec by adding a paragraph vector to the input, so the model captures topic-level context, not just local word context. Useful for document-level tasks like classification or similarity.
  • GloVe (2014) is a matrix factorization approach. Rather than training a neural net, it factorizes a co-occurrence matrix using log-bilinear regression. GloVe embeddings are often used as inputs to downstream neural networks.