The Roadmap

The roadmap of this unit mirrors how the field itself unfolded. We start with the most naive representations (bag-of-words), graduate to static word embeddings (Word2Vec) that capture some meaning, suffer through the era of recurrent neural networks, and then arrive at attention and transformers — which finally crack the "bank of the river" problem by letting context shape the representation on the fly.

Pre-2013
Bag of Words & TF-IDF
Text represented as sparse count vectors over a vocabulary. No semantics or order, but shockingly effective for classification and search. Still in production today.
2013
Word2Vec
Dense, low-dimensional word embeddings trained on co-occurrence. Semantic arithmetic becomes possible: king − man + woman ≈ queen. But every word still has one static vector regardless of context.
2014–2018
RNNs, LSTMs, GRUs
Recurrent architectures carry information forward through sequences, allowing the model to use context. Long-range dependencies remain difficult, and training is slow because computation is sequential.
2017
Attention Is All You Need
The transformer paper from Google DeepMind proposes replacing recurrence with self-attention. Every word can directly attend to every other word. The field pivots.
2018–2019
BERT, GPT-1, GPT-2
Large-scale pretrained transformers. Fine-tune once, deploy everywhere. NLP becomes transfer learning.
2020–present
LLMs and Multimodality
Models with billions to trillions of parameters. Text generation, code, reasoning. The same architecture expanding beyond language into images, audio, and video.

ℹ

Real-World Hook

Every model we talk about in this unit is currently powering something you use. Bag-of-words is still inside production spam filters. Word2Vec is in production recommendation systems. Transformers are in your phone's autocomplete.

←PreviousWhy Text Is Harder Than It LooksIntroduction to NLP Next→The Preprocessing PipelineText Preprocessing