The Roadmap
The roadmap of this unit mirrors how the field itself unfolded. We start with the most naive representations (bag-of-words), graduate to static word embeddings (Word2Vec) that capture some meaning, suffer through the era of recurrent neural networks, and then arrive at attention and transformers — which finally crack the "bank of the river" problem by letting context shape the representation on the fly.
- Pre-2013
Bag of Words & TF-IDF
Text represented as sparse count vectors over a vocabulary. No semantics or order, but shockingly effective for classification and search. Still in production today.
- 2013
Word2Vec
Dense, low-dimensional word embeddings trained on co-occurrence. Semantic arithmetic becomes possible: king − man + woman ≈ queen. But every word still has one static vector regardless of context.
- 2014–2018
RNNs, LSTMs, GRUs
Recurrent architectures carry information forward through sequences, allowing the model to use context. Long-range dependencies remain difficult, and training is slow because computation is sequential.
- 2017
Attention Is All You Need
The transformer paper from Google DeepMind proposes replacing recurrence with self-attention. Every word can directly attend to every other word. The field pivots.
- 2018–2019
BERT, GPT-1, GPT-2
Large-scale pretrained transformers. Fine-tune once, deploy everywhere. NLP becomes transfer learning.
- 2020–present
LLMs and Multimodality
Models with billions to trillions of parameters. Text generation, code, reasoning. The same architecture expanding beyond language into images, audio, and video.
Real-World Hook
Every model we talk about in this unit is currently powering something you use. Bag-of-words is still inside production spam filters. Word2Vec is in production recommendation systems. Transformers are in your phone's autocomplete.