The Roadmap

The roadmap of this unit mirrors how the field itself unfolded. We start with the most naive representations (bag-of-words), graduate to static word embeddings (Word2Vec) that capture some meaning, suffer through the era of recurrent neural networks, and then arrive at attention and transformers — which finally crack the "bank of the river" problem by letting context shape the representation on the fly.

  1. Pre-2013

    Bag of Words & TF-IDF

    Text represented as sparse count vectors over a vocabulary. No semantics or order, but shockingly effective for classification and search. Still in production today.

  2. 2013

    Word2Vec

    Dense, low-dimensional word embeddings trained on co-occurrence. Semantic arithmetic becomes possible: king − man + woman ≈ queen. But every word still has one static vector regardless of context.

  3. 2014–2018

    RNNs, LSTMs, GRUs

    Recurrent architectures carry information forward through sequences, allowing the model to use context. Long-range dependencies remain difficult, and training is slow because computation is sequential.

  4. 2017

    Attention Is All You Need

    The transformer paper from Google DeepMind proposes replacing recurrence with self-attention. Every word can directly attend to every other word. The field pivots.

  5. 2018–2019

    BERT, GPT-1, GPT-2

    Large-scale pretrained transformers. Fine-tune once, deploy everywhere. NLP becomes transfer learning.

  6. 2020–present

    LLMs and Multimodality

    Models with billions to trillions of parameters. Text generation, code, reasoning. The same architecture expanding beyond language into images, audio, and video.

Real-World Hook

Every model we talk about in this unit is currently powering something you use. Bag-of-words is still inside production spam filters. Word2Vec is in production recommendation systems. Transformers are in your phone's autocomplete.