Chapter 5

Recurrent Neural Networks

Feed-forward networks are stateless — they treat every observation independently. Language requires memory. This chapter introduces RNNs as the first architecture that carries information forward through time, maps out the four sequence architecture types (seq-to-seq, seq-to-vector, vector-to-seq, encoder-decoder), explains the vanishing gradient problem that limits plain RNNs, and shows how LSTMs and GRUs solve it through gated memory.