Recurrent Neural Network (RNN)

A recurrent neural network has a simple twist: the output of a layer is added to the next input and fed back into the same layer. You can draw this two ways:

  • Unrolled — the same block copied left-to-right across time steps, with arrows showing how the hidden state passes forward. This is the more common way to draw an RNN.
  • Rolled — a single block with a loop arrow on it, simple and abstract.
Unrolled and rolled RNN diagram
An RNN in rolled form (right) and unrolled form (left).

The math at each time step:

yt=α(Wxxt+Wyyt1+b)y_t = \alpha(W_x \cdot x_t + W_y \cdot y_{t-1} + b)

Where α\alpha is typically a hyperbolic tangent activation. xtx_t is the current input (the embedding of the word at position tt). yt1y_{t-1} is the previous output. Wx and WyW_x \text{ and } W_y are weight matrices. Sometimes the hidden state is denoted hth_t if it stays internal rather than being read off as a per-step output.

RNN Cell — click any component to see its role in the equation

ht−1Wy×b+Wx×xttanhsoftmaxytht

Equation

yt = α(Wx xt + Wy yt−1 + b)

Hover over any part of the diagram to learn what it does.

Components

Hover any part of the RNN cell — the equation highlights the corresponding term and explains its role.

RNN Types

The same RNN components handle a surprising range of tasks depending on how you wire up inputs and outputs:

  • Sequence-to-sequence: input at every step, output at every step. Used for stock price forecasting, where each new day produces a new prediction.
  • Sequence-to-vector: many inputs, one output. Used for spam classification — read the whole email, output a single yes/no.
  • Vector-to-sequence: one input, many outputs. Used for image captioning — encode an image once, then generate a caption word by word.
  • Encoder-decoder: one full sequence in (the encoder), one full sequence out (the decoder). Used for machine translation — read the whole French sentence before generating the English.
RNN Architecture Explorer
RNN 1RNN 2RNN 3RNN 4y₁y₂y₃y₄x₁x₂x₃x₄OUTPUTRNNINPUT

Sequence-to-Sequence

One RNN output at every input time step. Input and output are aligned — the same length.

Each output is produced at the same time step as its corresponding input.

Task: Stock price forecasting, named entity tagging

"Tag each word: Apple [ORG] is [O] hiring [O] in [O] Berlin [LOC]"

Select an RNN architecture type to see how inputs and outputs are wired, with a real-world task example for each.

Checkpoint

You are building a machine translation system that reads a full English sentence and produces a full French sentence. The output length is different from the input length. Which RNN architecture type is most appropriate?