Hidden Markov Models
A Hidden Markov Model is a way to model sequences when the thing you actually care about is unobserved.
Picture three states: cloudy, sunny, rainy. From any state, there's some probability of transitioning to any other state in the next hour — sunny → sunny might be 70%, sunny → rainy might be 30%, and so on. This is a Markov chain. The "Markov" part means the next state depends only on the current state — no memory of further history.
Now imagine you're locked in a windowless restaurant. You can't see the weather. But you can see what customers order: ice cream or hot soup. If sunny weather makes people order ice cream 80% of the time, and rainy weather makes them order soup 80% of the time, then by watching orders over time, you can infer the weather. The weather is the hidden state; the food order is the observation.
That's an HMM. The hidden state evolves as a Markov chain. The observations depend on the hidden state via emission probabilities.
Transition probabilities
How the weather changes hour to hour
| From ↓ / To → | ☀️ sunny | 🌧️ rainy |
|---|---|---|
| ☀️ sunny | 0.70 | 0.30 |
| 🌧️ rainy | 0.40 | 0.60 |
Emission probabilities
What customers order given the weather
| Weather | 🍦 ice cream | 🍲 hot soup |
|---|---|---|
| ☀️ sunny | 0.80 | 0.20 |
| 🌧️ rainy | 0.15 | 0.85 |
Explore the transition and emission tables, then switch to Decode to build an observation sequence and watch the Viterbi algorithm infer the most likely hidden weather states.
In NLP, the hidden states are often parts of speech (noun, verb, adjective), and the observations are the words themselves. HMMs were used in speech recognition and part-of-speech tagging for many years.
A great way to feel the limits of these models is to train an HMM to generate text and read what it spits out. Output trained on Jane Austen's Pride and Prejudice is grammatical-looking but utterly nonsensical — full of dangling references and characters appearing where they shouldn't. It's actually quite funny if you've read the book. It's also a perfect demonstration of why HMMs are not good at text generation.
Example: HMM Text Generation Trained on Pride and Prejudice
Input word: Love
conveyed him and his five cousins at a suitable hour to Meryton and the girls may go or you may send them by themselves you know â Elizabeth was distressed She felt that Janeâ s feelings she is not half so handsome as Jane nor half so good humoured as Lydia But you are always giving her the preference â â They have none of them do we â Darcy had walked away to another part of the business On finding Mrs Bennet Elizabeth and one of its narrowest parts They crossed it by a simple bridge in character with the general air of the scene it was a spot less adorned than any they had yet visited and the valley here contracted into a glen allowed room only for the stream and a narrow walk amidst the rough coppice wood which bordered it Elizabeth longed to observe that Mr Bingley had been a most delightful friend so easily guided that his worth was invaluable but she checked herself She remembered that he had but just courage enough to make her former assurance of her sisterâ s ready acquiescence â I hope â said she â Your cousinâ s conduct does not suit my feelings Why was he to be the judge â â You are then resolved to have him â â I have two small favours to request First that you will always exceed your income â â I hope not so Imprudence or thoughtlessness in money matters would be unpardonable in me â â Exceed their income My dear Mr Bennet But I knew not I was afraid of doing too much Wretched wretched mistake â Darcy made no answer and seemed desirous of changing the subject At that moment Sir William Lucas appeared close to them meaning to pass
Why HMMs Eventually Fall Short
HMMs share a fundamental constraint with the Markov assumption itself: the next state depends only on the current state. They can't model long-range dependencies — a dependency that spans 20 words is invisible to an HMM. This is a hint of the challenge that RNNs will address next, and that transformers will finally solve.