From Predicting to Creating

Everything we've done up to this point has lived in what we call discriminative machine learning. We taught models to decide: is this a cat or a dog? Is this review positive or negative? Will this user like this movie? This chapter is about teaching machines to generate something new(ish).

Discriminative vs. Generative

Most of the models we've trained in this course work like this: we have a corpus of training data consisting of observations and labels. We hand that to a model that learns to map observations to labels. Show it an image, and it predicts "cat." Show it a sentence, and it predicts "positive sentiment." Mathematically, we're estimating P(y | x) — the probability of a label given the data. We're learning boundaries between classes, or relationships between variables.

In generative machine learning, the training data is just a bunch of observations, without labels. The model takes random noise as input and produces a generated sample drawn from the distribution of the training data. Mathematically, we're estimating P(x | y) — or, in the unconditional case, just P(x). We're learning the underlying data distribution itself.

A discriminative model that's seen a million cat photos can tell you whether a new photo contains a cat. A generative model that's seen a million cat photos can draw you a new cat.

Discriminative vs. Generative Learning

P(y | x)

Probability of label given data

Input

Data x

An observation — an image, a sentence, a user interaction.

Output

Label y

A prediction — a class, a score, a decision.

What it learns

Decision boundaries between classes. Relationships between variables.

The cat example

Show it 1M cat photos → it can tell you if a new photo contains a cat.

Toggle between discriminative and generative framing to see how each treats inputs, outputs, and the probability distribution being estimated.

ℹ

The field is older than you think

In the spring of 2022, OpenAI released DALL-E 2 — not as an API or web interface, but on Instagram! You'd DM them a prompt and they'd send back an image. A lot of people walked away thinking generative image AI was new, but it wasn't.

2014–2017: GAN-generated human faces went from blurry, distorted patches of skin to photorealistic portraits indistinguishable from real photos.
2016: Image-to-image translation — sketches to filled-in images, daytime to nighttime scenes.
2017: Text-to-image translation — generating bird and flower images from text descriptions.
2016: Face-to-emoji, photo editing, rain removal, age progression, super-resolution, inpainting, 3D object generation, video frame generation.

The 2022 explosion was the result of a long, steady accumulation of research. Knowing this history matters because it tells you which problems are genuinely hard, which "new" techniques are old ideas with more compute, and where the field might go next.

2014
GANs Introduced
Goodfellow et al. publish the original GAN paper. Generated face images are blurry and clearly artificial — but the concept of adversarial training is born.
2015–2016
Conditional GANs & Image Translation
Conditioning inputs unlock day-to-night transfer, sketch-to-photo, colorization, and the first text-to-image systems for birds and flowers.
2017
ProgressiveGAN
Generated human faces become photorealistic. The gap between real and generated closes dramatically. Research community starts paying serious attention.
2019–2020
VAE-GAN Hybrids & StyleGAN
StyleGAN introduces style-based control over generated faces. VAE-GAN hybrids combine smooth latent spaces with sharp outputs.
2020–2021
Diffusion Models Emerge
Ho et al. formalize Denoising Diffusion Probabilistic Models. Early results quietly outperform GANs on image quality benchmarks.
2022
DALL-E 2, Stable Diffusion, Midjourney
Latent diffusion models reach the public. Text-to-image generation becomes widely accessible. The mainstream moment for generative AI.
2023–present
Multimodal & Video Generation
Diffusion applied to video (Sora), audio (AudioLDM), and 3D. Models generate consistent long-form video and coherent multi-turn creative content.

Traditional Approaches (Briefly)

Before neural networks took over, generative modeling lived in the world of Hidden Markov Models and Naive Bayes. We touched on HMMs back in the NLP unit. Remember the weather example? You can train an HMM on a text corpus to generate next words. We did this with Pride and Prejudice.

The result is... technically English. The model strings together plausible word transitions, but there's no global coherence, no characters, no plot, no actual meaning. Traditional approaches are extremely limited at generation. That's why neural networks took over so completely.

◆

Example: HMM Text Generation Trained on Pride and Prejudice

Input word: Love

conveyed him and his five cousins at a suitable hour to Meryton and the girls may go or you may send them by themselves you know â Elizabeth was distressed She felt that Janeâ s feelings she is not half so handsome as Jane nor half so good humoured as Lydia But you are always giving her the preference â â They have none of them do we â Darcy had walked away to another part of the business On finding Mrs Bennet Elizabeth and one of its narrowest parts They crossed it by a simple bridge in character with the general air of the scene it was a spot less adorned than any they had yet visited and the valley here contracted into a glen allowed room only for the stream and a narrow walk amidst the rough coppice wood which bordered it Elizabeth longed to observe that Mr Bingley had been a most delightful friend so easily guided that his worth was invaluable but she checked herself She remembered that he had but just courage enough to make her former assurance of her sisterâ s ready acquiescence â I hope â said she â Your cousinâ s conduct does not suit my feelings Why was he to be the judge â â You are then resolved to have him â â I have two small favours to request First that you will always exceed your income â â I hope not so Imprudence or thoughtlessness in money matters would be unpardonable in me â â Exceed their income My dear Mr Bennet But I knew not I was afraid of doing too much Wretched wretched mistake â Darcy made no answer and seemed desirous of changing the subject At that moment Sir William Lucas appeared close to them meaning to pass

Checkpoint

A discriminative model and a generative model are both trained on the same dataset of labeled handwritten digits (0–9). Which probability distribution does each one estimate?

Next→The GAN FrameworkGenerative Adversarial Networks