Unit 5
Generative AI
Master the three dominant generative frameworks — GANs, VAEs, and diffusion models — from first principles through practical application, including architecture selection, evaluation, and the failure modes that surface in deployed systems.
Chapter 1
Introduction to Generative AI
Everything we've built in this course has lived in discriminative machine learning — models that decide. This chapter pivots to generative machine learning: models that create. We define the probability distributions each framework estimates, trace the real history of generative AI from 2014 GAN experiments to 2022's public explosion, and contrast neural generative models with the limited traditional approaches that preceded them.
Chapter 2
Generative Adversarial Networks
GANs pit two neural networks against each other in a zero-sum game: a generator that creates fake samples and a discriminator that tries to tell real from fake. When trained together, the generator learns to produce outputs that are indistinguishable from real data. This chapter covers the setup, the min-max training objective from Goodfellow et al. (2014), the common training pathologies (vanishing gradients, mode collapse, instability), and the conditional GAN extensions that unlock text-to-image, image-to-image, and super-resolution applications.
Chapter 3
Variational Autoencoders
An autoencoder compresses data to a latent vector and reconstructs it — but nothing forces the latent space to be well-organized, making random sampling produce garbage. VAEs fix this by encoding inputs as distributions rather than points, and by adding KL divergence as a regularization term that shapes the latent space to be continuous and complete. This chapter builds from the vanilla autoencoder's failure mode to the full VAE architecture, the two-term loss function, and the real-world applications where VAEs outshine GANs: anomaly detection, controllable generation, and drug discovery.
Chapter 4
Transformers in Generative Systems
Transformers weren't designed exclusively for generation, but they've become the backbone of it — from decoder-only LLMs (GPT, Claude, Gemini) to the text encoders inside image diffusion models. This chapter revisits the three properties that make transformers so effective (attention, parallelism, transfer learning), revisits causal masking and autoregressive decoding, and previews how the transformer text encoder will appear as a key component inside the diffusion model architecture in the next chapter.
Chapter 5
Diffusion Models
This chapter covers the forward and reverse diffusion processes, then builds up the Latent Stable Diffusion architecture (Rombach et al., 2022) — showing how autoencoders, U-Nets, transformer text encoders, and cross-attention integrate into the system that powers DALL-E, Stable Diffusion, and Midjourney.
Chapter 6
Choosing, Evaluating, and Deploying
Picking the right generative architecture requires matching the model's strengths to the problem's requirements: GANs for visual quality, VAEs for structured latent spaces and anomaly detection, diffusion for state-of-the-art text-to-image. This chapter also discusses evaluation metrics: Inception Score, Fréchet Inception Distance, reconstruction error, and semantic similarity, explaining what each metric measures, where it falls short, and when to use it.
Chapter 7
Challenges and Mitigation Strategies
Generative models are powerful and fragile in equal measure. This chapter covers the five major failure modes you'll encounter in production: training data bias, GAN mode collapse, VAE overfitting, data and model drift, and adversarial attacks — with specific, actionable mitigation strategies for each.