Data Drift and Model Drift

Challenge 4: Data Drift and Model Drift

While data and model drift are general problems that apply to all ML systems, generative systems are particularly vulnerable to these issues because the distribution of inputs (prompts, conditioning data) and the distribution of expected outputs can both drift over time. A model trained on 2022 image aesthetics may feel obviously dated in 2026. A text-to-image model fine-tuned on product photos from one year's catalog may produce off-brand results when the visual language of the brand evolves.

⟳Interactive · Data Drift Over Time

What time do students arrive relative to the class start time (0)? Hover a chart to highlight it.

Hover a chart to see observations about drift.

The drift analogy

Just as the student arrival distribution shifts over a semester, a deployed model's input distribution shifts as user behavior, language, or world events evolve. The model was trained on Week 1 data — by Week 12, it's operating out-of-distribution.

Student arrival times drift over a semester — the same phenomenon that causes deployed ML models to degrade as their input distribution shifts away from training data.

⚠

Mitigation: Continuous monitoring

Set up data drift detection and model drift detection. We covered this in the Recommendation Systems chapter and the principles transfer directly. Decide on retraining cadence in advance. Don't wait until a customer complaint reveals that your model has been quietly degrading for six months.

For generative models specifically: track your evaluation metrics (FID, semantic similarity) on a consistent held-out evaluation set over time. A rising FID curve is an early warning sign even before user complaints arrive.

←PreviousVAE OverfittingChallenges and Mitigation Strategies Next→Adversarial AttacksChallenges and Mitigation Strategies