LeNet to AlexNet: From Proof of Concept to Revolution

LeNet: The First Proof of Concept (1998)

The story of CNN architectures begins, surprisingly, in the late 1990s. Yann LeCun and colleagues built LeNet — a convolutional neural network designed to read handwritten digits — for use by the US Postal Service. Banks and post offices wanted to automatically read zip codes on envelopes, and LeNet delivered.

LeNet Architecture
LeNet Architecture [Source]

The architecture itself is straightforward by modern standards: an input layer, a couple of convolutional feature map stages with subsampling, and a pair of fully connected layers. But it worked. It worked in a real operational context, at scale, before the term "deep learning" had even been coined.

The irony is that despite this success, neural networks were not considered serious technology at the time. The machine learning community largely viewed them as curiosities, theoretically interesting but impractical.

Gif of LeNet working
LeNet (1998): the simple architecture worked surprisingly well for the task of reading handwritten digits. [Source: Yann LeCun]

ImageNet and the Conditions for a Revolution

Fei-Fei Li, a young professor at Stanford, had a vision: if the field wanted to push past its limitations, it needed more data. Not a few thousand images. Millions. She and her lab constructed ImageNet, a massive, human-labeled dataset of over a million images scraped from the internet, organized into thousands of categories.

To accelerate community progress, they launched the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), an annual competition where teams would test their models on ImageNet's held-out test set and post results to a public leaderboard. Competition, as always, fueled innovation.

TED Talk by Fei-Fei Li on ImageNet from 2015

AlexNet and the Moment Everything Changed (2012)

In 2012, something unexpected happened. A team led by Alex Krizhevsky, using a deep convolutional neural network now called AlexNet, won the ILSVRC challenge by a margin that shocked the community. The error rate dropped dramatically compared to the best non-neural approaches — not by a little, but by a lot.

AlexNet was essentially a deeper, wider version of LeNet: more convolutional layers, more filters, trained on GPUs for the first time at this scale. Its success demonstrated something important: neural networks work really well when you have enough data. The challenge had not been the architecture. It had been the data.

AlexNet Architecture
AlexNet Architecture [Source]

The ImageNet Moment

The 2012 ILSVRC is often cited as the starting gun for the modern deep learning era. Research funding flooded in. Academics who had been dismissed for working on neural networks suddenly found themselves in high demand at technology companies. The field's self-perception changed overnight.