Inception, DenseNet, and the Efficiency Era

Inception: Why Choose One Filter Size? (2014)

VGGNet committed to 3 × 3 filters everywhere. Inception (GoogLeNet) took a different philosophical stance: you do not have to choose a single filter size per layer. What if you applied multiple filter sizes in parallel and let the network decide which representations are useful?

An Inception module applies 1 × 1, 3 × 3, and 5 × 5 convolutions simultaneously, along with max pooling, and concatenates all the results along the channel dimension. The 1 × 1 convolutions serve as bottlenecks, reducing channel dimensionality before the more expensive 3 × 3 and 5 × 5 operations — a trick that makes the module surprisingly parameter-efficient.

The result: at the 2014 ILSVRC challenge, Inception achieved state-of-the-art results with 12× fewer parameters than AlexNet and better accuracy. This made it not just an academic success but a practical one — smaller models deploy faster and cheaper.

InceptionV3 architecture
Inception v3 Architecture [Source]

Hover over any component to learn what it does

Input1×1 Conv1×1 Conv1×1 ConvMax Pool3×3 Conv5×5 Conv1×1 ConvFilter ConcatenationOutput

The Inception module applies multiple filter sizes in parallel, letting the network learn at different scales simultaneously. Hover each component to explore its role.

DenseNet: Every Layer Connected to Every Other Layer

DenseNet takes the skip connection idea from ResNet and radicalizes it: every layer is connected to every other layer. In a network with L layers, there are L × (L+1) / 2 direct connections. Feature maps from layer 1 flow directly to layer 5. Feature maps from layer 3 flow directly to layer 7. All of them.

The benefits: alleviates vanishing gradient more aggressively than ResNet, and allows feature reuse throughout the entire network. The cost: significantly higher memory usage and a more complex hyperparameter landscape. DenseNet is powerful but demanding to train well.

DenseNet architecture
DenseNet Architecture [Source]

SqueezeNet: Doing More With Less

SqueezeNet is a masterclass in efficiency. By introducing a "squeeze" layer (1 × 1 convolutions to reduce channels) followed by an "expand" layer (a mix of 1 × 1 and 3 × 3 convolutions), SqueezeNet achieves competitive accuracy with dramatically fewer parameters than AlexNet — making it particularly useful for deployment on resource-constrained devices.

SqueezeNet architecture
SqueezeNet Architecture [Source]

Real World: Mobile and Edge Deployment

SqueezeNet and MobileNet-family models are designed with mobile devices and IoT hardware in mind. When you are running inference on a smartphone or a factory sensor, model size and latency matter as much as accuracy. The architectural innovations of this era — bottleneck convolutions, depthwise separable convolutions — all reflect the pressure to do more with less compute.

You Don't Need to Implement These From Scratch

All of the architectures described in this chapter — AlexNet, VGGNet, ResNet variants, Inception, DenseNet, SqueezeNet — are available in PyTorch's Model Zoo and TensorFlow's Model Garden, often with pre-trained weights included. Transfer learning (the next chapter) makes this directly practical.

💭Reflection

You are deploying a computer vision model to a microcontroller in an industrial sensor that has 512 KB of RAM and no GPU. Which architectural principles from this chapter would most directly guide your model selection?