Deep Learning
The word "deep" in deep learning refers specifically to the number of layers through which data is transformed. More layers allow the network to learn more complex, abstract representations of the input. A shallow network might learn to detect edges in an image. A deeper network might learn to detect edges, then shapes, then object parts, then objects — each layer building on the abstractions of the one before it.
Here is a way to think about the AI landscape spatially:
- AI is the broad umbrella: any technique that makes machines appear intelligent, including knowledge bases, rule systems, and search algorithms.
- Machine Learning is a subset of AI: algorithms that improve through experience. Logistic regression, decision trees, support vector machines.
- Representation Learning is a subset of machine learning: algorithms that automatically discover the features they need. Shallow autoencoders.
- Deep Learning is a subset of representation learning: algorithms that discover hierarchical features across multiple layers of transformation. Multi-layer perceptrons, convolutional neural networks, Transformers.
The defining characteristic of deep learning — and the reason it has revolutionized fields from computer vision to natural language processing — is that it automatically discovers and learns the features it needs from raw data. You do not need to hand-engineer features. The network figures it out.
When Should You Use Deep Learning?
Use deep learning when…
- You have a large amount of training data (tens of thousands to millions of examples)
- Your input has high dimensionality or is unstructured (images, text, audio, video)
- You suspect complex, non-linear relationships between your inputs and outputs
- Explainability is not your primary concern
Consider traditional machine learning when…
- You have limited training data
- Your features are well-understood and can be hand-engineered
- Interpretability is important (healthcare, finance, legal)
- Computational resources are constrained
The Costs of Going Deep
Neural networks are not free. A modest 3-layer network for a 28×28 image has 784,000 weights to learn. The computational cost of training state-of-the-art models has roughly doubled every 3–4 months since 2012 — a 300,000× increase in a decade. Training is also more challenging than traditional methods: the error surface is non-convex, which means you are not guaranteed to find the global minimum. And deep networks are notorious for overfitting, especially when training data is limited.
Understanding these costs is not a reason to avoid deep learning. It is a reason to use it deliberately.
Real-World Application
One of the most common mistakes new practitioners make is reaching for a deep neural network for every problem. If you are predicting customer lifetime value from 10 structured features, a gradient boosted tree will likely outperform a neural network, train 1000× faster, and be far easier to explain to a business stakeholder. Reserve deep learning for the problems it is actually designed for.
A hospital wants to predict 30-day readmission risk from a structured patient record with 35 clinical variables. They have 8,000 labeled examples. Which approach is most appropriate?