Transfer Learning
The procedure for transfer learning is straightforward:
- Start with a pre-trained model (ResNet50, VGG19, etc.) with its weights already trained on a large dataset.
- Remove the final fully connected classification layer. Everything else — all of those convolutional layers with their learned feature detectors — stays.
- Freeze the existing weights by turning off gradient computation. You do not want to destroy what has already been learned.
- Add your own custom head — a new fully connected layer whose output size matches your number of target classes.
- Train normally. Because the backbone weights are frozen, only the new head's weights are updated. This is very fast.
- Optionally, unfreeze some or all of the backbone and fine-tune at a very low learning rate. This allows the deep features to adapt slightly to your specific domain.
Transfer Learning at a Glance
A generic model (A) trained on a large dataset (D_A) learns features that generalize. Those features are reused for a new task by replacing only the classification head with a trainable custom head (B) suited to task-specific outputs (Y_B).
Click through the tabs to see how pre-trained weights flow from the generic model into a task-specific backbone, with only the custom head trained from scratch.
Real World: Medical Imaging (Fibrosis Detection)
A team building a model to detect pulmonary fibrosis from X-ray images does not need to collect millions of X-ray images. They take a ResNet pre-trained on ImageNet, add a binary classification head, and train on a few thousand labeled X-rays. The ImageNet features — edges, textures, shapes — are already useful for reading medical images, even though the original training data contained no X-rays at all.
Fine-Tuning Trade-offs: How Much to Unfreeze?
Not all transfer learning scenarios call for the same degree of fine-tuning. Here is a practical framework:
- Freeze everything except the head — Use this when your target dataset is small and similar to the pre-training dataset. The ImageNet features are already good; you just need to learn new output categories.
- Unfreeze the last few convolutional layers — Use this when your target domain is somewhat different (e.g., medical images vs. everyday photos). The early layers (edge detectors) are still universally useful; the later layers may need to shift.
- Unfreeze everything (full fine-tuning) — Use this when you have a large task-specific dataset and your domain is very different from the pre-training domain. You are essentially using the pre-trained weights as a smarter initialization than random.
In all fine-tuning scenarios: use a lower learning rate than you would for training from scratch — typically 10× to 100× lower. High learning rates will overwrite the useful features that the pre-trained model spent enormous resources learning.
You have 500 labeled satellite images of crop disease — a very small dataset — and want to classify them. The closest available pre-trained model was trained on ImageNet (everyday objects). What is the best transfer learning strategy?