Transfer Learning

The procedure for transfer learning is straightforward:

  1. Start with a pre-trained model (ResNet50, VGG19, etc.) with its weights already trained on a large dataset.
  2. Remove the final fully connected classification layer. Everything else — all of those convolutional layers with their learned feature detectors — stays.
  3. Freeze the existing weights by turning off gradient computation. You do not want to destroy what has already been learned.
  4. Add your own custom head — a new fully connected layer whose output size matches your number of target classes.
  5. Train normally. Because the backbone weights are frozen, only the new head's weights are updated. This is very fast.
  6. Optionally, unfreeze some or all of the backbone and fine-tune at a very low learning rate. This allows the deep features to adapt slightly to your specific domain.
Transfer Learning — Interactive Diagram
GENERIC (PRE-TRAINING)DAGeneric datasetAGeneric modelTAGeneric taskTASK-SPECIFIC (ADAPTATION)XBSmaller,task-specific dataA′Pre-trainedBTrainablecustom headYBSpecific taskoutputs
Generic / pre-trained Task-specific- - Weight transfer

Transfer Learning at a Glance

A generic model (A) trained on a large dataset (D_A) learns features that generalize. Those features are reused for a new task by replacing only the classification head with a trainable custom head (B) suited to task-specific outputs (Y_B).

Click through the tabs to see how pre-trained weights flow from the generic model into a task-specific backbone, with only the custom head trained from scratch.

Real World: Medical Imaging (Fibrosis Detection)

A team building a model to detect pulmonary fibrosis from X-ray images does not need to collect millions of X-ray images. They take a ResNet pre-trained on ImageNet, add a binary classification head, and train on a few thousand labeled X-rays. The ImageNet features — edges, textures, shapes — are already useful for reading medical images, even though the original training data contained no X-rays at all.

Fine-Tuning Trade-offs: How Much to Unfreeze?

Not all transfer learning scenarios call for the same degree of fine-tuning. Here is a practical framework:

  • Freeze everything except the head — Use this when your target dataset is small and similar to the pre-training dataset. The ImageNet features are already good; you just need to learn new output categories.
  • Unfreeze the last few convolutional layers — Use this when your target domain is somewhat different (e.g., medical images vs. everyday photos). The early layers (edge detectors) are still universally useful; the later layers may need to shift.
  • Unfreeze everything (full fine-tuning) — Use this when you have a large task-specific dataset and your domain is very different from the pre-training domain. You are essentially using the pre-trained weights as a smarter initialization than random.

In all fine-tuning scenarios: use a lower learning rate than you would for training from scratch — typically 10× to 100× lower. High learning rates will overwrite the useful features that the pre-trained model spent enormous resources learning.

Checkpoint

You have 500 labeled satellite images of crop disease — a very small dataset — and want to classify them. The closest available pre-trained model was trained on ImageNet (everyday objects). What is the best transfer learning strategy?