Progressive Resizing

When fast.ai and its team of students won the DAWNBench competition in 2018, one of the most important innovations was something very simple: start training using small images, and end training using large images. Spending most of the epochs training with small images, helps training complete much faster. Completing training using large images makes the final accuracy much higher. We call this approach progressive resizing.

jargon: progressive resizing: Gradually using larger and larger images as you train.

As we have seen, the kinds of features that are learned by convolutional neural networks are not in any way specific to the size of the image—early layers find things like edges and gradients, and later layers may find things like noses and sunsets. So, when we change image size in the middle of training, it doesn’t mean that we have to find totally different parameters for our model.

But clearly there are some differences between small images and big ones, so we shouldn’t expect our model to continue working exactly as well, with no changes at all. Does this remind you of something? When we developed this idea, it reminded us of transfer learning! We are trying to get our model to learn to do something a little bit different from what it has learned to do before. Therefore, we should be able to use the fine_tune method after we resize our images.

There is an additional benefit to progressive resizing: it is another form of data augmentation. Therefore, you should expect to see better generalization of your models that are trained with progressive resizing.

To implement progressive resizing it is most convenient if you first create a get_dls function which takes an image size and a batch size as we did in the section before, and returns your DataLoaders:

Now you can create your DataLoaders with a small size and use fit_one_cycle in the usual way, training for a few less epochs than you might otherwise do:

In [ ]:

  1. dls = get_dls(128, 128)
  2. learn = Learner(dls, xresnet50(n_out=dls.c), loss_func=CrossEntropyLossFlat(),
  3. metrics=accuracy)
  4. learn.fit_one_cycle(4, 3e-3)
epochtrain_lossvalid_lossaccuracytime
01.9029432.4470060.40141900:30
11.3152031.5729920.52576500:30
21.0011990.7678860.75914900:30
30.7658640.6655620.79798400:30

Then you can replace the DataLoaders inside the Learner, and fine-tune:

In [ ]:

  1. learn.dls = get_dls(64, 224)
  2. learn.fine_tune(5, 1e-3)
epochtrain_lossvalid_lossaccuracytime
00.9852131.6540630.56572101:06
epochtrain_lossvalid_lossaccuracytime
00.7068690.6896220.78454101:07
10.7392170.9285410.71247201:07
20.6294620.7889060.76400301:07
30.4919120.5026220.83644501:06
40.4148800.4313320.86333101:06

As you can see, we’re getting much better performance, and the initial training on small images was much faster on each epoch.

You can repeat the process of increasing size and training more epochs as many times as you like, for as big an image as you wish—but of course, you will not get any benefit by using an image size larger than the size of your images on disk.

Note that for transfer learning, progressive resizing may actually hurt performance. This is most likely to happen if your pretrained model was quite similar to your transfer learning task and dataset and was trained on similar-sized images, so the weights don’t need to be changed much. In that case, training on smaller images may damage the pretrained weights.

On the other hand, if the transfer learning task is going to use images that are of different sizes, shapes, or styles than those used in the pretraining task, progressive resizing will probably help. As always, the answer to “Will it help?” is “Try it!”

Another thing we could try is applying data augmentation to the validation set. Up until now, we have only applied it on the training set; the validation set always gets the same images. But maybe we could try to make predictions for a few augmented versions of the validation set and average them. We’ll consider this approach next.