Presizing

We need our images to have the same dimensions, so that they can collate into tensors to be passed to the GPU. We also want to minimize the number of distinct augmentation computations we perform. The performance requirement suggests that we should, where possible, compose our augmentation transforms into fewer transforms (to reduce the number of computations and the number of lossy operations) and transform the images into uniform sizes (for more efficient processing on the GPU).

The challenge is that, if performed after resizing down to the augmented size, various common data augmentation transforms might introduce spurious empty zones, degrade data, or both. For instance, rotating an image by 45 degrees fills corner regions of the new bounds with emptiness, which will not teach the model anything. Many rotation and zooming operations will require interpolating to create pixels. These interpolated pixels are derived from the original image data but are still of lower quality.

To work around these challenges, presizing adopts two strategies that are shown in <>:

  1. Resize images to relatively “large” dimensions—that is, dimensions significantly larger than the target training dimensions.
  2. Compose all of the common augmentation operations (including a resize to the final target size) into one, and perform the combined operation on the GPU only once at the end of processing, rather than performing the operations individually and interpolating multiple times.

The first step, the resize, creates images large enough that they have spare margin to allow further augmentation transforms on their inner regions without creating empty zones. This transformation works by resizing to a square, using a large crop size. On the training set, the crop area is chosen randomly, and the size of the crop is selected to cover the entire width or height of the image, whichever is smaller.

In the second step, the GPU is used for all data augmentation, and all of the potentially destructive operations are done together, with a single interpolation at the end.

Presizing on the training set

This picture shows the two steps:

  1. Crop full width or height: This is in item_tfms, so it’s applied to each individual image before it is copied to the GPU. It’s used to ensure all images are the same size. On the training set, the crop area is chosen randomly. On the validation set, the center square of the image is always chosen.
  2. Random crop and augment: This is in batch_tfms, so it’s applied to a batch all at once on the GPU, which means it’s fast. On the validation set, only the resize to the final size needed for the model is done here. On the training set, the random crop and any other augmentations are done first.

To implement this process in fastai you use Resize as an item transform with a large size, and RandomResizedCrop as a batch transform with a smaller size. RandomResizedCrop will be added for you if you include the min_scale parameter in your aug_transforms function, as was done in the DataBlock call in the previous section. Alternatively, you can use pad or squish instead of crop (the default) for the initial Resize.

<> shows the difference between an image that has been zoomed, interpolated, rotated, and then interpolated again (which is the approach used by all other deep learning libraries), shown here on the right, and an image that has been zoomed and rotated as one operation and then interpolated just once on the left (the fastai approach), shown here on the left.

In [ ]:

  1. #hide_input
  2. #id interpolations
  3. #caption A comparison of fastai's data augmentation strategy (left) and the traditional approach (right).
  4. dblock1 = DataBlock(blocks=(ImageBlock(), CategoryBlock()),
  5. get_y=parent_label,
  6. item_tfms=Resize(460))
  7. # Place an image in the 'images/grizzly.jpg' subfolder where this notebook is located before running this
  8. dls1 = dblock1.dataloaders([(Path.cwd()/'images'/'grizzly.jpg')]*100, bs=8)
  9. dls1.train.get_idxs = lambda: Inf.ones
  10. x,y = dls1.valid.one_batch()
  11. _,axs = subplots(1, 2)
  12. x1 = TensorImage(x.clone())
  13. x1 = x1.affine_coord(sz=224)
  14. x1 = x1.rotate(draw=30, p=1.)
  15. x1 = x1.zoom(draw=1.2, p=1.)
  16. x1 = x1.warp(draw_x=-0.2, draw_y=0.2, p=1.)
  17. tfms = setup_aug_tfms([Rotate(draw=30, p=1, size=224), Zoom(draw=1.2, p=1., size=224),
  18. Warp(draw_x=-0.2, draw_y=0.2, p=1., size=224)])
  19. x = Pipeline(tfms)(x)
  20. #x.affine_coord(coord_tfm=coord_tfm, sz=size, mode=mode, pad_mode=pad_mode)
  21. TensorImage(x[0]).show(ctx=axs[0])
  22. TensorImage(x1[0]).show(ctx=axs[1]);

Presizing - 图2

You can see that the image on the right is less well defined and has reflection padding artifacts in the bottom-left corner; also, the grass at the top left has disappeared entirely. We find that in practice using presizing significantly improves the accuracy of models, and often results in speedups too.

The fastai library also provides simple ways to check your data looks right before training a model, which is an extremely important step. We’ll look at those next.

Checking and Debugging a DataBlock

We can never just assume that our code is working perfectly. Writing a DataBlock is just like writing a blueprint. You will get an error message if you have a syntax error somewhere in your code, but you have no guarantee that your template is going to work on your data source as you intend. So, before training a model you should always check your data. You can do this using the show_batch method:

In [ ]:

  1. dls.show_batch(nrows=1, ncols=3)

Presizing - 图3

Take a look at each image, and check that each one seems to have the correct label for that breed of pet. Often, data scientists work with data with which they are not as familiar as domain experts may be: for instance, I actually don’t know what a lot of these pet breeds are. Since I am not an expert on pet breeds, I would use Google images at this point to search for a few of these breeds, and make sure the images look similar to what I see in this output.

If you made a mistake while building your DataBlock, it is very likely you won’t see it before this step. To debug this, we encourage you to use the summary method. It will attempt to create a batch from the source you give it, with a lot of details. Also, if it fails, you will see exactly at which point the error happens, and the library will try to give you some help. For instance, one common mistake is to forget to use a Resize transform, so you end up with pictures of different sizes and are not able to batch them. Here is what the summary would look like in that case (note that the exact text may have changed since the time of writing, but it will give you an idea):

In [ ]:

  1. #hide_output
  2. pets1 = DataBlock(blocks = (ImageBlock, CategoryBlock),
  3. get_items=get_image_files,
  4. splitter=RandomSplitter(seed=42),
  5. get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
  6. pets1.summary(path/"images")
  1. Setting-up type transforms pipelines
  2. Collecting items from /home/jhoward/.fastai/data/oxford-iiit-pet/images
  3. Found 7390 items
  4. 2 datasets of sizes 5912,1478
  5. Setting up Pipeline: PILBase.create
  6. Setting up Pipeline: partial -> Categorize
  7. Building one sample
  8. Pipeline: PILBase.create
  9. starting from
  10. /home/jhoward/.fastai/data/oxford-iiit-pet/images/american_pit_bull_terrier_31.jpg
  11. applying PILBase.create gives
  12. PILImage mode=RGB size=500x414
  13. Pipeline: partial -> Categorize
  14. starting from
  15. /home/jhoward/.fastai/data/oxford-iiit-pet/images/american_pit_bull_terrier_31.jpg
  16. applying partial gives
  17. american_pit_bull_terrier
  18. applying Categorize gives
  19. TensorCategory(13)
  20. Final sample: (PILImage mode=RGB size=500x414, TensorCategory(13))
  21. Setting up after_item: Pipeline: ToTensor
  22. Setting up before_batch: Pipeline:
  23. Setting up after_batch: Pipeline: IntToFloatTensor
  24. Building one batch
  25. Applying item_tfms to the first sample:
  26. Pipeline: ToTensor
  27. starting from
  28. (PILImage mode=RGB size=500x414, TensorCategory(13))
  29. applying ToTensor gives
  30. (TensorImage of size 3x414x500, TensorCategory(13))
  31. Adding the next 3 samples
  32. No before_batch transform to apply
  33. Collating items in a batch
  34. Error! It's not possible to collate your items in a batch
  35. Could not collate the 0-th members of your tuples because got the following shapes
  36. torch.Size([3, 414, 500]),torch.Size([3, 375, 500]),torch.Size([3, 500, 281]),torch.Size([3, 203, 300])
  1. ---------------------------------------------------------------------------
  2. RuntimeError Traceback (most recent call last)
  3. <ipython-input-11-8c0a3d421ca2> in <module>
  4. 4 splitter=RandomSplitter(seed=42),
  5. 5 get_y=using_attr(RegexLabeller(r'(.+)_\d+.jpg$'), 'name'))
  6. ----> 6 pets1.summary(path/"images")
  7. ~/git/fastai/fastai/data/block.py in summary(self, source, bs, show_batch, **kwargs)
  8. 182 why = _find_fail_collate(s)
  9. 183 print("Make sure all parts of your samples are tensors of the same size" if why is None else why)
  10. --> 184 raise e
  11. 185
  12. 186 if len([f for f in dls.train.after_batch.fs if f.name != 'noop'])!=0:
  13. ~/git/fastai/fastai/data/block.py in summary(self, source, bs, show_batch, **kwargs)
  14. 176 print("\nCollating items in a batch")
  15. 177 try:
  16. --> 178 b = dls.train.create_batch(s)
  17. 179 b = retain_types(b, s[0] if is_listy(s) else s)
  18. 180 except Exception as e:
  19. ~/git/fastai/fastai/data/load.py in create_batch(self, b)
  20. 125 def retain(self, res, b): return retain_types(res, b[0] if is_listy(b) else b)
  21. 126 def create_item(self, s): return next(self.it) if s is None else self.dataset[s]
  22. --> 127 def create_batch(self, b): return (fa_collate,fa_convert)[self.prebatched](b)
  23. 128 def do_batch(self, b): return self.retain(self.create_batch(self.before_batch(b)), b)
  24. 129 def to(self, device): self.device = device
  25. ~/git/fastai/fastai/data/load.py in fa_collate(t)
  26. 44 b = t[0]
  27. 45 return (default_collate(t) if isinstance(b, _collate_types)
  28. ---> 46 else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)
  29. 47 else default_collate(t))
  30. 48
  31. ~/git/fastai/fastai/data/load.py in <listcomp>(.0)
  32. 44 b = t[0]
  33. 45 return (default_collate(t) if isinstance(b, _collate_types)
  34. ---> 46 else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)
  35. 47 else default_collate(t))
  36. 48
  37. ~/git/fastai/fastai/data/load.py in fa_collate(t)
  38. 43 def fa_collate(t):
  39. 44 b = t[0]
  40. ---> 45 return (default_collate(t) if isinstance(b, _collate_types)
  41. 46 else type(t[0])([fa_collate(s) for s in zip(*t)]) if isinstance(b, Sequence)
  42. 47 else default_collate(t))
  43. ~/anaconda3/lib/python3.7/site-packages/torch/utils/data/_utils/collate.py in default_collate(batch)
  44. 53 storage = elem.storage()._new_shared(numel)
  45. 54 out = elem.new(storage)
  46. ---> 55 return torch.stack(batch, 0, out=out)
  47. 56 elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \
  48. 57 and elem_type.__name__ != 'string_':
  49. RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 414 and 375 in dimension 2 at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/TH/generic/THTensor.cpp:612
  1. Setting-up type transforms pipelines
  2. Collecting items from /home/sgugger/.fastai/data/oxford-iiit-pet/images
  3. Found 7390 items
  4. 2 datasets of sizes 5912,1478
  5. Setting up Pipeline: PILBase.create
  6. Setting up Pipeline: partial -> Categorize
  7. Building one sample
  8. Pipeline: PILBase.create
  9. starting from
  10. /home/sgugger/.fastai/data/oxford-iiit-pet/images/american_bulldog_83.jpg
  11. applying PILBase.create gives
  12. PILImage mode=RGB size=375x500
  13. Pipeline: partial -> Categorize
  14. starting from
  15. /home/sgugger/.fastai/data/oxford-iiit-pet/images/american_bulldog_83.jpg
  16. applying partial gives
  17. american_bulldog
  18. applying Categorize gives
  19. TensorCategory(12)
  20. Final sample: (PILImage mode=RGB size=375x500, TensorCategory(12))
  21. Setting up after_item: Pipeline: ToTensor
  22. Setting up before_batch: Pipeline:
  23. Setting up after_batch: Pipeline: IntToFloatTensor
  24. Building one batch
  25. Applying item_tfms to the first sample:
  26. Pipeline: ToTensor
  27. starting from
  28. (PILImage mode=RGB size=375x500, TensorCategory(12))
  29. applying ToTensor gives
  30. (TensorImage of size 3x500x375, TensorCategory(12))
  31. Adding the next 3 samples
  32. No before_batch transform to apply
  33. Collating items in a batch
  34. Error! It's not possible to collate your items in a batch
  35. Could not collate the 0-th members of your tuples because got the following
  36. shapes:
  37. torch.Size([3, 500, 375]),torch.Size([3, 375, 500]),torch.Size([3, 333, 500]),
  38. torch.Size([3, 375, 500])

You can see exactly how we gathered the data and split it, how we went from a filename to a sample (the tuple (image, category)), then what item transforms were applied and how it failed to collate those samples in a batch (because of the different shapes).

Once you think your data looks right, we generally recommend the next step should be using it to train a simple model. We often see people put off the training of an actual model for far too long. As a result, they don’t actually find out what their baseline results look like. Perhaps your problem doesn’t need lots of fancy domain-specific engineering. Or perhaps the data doesn’t seem to train the model at all. These are things that you want to know as soon as possible. For this initial test, we’ll use the same simple model that we used in <>:

In [ ]:

  1. learn = cnn_learner(dls, resnet34, metrics=error_rate)
  2. learn.fine_tune(2)
epochtrain_lossvalid_losserror_ratetime
01.5513050.3221320.10622500:19
epochtrain_lossvalid_losserror_ratetime
00.5294730.3121480.09539900:23
10.3302070.2458830.08051400:24

As we’ve briefly discussed before, the table shown when we fit a model shows us the results after each epoch of training. Remember, an epoch is one complete pass through all of the images in the data. The columns shown are the average loss over the items of the training set, the loss on the validation set, and any metrics that we requested—in this case, the error rate.

Remember that loss is whatever function we’ve decided to use to optimize the parameters of our model. But we haven’t actually told fastai what loss function we want to use. So what is it doing? fastai will generally try to select an appropriate loss function based on what kind of data and model you are using. In this case we have image data and a categorical outcome, so fastai will default to using cross-entropy loss.