Questionnaire

  1. What is the equation for a step of SGD, in math or code (as you prefer)?
  2. What do we pass to cnn_learner to use a non-default optimizer?
  3. What are optimizer callbacks?
  4. What does zero_grad do in an optimizer?
  5. What does step do in an optimizer? How is it implemented in the general optimizer?
  6. Rewrite sgd_cb to use the += operator, instead of add_.
  7. What is “momentum”? Write out the equation.
  8. What’s a physical analogy for momentum? How does it apply in our model training settings?
  9. What does a bigger value for momentum do to the gradients?
  10. What are the default values of momentum for 1cycle training?
  11. What is RMSProp? Write out the equation.
  12. What do the squared values of the gradients indicate?
  13. How does Adam differ from momentum and RMSProp?
  14. Write out the equation for Adam.
  15. Calculate the values of unbias_avg and w.avg for a few batches of dummy values.
  16. What’s the impact of having a high eps in Adam?
  17. Read through the optimizer notebook in fastai’s repo, and execute it.
  18. In what situations do dynamic learning rate methods like Adam change the behavior of weight decay?
  19. What are the four steps of a training loop?
  20. Why is using callbacks better than writing a new training loop for each tweak you want to add?
  21. What aspects of the design of fastai’s callback system make it as flexible as copying and pasting bits of code?
  22. How can you get the list of events available to you when writing a callback?
  23. Write the ModelResetter callback (without peeking).
  24. How can you access the necessary attributes of the training loop inside a callback? When can you use or not use the shortcuts that go with them?
  25. How can a callback influence the control flow of the training loop?
  26. Write the TerminateOnNaN callback (without peeking, if possible).
  27. How do you make sure your callback runs after or before another callback?

Further Research

  1. Look up the “Rectified Adam” paper, implement it using the general optimizer framework, and try it out. Search for other recent optimizers that work well in practice, and pick one to implement.
  2. Look at the mixed-precision callback with the documentation. Try to understand what each event and line of code does.
  3. Implement your own version of the learning rate finder from scratch. Compare it with fastai’s version.
  4. Look at the source code of the callbacks that ship with fastai. See if you can find one that’s similar to what you’re looking to do, to get some inspiration.