Conclusion

In this chapter we explored the foundations of deep learning, beginning with matrix multiplication and moving on to implementing the forward and backward passes of a neural net from scratch. We then refactored our code to show how PyTorch works beneath the hood.

Here are a few things to remember:

  • A neural net is basically a bunch of matrix multiplications with nonlinearities in between.
  • Python is slow, so to write fast code we have to vectorize it and take advantage of techniques such as elementwise arithmetic and broadcasting.
  • Two tensors are broadcastable if the dimensions starting from the end and going backward match (if they are the same, or one of them is 1). To make tensors broadcastable, we may need to add dimensions of size 1 with unsqueeze or a None index.
  • Properly initializing a neural net is crucial to get training started. Kaiming initialization should be used when we have ReLU nonlinearities.
  • The backward pass is the chain rule applied multiple times, computing the gradients from the output of our model and going back, one layer at a time.
  • When subclassing nn.Module (if not using fastai’s Module) we have to call the superclass __init__ method in our __init__ method and we have to define a forward function that takes an input and returns the desired result.