Questionnaire

  1. What is a “hook” in PyTorch?
  2. Which layer does CAM use the outputs of?
  3. Why does CAM require a hook?
  4. Look at the source code of the ActivationStats class and see how it uses hooks.
  5. Write a hook that stores the activations of a given layer in a model (without peeking, if possible).
  6. Why do we call eval before getting the activations? Why do we use no_grad?
  7. Use torch.einsum to compute the “dog” or “cat” score of each of the locations in the last activation of the body of the model.
  8. How do you check which order the categories are in (i.e., the correspondence of index->category)?
  9. Why are we using decode when displaying the input image?
  10. What is a “context manager”? What special methods need to be defined to create one?
  11. Why can’t we use plain CAM for the inner layers of a network?
  12. Why do we need to register a hook on the backward pass in order to do Grad-CAM?
  13. Why can’t we call output.backward() when output is a rank-2 tensor of output activations per image per class?

Further Research

  1. Try removing keepdim and see what happens. Look up this parameter in the PyTorch docs. Why do we need it in this notebook?
  2. Create a notebook like this one, but for NLP, and use it to find which words in a movie review are most significant in assessing the sentiment of a particular movie review.

In [ ]: