13 Convolutional Neural Networks - Questionnaire - 《The fastai book》

Questionnaire
- Further Research

Questionnaire

What is a “feature”?
Write out the convolutional kernel matrix for a top edge detector.
Write out the mathematical operation applied by a 3×3 kernel to a single pixel in an image.
What is the value of a convolutional kernel apply to a 3×3 matrix of zeros?
What is “padding”?
What is “stride”?
Create a nested list comprehension to complete any task that you choose.
What are the shapes of the input and weight parameters to PyTorch’s 2D convolution?
What is a “channel”?
What is the relationship between a convolution and a matrix multiplication?
What is a “convolutional neural network”?
What is the benefit of refactoring parts of your neural network definition?
What is Flatten? Where does it need to be included in the MNIST CNN? Why?
What does “NCHW” mean?
Why does the third layer of the MNIST CNN have 7*7*(1168-16) multiplications?
What is a “receptive field”?
What is the size of the receptive field of an activation after two stride 2 convolutions? Why?
Run conv-example.xlsx yourself and experiment with trace precedents.
Have a look at Jeremy or Sylvain’s list of recent Twitter “like”s, and see if you find any interesting resources or ideas there.
How is a color image represented as a tensor?
How does a convolution work with a color input?
What method can we use to see that data in DataLoaders?
Why do we double the number of filters after each stride-2 conv?
Why do we use a larger kernel in the first conv with MNIST (with simple_cnn)?
What information does ActivationStats save for each layer?
How can we access a learner’s callback after training?
What are the three statistics plotted by plot_layer_stats? What does the x-axis represent?
Why are activations near zero problematic?
What are the upsides and downsides of training with a larger batch size?
Why should we avoid using a high learning rate at the start of training?
What is 1cycle training?
What are the benefits of training with a high learning rate?
Why do we want to use a low learning rate at the end of training?
What is “cyclical momentum”?
What callback tracks hyperparameter values during training (along with other information)?
What does one column of pixels in the color_dim plot represent?
What does “bad training” look like in color_dim? Why?
What trainable parameters does a batch normalization layer contain?
What statistics are used to normalize in batch normalization during training? How about during validation?
Why do models with batch normalization layers generalize better?

Further Research

What features other than edge detectors have been used in computer vision (especially before deep learning became popular)?
There are other normalization layers available in PyTorch. Try them out and see what works best. Learn about why other normalization layers have been developed, and how they differ from batch normalization.
Try moving the activation function after the batch normalization layer in conv. Does it make a difference? See what you can find out about what order is recommended, and why.

In [ ]: