Module and Parameter

To create a model, we’ll need Module. To create Module, we’ll need Parameter, so let’s start there. Recall that in <> we said that the Parameter class “doesn’t actually add any functionality (other than automatically calling requires_grad_ for us). It’s only used as a “marker” to show what to include in parameters.” Here’s a definition which does exactly that:

In [ ]:

  1. class Parameter(Tensor):
  2. def __new__(self, x): return Tensor._make_subclass(Parameter, x, True)
  3. def __init__(self, *args, **kwargs): self.requires_grad_()

The implementation here is a bit awkward: we have to define the special __new__ Python method and use the internal PyTorch method _make_subclass because, as at the time of writing, PyTorch doesn’t otherwise work correctly with this kind of subclassing or provide an officially supported API to do this. This may have been fixed by the time you read this, so look on the book’s website to see if there are updated details.

Our Parameter now behaves just like a tensor, as we wanted:

In [ ]:

  1. Parameter(tensor(3.))

Out[ ]:

  1. tensor(3., requires_grad=True)

Now that we have this, we can define Module:

In [ ]:

  1. class Module:
  2. def __init__(self):
  3. self.hook,self.params,self.children,self._training = None,[],[],False
  4. def register_parameters(self, *ps): self.params += ps
  5. def register_modules (self, *ms): self.children += ms
  6. @property
  7. def training(self): return self._training
  8. @training.setter
  9. def training(self,v):
  10. self._training = v
  11. for m in self.children: m.training=v
  12. def parameters(self):
  13. return self.params + sum([m.parameters() for m in self.children], [])
  14. def __setattr__(self,k,v):
  15. super().__setattr__(k,v)
  16. if isinstance(v,Parameter): self.register_parameters(v)
  17. if isinstance(v,Module): self.register_modules(v)
  18. def __call__(self, *args, **kwargs):
  19. res = self.forward(*args, **kwargs)
  20. if self.hook is not None: self.hook(res, args)
  21. return res
  22. def cuda(self):
  23. for p in self.parameters(): p.data = p.data.cuda()

The key functionality is in the definition of parameters:

  1. self.params + sum([m.parameters() for m in self.children], [])

This means that we can ask any Module for its parameters, and it will return them, including for all its child modules (recursively). But how does it know what its parameters are? It’s thanks to implementing Python’s special __setattr__ method, which is called for us any time Python sets an attribute on a class. Our implementation includes this line:

  1. if isinstance(v,Parameter): self.register_parameters(v)

As you see, this is where we use our new Parameter class as a “marker”—anything of this class is added to our params.

Python’s __call__ allows us to define what happens when our object is treated as a function; we just call forward (which doesn’t exist here, so it’ll need to be added by subclasses). Before we do, we’ll call a hook, if it’s defined. Now you can see that PyTorch hooks aren’t doing anything fancy at all—they’re just calling any hooks have been registered.

Other than these pieces of functionality, our Module also provides cuda and training attributes, which we’ll use shortly.

Now we can create our first Module, which is ConvLayer:

In [ ]:

  1. class ConvLayer(Module):
  2. def __init__(self, ni, nf, stride=1, bias=True, act=True):
  3. super().__init__()
  4. self.w = Parameter(torch.zeros(nf,ni,3,3))
  5. self.b = Parameter(torch.zeros(nf)) if bias else None
  6. self.act,self.stride = act,stride
  7. init = nn.init.kaiming_normal_ if act else nn.init.xavier_normal_
  8. init(self.w)
  9. def forward(self, x):
  10. x = F.conv2d(x, self.w, self.b, stride=self.stride, padding=1)
  11. if self.act: x = F.relu(x)
  12. return x

We’re not implementing F.conv2d from scratch, since you should have already done that (using unfold) in the questionnaire in <>. Instead, we’re just creating a small class that wraps it up along with bias and weight initialization. Let’s check that it works correctly with Module.parameters:

In [ ]:

  1. l = ConvLayer(3, 4)
  2. len(l.parameters())

Out[ ]:

  1. 2

And that we can call it (which will result in forward being called):

In [ ]:

  1. xbt = tfm_x(xb)
  2. r = l(xbt)
  3. r.shape

Out[ ]:

  1. torch.Size([128, 4, 64, 64])

In the same way, we can implement Linear:

In [ ]:

  1. class Linear(Module):
  2. def __init__(self, ni, nf):
  3. super().__init__()
  4. self.w = Parameter(torch.zeros(nf,ni))
  5. self.b = Parameter(torch.zeros(nf))
  6. nn.init.xavier_normal_(self.w)
  7. def forward(self, x): return x@self.w.t() + self.b

and test if it works:

In [ ]:

  1. l = Linear(4,2)
  2. r = l(torch.ones(3,4))
  3. r.shape

Out[ ]:

  1. torch.Size([3, 2])

Let’s also create a testing module to check that if we include multiple parameters as attributes, they are all correctly registered:

In [ ]:

  1. class T(Module):
  2. def __init__(self):
  3. super().__init__()
  4. self.c,self.l = ConvLayer(3,4),Linear(4,2)

Since we have a conv layer and a linear layer, each of which has weights and biases, we’d expect four parameters in total:

In [ ]:

  1. t = T()
  2. len(t.parameters())

Out[ ]:

  1. 4

We should also find that calling cuda on this class puts all these parameters on the GPU:

In [ ]:

  1. t.cuda()
  2. t.l.w.device

Out[ ]:

  1. device(type='cuda', index=5)

We can now use those pieces to create a CNN.

Simple CNN

As we’ve seen, a Sequential class makes many architectures easier to implement, so let’s make one:

In [ ]:

  1. class Sequential(Module):
  2. def __init__(self, *layers):
  3. super().__init__()
  4. self.layers = layers
  5. self.register_modules(*layers)
  6. def forward(self, x):
  7. for l in self.layers: x = l(x)
  8. return x

The forward method here just calls each layer in turn. Note that we have to use the register_modules method we defined in Module, since otherwise the contents of layers won’t appear in parameters.

important: All The Code is Here: Remember that we’re not using any PyTorch functionality for modules here; we’re defining everything ourselves. So if you’re not sure what register_modules does, or why it’s needed, have another look at our code for Module to see what we wrote!

We can create a simplified AdaptivePool that only handles pooling to a 1×1 output, and flattens it as well, by just using mean:

In [ ]:

  1. class AdaptivePool(Module):
  2. def forward(self, x): return x.mean((2,3))

That’s enough for us to create a CNN!

In [ ]:

  1. def simple_cnn():
  2. return Sequential(
  3. ConvLayer(3 ,16 ,stride=2), #32
  4. ConvLayer(16,32 ,stride=2), #16
  5. ConvLayer(32,64 ,stride=2), # 8
  6. ConvLayer(64,128,stride=2), # 4
  7. AdaptivePool(),
  8. Linear(128, 10)
  9. )

Let’s see if our parameters are all being registered correctly:

In [ ]:

  1. m = simple_cnn()
  2. len(m.parameters())

Out[ ]:

  1. 10

Now we can try adding a hook. Note that we’ve only left room for one hook in Module; you could make it a list, or use something like Pipeline to run a few as a single function:

In [ ]:

  1. def print_stats(outp, inp): print (outp.mean().item(),outp.std().item())
  2. for i in range(4): m.layers[i].hook = print_stats
  3. r = m(xbt)
  4. r.shape
  1. 0.5239089727401733 0.8776043057441711
  2. 0.43470510840415955 0.8347987532615662
  3. 0.4357188045978546 0.7621666193008423
  4. 0.46562111377716064 0.7416611313819885

Out[ ]:

  1. torch.Size([128, 10])

We have data and model. Now we need a loss function.