Loss

We’ve already seen how to define “negative log likelihood”:

In [ ]:

  1. def nll(input, target): return -input[range(target.shape[0]), target].mean()

Well actually, there’s no log here, since we’re using the same definition as PyTorch. That means we need to put the log together with softmax:

In [ ]:

  1. def log_softmax(x): return (x.exp()/(x.exp().sum(-1,keepdim=True))).log()
  2. sm = log_softmax(r); sm[0][0]

Out[ ]:

  1. tensor(-1.2790, grad_fn=<SelectBackward>)

Combining these gives us our cross-entropy loss:

In [ ]:

  1. loss = nll(sm, yb)
  2. loss

Out[ ]:

  1. tensor(2.5666, grad_fn=<NegBackward>)

Note that the formula:

\log \left ( \frac{a}{b} \right ) = \log(a) - \log(b)

gives a simplification when we compute the log softmax, which was previously defined as (x.exp()/(x.exp().sum(-1))).log():

In [ ]:

  1. def log_softmax(x): return x - x.exp().sum(-1,keepdim=True).log()
  2. sm = log_softmax(r); sm[0][0]

Out[ ]:

  1. tensor(-1.2790, grad_fn=<SelectBackward>)

Then, there is a more stable way to compute the log of the sum of exponentials, called the LogSumExp trick. The idea is to use the following formula:

\log \left ( \sum_{j=1}^{n} e^{x_{j}} \right ) = \log \left ( e^{a} \sum_{j=1}^{n} e^{x_{j}-a} \right ) = a + \log \left ( \sum_{j=1}^{n} e^{x_{j}-a} \right )

where $a$ is the maximum of $x_{j}$.

Here’s the same thing in code:

In [ ]:

  1. x = torch.rand(5)
  2. a = x.max()
  3. x.exp().sum().log() == a + (x-a).exp().sum().log()

Out[ ]:

  1. tensor(True)

We’ll put that into a function:

In [ ]:

  1. def logsumexp(x):
  2. m = x.max(-1)[0]
  3. return m + (x-m[:,None]).exp().sum(-1).log()
  4. logsumexp(r)[0]

Out[ ]:

  1. tensor(3.9784, grad_fn=<SelectBackward>)

so we can use it for our log_softmax function:

In [ ]:

  1. def log_softmax(x): return x - x.logsumexp(-1,keepdim=True)

Which gives the same result as before:

In [ ]:

  1. sm = log_softmax(r); sm[0][0]

Out[ ]:

  1. tensor(-1.2790, grad_fn=<SelectBackward>)

We can use these to create cross_entropy:

In [ ]:

  1. def cross_entropy(preds, yb): return nll(log_softmax(preds), yb).mean()

Let’s now combine all those pieces together to create a Learner.