required libraries
from fastai.vision.all import *
42); torch.random.manual_seed(
Seongbin Park
August 6, 2022
Cross entropy loss is a loss function that can be used for multi-class classification using neural networks. Chapter 5 of the fast.ai textbook outlines the use of cross entropy loss for binary classification, so in this post, we will take a look at classification for 3 classes.
The softmax function ensures 2 things: - activations are all between 0 and 1 - activations sum to 1.
For multi-class classification, we need an activation per class (in the final layer). Each activation then indicates the relative confidence of each class being the true label. Therefore, we can get the predicted prababilities that each class is the true label by applying the softmax function to the final column of activations.
Given \(C\) total classes, for any class \(k,\) let’s say \(x_k\) represents the activation for \(c\). Then, the softmax activation for an arbitrary class \(c\) is equal to
\[\frac{e^{x_c}}{\sum^C_{k=1}e^{x_k}}.\]
In Python code, this would be
Note that the code version returns a tensor/array of softmax activations.
For demonstration purposes, let’s first create a set of activations using torch.randn
, assuming we have 6 objects to classify into 3 classes.
tensor([[ 3.8538, 2.9746, -0.9948],
[ 0.8792, -1.5163, 2.1566],
[ 1.6016, 3.3612, 0.7117],
[-1.3732, 1.2209, 2.6695],
[-0.4632, 0.0835, -0.5032],
[ 1.7197, -0.6195, -0.7914]])
Let’s also set our target labels:
To take the softmax of our initial (random) activations, we need to pass acts
into torch.softmax
:
tensor([[0.7028, 0.2917, 0.0055],
[0.2137, 0.0195, 0.7668],
[0.1385, 0.8046, 0.0569],
[0.0140, 0.1876, 0.7984],
[0.2711, 0.4684, 0.2605],
[0.8492, 0.0819, 0.0689]])
Perfect! We can check that each row adds up to 1 as expected.
To calculate our loss, for each item of targ
, we need to select the appropriate column of sm_acts
using tensor indexing:
F.nll_loss
does the same thing, but flips the sign of each number in the tensor. PyTorch defaults to taking the mean of the losses; to prevent this, we can pass reduction='none'
as a parameter.
tensor([0.7028, 0.0195, 0.1385, 0.7984, 0.2605, 0.8492])
We take the (natural) logarithm of result
for two reasons: - prevents under/overflow when performing mathematical operations - differences between small numbers is amplified
In our case, result
relfects the predicted probability of the correct label, so when the prediction is “good” (closer to 1), we want our loss function to return a small value (and vice versa). We can achieve this by taking the negative of the log:
And there we go! We just found the cross entropy loss for our example.
We can simplify the code above by using log_softmax
followed nll_loss
:
tensor([0.3527, 3.9384, 1.9770, 0.2251, 1.3451, 0.1635])
In practice, this is exactly what nn.CrossEntropyLoss
does:
tensor([0.3527, 3.9384, 1.9770, 0.2251, 1.3451, 0.1635])
The output loss tensors for all three approaches are equivalent as expected!