Categorical crossentropy is a loss function that is used for single label categorization. This is when only one category is applicable for each data point. In other words, an example can belong to one class only.
The block before the Target block must use the activation function Softmax.
When to use categorical crossentropy
Use categorical crossentropy in classification problems where only one result can be correct.
Example: In the MNIST problem where you have images of the numbers 0,1, 2, 3, 4, 5, 6, 7, 8, and 9. Categorical crossentropy gives the probability that an image of a number is, for example, a 4 or a 9.
Categorical crossentropy math
where ŷ is the predicted value.
Categorical crossentropy will compare the distribution of the predictions (the activations in the output layer, one for each class) with the true distribution, where the probability of the true class is set to 1 and 0 for the other classes. To put it in a different way, the true class is represented as a one-hot encoded vector, and the closer the model’s outputs are to that vector, the lower the loss.
You can read more on how to use categorical crossentropy in our cheat sheets for single-label image classification.
We also have a topic about categorical crossentropy in our Glossary.