Categorical crossentropy is a loss function that is used for single label categorization. This is when only one category is applicable for each data point. In other words, an example can belong to one class only.
The block before the Target block must use the activation function Softmax.
Use categorical crossentropy in classification problems where only one result can be correct.
Example: In the MNIST problem where you have images of the numbers 0,1, 2, 3, 4, 5, 6, 7, 8, and 9. Categorical crossentropy gives the probability that an image of a number is, for example, a 4 or a 9.
where ŷ is the predicted value.
Categorical crossentropy will compare the distribution of the predictions (the activations in the output layer, one for each class) with the true distribution, where the probability of the true class is set to 1 and 0 for the other classes. To put it in a different way, the true class is represented as a one-hot encoded vector, and the closer the model’s outputs are to that vector, the lower the loss.
We also have a topic about categorical crossentropy in our Glossary.
Stay in the know by signing up for occasional emails with tips, tricks, deep learning insights, product updates, event news and webinar invitations.
We promise not to spam you or share your email with any third party. You can change your preferences at any time. See our privacy policies.
Please check your email inbox account to confirm, set, or update your communication preferences.