# Categorical crossentropy

Categorical crossentropy is a loss function that is used for single label categorization. This is when only one category is applicable for each data point. In other words, an example can belong to one class only.

Note
The block before the Target block must use the activation function ​Softmax.

## When to use categorical crossentropy​​

Use categorical crossentropy in classification problems where only one result can be correct.

​​Example:​ In the ​MNIST​​ problem where you have images of the numbers 0,1, 2, 3, 4, 5, 6, 7, 8, and 9. Categorical crossentropy gives the probability that an image of a number is, for example, a 4 or a 9.

## Categorical crossentropy math

where ŷ is the predicted value.

Categorical crossentropy will compare the distribution of the predictions (the activations in the output layer, one for each class) with the true distribution, where the probability of the true class is set to 1 and 0 for the other classes. To put it in a different way, the true class is represented as a one-hot encoded vector, and the closer the model’s outputs are to that vector, the lower the loss.