The softmax function highlights the largest values and suppresses values which are significantly below the maximum value, though this is not true for small values. It normalizes the outputs so that they sum to 1 so that they can be directly treated as probabilities over the output.

It is often used in the final layer in a classifier model with the categorical crossentropy as loss function.


\[\sigma(x_j) = \frac{e^{x_j}}{\sum_{k=0}^{K} e^{x_k}}\]
Softmax diagram
Figure 1. Softmax diagram
Was this page helpful?