Loss functions

The loss function is a critical part of model training: it quantifies how well a model is performing a task by calculating a single number, the loss, from the model output and the desired target.
If the model predictions are totally wrong, the loss will be a high number. If they’re pretty good, it will be close to zero.

You select the Loss function you want to use in the parameters of the Target block.

During training, the optimizer tunes the model to minimize the loss on training examples. After at least 1 epoch has run, the loss and metrics plot of the Evaluation view will show the average value of the loss over all the training examples, as well as over the validation examples.

Choosing a loss function

If training a model is like rolling a ball down a hill, the loss function is the profile of that hill: it determines the gradient of the slope, and where the lowest point is.

Node with activation function
Figure 1. Plots of various loss functions as a function of the relation between predicted and target values. The loss is closest to 0 when the predicted value equals the target.

All the loss functions are minimal when the model prediction is equal to its target.
However, different loss functions will change how the model behaves when strict equality cannot be obtained, e.g., if there is noise, not enough input information, or if the model isn’t perfect.

In practice, the choice of the loss function is mostly directed by the task that the model needs to solve. The following loss functions are available on the platform:

Classification Regression

Single label:
Categorical crossentropy​​​

Continuous values:
Mean squared error​​
​Mean absolute error​​
Mean squared logarithmic error​​

Multi-label:
Binary crossentropy​
Squared hinge​​

Discrete values:
Poisson​​

Compatibility with activation functions

Some loss functions can only be calculated for a limited range of model outputs.
You can ensure that the model output is always in the correct range by using an appropriate activation function on the last block of the model.

The platform will warn you if the activation function of the last block is incompatible with the loss function selected in the Target block.

Example

The categorical crossentropy​​​ loss function needs to calculate the logarithm of the model prediction, which is only possible if the model output is strictly positive.

  • The TanH activation outputs values between -1 and 1, which makes it incompatible with the categorical crossentropy.

  • The sigmoid activation outputs values between 0 and 1, which makes it a perfect match for the categorical crossentropy!