Regression loss metrics
MAE — mean absolute error
Mean absolute error (MAE) is a loss function used for regression. The loss is the mean over the absolute differences between true and predicted values, deviations in either direction from the true value are treated the same way.
Example: If the true value is 50 and your model predicts 55, the absolute error for that data point is 5, and if the model had predicted 45, the error would also be 5. The MAE for the whole dataset would be the mean value of all such errors in the dataset.
where ŷ is the predicted value.
Why use MAE
MAE is not sensitive towards outliers and given several examples with the same input feature values, the optimal prediction will be their median target value. This should be compared with Mean squared error (MSE), where the optimal prediction is the mean.
When to use MAE
Use MAE when you are doing regression and don’t want outliers to play a big role. It can also be useful if you know that your distribution is multimodal, and it’s desirable to have predictions at one of the modes, rather than at the mean of them.
When doing image reconstruction MAE encourages less blurry images compared to MSE. This is used for example in the paper Image-to-Image Translation with Conditional Adversarial Networks by Isola et al.
MAPE — mean absolute percentage error
The MAPE measures the mean absolute percentage error in percent. The percentage error is a relative error.
The MAPE over the whole dataset would then be the mean value of all such errors. One weakness with MAPE is that it is not defined when the true value equals zero (because you need to divide by the true value.) Thus, if you have a regression problem where many of the true values are zero, you should probably look for another metric.
The MAPE also tends to be biased in favor of small predictions, because a prediction smaller than the true value can never have an error over 100%, whereas too large predictions can have arbitrarily large error.
Example: If the true value is 50 and your model predicts 55, the absolute relative error for that data point is 10% (deviation from true value / true value = 5/50). If it had predicted 45, the error would also have been 5/50 = 10%, since MAPE uses the absolute error (no negative values).
MSE — mean squared error
The Mean squared error (MSE) measures the mean squared error. Like MAE, the MSE treats deviations in either direction the same way. One difference between MSE and MAE is that MSE will tend to punish large errors more because it squares all the errors. Thus, if it is particularly undesirable to have large errors in a specific task, it might be worth considering to use MSE instead of MAE.
Example: If the true value is 50 and your model predicts 55, the squared error for that data point is (55-50)^2 = 25, and if the model had predicted 45, the squared error would also be (45-50)^2 = 25. The MSE for the whole dataset would be the mean value of all such errors in the dataset.
MSE is the most commonly used loss function for regression. The loss is the mean over seen data of the squared differences between true and predicted values, or writing it as a formula:
where ŷ is the predicted value.
Minimizing MSE is equivalent of maximizing the likelihood of the data under the assumption that the target comes from a normal distribution, conditioned on the input.
Why use MSE
MSE is sensitive towards outliers and given several examples with the same input feature values, the optimal prediction will be their mean target value. This should be compared with Mean Absolute Error, where the optimal prediction is the median. Thus, MSE is good to use if you believe that your target data, conditioned on the input, is normally distributed around a mean value, and when it’s important to penalize outliers extra much.
When to use MSE
Use MSE when doing regression, believing that your target, conditioned on the input, is normally distributed, and want large errors to be significantly (quadratically) more penalized than small ones.
You want to predict future house prices. The price is a continuous value, and therefore we want to do regression. MSE can here be used as the loss function, as it is a reasonable assumption that house prices are normally distributed.