MSE / Mean squared error
The Mean squared error (MSE) measures the mean squared error. Like MAE, the MSE treats deviations in either direction the same way. One difference between MSE and MAE is that MSE will tend to punish large errors more because it squares all the errors. Thus, if it is particularly undesirable to have large errors in a specific task, it might be worth considering to use MSE instead of MAE.
Example: If the true value is 50 and your model predicts 55, the squared error for that data point is (55-50)^2 = 25, and if the model had predicted 45, the squared error would also be (45-50)^2 = 25. The MSE for the whole dataset would be the mean value of all such errors in the dataset.
MSE is the most commonly used loss function for regression. The loss is the mean over seen data of the squared differences between true and predicted values, or writing it as a formula:
where ŷ is the predicted value.
Minimizing MSE is equivalent of maximizing the likelihood of the data under the assumption that the target comes from a normal distribution, conditioned on the input.
Why use MSE
MSE is sensitive towards outliers and given several examples with the same input feature values, the optimal prediction will be their mean target value. This should be compared with Mean Absolute Error, where the optimal prediction is the median. Thus, MSE is good to use if you believe that your target data, conditioned on the input, is normally distributed around a mean value, and when it’s important to penalize outliers extra much.
When to use MSE
Use MSE when doing regression, believing that your target, conditioned on the input, is normally distributed, and want large errors to be significantly (quadratically) more penalized than small ones.
Example of use
You want to predict future house prices. The price is a continuous value, and therefore we want to do regression. MSE can here be used as the loss function, as it is a reasonable assumption that house prices are normally distributed.