# Squared hinge

The squared hinge loss is a loss function used for “maximum margin” binary classification problems. Mathematically it is defined as:

$L(y, \hat{y}) = \sum_{i=0}^{N}\Big(max(0, 1 - y_i \cdotp {\hat{y}}_i)^2\Big)$

where ŷ the predicted value and y is either 1 or -1. Thus, the squared hinge loss is:

 0 * when the true and predicted labels are the same and * when ŷ≥ 1 (which is an indication that the classifier is sure that it’s the correct label) quadratically increasing with the error * when the true and predicted labels are not the same or * when ŷ< 1, even when the true and predicted labels are the same (which is an indication that the classifier is not sure that it’s the correct label)
 Note ŷ should be the actual numerical output of the classifier and not the predicted label.

The hinge loss guarantees that, during training, the classifier will find the classification boundary which is the furthest apart from each of the different classes of data points as possible. In other words, it finds the classification boundary that guarantees the maximum margin between the data points of the different classes.

## When to use Square hinge

Use the Squared Hinge loss function on problems involving yes/no (binary) decisions and when you’re not interested in knowing how certain the classifier is about the classification (i.e., when you don’t care about the classification probabilities). Use in combination with the tanh() activation function in the last layer.

Example: You want to classify email into ‘spam’ and ‘not spam’ and you’re only interested in the classification accuracy.