Intersection over union

The intersection over union is a great metric of the model performance for multi-label problem types. The intersection over union value is 1 when the model is perfect, and 0 when the model is very bad.

When is intersection over union available

The intersection over union metric is available when the problem type of the model is multi-label. This happens for instance in:

  • Multi-label classification. The Build your own music critic tutorial shows a case where a single song might belong to one or more categories at once, such as Epic and Happy.

  • Image segmentation. The Skin cancer detection tutorial shows a case where a skin pathology might be visible in some (or none) of the pixels of an image.

In these cases, a model doesn’t simply make correct or incorrect predictions. Since many labels are possible at once, a model’s predictions are often partly correct, depending on how many class labels (or image pixels) are correctly or incorrectly identified.

What’s the intersection over union

The intersection over union gives a good idea of the model overall performance. It increases when the model correctly identifies labels (the intersection), and decreases when the model fails to identify labels or identifies incorrect labels (the union).

Use this metric when it’s sufficient that a model gives correct predictions for most possible labels to be considered good.
If a prediction is only useful to you when the model predicts all of the labels correctly, check the exact match ratio instead.

The formula for intersection over union can be given in terms of positive and negative binary predictions:

\[\begin{array}{rcl} \text{Intersection over union} & = & \dfrac{\text{TP}}{\text{FN + FP + TP}} \\ \end{array}\]

TP = True positive (Actual positive is predicted positive)
FP = False positive (Actual negative is predicted positive)
FN = False negative (Actual positive is predicted negative)


Multi-label classification

Here is a multi-label classification with 5 classes inspired by the Build your own music critic tutorial. A song can be classified as being Happy, Sad, Fast, Slow, and Melodic. Given an example of song, the model gives the following predictions:


Actual label












  • The intersection has 2 true positives (Happy and Fast).

  • There is 1 false negative (Melodic label not identified).

  • There is 1 false positive (Slow label incorrectly predicted)

  • The union is equal to intersection + false negatives + false positives = 4.

The intersection over union of this example is

\[\begin{array}{rcl} \text{Intersection over union} & = & \dfrac{\text{2}}{\text{4}} = 0.5 \\ \end{array}\]

indicating that the model is partly correct but could be improved.

Image segmentation

In this example from the Skin cancer detection tutorial, the model must predict the pixels showing a pathology. Given an image example, the model makes the following prediction:

Input image

Input image

Actual mask

Actual mask

Predicted mask

Predicted mask
  • The intersection has 416 true positives (white pixels correctly predicted).

  • There are 39 false negatives (actual white pixels not predicted by the model, on the right side of the spot).

  • There are 73 false positives (incorrectly identified pixels around the bottom scratch).

  • The union is equal to intersection + false negatives + false positives = 528.

The intersection over union of this example is

\[\begin{array}{rcl} \text{Intersection over union} & = & \dfrac{\text{416}}{\text{528}} = 0.788 \\ \end{array}\]

indicating that the model is mostly correct even though some pixels, especially around the edges of the main spot, don’t match exactly the target.

Suggestions on how to improve

Large discrepancy

If there is a large discrepancy between training and validation accuracy (called overfitting), try to introduce dropout and/or batch normalization blocks to improve generalization. Overfitting means that the model performs well when it’s shown a training example (resulting in a low training loss), but badly when it’s shown a new example it hasn’t seen before (resulting in a high validation loss).

A large discrepancy can also show that the validation data are too different from the training data. Then create a new split between training and validation subsets.

Low accuracy

If the training accuracy is low, the model is not learning well enough. Try to build a new model or collect more training data.

Was this page helpful?
Yes No