Confusion matrix

The confusion matrix appears when an experiment solves a classification problem.
If a model is good the diagonal cells have the highest count.

Matteo explains how to analyse results from a confusion matrix

Every row represents an actual category that an example belongs to, as defined by the target feature of that example.
Every column represents a category predicted by the model, as defined by the category with the highest probability.
All the examples that have the same particular combination of actual and predicted categories fall within the same cell, increasing the count of that cell.

If a model is good, most of the examples should fall in the diagonal cells, which correspond to identical actual and predicted categories, that is, correct predictions. All other cells represent errors in the prediction.

Confusion matrix
Figure 1. Correct predictions show up in the diagonal. All predictions outside the diagonal are errors.

Interact with the confusion matrix

You can click on each cell of the confusion matrix. This will filter the predictions table to show only the examples that fall in this cell.

Filter table results from confusion matrix
Figure 2. Click on a confusion matrix cell to filter the predictions table.

Cells: Count displays exactly how many examples fall into a given cell of the confusion matrix.
Percentage normalizes the count so that rows add up to 100%. That is, each cell value shows the percentage of examples from the actual category that fall in the predicted category.

The threshold slider

In the confusion matrix, you can use threshold slider for binary classification problems, or multi-label classification problems with a feature set of binary features.

The Threshold slider allows you to vary the threshold between 0 and 1. When you change the threshold in the confusion matrix, it will also be updated in the ROC curve and the predictions table will reflect the results based on the current threshold value.

Threshold slider
Figure 3. Threshold slider on top of the confusion matrix and predictions table on the right

How to decide on a threshold

The threshold value allows you to control how the errors made by the model distribute between false positive and false negative. Some applications are more sensitive to one type of error than to the other, and in some applications both types of error are equally critical.

  • A lower threshold reduces the rate of false negative, but increases the risk of ignoring false examples.

  • A higher threshold reduces the rate of false positive, but increases the risk of ignoring positive examples.

You can use the Threshold slider to try different values and find a point on the confusion matrix that is preferable for your own application.

Note that you should avoid threshold values that correspond to the 1 or 0. A model operating at one of these points classifies every example in the same way, regardless of the example, and so is not a useful model.


Use a low threshold value when you try to detect defects in mass produced parts, because keeping a defect part (false negative) is much more harmful than removing a working part (false positive).

Threshold slider for defective products
Figure 4. Use a low threshold value for a use case where false negatives are much more harmful.

Show predictions from other epochs

By default the confusion matrix shows the predictions for the validation subset in the best epoch. To inspect predictions from other epochs, select the subset and checkpoint (epoch) you wish inspect and click Predict.

How to improve classification results

Many examples falling evenly over the off-diagonal indicate that the model has difficulties distinguishing categories.
You may try to use a larger model with more parameters.

If many examples fall into the same cell, the same row, or the same column, it indicates a systematic bias of the model towards the class involved. This may be caused by a problem in the dataset, in the training/validation subset split, or in the model design.
Inspect the misclassified examples to find clues about the problem.

Multi-dimensional target

For a multi-dimensional target, each value in the confusion matrix corresponds to an element of the output tensor. This means that the total number of values in the confusion matrix may be higher than the number of examples in the dataset.

Was this page helpful?