The confusion matrix appears when an experiment solves a classification problem.
If a model is good the diagonal cells have the highest count.
Every row represents an actual category that an example belongs to, as defined by the target feature of that example.
Every column represents a category predicted by the model, as defined by the category with the highest probability.
All the examples that have the same particular combination of actual and predicted categories fall within the same cell, increasing the count of that cell.
If a model is good, most of the examples should fall in the diagonal cells, which correspond to identical actual and predicted categories, that is, correct predictions. All other cells represent errors in the prediction.
Interact with the confusion matrix
You can click on each cell of the confusion matrix. This will filter the predictions table to show only the examples that fall in this cell.
Cells: Count displays exactly how many examples fall into a given cell of the confusion matrix.
Percentage normalizes the count so that rows add up to 100%. That is, each cell value shows the percentage of examples from the actual category that fall in the predicted category.
The threshold slider
In the confusion matrix, you can use threshold slider for binary classification problems, or multi-label classification problems with a feature set of binary features.
The Threshold slider allows you to vary the threshold between
1. When you change the threshold in the confusion matrix, it will also be updated in the ROC curve and the predictions table will reflect the results based on the current threshold value.
How to decide on a threshold
The threshold value allows you to control how the errors made by the model distribute between false positive and false negative. Some applications are more sensitive to one type of error than to the other, and in some applications both types of error are equally critical.
You can use the Threshold slider to try different values and find a point on the confusion matrix that is preferable for your own application.
Note that you should avoid threshold values that correspond to the 1 or 0. A model operating at one of these points classifies every example in the same way, regardless of the example, and so is not a useful model.
Use a low threshold value when you try to detect defects in mass produced parts, because keeping a defect part (false negative) is much more harmful than removing a working part (false positive).
Show predictions from other epochs
By default the confusion matrix shows the predictions for the validation subset in the best epoch. To inspect predictions from other epochs, select the subset and checkpoint (epoch) you wish inspect and click Predict.
How to improve classification results
Many examples falling evenly over the off-diagonal indicate that the model has difficulties distinguishing categories.
You may try to use a larger model with more parameters.
If many examples fall into the same cell, the same row, or the same column, it indicates a systematic bias of the model towards the class involved. This may be caused by a problem in the dataset, in the training/validation subset split, or in the model design.
Inspect the misclassified examples to find clues about the problem.
For a multi-dimensional target, each value in the confusion matrix corresponds to an element of the output tensor. This means that the total number of values in the confusion matrix may be higher than the number of examples in the dataset.