Documentation / Evaluation view / Model evaluation

The confusion matrix appears after a classification model has started training and completed at least one epoch.

Figure 1. Confusion matrix

The confusion matrix helps to illustrate what kinds of errors a classification model is making. It is structured as a table of true classes versus predicted classes. If a model is good, the entries on the diagonal will have high values compared to the rest of the table.

In the case of a classification problem with multi-dimensional target the confusion matrix will be sampled for performance reasons to a maximum of 500,000 values. For a multi-dimensional target each value in the confusion matrix corresponds to a vector element, or to a pixel in the target image. This means that the total number of values in the confusion matrix will be many more than the number of samples in the dataset.

If the data are very unbalanced, e.g., there are much more examples of some classes than others, the rare classes may be hard to learn. In these cases, try to get more examples of the rare classes, or use resampling techniques to create synthetic data that can be added to the dataset.

For a regression model the error distribution curve shows the distribution of the errors that the model makes when predicting a numeric value for each validation example.

For a classification model the error distribution curve shows the distribution between matches and mismatches.

Figure 2. Error distribution curve

In a regression model you want the errors to be as close to zero as possible. If the error curve peaks at a non-zero value, there may be a problem with the model. We also expect the curve to be more or less symmetric around zero. If it isn’t, there is probably some sort of systematic problem with the model.

**Example**: If you have accidentally used ReLU as the activation function into the Target node, the model cannot predict a value below zero and the error distribution could become asymmetric. Or if you have used a sigmoid as the final activation function, effectively imposing a roof on the highest value that can be predicted, the error distribution can be skewed.

The error distribution curve is based on 5000 sampled values in the case of a regression problem with a multi-dimensional target.

Try changing the final activation function to Linear.

The prediction scatterplot shows actual vs predicted values.

You see a prediction scatterplot when you are training a regression model, i.e., where the model is trying to learn a numeric value, as opposed to classification.

Figure 3. Scatterplot

The prediction scatterplot has the actual, true value on the x-axis, and the predicted value on the y-axis. A perfect model would yield a straight diagonal line here. In practice, one wants to see a swarm of points with a shape that is not round or diffuse but has a clear trend.

As with the error distribution curve, you can spot potential problems with your model in the Prediction scatterplot.

**Example:** If you have accidentally used ReLU as the activation function into the Target node, the model cannot predict a value below zero and you will see in the scatterplot that negative values (if there are any) cannot be modelled well.

**Example:** If you have used a sigmoid as the final activation function, effectively imposing a roof on the highest value that can be predicted, you might see a horizontal line indicating that there is an effective maximum on the predictions and that higher values cannot be modelled.

In the case of a regression problem with a multi-dimensional target, each dot in the scatterplot represents an element in the target vector or a pixel value in the target image. The scatter plot is sampled to show maximum 500 data points.

Try changing the final activation function to Linear.