Classification models - Evaluate and improve
Evaluate classification models
Evaluate the performance of a classification problem in the Evaluation view. So what should you look for? The list below describes the different evaluation metrics and tools.
What’s good? You want the loss as low as possible. Close to 0. You want the training error to be slightly lower than test error.
The loss curve can give you an insight on what you should do to improve your experiment, as you can find here.
What’s good? For a perfect model the accuracy is 1.
The accuracy measures how often the model gets the predictions right.
What’s good? If the model’s predictions are perfect, precision value is 1.
Precision is used in binary classification problems and indicates what proportion of positive predictions was actually correct.
What’s good? When a model’s predictions are perfect, recall value is 1.
Recall is used in binary classification problems and indicates what proportion of actual positive examples was predicted correctly.
What’s good? The best value for the f1 score is 1 and the worst score is 0.
The F1 score is the harmonic mean of precision and recall.
AUC (Area under curve).
What’s good? If AUC is 1 that means that your model predictions are perfect. If AUC is 0 you just need to invert your model´s output to obtain a perfect model :)
Area under curve (AUC) is used to see how well your classifier can separate positive and negative examples.
What’s good? You want as many examples in the diagonal as possible. If a model is good, most of the examples should fall in the diagonal cells.
The diagonal cells correspond to identical actual and predicted categories, that is, correct predictions. All other cells represent errors in the prediction.
Rows represents an actual category
Columns represents the category predicted by the model.
Improve classification models
How can you improve performance in a classification task? The values of the metrics above can give you an idea on how to proceed.
First of all: watch out for unbalanced datasets.
In a classification problem, a dataset is said to be unbalanced when there are significantly more examples belonging to some classes and less in others.
Low accuracy and high loss in the training set.
The model is not learning well enough.
Solution: Double check that your dataset is correct, run your experiment for a longer time, or use a larger model with more parameters.
High gap in training and validation values of accuracy or loss.
When there is a high gap between your training and validation values of accuracy or loss, these might be two potential causes:
First is, the distributions in training and validation sets are very different from each other. In this case there might be something wrong with your data and/or your split. A possible solution without modifying the model could be to create a new stratified split.
Second is, you might be overfitting on your training data, which means your model is not generalizing well. This might occur when your validation curve first goes down but then up, and when there is a high gap between training and validation values of accuracy and loss. One suggestion to prevent overfitting can be regularization.
Low f1 score.
A low f1 score might indicate that your dataset is imbalanced.
Try to balance your dataset in the Dataset view
Collect more data and create a new dataset on the platform.
Confused confusion matrix.
Many validation/test examples falling evenly over the off-diagonal indicate that the model has difficulties distinguishing categories.
Solution: Double check that your dataset and dataset splits are correct, and that you are not overfitting on the training set.
Confusion matrix: Many examples fall into the same cell, the same row, or the same column.
This indicates a systematic bias of the model towards the class involved. This may be caused by a problem in the dataset, in the training/validation subset split, or in the model design.
Solution: Inspect the misclassified examples, in the Predictions inspection tab, to find clues about the problem. For example, are the photos misclassified as dogs really very similar to images of real dogs?