Classification models - Evaluate and improve
Evaluate classification models
Evaluate the performance of a classification problem in the Evaluation view. So what you should you look for? The list below describes the different evaluation metrics and tools
What’s good? You want the loss as low as possible. Close to 0. You want the training error to be slightly lower than test error.
The loss curve can give you an insight on what you should do to improve your experiment, as you can find here.
What’s good? For a perfect model the accuracy is 1.
The accuracy measures how often the model gets the predictions right.
What’s good? If the model’s predictions are perfect, precision value is 1.
Precision is used in binary classification problems and indicates what proportion of positive predictions was actually correct.
What’s good? When a model’s predictions are perfect, recall value is 1.
Recall is used in binary classification problems and indicates what proportion of actual positive examples was predicted correctly.
What’s good? The best value for the f1 score is 1 and the worst score is 0.
The F1 score is the harmonic mean of precision and recall.
AUC (Area under curve).
What’s good? If AUC is 1 that means that your model predictions are perfect. If AUC is 0 you just need to invert your model´s output to obtain a perfect model :)
Area under curve (AUC) is used to see how well your classifier can separate positive and negative examples.
What’s good? You want as many examples in the diagonal as possible. If a model is good, most of the examples should fall in the diagonal cells.
The diagonal cells correspond to identical actual and predicted categories, that is, correct predictions. All other cells represent errors in the prediction.
Columns represents an actual category
Rows represents the category predicted by the model.
Improve classification models
How can you improve performance in a classification task? The values of the metrics above can give you an idea on how to proceed.
First of all: use a balanced dataset.
In a classification problem, a dataset is said to be unbalanced when there are more examples belonging to some classes and less in others.
Low accuracy and high loss in the training set.
The model is not learning well enough.
Solution: Try to change your model, collect more data or run your experiment for a longer time. Gentle fine tuning for text-classification.
High gap in training and validation values of accuracy or loss.
A large discrepancy can show that the validation data are too different from the training data. This means that you are overfitting your training data, that is, your model is not generalizing very well.
Solution: A possible solution without modifying the model is to create a new split between training and validation subsets.
Low f1 score.
A low f1 score might indicate that your dataset is imbalanced.
Try to balance your dataset in the Dataset view
Collect more data and create a new dataset on the platform.
Wrong confusion matrix.
Many examples falling evenly over the off-diagonal indicate that the model has difficulties distinguishing categories.
Solution: Use a larger model with more parameters.
Confusion matrix: Many examples fall into the same cell, the same row, or the same column.
This indicates a systematic bias of the model towards the class involved. This may be caused by a problem in the dataset, in the training/validation subset split, or in the model design.
Solution: Inspect the misclassified examples, in the Predictions example tab, to find clues about the problem. For example, are the photos misclassified as dogs really very similar to images of real dogs?