ROC Curve

The ROC Curve appears when the model’s target is a binary feature.

The curve illustrates the performance of a particular model by showing the recall achievable as a function of the acceptable false positive rate. You can also use it to decide what is the best threshold value to use with your model.

How to read the ROC curve

ROC curve
Figure 1. The ROC Curve shows how many positive examples are correctly identified, as a function of how many negative examples are wrongly classified as positive. The threshold determines the point on the ROC curve where the model operates.

The horizontal axis represents the false positive rate, i.e., fall-out, or the probability of false alarm.
Lower values are better, meaning that the model predicts the positive class almost only when an example actually belongs to the positive class.

The vertical axis represents the true positive rate, i.e., the recall.
Higher values are better, meaning that the model seldom fails to identify an example of the positive class.

The ROC Curve shows the various trade-offs between false positive rate and recall where it is possible to operate a particular model.
The dot shows how the model operates with the currently selected Threshold value.

Interpretation of the curve

A good model should have a ROC curve that passes as close to the top left corner as possible. This means that the model can be operated with a low probability of false alarm, while still detecting most examples of the positive class.
A model that guesses randomly whether an example belongs to the positive class has a ROC Curve close to the Line of identity.

The area under the curve (AUC) metric gives a single value that lets you compare different models easily: values close to 1 mean that the ROC curve is almost ideal, and values close to 0.5 mean that the ROC curve is almost the line of identity, i.e., the model guesses randomly.

Interact with the ROC curve

Threshold and operating point

A perfect model outputs 1 when an example belongs to the positive class, and 0 otherwise.
However, in practice, the model output is a real number between 0 and 1, which gives you the opportunity to decide on the threshold value, e.g., 0.5, or 0.9, or 0.99, etc., above which a prediction is considered positive.

The value of the threshold affects the kind of error that the model makes, and moves the operating point along the ROC curve:

  • A lower threshold reduces the rate of false negative, but increases the risk of false alarm.

  • A higher threshold reduces the rate of false positive, but increases the risk of ignoring positive examples.

The threshold slider

The Threshold slider allows you to vary the threshold between 0 and 1.

On the ROC curve, a dot will show the false positive rate and true positive rate of the model for this threshold.

The confusion matrix and predictions table are updated as well to reflect the current threshold value.

How to decide on a threshold

The threshold value allows you to control how the errors made by the model distribute between false positive and false negative. Some applications are more sensitive to one type of error than to the other, and in some applications both types of error are equally critical.

Use the Threshold slider to try different values and find a point on the ROC curve that is preferrable for your own application.

Note that you should avoid threshold values that correspond to the endpoints of the ROC curve. A model operating at one of these points classifies every example in the same way, regardless of the example, and so is not a useful model.

Examples:

ROC curve
Figure 2. The same model can be operated at different points of the ROC curve by changing the threshold value, according to whether false positives or false negatives are the most critical type of error for an application.
  • The Skin cancer detection tutorial shows how to highlight skin lesions on pictures.
    In medical applications, it is often preferable to identify every single sick patient so that they can receive treatment, even when this might cause false alarms (that a physician can manually dismiss later).
    In this case, we’d like the model to operate at a point of the ROC curve which has a true positive rate as high as possible, and we tolerate a larger false positive rate.

  • In a preemptive equipment monitoring system, a positive failure prediction may cause some equipment to be shipped for repair.
    In that case, it might be more cost effective to operate the model at a point of the ROC curve that has a low false positive rate, even though a larger fraction of failures might go undetected until the equipment actually breaks.