Use AI to detect fraud
How to use class weight when working with an unbalanced dataset
Detecting fraud can be challenging in a dynamic global business environment with an overwhelming amount of traffic and data to monitor.
Fraud detection is an ideal use case for machine learning with plenty of past success in many industries like banking and insurance. This tutorial will teach how to do it.
- Target audience: Beginners. No prior coding knowledge required.
- 15 minutes
You will learn to
- Work with unbalanced classes.
- Work with tabular data.
- Run several experiments to test ideas.
In the past, fraud detection has been done by rule-based algorithms, which are typically complicated to implement and often easy to circumvent. These techniques can miss a large amount of fraudulent activity, and can also lead to excessive amounts of false positives, where client’s cards get declined due to misidentified suspicious behavior. Traditional models are also very inflexible, which is a problem in an application where fraudulent users are constantly finding new ways to slip under the radar.
Credit card fraud data
The data you’re going to use in this tutorial is available on Kaggle. Navigate to https://www.kaggle.com/mlg-ulb/creditcardfraud and download the data. You need to be logged in.
This dataset contains credit card transactions made by European cardholders in September 2013.
It contains only numerical input variables which are the result of a PCA (Principal Component Analysis) transformation. Due to confidentiality issues, the dataset does not include the original features or more background information about the data.
How to handle an unbalanced dataset
Fraud is very uncommon, most customers do not try to trick the system even though many might dream about it. That means that the dataset is very unbalanced, that is, the dataset contains a very different amount of examples for each of its classes.
Training on such datasets generally leads to biased models, since each class affects the loss proportionally to its frequency.
This means that a biased model is less likely to label things as fraudulent, leading to many false negatives.
There are ways to tackle this problem. A common way to improve results on infrequent classes is to create a new dataset and balance it by oversampling or undersampling examples. In this tutorial, however, we will use class weighting.
Upload and save the data
This will import the dataset into your project, and you will be taken to the Datasets view where you can process the dataset if you want.
That is not necessary this time, just click Save version in the upper-right corner.
Create an experiment
Click the new button Use in new experiment that appeared where the Save version was.
In the Datasets tab in the Experiment wizard, make sure you’ve got the correct dataset and version selected, plus the right training and validation subsets.
In the Input(s)/target tab select:
For Input; All features (V1-V28 + Amount) except Class and Time.
For Target; Class.
In the Snippet tab select the Tabular snippet.
Add class weights and run the model
Now a model optimized for tabular data populates the Modeling canvas. It’s almost ready to run, you just have to select the Target block and make sure that the box for Use class weights is checked. This enables class weighting in the calculation of the loss function.
What is class weighting
Class weighting is used to improve single-label classification results on unbalanced datasets. It assigns class weights which are inversely proportional to the class frequency in the training data.
This corresponds to giving all the classes equal importance on gradient updates, on average, regardless of how many samples we have from each class in the training data. This in turn prevents models from predicting the more frequent class(es) exceedingly often simply based on their increased prior probability.
On an unbalanced dataset containing 900 examples of class A and 100 images of class B, a classification model that always predicts A would achieve a relatively low loss and a 90% accuracy. By scaling up the error from misclassified B examples, class weighting pushes models to learn a better representation of each class.
Click the Run button to start the training.
The training will take some time so this is a perfect opportunity to fire up some other tweaked experiments.
Run several experiments
One of the best things with the Peltarion Platform is that it’s so easy to create new experiments.
This enables you to quickly try out new ideas and find out what setup works best for your problem.
So while the first experiment is running click Duplicate or the new experiment icon .
Note: If you’re on the Free plan you can run 1 experiment at a time. All other plans can run concurrent experiments.
Ideas on what you can test out in a new experiment
Change the model - add more Dense blocks
As a rule of thumb, simple input data will need smaller, less complicated architecture. Start simple and then increase the number of Dense blocks between the first and the last one in subsequent experiments. This will allow you to systematically try different options and see how this impacts performance.
Change the model - add more Dropout blocks
Since overfitting may be an issue in this problem, if you decide to add more Dense blocks, we recommend to add Dropout blocks after them. A good Dropout rate for Dropout blocks is 0.5, but you can experiment with different values.
Note that Dropout rate of 0 (the default) is the same as not having the Dropout block, and Dropout rate of 1 (or close to 1) basically destroys the model, since it throws away the results of the computations above. Indeed, Dropout rate higher than 0.5 is generally not used. So probably we can suggest to tweak it, but only between 0.1 and 0.5.
Change learning rate schedule
There is no go-to schedule for all models. Changing the learning rate, in general, has shown to make training less sensitive to the learning rate value you pick for it. So using a learning rate schedule can give better training performance and make the model converge faster. Try exponential decay. The exponential schedule divides the learning rate by the same factor (%) every epoch. This means that the learning rate will decrease rapidly in the first few epochs, and spend more epochs with a lower value, but never reach exactly zero.
Once you have settled on the overall model structure but want to achieve an even better model it can be appropriate to test another optimizer. A good recommendation here is to use regular stochastic gradient descent with momentum (Nesterov or standard). Such an optimizer may achieve superior results, though getting there can sometimes require a lot of tuning.
Change early stopping
For example, allow a patience of 15. It takes some time to get a good result so we don’t want the experiment to stop too early.
Evaluate on macro-precision, -recall, and -F1
The Evaluation view shows in several ways how the training of the model has progressed.
When working with unbalanced datasets you should look at the macro scores for precision, recall and F1-score.
The macro-average gives every class the same importance, and therefore better reflects how well the model performs — considering that you aim at having a model that performs well on ALL classes, including the minority classes.
What’s a good score? It depends.
So, what is a good score? Well, off course it depends. Ideally, what one should do is associate a cost to each type of mistake and optimizce for that.
If a false positive cost you 5 SEK and a false negative costs you 500 SEK, I want a model with very few false negatives.
Note that this is different from false negative % - since it is unbalanced classes, an unbiased model will produce a lot more false positives than false negatives, because there are a lot more real negatives than real positives.
Evaluate with the confusion matrix
You can see this in the confusion matrix (located in the Predictions inspection tab). You want a high number in the bottom right corner (true positives) and a low one in the bottom left corner (false negatives). It is also bad if the number in the top right corner (false positives) is big.
Don’t evaluate on loss or accuracy
Don’t use loss as a function to compare experiments. It is not unusual to observe a high evaluation accuracy when testing a classification model trained on very unbalanced data. In such cases, the accuracy is only reflecting the underlying class distribution. You want to avoid that!
For example, if only 5% of all houses are affected by water damage, we can construct a model that guesses that no house ever gets water damage and still obtain an accuracy of 95%. While 95% is a pleasantly high proportion, this model will not do what it is intended to, i.e. distinguish well between houses that get water damage from those that don’t.
Next step on the platform
You can move on to learn how to use the Peltarion platform to build a regression model and combine a variety of input data types try our Predicting real estate prices tutorial.
Our colleague Gabriela Zarzar Gandler explains really well how to deal with unbalanced classes in a dataset in this article published by Towards Data Science.