Buy or not / Predict from tabular data

Predict if a customer will buy or not based on earlier customers buying patterns

Money!! Understanding what makes a user willing to cash up and buy a product has always been key to businesses.

This tutorial will show you how you can build simple AI models using the spreadsheets that so many of us work with. You will use tabular data to solve a classification problem, and get advice on how you’d also solve a regression problem.

12 - Target audience: Beginners
12 - Estimated time: Setup - 5 minutes | Training - 10 minutes

Tab data numbers

The problem - Unleash the power of the spreadsheet

Most of the data that businesses collect are tabular, i.e., data that can be stored in a spreadsheet: numerical, categorical, binary, or any combination of those. You name it.

How do you use this data to make really good predictions? Well, there are many ways to make predictions using tabular data, and the Peltarion Platform is a great way to quickly and intuitively leverage your data to make valuable predictions.

You will learn to

  • Import and use tabular data onto the Peltarion Platform

  • Solve a classification problem - Will they buy yes or no?
    (You’ll get some hints on how to solve a regression problem as well)

  • Analyze the performance of your model

Getting started - create a project

Let’s begin! First, click New project and name it, so you know what kind of project it is.

The data

To train a model, you need examples of input data together with the predictions that you expect the model to make.

This tutorial uses data from a phone marketing campaign. Many features such as the age, the employment and education of the client, the response to earlier phone campaigns, the day of the week of the phone call, etc. are recorded in table form.
The dataset also contains whether or not the client subscribed to a term deposit after the phone call. It’s this outcome that the AI model will learn to predict from the known factors.

Import the dataset

Go to the Datasets view to import and preprocess datasets.

Import the Bank marketing dataset from our Data library

In the Datasets view, click on Data library and choose the Bank marketing dataset. This dataset is used to solve a binary classification problem for a propensity to buy use case.

After you have reviewed the information about the dataset, click on Accept and import to accept the terms of the dataset’s license and import it into your project.

Bank marketing dataset in the data library
Figure 1. Bank marketing in the data library.

Import your own tabular data

If you want to train a model to make predictions tailored to your usage, you can upload your own tabular data. To do this, you need to upload a comma-separated value (CSV) file.

Most software such as Microsoft Excel or Google Sheets have a built-in function to export your spreadsheet as a CSV file.

Preprocess the data

Create a feature set

Feature sets allow you to group several features together, so that the model can receive many features from a single Input block.

  • Click on New feature set.

  • Select all of the features in the list, except purchased, and Name the feature set Input Features.
    These features will be used as input information for the model to make predictions.

Feature encoding

Feature encoding determines the form in which the feature values are given to the model so that they can be processed. In most cases, the platform finds the appropriate encoding automatically.

The target feature is called purchased, and represents whether or not the client subscribed to a term deposit.

  • Since the target feature has only 2 possible values, make sure the Encoding is set to Binary.

  • Since we consider that the outcome is positive when the client did subscribe to the term deposit, which is indicated by the value 1, make sure the Positive class is set to 1.

You can review the encoding of the other features, although the default values will work for this tutorial. Once you are happy with the preprocessing, click Save version, then Use in new experiment.

Build your network in the modeling view

Tabular data can be incredibly diverse, so there is no predefined snippet in the Experiment wizard that will work every time.

Instead, click on Create blank experiment in the Experiment wizard to start designing your own, customized model.

In the Modeling view, navigate to the Build section. There you will find the fundamental blocks required to build your model.

Simple model for tabular data
Figure 2. Simple model for binary classification with tabular data with the Bank marketing dataset.
  1. In the list of Blocks, click on the Input block to add it to the modeling canvas.

  2. Select the Input block that appeared on the modeling canvas, and the block settings will appear.

  3. Set the Feature to the feature set Input Features that you created in the datasets view.

  4. Add a Dense block, and set its Nodes setting to 100.

    • Dense are fully connected layers of neurons; these are the body of your model. Determining how many layers and how many nodes per layer is both art and science and typically requires some trial and error.

    • As a rule of thumb, simple input data will need smaller, less complicated architecture. Start simple and then increase the number of nodes and layers in subsequent experiments. This will allow you to systematically try different options and see how this impacts performance.

  5. Add another Dense block, and set its Nodes setting to 10.

  6. Add one last Dense block which will calculate the model output.

    • Since the target feature is a single value, set the block’s Nodes to 1.
      The platform will tell you if the last block’s size doesn’t match the target feature size.

    • Since the target feature uses binary encoding, set the block’s Activation to Sigmoid.

  7. Add a Target block

Once you are happy with your model, click Run.

Evaluation view

In the Evaluation view, you will find several ways of analyzing how your model is performing. The specific metrics that you are shown depend on your problem type and loss function.

Loss and metrics curves

The Loss and metrics curves show the performance of your model on the training and validation datasets for different epochs. In general, you are aiming to minimize loss and error metrics and maximize accuracy. To identify which metrics are most important for your specific application, read more about loss and metrics in the knowledge center.

Predictions inspection

The Predictions inspections section lets you analyze the performance of a particular epoch on a particular subset.

The Bank marketing use case is a binary use case, so you’ll get the opportunity to set a threshold. The threshold value allows you to control how the errors made by the model distribute between false positive and false negative.
Slide the Threshold slider to a good value, e.g., 0.2.

The features of this section are also dependent on your problem type. Read this article on Prediction inspection to learn more.

ROC curve
Figure 3. ROC curve

Improve your model

A vital step in successful data science is not just building a working prototype but also going back and experimenting with new iterations of your model to improve the performance.

To provide some guidance for what types of settings and parameters to change to try to improve your model, have a look at the Improving your tabular data model tutorial.

Further reading

Congratulations, you have completed the tabular data tutorial!

With good input data, models like these can be used to make important predictions and solve a wide array of interesting problems. Read more here: