Movie review feelings

Solve a text classification problem with BERT

In this tutorial, you will solve a text classification problem using Multilingual BERT (Bidirectional Encoder Representations from Transformers). The input is an IMDB dataset consisting of movie reviews, tagged with either positive or negative sentiment – i.e., how a user or customer feels about the movie.

Text classification aims to assign text, e.g., tweets, messages, or reviews, to one or multiple categories. Such categories can be the author’s mood: is a review positive or negative?

Person- Target audience: Beginners
Spaceship- Tutorial type: Get started tutorial
Bell- Problem type: Text classification

You will learn to
Peltarion logo- How to build and deploy a model based on BERT.
Peltarion logo- Work with single-label text classification.

Create a project

First, navigate to the Projects view.
Click New project to create a project and name it so you know what kind of project it is.

New project icon

A project combines all of the steps in solving a problem, from the pre-processing of datasets to model building, evaluation, and deployment. Using projects makes it easy to collaborate with others.

Add the IMDB data

After creating the project, you will be taken to the Datasets view, where you can import data.

There are several ways to import your data to the platform. This time we will use the Data library that is packed with free-to-use datasets.

Click the Import free datasets button.

Import free datasets button

Look for the IMDB - tutorial data dataset in the list. Click on it to get more information. You can also read this article for more info on the IMDB dataset.

Movie theater IMDM datatset
Figure 1. The IMDB dataset includes a huge amount of movie reviews.

If you agree with the license, click Accept and import.

Accept and import button

This will import the dataset in your project, and you can now edit it.

Design a text classification model with the wizard

Click Use in new experiment to open the Experiment wizard.

Use in new experiment button

The Experiment wizard opens to help you set up your training experiment. We’ll now go over each tab.

  • Dataset tab
    Make sure that the IMDB dataset with subset is selected.

  • Inputs / target tab
    The platform should select the correct features by default.

    • Input column: make sure that only review is selected.

    • Target column: make sure that sentiment is selected.

  • Problem type tab

    • Make sure the Problem type is Single-label text classification, since we want to classify text into two possible categories (positive and negative). The platform recommends the appropriate Problem type based on the input and target features.

  • Click Create to build the model.

Create button

Run experiment

The experiment is done and ready to be trained, so just click Run.

Run button

The training will take some time since BERT is a very large and complex model.
Expect about half an hour of training time per epoch with the Sequence length of 256.

Until at least the first epoch finishes, you can read more about the loss and metrics, or grab some fika (a Swedish coffee and cake break) while you wait.

Deploy your trained experiment

In the Evaluation view click Create deployment.

Create deployment button
  1. Select Experiment that you want to deploy for use in production.
    In this tutorial we only trained one model so there is only one experiment in the list, but if you train more models with different tweaks, they will become available for deployment.

  2. Select the Checkpoint marked with (best), since this is when the model had the best performance.
    The platform creates a checkpoint after every epoch of training. This is useful since performance can sometimes get worse when a model is trained for too many epochs.

  3. Click Create to create the new deployment from the selected experiment and checkpoint.

  4. Click on Enable to deploy the experiment.
    As soon as your deployment is enabled, you can start requesting predictions.

Enable button

Test with our web app

Let’s test your model now when it’s deployed. Click Open web app, and you’ll open the Deployment web app.

Open web app button

Add review

Now write your own review, copy the example below, or simply copy a recent review from, e.g., IMDB:

I don’t want to complain about the movie, it was really just ok. I would consider it an epilogue to Endgame as opposed to a middle film in what I’m assuming is a trilogy (to match the other MCU character films). Anyhow, I was just meh about this one. I will say that the mid-credit scene was one of the best among the MCU movies.

Click Get your results

Click Get your result.

Tutorial recap

In this tutorial, you’ve created a text classification model that you first evaluated and then deployed. You have used all the tools you need to go from data to production — easily and quickly.

Next steps

Learn how to improve a sentiment analysis model

In the tutorial Improve sentiment analysis, you run several experiments to solve a text classification problem using Multilingual BERT. The input is an IMDB dataset consisting of movie reviews, tagged with either positive or negative sentiment – i.e., how a user or customer feels about the movie.

Get started with multi-language classification

In the tutorial, Classify text in any language; you will learn how to use the Multilingual BERT block to create a model that is able to work with multiple languages simultaneously!

This will unlock the AI possibilities to automatically identify relationships and context in text data in 100 languages.

Figure 2. Next tutorial → Classify text in any language
Was this page helpful?
Yes No