Improve sentiment analysis
Find the best experiment text classification problem with BERT
In this tutorial, you run several experiment solve a text classification problem using Multilingual BERT (Bidirectional Encoder Representations from Transformers). The input is an IMDB dataset consisting of movie reviews, tagged with either positive or negative sentiment – i.e., how a user or customer feels about the movie.
Text classification aims to assign text, e.g., tweets, messages, or reviews, to one or multiple categories. Such categories can be the author’s mood: is a review positive or negative?
- Learn AI & platform tutorial
- Target audience: Beginners
You will learn to
- How to build and deploy a model based on BERT.
- Work with single-label text classification.
Create a project
First, navigate to the Projects view.
Click New project to create a project and name it so you know what kind of project it is.
What is BERT?
BERT pushed the state of the art in Natural Language Processing (NLP), the techniques that aim to automatically process, analyze and manipulate (large amounts) of language data like speech and text.
BERT does this by combining two powerful technologies:
It is based on a deep Transformer network. A type of network that can process efficiently long texts by using attention.
It is bidirectional. Meaning that it takes into account the whole text passage to understand the meaning of each word.
If you want it could be a good idea to read about word embeddings and BERT’s attention mechanism, which are important concepts in NLP. For an introduction and overview material, check out the links below:
Add the IMDB data
After creating the project, you will be taken to the Datasets view, where you can import data.
There are several ways to import your data to the platform. This time we will use the Data library that is packed with free-to-use datasets.
Click the Import free datasets button.
Look for the IMDB - tutorial data dataset in the list. Click on it to get more information. You can also read this article for more info on the IMDB dataset.
If you agree with the license, click Accept and import.
This will import the dataset in your project, and you can now edit it.
The feature encoding determines the way in which example data is turned into numbers that a model can do calculations with. Verify that the default feature encoding is correct:
If a feature uses the wrong settings, click the Wrench icon to change it.
Subsets of the dataset
Keep these default values in this project, but you can use whatever subset split you want later.
Save the dataset
You’ve now created a dataset ready to be used in the platform.
Click Save version and then click Use in new experiment and the Experiment wizard will pop up.
Design a text classification model with the wizard
The Experiment wizard opens to help you set up your training experiment. We’ll now go over each tab.
The platform selects the correct subsets by default.
Training uses 80% of the available examples.
Validation uses the 10% validation examples to check how the model generalizes to unseen examples.
Inputs / target tab
The platform should select the correct features by default.
Input column: make sure that only review is selected.
Target column: make sure that sentiment is selected.
Problem type tab
Make sure the Problem type is Single-label text classification, since we want to classify text into two possible categories (positive and negative). The platform recommends the appropriate Problem type based on the input and target features.
Click Create to build the model.
The model is now built in the Modeling view, and you can change some settings.
The model contain four blocks:
The Input block represents data coming into the model.
The Multilingual BERT block implements the BERT network in its base size.
The Dense block represents a fully connected layer of artificial nodes.
It outputs a tensor of shape
1since the target feature sentiment is one out of two values (
The Target block represents the output that we are trying to learn with our model.
Change sequence length in BERT block
Click on the Multilingual BERT block in the modeling canvas.
Set the Sequence length to 256, since this is a good compromise between speed and the amount of text preserved. Smaller sequences compute faster, but they might cut some words from your text.
Sequence length is the number of tokens (roughly the number of words) kept during text tokenization. If there are fewer tokens in an example than indicated by this parameter the text will be padded. If there are more tokens, the text will be truncated, i.e., cut from the end to fit the sequence length.
In the dataset, a Sequence length of 512 will cut 8% of the review samples, a length of 256 will cut 30%, and 128 will cut 74%.
Check experiment settings & run the experiment
Click the Settings tab:
Set the Batch Size to 16.
Batch size is how many samples should be calculated at the same time. A larger batch size than 16 would run out of memory. For text processing models, we recommend keeping the product Sequence length x Batch size below 3000 to avoid running out of memory.
Check that Epochs is 2. BERT models are already pretrained, and a delicate fine-tuning generally gives the best results.
Keep the loss function in the Target block.
Loss is a number on how well the model performs. If the model predictions are totally wrong, the loss will be a high number. If they’re pretty good, it will be close to zero.
To evaluate the performance of the model, you can look at the Binary accuracy by clicking on its name under the plot.
Binary accuracy gives the percentage of predictions that are correct. It should be about 85-90% by the end of training.
The precision gives the proportion of positive predictions, i.e., examples classified as
positive, that was actually correct.
The recall gives the proportion of positive examples that are identified by the model.
The confusion matrix shows how often examples are correctly or incorrectly classified as another category. Correct predictions fall on the diagonal.
Since the problem is a binary classification problem, the ROC curve (Receiver Operating Characteristic curve) will also be shown.
The ROC curve is a nice way to see how good the model generally is. The closer the ROC curve passes to the top left corner, the better the model is performing.
Test the text classifier in a terminal
To see what an actual request from the application and the response from the model may look like, you can run the example CURL command that is provided in the Code examples section of the Deployment view. Replace the VALUE parameter with review text and run the command in a terminal.
curl -X POST \ -F "review=A fun brain candy movie...good action...fun dialog. A genuinely good day" \ -u "<Token>" \ <URL>
The output will look something like this:
The predicted result is positive since it’s above the arbitrary threshold value.