BERT - Text classification / cheat sheet

Target audience: Data scientists and developers

Use this cheat sheet

Use this cheat sheet if you want to use BERT, and your input data consists of English text with a classification tag.

What is BERT?

BERT is a state-of-the-art deep learning language processing model. BERT performs significantly better than all other previous language models. Read more about BERT here.

Example use cases

Sentiment analysis

(😃 or 😱 or 😡 or 😍 or …​)

Try yourself: HappyDB - happy moments is a corpus of more than 100,000 happy moments.

Ticket classification

Save time and enable a quicker response by classifying incoming customer service requests as a first step, e.g., by topic, or urgency.

Positive v.s. negative

(👍v.s.👎)

Try yourself: This Quora dataset consists of answers labeled as sincere or sincere.

Data preparation

Dataset requirements

  • Csv-file saved in UTF-8.

  • Features (text, label) must be separated with commas.

  • The first row must include the feature’s names, i.e., headings for each feature column.

  • The text feature will be used in the Input block.

    • Put text feature inside double quotes "…​".

    • If you have a double quote (") in the text feature, replace it with 2 double-quotes ("").

  • At least one categorical feature to be used as target.

  • A line should be less than 20k characters.

BERT example CSV
Figure 1. Example CSV.

Requirements for BERT

  • Text examples should be mostly in English, or they won’t be understood.

  • Text examples should be mostly less than 512 tokens. Longer texts will be cut to fit the sequence length.

Datasets view - Update text feature

In the Datasets view make sure the target feature Encoding is set to Categorical.

Change the text feature settings to:

  • Encoding: Text (Beta).

  • Sequence length: 3-512.
    The BERT Encoder block accepts any integer input size from 3 to 512.
    For the best performance, use the smallest size that does not result in your text being outrageously cut (this is difficult to estimate).

  • Language model: English BERT uncased.

Modeling view

Add the BERT English uncased snippet.

Select the pretrained weights English Wikipedia and BookCorpus.
We provide a pre-trained BERT model so that you only need to input your data. Because building and training a BERT model from scratch is very expensive, takes a lot of time, and requires huge amounts of data.

Set all weights trainable by checking the Weights trainable (all blocks) box.

Input & target block

Set the Input feature to the text feature in your dataset, e.g., the text/ tweet/ doc/ review/ etc.

Set the Target feature to the categorical feature, e.g., sentiment/ label/ etc.

BERT snippet feature settings
Figure 2. BERT snippet feature settings

Change experiment settings

Click the Settings tab and change batch size, learning rate, and the number of epochs.

Batch size - as big as possible to train fast

The batch size should be as big as possible, simply because then the model will train faster. Too big batch size will fail the experiment because there will be not enough memory.

Guideline:

\[\text{Sequence}\text{ length} * \text{Batch}\text{ size} < 3000 \text{\textasciitilde} 4000\]
Table 1. Recommended max batch size based on sequence length (credit Google)
Sequence length Recommended max batch size

64

64

128

32

256

16

320

14

384

12

512

6

Epochs - 1 or 2

Set the number of epochs to 1 or 2.
Give the model a taste of your data, don’t brainwash it. It would just memorize your training set. BERT pre-knows a lot, but not quite what you need so it’s good to fine-tune it,

Learning rate - 0.000001 (5 zeros)

Set the learning rate to 0.000001 (5 zeros). Take tiny steps, don’t trample existing pre-trained weights.

Run the experiment

You’re all set. Hit the Run button.

Evaluation view

If you have a larger sequence length and many examples, the training will take quite a long time since BERT is a very large and complex model. Currently, expect more than four hours for 2 epochs when you use sequence length 512.
Go and grab some fika (a Swedish coffee and cake break) while you wait.

Deployment view

In the Deployment view, click New deployment.

Select your experiment and epoch for deployment.

Click the Enable switch to deploy the experiment.

Text Classifier web app

To make it easier to test your model, we have provided a Text Classifier web app.

Open the Text classifier: bit.ly/Text_classifier, and copy-paste the following information from the Deployment view:

  • Deployment URL

  • Token

  • input parameter feature name.

Then enter a new text and press the Run button ▶️ to predict!

Example:

Text webb app PA2