Estimated time: BERT model training + 60 min

Pop or rock? Create your own app and let BERT decide

We have some serious music nerds at Peltarion, and discussions of whether a song is pop or rock can go on forever. The Head of Engineering at Peltarion Daniel Skantze got sick and tired of the endless discussions and decided to build a web app to once and for all decide if a track is pop or rock.

This tutorial is based on how he did it, and gives you the steps to build your own model.

Test the app for yourself

Open a browser and enter the address pop-or-rock.demo.peltarion.com.

Search the internet for your favorite pop or rock lyrics. Copy/paste the lyrics in the web app and hit the submit button. Is our model correct?

Wannabe by Spice Girls is pop according to our webapp. Well, that is a true prediction.

You will learn a data science end-to-end workflow

This tutorial will show you the three main steps of a data science workflow:

  1. Preprocess a dataset to make it fit your model in a Jupyter Notebook.
  2. Build, train and deploy a BERT model in the Peltarion Platform.
  3. Create a web app that will be able to determine if a song lyric is pop or rock.

Clone and download the pop-or-rock repo

Before you do anything else, clone and download the pop-or-rock GitHub repo.

This repo includes all the files you need for step 1, preprocess the lyrics data, and step 3, build a web app that uses your deployed model.

Preprocess the data

Download the data

Download the lyrics dataset from Kaggle. In this dataset, there are around 380,000+ lyrics from a lot of different artists and genres, all arranged by year.

Extract the data

Extract the lyrics.csv into the /demo-pop-or-rock/data_preparation/ directory.

Install pipenv 

In this tutorial you will preprocess the data in a Jupyter Notebook. There are other ways to run a notebook, e.g., colab, but it’s always best to show only one way to solve a problem in a tutorial, so we’ll stick with Jupyter. 

In the terminal, navigate to the /demo-pop-or-rock directory.

Run
pipenv install jupyter

Run
pipenv shell

Now you have Jupyter installed for your project and you have access to it in your shell.

Run
pip install jupyter

Preprocess the dataset csv to fit the Peltarion Platform

We have created a Jupyter Notebook for this tutorial with all the preprocessing steps.

In a terminal, navigate to the /demo-pop-or-rock/data_preparation/ directory and run this command:
jupyter notebook pop_rock_classifier.ipynb

This will open the notebook in a browser. Follow all the steps in the notebook, but basically, this is what you’re going to do:

  1. Remove all rows with missing entries.
    Some fields are missing for some rows. Training a model requires all fields to be populated, so we simply remove the rows with missing entries.
    Removing rows works well with this large dataset, but for smaller datasets where you don't want to remove data, you could instead replace the missing values, e.g., with average value or the most frequent value. It all depends on your dataset.
  2. Keep only pop and rock genres.
    We do not need lyrics from any other genre and can remove all other lyric genres.
  3. Check for optimal sequence length.
    The BERT model requires a predefined sequence length between 3-512 tokens to work. However, song lyrics can be of any length, so we need to decide what this sequence length should be.
    A larger sequence length lets the model capture more information, typically giving better performance, but it also increases the training time quite a lot. Hence, we want to set a sequence length to a value that is large enough to capture most lyrics, but not any larger.
  4. Balance classes.
    During model training, the model must be presented with examples from all classes. Additionally, if one class is much more common than another, the model will  tend to predict the common class more often, just because it was seen more often during training. This is, of course, a bad thing, so we want to make sure the data is balanced between all classes before training on it.
  5. Create output csv.
    Last but not least, you’ll save the dataset as; balanced_downsampled_pop_rock_lyrics.csv

Build a model on the platform

Open the Peltarion Platform, https://platform.peltarion.com/.

Sign up
If you don't have an account yet, here's where you sign up. It's all free, and you get instant access.

Import the lyrics dataset to the platform

  1. Create a project on the platform and navigate to the Datasets view (click the Datasets icon at the top of the page).
  2. Click New dataset and upload the balanced_downsampled_pop_rock_lyrics.csv .
  3. Select the lyrics feature, click the spanner () and make sure the Encoding is Text.
    Set Sequence length to 400; this is the sequence length we found was a good one in the notebook.
    Set Language model to English BERT uncased.
  4. Select the genre feature and make sure the Encoding is Categorical.
  5. Save this version of the dataset and navigate to the Modeling view (click the Modeling icon at the top of the page).

Use the BERT snippet

  1. In the Modeling view create a new experiment. Select the BERT English uncased snippet. Tick the box Weights trainable (all blocks).
    The BERT snippet allows you to use this massive network with pretrained weights. You don’t have to build and train it yourself, which is a BIG win!
  2. In the Input block, set Feature to lyrics.
  3. In the Target block, set Feature to genre.
  4. Open the Settings tab and change:
  • Batch size to 6. Due to the long sequence length of 400, the batch size needs to be this small.
  • Epochs to 2. Don’t run it too long.
  • Learning rate to 0,000001 (5 zeros). Take small steps to avoid catastrophic forgetting.

Run experiment

Click Run to fire up BERT!

Navigate to the Evaluation view and watch the model train. Because BERT is a very large and complex model, nothing much will show in the beginning since the training will take a long time. At Peltarion, we usually grab a fika while we wait, or create mote experiments to run in parallel. Easy experimenting is one of the key features of the Peltarion Platform.

You can currently expect more than four hours for two epochs when you use sequence length 400.

Enable the deployment

Finally, when the model has been trained, navigate to the Deployment view, and create a new deployment. Click Enable to set the deployment live. 

Keep this window open. You'll soon need to copy information from the Deployment view.

Create a web app – use the model

You have now deployed your trained “pop or rock” model, and you can create a web app that uses the model. The idea is that the app will display a simple webpage with a text area and a submit button. A user submits some lyrics, clicks the submit button and gets a response page stating if the song is pop or rock. Simple but effective! There will be no confusion if the lyric is classified as pop or rock :)

Pop or rock repository

Everything you need to create the web app is available at the repo demo-pop-or-rock that you cloned in the beginning of this tutorial.

Set up the configuration first

Create app-config.json
Create a configuration file based on our example config file: sample-config.json. Name this file app-config.json (it must be this name).

Create a config-folder
Create a config-folder and save the app-config.json file there. We have entered the folder config/ in .gitignore, so it's safe to put config files there.

Copy and paste deployment URL and token

Go to the Deployment view of your project and find the deployment's URL and Token. Copy and paste the deployment URL and token to app-config.json.

Start the app with npm

In a terminal navigate to the pop-or-rock repository directory. 

Run:

npm install (if needed)

npm start (will use app-config.json is the config file you created previously)

Test the classifier in a browser

Open a browser and enter the address the http://127.0.0.1:3000.

Search the internet for your favorite pop or rock lyrics. Copy/paste in the web app and hit the submit button. Is your model correct?

Recap

In this tutorial you’ve learned how to preprocess a dataset, build, train and deploy a model on the Peltarion Platform, and last but not least, created a web app to discern if a song is pop or rock. A complete science end-to-end data project! You’re a data scientist now!

Next steps and linked ideas

This is, of course, only a demo of what you can do with BERT on the platform and how you can build a demo app. If you want to dig more into BERT, you should do our BERT movie review sentiment analysis tutorial or Build a besserwisser bot for Slack with BERT.

Ideas on what you can do with BERT are endless: joke classifier, novel categorization, writing style classifier, political sentiment and more. The platform makes it easy to realize the ideas, but remember, it’s key to have good data.