New to Peltarion? Discover our deep learning Platform
A single deep learning platform to build and deploy your projects, even if you’re not an AI superstar.FIND OUT MORE
Music lyrics analyzer
Build your own text classifier app
We have some serious music nerds at Peltarion, and discussions of whether a song is pop or rock can go on forever. The Head of Engineering at Peltarion Daniel Skantze got sick and tired of the endless discussions and decided to build a web app to once and for all decide if a track is pop or rock.
This tutorial is based on how he did it, and gives you the steps to build your own model.
Target audience: Developers
Test the web app first
Open a browser and enter the address pop-or-rock.demo.peltarion.com.
Search the internet for your favorite pop or rock lyrics. Copy/paste the lyrics in the web app and hit the submit button. Is our model correct?
You will learn a data science end-to-end workflow
This tutorial will show you the three main steps of a data science workflow:
Preprocess a dataset to make it fit your model in a Jupyter Notebook.
Build, train and deploy a BERT model in the Peltarion Platform.
Create a web app that will be able to determine if a song lyric is pop or rock.
Clone and download the pop-or-rock repo
Before you do anything else, clone and download the pop-or-rock GitHub repo.
This repo includes all the files you need for step 1, preprocess the lyrics data, and step 3, build a web app that uses your deployed model.
Preprocess the data
Download the data
Download the lyrics dataset from Kaggle. In this dataset, there are around 380,000+ lyrics from a lot of different artists and genres, all arranged by year.
Extract the data
Extract the lyrics.csv into the
In this tutorial you will preprocess the data in a Jupyter Notebook. There are other ways to run a notebook, e.g., colab, but it’s always best to show only one way to solve a problem in a tutorial, so we’ll stick with Jupyter.
In the terminal, navigate to the
pipenv install jupyter
Now you have Jupyter installed for your project and you have access to it in your shell.
pip install jupyter
Preprocess the dataset csv to fit the Peltarion Platform
We have created a Jupyter Notebook for this tutorial with all the preprocessing steps.
In a terminal, navigate to the
/demo-pop-or-rock/data_preparation/ directory and run this command:
jupyter notebook pop_rock_classifier.ipynb
This will open the notebook in a browser. Follow all the steps in the notebook, but basically, this is what you’re going to do:
Remove all rows with missing entries.
Some fields are missing for some rows. Training a model requires all fields to be populated, so we simply remove the rows with missing entries. Removing rows works well with this large dataset, but for smaller datasets where you don’t want to remove data, you could instead replace the missing values, e.g., with average value or the most frequent value. It all depends on your dataset.
Keep only pop and rock genres.
We do not need lyrics from any other genre and can remove all other lyric genres.
Check for optimal sequence length.
The BERT model requires a predefined sequence length between 3-512 tokens to work. However, song lyrics can be of any length, so we need to decide what this sequence length should be. A larger sequence length lets the model capture more information, typically giving better performance, but it also increases the training time quite a lot. Hence, we want to set a sequence length to a value that is large enough to capture most lyrics, but not any larger.
During model training, the model must be presented with examples from all classes. Additionally, if one class is much more common than another, the model will tend to predict the common class more often, just because it was seen more often during training. This is, of course, a bad thing, so we want to make sure the data is balanced between all classes before training on it.
On the platform, you can use class weights to adjust models for imbalanced data, but since we have this nice notebook up and running, we’ll do the balancing here.
Create output csv.
Last but not least, you’ll save the dataset as; balanced_downsampled_pop_rock_lyrics.csv
Build a model on the platform
Open the Peltarion Platform, https://platform.peltarion.com/.
If you don’t have an account yet, here’s where you sign up. It’s all free, and you get instant access.
Import the lyrics dataset to the platform
Create a project on the platform and navigate to the Datasets view (click the Datasets icon at the top of the page).
Click New dataset and upload the balanced_downsampled_pop_rock_lyrics.csv.
Select the lyrics feature, click the spanner () and make sure the Encoding is Text.
Set Sequence length to 400; this is the sequence length we found was a good one in the notebook.
Set Language model to English BERT uncased.
Select the genre feature and make sure the Encoding is Categorical.
Save this version of the dataset and click Use in new experiment.
Use the BERT snippet
In the Experiment wizard make sure that the correct dataset is selected. Set Input feature to lyrics and Target feature to genre.
Select the BERT English uncased snippet in the Choose snippet tab.
In the Initialize weights tab, tick the box Weights trainable (all blocks).
The BERT snippet allows you to use this massive network with pretrained weights. You don’t have to build and train it yourself, which is a BIG win!
Click Create and the BERT model will appear on the Modeling canvas. All settings will be prepopulated by the wizard, but for your information:
Batch size is 6. Due to the long sequence length of 400, the batch size needs to be this small.
Epochs is 2. Don’t run it too long.
Learning rate is 0,000001 (5 zeros). Take small steps to avoid catastrophic forgetting.
Click Run to fire up BERT!
Navigate to the Evaluation view and watch the model train. Because BERT is a very large and complex model, nothing much will show in the beginning since the training will take a long time. At Peltarion, we usually grab a fika while we wait, or create mote experiments to run in parallel. Easy experimenting is one of the key features of the Peltarion Platform.
You can currently expect more than four hours for two epochs when you use sequence length 400.
Enable the deployment
Finally, when the model has been trained, navigate to the Deployment view, and create a new deployment. Click Enable to set the deployment live.
Keep this window open. You’ll soon need to copy information from the Deployment view.
Create a web app – use the model
You have now deployed your trained “pop or rock” model, and you can create a web app that uses the model. The idea is that the app will display a simple webpage with a text area and a submit button. A user submits some lyrics, clicks the submit button and gets a response page stating if the song is pop or rock. Simple but effective! There will be no confusion if the lyric is classified as pop or rock :)
Pop or rock repository
Everything you need to create the web app is available at the repo demo-pop-or-rock that you cloned in the beginning of this tutorial.
Set up the configuration first
Create a configuration file based on our example config file: sample-config.json. Name this file app-config.json (it must be this name).
Create a config-folder
Create a config-folder and save the app-config.json file there. We have entered the folder config/ in .gitignore, so it’s safe to put config files there.
Copy and paste deployment URL and token
Go to the Deployment view of your project and find the deployment’s URL and Token. Copy and paste the deployment URL and token to app-config.json.
Start the app with npm
In a terminal navigate to the pop-or-rock repository directory.
Run (if needed):
(will use app-config.json is the config file you created previously)
Test the classifier in a browser
Open a browser and enter the address the http://127.0.0.1:3000.
Search the internet for your favorite pop or rock lyrics. Copy/paste in the web app and hit the submit button. Is your model correct?
Tutorial recap and next steps
In this tutorial you’ve learned how to preprocess a dataset, build, train and deploy a model on the Peltarion Platform, and last but not least, created a web app to discern if a song is pop or rock. A complete science end-to-end data project! You’re a data scientist now!
Next steps and linked ideas
This is, of course, only a demo of what you can do with BERT on the platform and how you can build a demo app. If you want to dig more into BERT, you should do our Movie_review_feelings.adoc[BERT movie review sentiment analysis tutorial] or Build a besserwisser bot for Slack with BERT.
Ideas on what you can do with BERT are endless: joke classifier, novel categorization, writing style classifier, political sentiment and more. The platform makes it easy to realize the ideas, but remember, it’s key to have good data.