Build your own music critic
Create a state-of-the-art model to find the mood of a song
Can we build a model that classifies the mood of a song based on a visual representation of the soundtrack?
We’re going to find out!
You will create a solution to solve a complex real-world problem using a state-of-the-art AI model.
This tutorial will show you how to build a model that will solve a multi-label classification problem. This means that the model tries to decide for each class (song mood) whether the song belongs to that class or not.
- Target audience: Intermediate users
- Setup - 15 minutes | Training - 20 minutes
You will learn
- Solve a multi-label classification problem.
- Understand that data can be transformed. Sound is images.
- Run multiple experiments.
See how Andreas did this tutorial on the Microsoft AI Show.
Create project - Figure out the mood of a song
First, click New project to create a project. Name it something like Song mood, so you know what kind of project it is. Naming is important.
Please note that by working with this dataset, you accept the author’s license in the Dataset licenses section of the Knowledge center.
This is a big dataset, so it will take some time to download and upload the dataset.
Waiting is normal when handling big datasets.
Upload the dataset from computer or URL
Once you have downloaded the dataset, you can upload it in two different ways. Either:
Navigate to the Datasets view. Locate the dataset on your computer and drag-and-drop it to the Upload file area.
Click URL import, copy the link to the dataset, and paste the link.
It may take some time to import the spectrogram file since it’s so large, but when all files are imported, name the dataset and click Done.
The tagger dataset
The Tagger dataset you have uploaded conists of song file segments converted to "log scaled mel spectrograms", each tagged with the songs' moods.
Definition of log scaled mel spectrogram:
The magnitude of the short-time Fourier transform (STFT) transformed to mel scale. This is then transformed to log scale.
For great detail on how to convert music file segments to "log scaled mel spectrograms" yourself, check out our GitHub repo with the Jupyter Notebook ready to go.
One song in the dataset, I’m Your Ride (Instrumental Version), is tagged with "Happy." Listen here:
Understanding the data - Is it good enough?
One of the biggest challenges with machine learning is the quality of the input data. Usually, it’s not good enough. In this tutorial, the ground truth (the labels for the data) comes from a manual assignment of moods, based partially on subjective opinions. This will make it difficult for the model to identify consistent patterns in the training data.
Did you agree that the song I’m Your Ride (Instrumental Version) really is "Happy"? What about "Hopeful"? These tags were created by hand, by different people. Now imagine you are a data scientist working for this music company, and your goal is to improve the consistency of search results when searching by mood.
This will import the dataset in your project, and you can now edit it.
Preprocess the dataset on platform
All samples in the dataset are by default split into 10% validation, 10% test, and 80% training subsets.
Usually, the default settings are used to train a deployed model for real-world use.
However, this dataset is so large and it’s always a good idea to test your model first. You will therefore start by using only 8% of the dataset for training and 2% for validation.
Create new smaller subsets
To do this you need to make sure you see the advanced settings.
In the Subsets section, click New subset.
Name the subset Split 8/2.
Click the 2. Split subset tab and select Random.
Set the Size of the first subset to 8 and name it Training 8%.
Set the Size of the second subset to 2 and name itand the Validation 2%.
Create a small feature set for 5 moods
Click on New feature set. As a first quick stab at this problem, let’s choose only these five moods to see if the model can see any differnence. Select these five moods and name the feature set 5 moods.
Click Create to create the new feature set.
Save the dataset
You’ve now created a dataset ready to be used in the platform. Click Use in new experiment.
Build a model in the Experiment wizard
In the Experiment wizard, name the experiment, then in the:
Select the Split 8/2 subset. This is the small subset you just created.
Select spectrogram (150×716×1) as input.
Select 5 moods as target.
Problem type tab:
Make sure that Multi-label image classification is selected.
Start training from the Modeling view
Since you used the magic Experiment wizard, everything is set up and ready to be trained. So just click Run to start to train the model.
Run concurrent experiment
While the first experiment runs you can build and run a concurrent experiment to find out if you can improve the experiment.
Note: If you’re on the Free plan you can run 1 experiment at a time. All other plans can run concurrent experiments.
Duplicate the experiment and then use the left side section to add blocks and change settings in the new model. As long as you keep the same loss function (in this case, binary crossentropy) you can compare the models' results and see which one is the best in the Evaluation view.
If there is a large discrepancy between training and validation losses, try to introduce Dropout and or Batch normalization blocks to improve generalization. If the training loss is very high, the model is not learning well enough. After you have adjusted your model you can compare the two different results in the Evaluation view, to see which one performed the best.
Navigate to the Evaluation view. As the training of the models advances, epoch by epoch, the training and validation performance metrics are visualized in the evaluation graphs.
Often an experiment is early stopped. Early stopping monitors the evolution of the given metric after every epoch. If the metric has not improved for Patience epochs in a row, the experiment is early stopped. You don’t want to run an experiment for too long to prevent overfitting, and there is no point to continue to train if the model doesn’t improve.
What does the graph show you?
The platform marks the best epoch for you where the validation loss is at it’s lowest.
When analyzing an experiment, we don’t want overfitting.
That is when the training and the validation lines to grow further apart as time goes on.
Overfitting is when the model memorizes what is correct from the training dataset instead of learning what is correct. That means that the model can’t figure out how to tag a new song from the validation set the model hasn’t seen before.
Read our article on loss and metrics to understand how to read the loss curve.
Last iteration with the whole dataset
In this tutorial, we’ve kept things small just to test out our ideas. We’ve only used 5 out of 46 moods plus used the 8%-training and 2%-validation subsets. Now when you’re satisfied with your model, you should train it on the complete dataset with all moods to see what happens. This will take a much longer time though, since we’re using all data.
To do this, navigate to the Datasets view and click on Go to draft to create a new editable version of the dataset.
In this version, create a new feature set with all 46 moods.
Click on Save version to make it version 2 of the dataset.
Then navigate to the Modeling view and click on Duplicate to create a fresh copy of the model.
In the project Settings, under Dataset, select #2 as the Version, and Default split as split.
Finally, you will need to update the model to work with all the new moods.
Click on the Target block and set the target feature to the one that includes all 46 moods. Now change the number of Nodes in the last Dense block to 46 to match the number of moods. Then Run the experiment again to train using the full dataset.
Deploy your model
If you want to learn how to deploy your model and put it to use, you’ll find all info in the Deployment view section.
Tutorial recap - You have solved a real-world problem
Congratulations! You have built a multi-label classification model that labels songs according to mood. It took time, but you’ve just built something fundamentally cool. Achievement unlocked!
Think of other dense data types you can represent as a mel spectrogram. Or other datasets that you can now classify into multiple classes.
Your model can be used for many use cases such as:
Automating tagging of new songs. This will remove subjective opinions on what mood a new song has, making the tagging more consistent.
Improving the quality and consistency of existing metadata. Since the manual assignments are not perfect, the model can be used to identify most likely erroneous tags for existing songs and suggest alternative tags.
Tagging songs at a finer granularity level to allow more detailed queries when searching for songs.
Finding related songs by ranking songs according to mood similarity.
Making soundtracks easy to tag & find, through AI