Build your own music critic
Create a state-of-the-art model to find the mood of a song
Can we build a model that classifies the mood of a song based on a visual representation of the soundtrack?
We’re going to find out!
You will create a solution to solve a complex real-world problem using a state-of-the-art AI model.
This tutorial will show you how to build a model that will solve a multi-label classification problem. This means that the model tries to decide for each class (song mood) whether the song belongs to that class or not.
- Target audience: Intermediate users
- Setup - 15 minutes | Training - 20 minutes
You will learn
- Solve a multi-label classification problem.
- Understand that data can be transformed. Sound is images.
- Run multiple experiments.
See how Andreas did this tutorial on the Microsoft AI Show.
Create project - Figure out the mood of a song
First, click New project to create a project. Name it something like Song mood, so you know what kind of project it is. Naming is important.
Please note that by working with this dataset, you accept the author’s license in the Dataset licenses section of the Knowledge center.
This is a big dataset, so it will take some time to download and upload the dataset.
Waiting is normal when handling big datasets.
Upload the dataset from computer or URL
Once you have downloaded the dataset, you can upload it in two different ways. Either:
Navigate to the Datasets view. Locate the dataset on your computer and drag-and-drop it to the Datasets view.
Click the URL import box, copy the link to the dataset, and paste the link.
It may take some time to import the spectrogram file since it’s so large, but when all files are imported, name the dataset and click Done.
The tagger dataset
The Tagger dataset you have uploaded conists of song file segments converted to "log scaled mel spectrograms", each tagged with the songs' moods.
Definition of log scaled mel spectrogram:
The magnitude of the short-time Fourier transform (STFT) transformed to mel scale. This is then transformed to log scale.
For great detail on how to convert music file segments to "log scaled mel spectrograms" yourself, check out our GitHub repo with the Jupyter Notebook ready to go.
One song in the dataset, I’m Your Ride (Instrumental Version), is tagged with "Happy." Listen here:
Understanding the data - Is it good enough?
One of the biggest challenges with machine learning is the quality of the input data. Usually, it’s not good enough. In this tutorial, the ground truth (the labels for the data) comes from a manual assignment of moods, based partially on subjective opinions. This will make it difficult for the model to identify consistent patterns in the training data.
Did you agree that the song I’m Your Ride (Instrumental Version) really is "Happy"? What about "Hopeful"? These tags were created by hand, by different people. Now imagine you are a data scientist working for this music company, and your goal is to improve the consistency of search results when searching by mood.
Preprocess the dataset on platform
All samples in the dataset are by default split into 10% validation, 10% test, and 80% training subsets.
Usually, the default settings are used to train a deployed model for real-world use.
However, this dataset is so large and it’s always a good idea to test your model first. You will therefore start by using only 8% of the dataset for training and 2% for validation.
Create new smaller subsets
In the Subsets section, click New subset.
Name the subset Split 8/2.
Click the 2. Split subset tab and select Random.
Set the Size of the first subset to 8 and name it Training 8%.
Set the Size of the second subset to 2 and name itand the Validation 2%.
Create a small feature set for 5 moods
Click on New feature set. As a first quick stab at this problem, let’s choose only these five moods to see if the model can see any differnence. Select these five moods and name the feature set 5 moods.
Click Create to create the new feature set.
Save the dataset
You’ve now created a dataset ready to be used in the platform. Click Save version and then click Use in new experiment.
Select DenseNet in the Experiment wizard
In the Experiment wizard, name the experiment, then:
In the Dataset tab
Select the Split 8/2 subset. This is the small subset you just created.
In the Input/Targets tab
Select spectrogram (150×716×1) as input.
Selext 5 moods as target.
In the Snippet tab
Select Multi-label image classification.
Select DenseNet 169.
Start training from the modeling view
Since you used the magic Experiment wizard, everything is set up and ready to be trained. So just click Run to start to train the model.
Run concurrent experiment
While the first experiment runs you can build and run a concurrent experiment to find out if you can improve the experiment.
Note: If you’re on the Free plan you can run 1 experiment at a time. All other plans can run concurrent experiments.
Duplicate the experiment and then use the left side section to add blocks and change settings in the new model. As long as you keep the same loss function (in this case, binary crossentropy) you can compare the models' results and see which one is the best in the Evaluation view.
If there is a large discrepancy between training and validation losses, try to introduce Dropout and or Batch normalization blocks to improve generalization. If the training loss is very high, the model is not learning well enough. After you have adjusted your model you can compare the two different results in the Evaluation view, to see which one performed the best.
Navigate to the Evaluation view. As the training of the models advances, epoch by epoch, the training and validation performance metrics are visualized in the evaluation graphs.
Often an experiment is early stopped. Early stopping monitors the evolution of the given metric after every epoch. If the metric has not improved for Patience epochs in a row, the experiment is early stopped. You don’t want to run an experiment for too long to prevent overfitting, and there is no point to continue to train if the model doesn’t improve.
What does the graph show you?
The platform marks the best epoch for you where the validation loss is at it’s lowest.
When analyzing an experiment, we don’t want overfitting.
That is when the training and the validation lines to grow further apart as time goes on.
Overfitting is when the model memorizes what is correct from the training dataset instead of learning what is correct. That means that the model can’t figure out how to tag a new song from the validation set the model hasn’t seen before.
Read our article on loss and metrics to understand how to read the loss curve.
Last iteration with the whole dataset
In this tutorial, we’ve kept things small just to test out our ideas. We’ve only used 5 out of 46 moods plus used the 8%-training and 2%-validation subsets. Now when you’re satisfied with your model, you should train it on the complete dataset with all moods to see what happens. This will take a much longer time though, since we’re using all data.
To do this, navigate to the Datasets view and work on the 2 Draft of the dataset.
In this version, create a new feature set with all 46 moods.
Save the 2 Draft to make it version 2 of the dataset.
Then duplicate your favorite model in the Modeling view, select 80% training subset and 10% validation. Click on the Target block and set the target feature to the one that includes all 46 moods. Now change the number of nodes in the last Dense block to 46. Save this dataset version and use it in a new model. Then run the experiment and watch it train in the Evaluation view.
Deploy your model
If you want to learn how to deploy your model and put it to use, you’ll find all info in the Deployment view section.
Tutorial recap - You have solved a real-world problem
Congratulations! You have built a multi-label classification model that labels songs according to mood. It took time, but you’ve just built something fundamentally cool. Achievement unlocked!
Think of other dense data types you can represent as a mel spectrogram. Or other datasets that you can now classify into multiple classes.
Your model can be used for many use cases such as:
Automating tagging of new songs. This will remove subjective opinions on what mood a new song has, making the tagging more consistent.
Improving the quality and consistency of existing metadata. Since the manual assignments are not perfect, the model can be used to identify most likely erroneous tags for existing songs and suggest alternative tags.
Tagging songs at a finer granularity level to allow more detailed queries when searching for songs.
Finding related songs by ranking songs according to mood similarity.
Making soundtracks easy to tag & find, through AI