Build your own music critic
Solve a multi-label image classification problem
Can we build a model that classifies the mood of a song based on a visual representation of the soundtrack? We’re going to find out!
This tutorial will show you how to build a model to solve a multi-label classification problem. This means that the model tries to decide for each label (song mood) whether the song belongs to that label or not.
- Target audience: Beginners
- Estimated Time: 30 minutes
- Tutorial type: Learn AI & platform
- Problem type: Image classification
You will learn
- Solve a multi-label classification problem.
- Understand that data can be transformed. Sound is images.
Create project
First, click New project to create a project. Name it something like Song mood, so you know what kind of project it is. Naming is important.

Add dataset to the platform
Download dataset
First, download the dataset to your computer from where we store it. Please note that by working with this dataset, you accept the author’s license.
This is a big dataset, so it will take some time to download and upload the dataset.
Waiting is normal when handling big datasets.
Upload dataset
Once you have downloaded the dataset navigate to the Datasets view. Click Upload file and select the downloaded dataset.
It may take some time to import the spectrogram file since it’s so large, but when all files are imported, name the dataset and click Done.
You can also import a dataset to the Platform without downloading it. To do this, copy the link to the dataset and use URL import tab when creating a new dataset.
The tagger dataset
The Tagger dataset you have uploaded consists of song file segments converted to images, each tagged with the songs' moods.
Example:
One song in the dataset, I’m Your Ride (Instrumental Version), is tagged with "Happy."
Listen here:

Preprocess the dataset on platform
Create a small feature set for 5 moods
For multi-label image classification problems, you need to group your target features into a feature set. All features in the target feature set must have binary encoding. This target feature set will be used in the Target block.
It’s always a good idea to test your model first so as a first quick stab at this problem, let’s choose only these five moods to see if the model can see any difference.
-
Click on New feature set.
-
Select these five moods and name the feature set 5 moods.
-
Angry
-
Countryside
-
Dark
-
Epic
-
Happy
-
-
Click Create to create the new feature set.
Create new smaller subsets
All samples in the dataset are by default split into 10% validation, 10% test, and 80% training subsets.
Usually, the default settings are used to train a deployed model for real-world use.
However, this dataset is so large you will therefore start by using only 8% of the dataset for training and 2% for validation.
-
Make sure you see the advanced settings.
-
In the Subsets section, click New subset.
-
Name the subset Split 8/2.
-
Click the 2. Split subset tab and select Random.
-
Set the Size of the first subset to 8 and name it Training 8%.
Set the Size of the second subset to 2 and name it Validation 2%. -
Click Create.

Save dataset
You’ve now created a dataset ready to be used in the platform. Click Use in new experiment.

Build a model in the Experiment wizard
In the Experiment wizard, name the experiment, then in the:
-
Dataset tab
Select the Split 8/2 subset. This is the small subset you just created. -
Inputs / target tab
Select spectrogram (150×716×1) as input.
Select 5 moods as target. -
Problem type tab
Make sure that Multi-label image classification is selected.
Click Create.
Run experiment
Since you used the magic Experiment wizard, everything is set up and ready to be trained. So just click Run to start to train the model.

Analyze experiment
Navigate to the Evaluation view. As the training of the model advances, epoch by epoch, the training and validation performance metrics are visualized in the evaluation graphs.
Often an experiment is early stopped. Early stopping monitors the evolution of the validation loss after every epoch. If the loss hasn’t improved for Patience epochs in a row, the experiment is early stopped. You don’t want to run an experiment for too long to prevent overfitting, and there is no point in continuing to train if the model doesn’t improve.
What does the graph show you?
The platform marks the best epoch for you where the validation loss is at its lowest.
Overfitting
When analyzing an experiment, we don’t want to overfit.
That is when the training and the validation lines grow further apart as time goes on.
Overfitting is when the model memorizes what is correct from the training dataset instead of learning what is correct.
That means that the model can’t figure out how to tag a new song from the validation set the model hasn’t seen before.
Read our article on loss and metrics to understand how to read the loss curve.

Deploy your model
If you want to learn how to deploy your model and put it to use, you’ll find all info in the Deployment view section.
Tutorial recap
Congratulations! You have built a multi-label classification model that labels songs according to mood. It took time, but you’ve just built something fundamentally cool. Achievement unlocked!
Think of other dense data types you can represent as a mel spectrogram. Or other datasets that you can now classify into multiple classes.
Possible applications
Your model can be used for many use cases such as:
-
Automating tagging of new songs. This will remove subjective opinions on what mood a new song has, making the tagging more consistent.
-
Improving the quality and consistency of existing metadata. Since the manual assignments are not perfect, the model can be used to identify most likely erroneous tags for existing songs and suggest alternative tags.
-
Tagging songs at a finer granularity level to allow more detailed queries when searching for songs.
-
Finding related songs by ranking songs according to mood similarity.

Next steps
Binary image classification
We suggest that you continue with our tutorial Detecting defects in mass produced parts. You’ll learn how to build a binary image classification model to detect faulty parts in a production line.
Intermediate audio
Related content
Making soundtracks easy to tag & find, through AI
Customer story: