Build your own music critic

Solve a multi-label image classification problem

Can we build a model that classifies the mood of a song based on a visual representation of the soundtrack? We’re going to find out!

This tutorial will show you how to build a model to solve a multi-label classification problem. This means that the model tries to decide for each label (song mood) whether the song belongs to that label or not.

Person - Target audience: Beginners
Clock - Estimated Time: 30 minutes
Spaceship - Tutorial type: Learn AI & platform
Bell - Problem type: Image classification

You will learn
Peltarion logo - Solve a multi-label classification problem.
Peltarion logo - Understand that data can be transformed. Sound is images.

Create project

First, click New project to create a project. Name it something like Song mood, so you know what kind of project it is. Naming is important.

New project button

Add dataset to the platform

Download dataset

First, download the dataset to your computer from where we store it. Please note that by working with this dataset, you accept the author’s license.

This is a big dataset, so it will take some time to download and upload the dataset. Clock

Waiting is normal when handling big datasets.

Upload dataset

Once you have downloaded the dataset navigate to the Datasets view. Click Upload file and select the downloaded dataset.

It may take some time to import the spectrogram file since it’s so large, but when all files are imported, name the dataset and click Done.

You can also import a dataset to the Platform without downloading it. To do this, copy the link to the dataset and use URL import tab when creating a new dataset.

The tagger dataset

The Tagger dataset you have uploaded consists of song file segments converted to images, each tagged with the songs' moods.

One song in the dataset, I’m Your Ride (Instrumental Version), is tagged with "Happy." Listen here:

log scaled mel spectrogram
Figure 1. The illustration shows an image of the first 30 seconds of "I’m Your Ride (Instrumental Version)." X-axis: time, y-axis: frequency (high Hz at the bottom and low Hz on top)

Preprocess the dataset on platform

Create a small feature set for 5 moods

For multi-label image classification problems, you need to group your target features into a feature set. All features in the target feature set must have binary encoding. This target feature set will be used in the Target block.

It’s always a good idea to test your model first so as a first quick stab at this problem, let’s choose only these five moods to see if the model can see any difference.

  1. Click on New feature set.

  2. Select these five moods and name the feature set 5 moods.

    • Angry

    • Countryside

    • Dark

    • Epic

    • Happy

  3. Click Create to create the new feature set.

Create new smaller subsets

All samples in the dataset are by default split into 10% validation, 10% test, and 80% training subsets. Usually, the default settings are used to train a deployed model for real-world use.
However, this dataset is so large you will therefore start by using only 8% of the dataset for training and 2% for validation.

  1. Make sure you see the advanced settings.

  2. In the Subsets section, click New subset.

  3. Name the subset Split 8/2.

  4. Click the 2. Split subset tab and select Random.

  5. Set the Size of the first subset to 8 and name it Training 8%.
    Set the Size of the second subset to 2 and name it Validation 2%.

  6. Click Create.

Subset split with 8% training and 2% validation.
Figure 2. Subset split with 8% training and 2% validation.

Save dataset

You’ve now created a dataset ready to be used in the platform. Click Use in new experiment.

Use in new experiment button

Build a model in the Experiment wizard

In the Experiment wizard, name the experiment, then in the:

  • Dataset tab
    Select the Split 8/2 subset. This is the small subset you just created.

  • Inputs / target tab
    Select spectrogram (150×716×1) as input.
    Select 5 moods as target.

  • Problem type tab
    Make sure that Multi-label image classification is selected.
    Click Create.

Run experiment

Since you used the magic Experiment wizard, everything is set up and ready to be trained. So just click Run to start to train the model.

Run button

Analyze experiment

Navigate to the Evaluation view. As the training of the model advances, epoch by epoch, the training and validation performance metrics are visualized in the evaluation graphs.

Often an experiment is early stopped. Early stopping monitors the evolution of the validation loss after every epoch. If the loss hasn’t improved for Patience epochs in a row, the experiment is early stopped. You don’t want to run an experiment for too long to prevent overfitting, and there is no point in continuing to train if the model doesn’t improve.

What does the graph show you?

The platform marks the best epoch for you where the validation loss is at its lowest.


When analyzing an experiment, we don’t want to overfit. That is when the training and the validation lines grow further apart as time goes on.
Overfitting is when the model memorizes what is correct from the training dataset instead of learning what is correct. That means that the model can’t figure out how to tag a new song from the validation set the model hasn’t seen before.

Read our article on loss and metrics to understand how to read the loss curve.

Loss ­curve ­with ­checkpoints^
Figure 3. Loss ­curve ­with ­checkpoints.

Deploy your model

If you want to learn how to deploy your model and put it to use, you’ll find all info in the Deployment view section.

Tutorial recap

Congratulations! You have built a multi-label classification model that labels songs according to mood. It took time, but you’ve just built something fundamentally cool. Achievement unlocked!

Think of other dense data types you can represent as a mel spectrogram. Or other datasets that you can now classify into multiple classes.

Possible applications

Your model can be used for many use cases such as:

  • Automating tagging of new songs. This will remove subjective opinions on what mood a new song has, making the tagging more consistent.

  • Improving the quality and consistency of existing metadata. Since the manual assignments are not perfect, the model can be used to identify most likely erroneous tags for existing songs and suggest alternative tags.

  • Tagging songs at a finer granularity level to allow more detailed queries when searching for songs.

  • Finding related songs by ranking songs according to mood similarity.

Figure 4. Epic jazz played on our favorite instrument, the saxophone!

Next steps

Binary image classification

We suggest that you continue with our tutorial Detecting defects in mass produced parts. You’ll learn how to build a binary image classification model to detect faulty parts in a production line.

Intermediate audio

Making soundtracks easy to tag & find, through AI

Customer story:

Epidemic CEO Oscar Höglund
Figure 5. Oscar Höglund, Co-Founder and CEO of Epidemic Sound
Was this page helpful?