Build your own music critic

Predict mood from raw audio data

This tutorial will show you how to build a model that will solve a multi-label classification problem. This means that the model tries to decide for each class whether the example belongs to that class or not.

You will create a solution on the platform to solve a complex problem using state-of-the-art machine learning models.

Target audience: Intermediate users

Preread: This tutorial is based on an adaptation of FCN-6, a content-based automatic music tagging algorithm using fully convolutional neural networks (CNN), from the paper AUTOMATIC TAGGING USING DEEP CONVOLUTIONAL NEURAL NETWORKS. If you want to you can dig deep into that before you dig into this tutorial.

The problem

tut5b 1

Can you figure out the beat, feeling and mood of a song by "looking" at its signature? We’re going to find out! In this case we’re converting music file segments to "log scaled mel spectrograms." For great detail on how to do this yourself, check out our GitHub repo with the Jupyter Notebook ready to go

All the spectrograms are tagged with the songs' moods. For example, one song in the dataset, I’m Your Ride (Instrumental Version), is tagged with "Happy." Listen here to see if you agree:

tut5b 2
Figure 1. The illustration shows a log scaled mel spectrogram of the first 30 seconds of "I’m Your Ride (Instrumental Version)." X-axis: time, y-axis: frequency (high Hz at the bottom and low Hz on top)

Note: Definition of log scaled mel spectrogram:
The magnitude of the short-time Fourier transform (STFT) transformed to mel scale. This is then transformed to log scale.

Possible applications

If our model can predict and then tag a song, it could be used for a number of use cases such as:

  • Automating tagging of new songs. This will remove subjective opinions on what mood a new song has, making the tagging more consistent.

  • Improving the quality and consistency of existing metadata. Since the manual assignments are not perfect, the model can be used to identify most likely erroneous tags for existing songs and suggest alternative tags.

  • Tagging songs at a finer granularity level to allow more detailed queries when searching for songs.

  • Finding related songs by ranking songs according to mood similarity.

Understanding the data

One of the biggest challenges with machine learning is the quality of the input data. Quite often it’s not good enough. In this tutorial, the ground truth, that is the labels for the data, comes from a manual assignment of moods, based partially on subjective opinions. This will make it difficult for the model to identify consistent patterns in the training data.

For example: Did you agree that the song I’m Your Ride (Instrumental Version) really is "Happy"? What about "Hopeful"? These tags were created by hand, by different people. Now imagine you are a data scientist working for this music company and your goal is to improve consistency of search results when searching by mood.

Challenge accepted!

Create project

First, create a project and name it so you know what kind of project it is. Naming is important.

A project combines all of the steps in solving a problem, from pre-processing of datasets to model building, evaluation and deployment. Using projects makes it easy to collaborate with others.

Add the dataset

Please note that by working with this dataset you accept the author’s license in the Dataset licenses section of the Knowledge center.

When you have downloaded the dataset, navigate to the Datasets view and click New dataset. Drag and drop the dataset into the grey dotted line box.

You can also import the dataset without downloading it. To do this, copy the link to the dataset, paste the link in the Import data from URL box and click the arrow.

It may take some time to import the spectrogram file since it’s so large, but when all files are imported name the dataset and click Done.

Create your subsets

All samples in the dataset is by default split into 20% validation and 80% training subsets. Use the defaults to train a real working tagger. However, since this dataset is so large and, in real life, you may want to test our model first, we will only use 8% of the dataset for training and 2% for validation.

Create the training subset

  1. In the Subsets section, click New subset.

  2. Name the subset Training (8%).

  3. Click Add % random filter.

  4. Drag the slider until the Size is 8%.

  5. Keep the default Seed.

  6. Click Create.

  7. Set Normalize on subset to Training (8%).

tut5b 3
Figure 2. Training subset

Create the validation subset

  1. In the Subsets section, click New subset.

  2. Name the subset Validation (2%).

  3. Click Add % random filter.

  4. Click End under Fixed from.

  5. Drag the slider until the Size is 2%.

  6. Keep the default Seed.

  7. Click Create.

tut5b 4
Figure 3. Validation subset

Change encoding

Change the Encodings to Numeric for these moods:

  • Angry (1)

  • Countryside (1)

  • Dark (1)

  • Epic (1)

  • Happy (1)

You do this by clicking on the spanner next to the feature name.

Create a feature set for moods

Click on New feature set, name the feature set Moods and select all the moods for a real working tagger. Again however, as a first quick stab at this problem lets choose only these five moods:

  • Angry (1)

  • Countryside (1)

  • Dark (1)

  • Epic (1)

  • Happy (1)

Click Create to create the new feature set.

Save the dataset

You’ve now created a dataset ready to be used in the platform. Click Save version and then click Use in new experiment.

Design experiment

Now it’s time to build a model. We’ll build an adaptation of FCN-6, a content-based automatic music tagging algorithm using fully convolutional neural networks, from the paper AUTOMATIC TAGGING USING DEEP CONVOLUTIONAL NEURAL NETWORKS.

Click Create blank experiment in the Experiment wizard.

Select the 8%-training and 2%-validation subsets

Navigate to the Settings tab in the Inspector. In the Dataset settings section, select the tagger dataset and then set the 8%-training and 2%-validation subsets.

Tip! Use the zooming tools if the model doesn’t fit the Modeling canvas. You’ll find more navigation tips in the topic Modeling canvas controls.

tut5b 5

Add blocks to the model

This is the model you are going to build:

tut5b 6
Figure 4. Final model

Click the Build tab in the Inspector and then the Blocks section to expand it.

Click the Input block in the Inspector. This will add an Input block to the Modeling canvas, and the Information center pop-up will appear with error messages. But don’t worry these error messages are very descriptive and easy to solve.

Navigate to the Blocks tab in the Inspector and set the Feature to spectrogram. First error message fixed.

Now we will stack five fully convolutional layers after each other.

tut5b 7
  1. Add a 2D Convolution block.

    • Filters: 128

    • Activation: Linear

      This block is used to detect spatial features in an image.

      In this particular case we want to set the activation function outside the 2D Convolution block, i.e., in a separate Activation block. Setting Activation function to Linear is equivalent to no activation function at all.

  2. Add a Batch normalization block.
    This normalizes all input features to a similar range of values which will speed up learning.

  3. Add Activation.
    Set Activation to ReLU (rectified linear unit).
    An activation function is used to determine the output of the neural network, for example, yes or no.
    A ReLU activation function has output 0 if the input is less than 0, and the same output as input if the input is greater than 0.

  4. Add a 2D Max pooling block.

    • Horizontal stride: 4

    • Vertical stride: 2
      This layer reduces the size of the data. You can say that 2D max pooling is similar to scaling down the size of an image.

  5. Add a Dropout block.

    • Rate: 0.5
      This block prevents overfitting by setting random weights to 0.

Add 2nd network layer

tut5b 8

Copy paste the 1st convolutional network layer, but set filters to 256 in the 2D Convolution block.

Connect the second network to the first.

Add 3rd network layer

tut5b 10

Copy paste the 1st convolutional network layer again, but set:

Filters to 512 in the 2D Convolution block.

In the 2D Max pooling block:

  • Horizontal stride: 2.

  • Vertical stride: 1.

Connect the third network to the second.

Add 4th network layer


Copy paste the 1st convolutional network layer a third time, but set:

  • Filters to 1024 in the 2D Convolution block

In the 2D Max pooling block:

  • Horizontal stride: 4

  • Vertical stride: 4

Connect the fourth network to the third.

Add 5th network layer

tut5b 12

Copy paste the 1st convolutional network layer a fourth time, but set:

Filters to 2048 in the 2D Convolution block

In the 2D Max pooling block:

  • Horizontal stride: 2

  • Vertical stride: 2

Connect the fifth network to the fourth.

Last part of the model

tut5b 13

Add the following blocks:

  1. Flatten:
    This block flattens the output of the last convolutional layer to a vector. You do this to give the following Dense block the right input size.

  2. Dense:
    This is a densely connected neural network layer.
    Set Activation to ReLU activation, that is the positive part of the function’s argument. The activation defines the output of that node given the input.

  3. Dense:

    • Nodes: 5. One node for each mood (Angry, Countryside, Dark, Epic, and Happy)

    • Activation: Sigmoid. The sigmoid activation function squashes the values to a range between 0 and 1.

  4. Target block:

    • Feature: Moods

    • Loss: Binary crossentropy

Binary crossentropy will output a score for each tag for each song. You can use this score to tag a song, for example, should it have the tag "Happy," "Dark" or "Angry."

Run experiment

The experiment is done and ready to be trained. In the Inspector, click Settings tab and change the batch size to 16.

Click Run to start to train the model.

Navigate to the Evaluation view. As the training of the model advances, epoch by epoch, the training and validation performance metrics are visualized in the evaluation graphs. It’s a large experiment so it will take some time.

Analyze experiment

When analyzing an experiment we are looking for, among other things, "overfitting." When it’s almost as if the model memorizes the training data and then can’t figure out how to tag a new song when provided. We do not want the lines to grow further apart as time goes on.

Improve experiment

Now it’s time to find out if you can improve the model. Try to duplicate the experiment and then add blocks and change settings in the model. As long as you keep the same loss function (in this case binary crossentropy) you can compare the models' result and see which one is the best in the Evaluation view.

If there is a large discrepancy between training and validation losses, try to introduce Dropout and or Batch Normalization blocks to improve generalization. If the training loss is very high, the model is not learning well enough.

Last iteration

In this tutorial we’ve kept things small just to test out our ideas, we’ve only used 5 out of 46 moods plus used the 8%-training and 2%-validation subsets. Now when you’re satisfied with your model you should train it on the complete dataset with all moods to see what happens. This will take much longer time though since we’re using all data.

To do this navigate to the Datasets view and create a new version of the dataset. In this version create a new feature set with all moods. Make sure you set Normalize on subset to the subset Training (80%).

Then duplicate your favorite model in the Modeling view, select 80% training subset and 20% validation. Set the target feature set to the one that includes all moods and change the number of nodes in the last Dense block to 46. Then run the experiment and watch it train in the Evaluation view.

Tutorial recap

It took time, but you’ve just built something fundamentally cool. Think of other dense data types you can represent as a mel spectrogram. Or other datasets that you can now classify into multiple classes. Achievement unlocked!