Target audience: Data scientists and Developers
Preread: This tutorial is based on an adaptation of FCN-6, a content-based automatic music tagging algorithm using fully convolutional neural networks, from the paper AUTOMATIC TAGGING USING DEEP CONVOLUTIONAL NEURAL NETWORKS. If you want to you can dig deep into that before you dig into this tutorial.
Can you figure out the beat, feeling and mood of a song by "looking" at its signature? We're going to find out! In this case we're converting music file segments to "log scaled mel spectrograms". For great detail on how to do this yourself, check out our GitHub repo with the Jupyter Notebook ready to go: https://github.com/Peltarion/community-code.
All the spectrograms are tagged with the songs' moods. For example, one song in the dataset, I'm Your Ride (Instrumental Version), is tagged with "Happy". Listen here to see if you agree:
If our model can predict and then tag a song, it could be used for a number of use cases such as:
One of the biggest challenges with machine learning is the quality of the input data. Quite often it’s not good enough. In this tutorial, the ground truth, that is the labels for the data, comes from a manual assignment of moods, based partially on subjective opinions. This will make it difficult for the model to identify consistent patterns in the training data.
For example: Did you agree that the song I'm Your Ride (Instrumental Version) really is "Happy"? What about "Hopeful"? These tags were created by hand, by different people. Now imagine you are a data scientist working for this music company and your goal is to improve consistency of search results when searching by mood.
First, create a project and name it so you know what kind of project it is. Naming is important.
A project combines all of the steps in solving a problem, from pre-processing of datasets to model building, evaluation, and deployment. Using projects makes it easy to collaborate with others.
Please note that, by working with this dataset you accept the author's license in the Dataset licenses section of the Knowledge center.
When you have downloaded the dataset, navigate to the Datasets view and click New dataset. Add the dowloaded zip-file to the Upload files tab.
You can also import the dataset without downloading it. To do this, copy the link to the dataset and paste that link in the Import files tab.
It may take some time to import the spectrogram file since it’s so large, but when all files are imported click Next, name the dataset and click Done.
All samples in the dataset is by default split into 20% validation and 80% training subsets. Use the defaults to train a real working tagger. However, since this dataset is so large and, in real life, you may want to test our model first, we will only use 8% of the dataset for training and 2% for validation.
Click on New feature set, name the feature set Moods and select all the moods for a real working tagger. Again however, as a first quick stab at this problem lets choose only these five moods:
Click Create to create the new feature set.
You’ve now created a dataset ready to be used in the platform. Click Save version and navigate to the Modeling view.
Now it’s time to build a model. We’ll build an adaptation of FCN-6, a content-based automatic music tagging algorithm using fully convolutional neural networks, from the paper AUTOMATIC TAGGING USING DEEP CONVOLUTIONAL NEURAL NETWORKS.
In the Modeling view create a new experiment.
Navigate to the Settings tab in the Inspector. Since this is the testrun of our model select select the 8%-training and 2%-validation subsets in the Dataset section of your new experiment,
Tip! Use the zooming tools if the model doesn't fit the Modeling canvas. You'll find more navigation tips in the topic Modeling canvas controls.
This is the model you are going to build:
Click the Build tab in the Inspector and then the Blocks section to expand it.
Click the Input block in the Inspector. This will add an Input block to the Modeling canvas, and the Information center pop-up will appear with error messages. But don't worry these error messages are very descriptive and easy to solve.
Navigate to the Blocks tab in the Inspector and set the Feature to spectrogram. First error message fixed.
Now we will stack five fully convolutional layers after each other.
Copy paste the 1st convolutional network layer, but set:
Connect the second network to the first.
Copy paste the 1st convolutional network layer again, but set:
Connect the third network to the second.
Copy paste the 1st convolutional network layer a third time, but set:
Connect the fourth network to the third.
Copy paste the 1st convolutional network layer a fourth time, but set:
Connect the fifth network to the fourth.
Add the following blocks:
The experiment is done and ready to be trained. In the Inspector, click Settings tab and change the batch size to 16.
Click Run to start to train the model.
Navigate to the Evaluation view. As the training of the model advances, epoch by epoch, the training and validation performance metrics are visualized in the evaluation graphs. It's a large experiment so it will take some time.
When analyzing an experiment we are looking for, among other things, "overfitting". When it's almost as if the model memorizes the training data and then can't figure out how to tag a new song when provided. We do not want the lines to grow further apart as time goes on.
Now it's time to find out if you can improve the model. Try to duplicate the experiment and then add blocks and change settings in the model. As long as you keep the same loss function (in this case binary crossentropy) you can compare the models' result and see which one is the best in the Evaluation view.
If there is a large discrepancy between training and validation losses, try to introduce Dropout and or Batch Normalization blocks to improve generalization. If the training loss is very high, the model is not learning well enough.
In this tutorial we’ve kept things small just to test out our ideas, we’ve only used 5 out of 46 moods plus used the 8%-training and 2%-validation subsets. Now when you’re satisfied with your model you should train it on the complete dataset with all moods to see what happens. This will take much longer time though since we’re using all data.
To do this navigate to the Dataset view and create a new version of the dataset. In this version create a new feature set with all moods. Make sure you set Normalize on subset to the subset Training (80%).
Then duplicate your favorite model in the Modeling view, select 80% training subset and 20% validation. Set the target feature set to the one that includes all moods and change the number of nodes in the last Dense block to 46. Then run the experiment and watch it train in the Evaluation view.
It took time, but you've just build something fundamentally cool. Think of other dense data types you can represent as a mel spectrogram. Or other datasets that you can now classify into multiple classes. Achievement unlocked!
Stay in the know by signing up for occasional emails with tips, tricks, deep learning insights, product updates, event news and webinar invitations.
We promise not to spam you or share your email with any third party. You can change your preferences at any time. See our privacy policies.
Please check your email inbox account to confirm, set, or update your communication preferences.