In this short tutorial, we will participate in the Freesound Audio Tagging 2019 Kaggle competition. The same case was also Task 2 in the DCASE2019 Challenge. To use the dataset tied to the competition, we encourage you to sign up on Kaggle, read through the competition rules and accept them.
Freesound Audio Tagging 2019 is an update from the previous year's audio tagging competition held by Freesound (MTG — Universitat Pompeu Fabra) and Google’s Machine Perception. The 2019 version is a multi-label audio tagging task, with twice the number of audio categories. The annotations in the FSDKaggle2019 dataset were gathered through the Freesound Annotator.
The competition is based on Eduardo Fonseca et al. papers on:
A preprocessed version of the dataset, ready to use on the Peltarion Platform, is available here.
Click dataset.zip to download the dataset.
The audio files have been converted into spectrograms and saved as NumPy files together with an index.csv file containing the corresponding classes per file.
On the platform landing page , click New Project and name your project, e.g., Project v1.
Go to the Datasets view and click New dataset. Add the dowloaded zip-file to the Upload files tab.
Click Next, name the dataset and click Done.
Since you will be solving a multi-label classification problem, you will need to create a feature to group the 80 sound categories.
Click New feature and then select all features except fname. Instead of adding the features one-by-one, click the checkbox next to the search field and then deselect fname.
Name the feature set Labels and then click Create.
Click Save version.
Go to the Modeling view and then click New experiment. Name the model and click Create.
In this tutorial, we will treat the audio files like pictures called spectrograms and perform image classification.
Since we’re performing classification on sound data viewed as pictures, we can use well-performing convolutional neural networks such as ResNet, DenseNet, or Inception v4. For this tutorial you will use ResNetv2 large 50.
Go to the Evaluation view.
The metrics you should be interested in are Precision and Recall. There is a great Medium article if you want to learn more about them.
Your model is now trained and ready to be downloaded.
If you trained multiple models, download the one with the best precision.
The model is now ready to ship!
If you don’t already have a Kaggle account, you can create one here. Just follow the instructions.
Once your account is ready, join the Audio Tagging 2019. The deadline for the competition has passed but you may still send a late submission by clicking the button in the top right corner.
Even though the competition doesn’t allow for transfer learning, it can be useful to test the power of transfer learning on the platform. Transfer learning takes advantage from image properties learned from huge datasets like ImageNet or CIFAR-100 and then fine-tunes the neural network for specific image recognition problem. In our case, we would fine-tune it with images of audio files.
The Peltarion platform supports VGG16 with weights trained on ImageNet.
You can try to get much better results with smaller dataset when using transfer learning.
Stay in the know by signing up for occasional emails with tips, tricks, deep learning insights, product updates, event news and webinar invitations.
We promise not to spam you or share your email with any third party. You can change your preferences at any time. See our privacy policies.
Please check your email inbox account to confirm, set, or update your communication preferences.