New to Peltarion? Discover our deep learning Platform
A single deep learning platform to build and deploy your projects, even if you’re not an AI superstar.FIND OUT MORE
Kaggle competition with zero code
Working with exported models
Getting started with Kaggle competitions can be very complicated without previous experience and in-depth knowledge of at least one of the common deep learning frameworks like TensorFlow or PyTorch. In this tutorial, we’ll explore the opportunity to participate in a Kaggle competition.
Target group: Intermediate users
Freesound Audio Tagging 2019
In this short tutorial, we will participate in the Freesound Audio Tagging 2019 Kaggle competition. The same case was also Task 2 in the DCASE2019 Challenge. To use the dataset tied to the competition, we encourage you to sign up on Kaggle, read through the competition rules and accept them.
Freesound Audio Tagging 2019 is an update from the previous year’s audio tagging competition held by Freesound (MTG — Universitat Pompeu Fabra) and Google’s Machine Perception. The 2019 version is a multi-label audio tagging task, with twice the number of audio categories. The annotations in the FSDKaggle2019 dataset were gathered through the Freesound Annotator.
The competition is based on Eduardo Fonseca et al. papers on:
The dataset used in the competition contains audio files in the good old wave format. That’s not the easiest format of data to work with in deep learning. However, there are a lot of very good models that can process images, so we’ll try to use that instead.
Otherwise, keep reading and we’ll show you how to import the spectrograms computed straight into your project, using our Data library.
Create a new project
On the platform landing page, click New Project and name your project, e.g., Project v1.
Add dataset to the platform
You should be taken to the Datasets view, where you can import data.
Click the Data library button and look for the Freesound Audio Tagging - tutorial data dataset in the list. Click on it to get more information about the dataset.
If you agree with the license, click Accept and import.
Create a feature set for labels
Since you will be solving a multi-label classification problem, you will need to create a feature to group the 80 sound categories.
Click New feature and then select all features except fname. Instead of adding the features one-by-one, click the checkbox next to the search field and then deselect fname.
Name the feature set Labels and then click Create.
Click Save version and then Use in new experiment.
Create a deep learning experiment
Name the model in the Experiment wizard and make sure your dataset is selected.
Click Create blank experiment.
In this tutorial, we will treat the audio files like pictures called spectrograms and perform image classification.
Since we’re performing classification on sound data viewed as pictures, we can use well-performing convolutional neural networks such as ResNet, DenseNet, or Inception v4. For this tutorial you will use ResNetv2 large 50.
In the Build tab expand the Blocks section.
Add an Input and set Feature to fname.
Add a Batch normalization block. Batch normalization is done to speed up training.
Add a Reshape block set Target shape to (256,256,1). This will add an additional channel axis required by ResNetv2 large 50.
Expand Snippets in the Inspector and select ResNetv2 large 50. A dialog will appear where you can choose weight initialization. Select Random.
Click the Input block above the added snippet and then press backspace to delete it.
Connect the Reshape block to the first block in the ResNetv2 large 50 group. To connect the two blocks, you must first expand ResNetv2 large 50 by clicking + and then expand the ResNet layer 1 group.
Change Activation in the Dense block after the Feature extraction group to ReLU. This is a very good activation function in most cases.
Add another Dense block before the Target block:
Select the Target block and set Feature to Labels and Loss to Binary crossentropy.
Run experiment to train the model
Go to the Settings tab on the right and set Batch size to 16 and Epochs to 30. Use default settings for the other parameters, e.g., Adam optimizer. A batch size of 16 will fit in the GPU memory, 30 epochs should be enough for the model to converge, and Adam is a good standard optimizer.
Click Run to start the training.
Go to the Evaluation view.
The metrics you should be interested in are Precision and Recall. There is a great Medium article if you want to learn more about them.
Your model is now trained and ready to be downloaded.
Click the Experiment options menu next to your experiment.
In the dropdown menu, click Download and the Download model popup will open. Click OK.
The download of the model as an H5 file starts with a slight delay.
If you trained multiple models, download the one with the best precision.
The model is now ready to ship!
Getting started with Kaggle account
Click New Kernel. A popup window will appear and you will be prompted to select kernel type.
Select Script and a new Kaggle kernel will open.
Add your downloaded model H5 file as a dataset.
Remove the default code in the kernel.
The path in the sidebar may not reflect the actual path to the file. To find the correct path add the following code:
!find ../input -name '*.h5'
Click the Run button and the correct path will appear in console. You may have to scroll down to see it.
Copy-paste the following code into the kernel:
import numpy as np import pandas as pd import librosa as lr import tensorflow as tf from tqdm import tqdm model = tf.keras.models.load_model('../input/freesound-audio-tagging-2019-model/resnet50.h5', compile=False) ##Change df = pd.read_csv('../input/freesound-audio-tagging-2019/sample_submission.csv', index_col='fname') ##Change def preprocess(wavfile): # Load roughly 8 seconds of audio. samples = 512*256 - 1 samplerate = 16000 waveform = lr.load(wavfile, samplerate, duration=samples/samplerate) # Loop too short audio clips. if len(waveform) < samples: waveform = np.pad(waveform, (0, samples - len(waveform)), mode='wrap') # Convert audio to log-mel spectrogram. spectrogram = lr.feature.melspectrogram(waveform, samplerate, n_mels=256) spectrogram = lr.power_to_db(spectrogram) spectrogram = spectrogram.astype(np.float32) return spectrogram for fname, scores in tqdm(df.iterrows(), total=len(df), desc='Predicting'): spectrogram = preprocess('../input/freesound-audio-tagging-2019/test/' + fname) scores = model.predict_on_batch(spectrogram[None, ...]) df.loc[fname] = scores df.to_csv('submission.csv')
Replace the path in the model variable with path that you copied in a previous step.
To run the script and make sure that everything works, highlight all code and click the Run button. You don’t have wait for the script to get all the predictions and you may click the Stop button at any time.
Click Commit. Kaggle kernel will check for errors and make predictions with your model. It then sends submission.csv to the competition.
Further work — Transfer learning
Even though the competition doesn’t allow for transfer learning, it can be useful to test the power of transfer learning on the platform.
Transfer learning takes advantage from image properties learned from huge datasets like ImageNet or CIFAR-100 and then fine-tunes the neural network for specific image recognition problem. In our case, we would fine-tune it with images of audio files.
The Peltarion platform supports snippets with weights trained on ImageNet.
You can try to get much better results with smaller dataset when using transfer learning.