New to Peltarion? Discover our deep learning Platform

A single deep learning platform to build and deploy your projects, even if you’re not an AI superstar.


Kaggle competition with zero code

Working with exported models

Getting started with Kaggle competitions can be very complicated without previous experience and in-depth knowledge of at least one of the common deep learning frameworks like TensorFlow or PyTorch. In this tutorial, we’ll explore the opportunity to participate in a Kaggle competition.

Target group: Intermediate users

tut9 1

Freesound Audio Tagging 2019

In this short tutorial, we will participate in the Freesound Audio Tagging 2019 Kaggle competition. The same case was also Task 2 in the DCASE2019 Challenge. To use the dataset tied to the competition, we encourage you to sign up on Kaggle, read through the competition rules and accept them.

Freesound Audio Tagging 2019 is an update from the previous year’s audio tagging competition held by Freesound (MTG — Universitat Pompeu Fabra) and Google’s Machine Perception. The 2019 version is a multi-label audio tagging task, with twice the number of audio categories. The annotations in the FSDKaggle2019 dataset were gathered through the Freesound Annotator.

The competition is based on Eduardo Fonseca et al. papers on:

Preprocessed data

The dataset used in the competition contains audio files in the good old wave format. That’s not the easiest format of data to work with in deep learning. However, there are a lot of very good models that can process images, so we’ll try to use that instead.

To transform sound into images, we computed a spectrogram for each sound file. If you’d like to see the Python code we used to do that, check out our Kaggle notebook here.

Otherwise, keep reading and we’ll show you how to import the spectrograms computed straight into your project, using our Data library.

tut9 2
Figure 1. Audio as a picture

Create a new project

On the platform landing page, click New Project and name your project, e.g., Project v1.

Add dataset to the platform

You should be taken to the Datasets view, where you can import data.

Click the Data library button and look for the Freesound Audio Tagging - tutorial data dataset in the list. Click on it to get more information about the dataset.

If you agree with the license, click Accept and import.

Create a feature set for labels

Since you will be solving a multi-label classification problem, you will need to create a feature to group the 80 sound categories.

Click New feature and then select all features except fname. Instead of adding the features one-by-one, click the checkbox next to the search field and then deselect fname.

Name the feature set Labels and then click Create.

tut9 3
Figure 2. New feature set

Click Save version and then Use in new experiment.

Create a deep learning experiment

Name the model in the Experiment wizard and make sure your dataset is selected.

Click Create blank experiment.

In this tutorial, we will treat the audio files like pictures called spectrograms and perform image classification.

Since we’re performing classification on sound data viewed as pictures, we can use well-performing convolutional neural networks such as ResNet, DenseNet, or Inception v4. For this tutorial you will use ResNetv2 large 50.

  1. In the Build tab expand the Blocks section.

  2. Add an Input and set Feature to fname.

  3. Add a Batch normalization block. Batch normalization is done to speed up training.

  4. Add a Reshape block set Target shape to (256,256,1). This will add an additional channel axis required by ResNetv2 large 50.

  5. Expand Snippets in the Inspector and select ResNetv2 large 50. A dialog will appear where you can choose weight initialization. Select Random.

  6. Click the Input block above the added snippet and then press backspace to delete it.

  7. Connect the Reshape block to the first block in the ResNetv2 large 50 group. To connect the two blocks, you must first expand ResNetv2 large 50 by clicking + and then expand the ResNet layer 1 group.

  8. Change Activation in the Dense block after the Feature extraction group to ReLU. This is a very good activation function in most cases.

  9. Add another Dense block before the Target block:

    • Nodes: 80

    • Activation: Sigmoid

  10. Select the Target block and set Feature to Labels and Loss to Binary crossentropy.

Run experiment to train the model

Go to the Settings tab on the right and set Batch size to 16 and Epochs to 30. Use default settings for the other parameters, e.g., Adam optimizer. A batch size of 16 will fit in the GPU memory, 30 epochs should be enough for the model to converge, and Adam is a good standard optimizer.

Click Run to start the training.

Analyze experiment

Go to the Evaluation view.

tut9 4

The metrics you should be interested in are Precision and Recall. There is a great Medium article if you want to learn more about them.

Download model

Your model is now trained and ready to be downloaded.

  1. Click the Experiment options menu next to your experiment.

  2. In the dropdown menu, click Download and the Download model popup will open. Click OK.

  3. The download of the model as an H5 file starts with a slight delay.

If you trained multiple models, download the one with the best precision.

The model is now ready to ship!

Getting started with Kaggle account

If you don’t already have a Kaggle account, you can create one here. Just follow the instructions.

Once your account is ready, join the Audio Tagging 2019. The deadline for the competition has passed but you may still send a late submission by clicking the button in the top right corner.

Submitting predictions

  1. Click New Kernel. A popup window will appear and you will be prompted to select kernel type.

  2. Select Script and a new Kaggle kernel will open.

  3. Add your downloaded model H5 file as a dataset.

  4. Remove the default code in the kernel.

  5. The path in the sidebar may not reflect the actual path to the file. To find the correct path add the following code:

!find ../input -name '*.h5'
  1. Click the Run button and the correct path will appear in console. You may have to scroll down to see it.

  2. Copy-paste the following code into the kernel:

import numpy as np
import pandas as pd
import librosa as lr
import tensorflow as tf
from tqdm import tqdm

model = tf.keras.models.load_model('../input/freesound-audio-tagging-2019-model/resnet50.h5', compile=False) ##Change
df = pd.read_csv('../input/freesound-audio-tagging-2019/sample_submission.csv', index_col='fname') ##Change

def preprocess(wavfile):

# Load roughly 8 seconds of audio.
samples = 512*256 - 1
samplerate = 16000
waveform = lr.load(wavfile, samplerate, duration=samples/samplerate)[0]
# Loop too short audio clips.
if len(waveform) < samples:
waveform = np.pad(waveform, (0, samples - len(waveform)), mode='wrap')

# Convert audio to log-mel spectrogram.
spectrogram = lr.feature.melspectrogram(waveform, samplerate, n_mels=256)
spectrogram = lr.power_to_db(spectrogram)
spectrogram = spectrogram.astype(np.float32)

return spectrogram

for fname, scores in tqdm(df.iterrows(), total=len(df), desc='Predicting'):
spectrogram = preprocess('../input/freesound-audio-tagging-2019/test/' + fname)
scores = model.predict_on_batch(spectrogram[None, ...])[0]
df.loc[fname] = scores

  1. Replace the path in the model variable with path that you copied in a previous step.

  2. To run the script and make sure that everything works, highlight all code and click the Run button. You don’t have wait for the script to get all the predictions and you may click the Stop button at any time.

  3. Click Commit. Kaggle kernel will check for errors and make predictions with your model. It then sends submission.csv to the competition.

Further work — Transfer learning

Even though the competition doesn’t allow for transfer learning, it can be useful to test the power of transfer learning on the platform.

Transfer learning takes advantage from image properties learned from huge datasets like ImageNet or CIFAR-100 and then fine-tunes the neural network for specific image recognition problem. In our case, we would fine-tune it with images of audio files.

The Peltarion platform supports snippets with weights trained on ImageNet.

You can try to get much better results with smaller dataset when using transfer learning.