Participating in a Kaggle competition with zero code

Working with exported models

Getting started with Kaggle competitions can be very complicated without previous experience and in-depth knowledge of at least one of the common deep learning frameworks like TensorFlow or PyTorch. In this tutorial, we’ll explore the opportunity to participate in a Kaggle competition.

Freesound Audio Tagging 2019

In this short tutorial, we will participate in the Freesound Audio Tagging 2019 Kaggle competition. The same case was also Task 2 in the DCASE2019 Challenge. To use the dataset tied to the competition, we encourage you to sign up on Kaggle, read through the competition rules and accept them.

Freesound Audio Tagging 2019 is an update from the previous year's audio tagging competition held by Freesound (MTG — Universitat Pompeu Fabra) and Google’s Machine Perception. The 2019 version is a multi-label audio tagging task, with twice the number of audio categories. The annotations in the FSDKaggle2019 dataset were gathered through the Freesound Annotator.

The competition is based on Eduardo Fonseca et al. papers on:

Download the dataset

A preprocessed version of the dataset, ready to use on the Peltarion Platform, is available here.

Dataset download page in Kaggle

Click dataset.zip to download the dataset.

The audio files have been converted into spectrograms and saved as NumPy files together with an index.csv file containing the corresponding classes per file.

Create a new project

On the platform landing page , click New Project and name your project, e.g., Project v1.

Add the dataset

Go to the Datasets view and click New dataset. Add the dowloaded zip-file to the Upload files tab.

Click Next, name the dataset and click Done.

Create a feature set for labels

Since you will be solving a multi-label classification problem, you will need to create a feature to group the 80 sound categories.

Click New feature and then select all features except fname. Instead of adding the features one-by-one, click the checkbox next to the search field and then deselect fname.

Name the feature set Labels and then click Create.

New feature set

Click Save version.

Create a deep learning experiment

Go to the Modeling view and then click New experiment. Name the model and click Create.

In this tutorial, we will treat the audio files like pictures called spectrograms and perform image classification.

Audio as a picture

Since we’re performing classification on sound data viewed as pictures, we can use well-performing convolutional neural networks such as ResNet, DenseNet, or Inception v4. For this tutorial you will use ResNetv2 large 50.

  1. In the Build tab expand the Blocks section.
  2. Add an Input and set Feature to fname.
  3. Add a Batch normalization block.Batch normalization is done to speed up training.
  4. Add a Reshape block set Target shape to (256,256,1). This will add an additional channel axis required by ResNetv2 large 50.
  5. Expand Snippets section in the Inspector and select ResNetv2 large 50.
  6. Click the Input block above the added snippet and then press backspace to delete it.
  7. Connect the Batch normalization block to the ResNetv2 large 50 group.
  8. Change Activation in the Dense block after the Feature extraction group to ReLU. This is a very good activation function in most cases.
  9. Add another Dense block before the Target block:

    Nodes: 80

    ActivationSigmoid
  10. Select the Target block and set Feature to Labels and Loss to Binary crossentropy.

Run experiment to train the model

  1. Go to the Settings tab on the right and set Batch size to 16 and Epochs to 30. Use default settings for the other parameters, e.g., Adam optimizer. A batch size of 16 will fit in the GPU memory, 30 epochs should be enough for the model to converge, and Adam is a good standard optimizer.
  2. Click Run to start the training.

Analyse experiment

Go to the Evaluation view.

Evaluation view

The metrics you should be interested in are Precision and Recall. There is a great Medium article if you want to learn more about them.

Download model

Your model is now trained and ready to be downloaded.

  1. Click the Experiment options menu next to your experiment.
  2. In the dropdown menu, click Download and the Download model popup will open. Click OK.
  3. The download of the model as an H5 file starts with a slight delay.

If you trained multiple models, download the one with the best precision.

The model is now ready to ship!

Getting started with Kaggle account

If you don’t already have a Kaggle account, you can create one here. Just follow the instructions.

Once your account is ready, join the Audio Tagging 2019. The deadline for the competition has passed but you may still send a late submission by clicking the button in the top right corner.

Submitting predictions

  1. Click New Kernel. A popup window will appear and you will be prompted to select kernel type.
  2. Select Script and a new Kaggle kernel will open.
  3. Add your downloaded model H5 file as a dataset.
  4. Remove the default code in the kernel.
  5. The path in the sidebar may not reflect the actual path to the file. To find the correct path add the following code:
    !find ../input -name '*.h5'
  6. Click the Run button and the correct path will appear in console. You may have to scroll down to see it.
  7. Copy-paste the following code into the kernel:

    import numpy as np
    import pandas as pd
    import librosa as lr
    import tensorflow as tf
    from tqdm import tqdm

    model = tf.keras.models.load_model('../input/freesound-audio-tagging-2019-model/resnet50.h5', compile=False) ##Change
    df = pd.read_csv('../input/freesound-audio-tagging-2019/sample_submission.csv', index_col='fname') ##Change


    def preprocess(wavfile):

    # Load roughly 8 seconds of audio.
    samples = 512*256 - 1
    samplerate = 16000
    waveform = lr.load(wavfile, samplerate, duration=samples/samplerate)[0]
    # Loop too short audio clips.
    if len(waveform) < samples:
    waveform = np.pad(waveform, (0, samples - len(waveform)), mode='wrap')

    # Convert audio to log-mel spectrogram.
    spectrogram = lr.feature.melspectrogram(waveform, samplerate, n_mels=256)
    spectrogram = lr.power_to_db(spectrogram)
    spectrogram = spectrogram.astype(np.float32)

    return spectrogram


    for fname, scores in tqdm(df.iterrows(), total=len(df), desc='Predicting'):
    spectrogram = preprocess('../input/freesound-audio-tagging-2019/test/' + fname)
    scores = model.predict_on_batch(spectrogram[None, ...])[0]
    df.loc[fname] = scores

    df.to_csv('submission.csv')
  8. Replace the path in the model variable with path that you copied in a previous step.
  9. To run the script and make sure that everything works, highlight all code and click the Run button. You don't have wait for the script to get all the predictions and you may click the Stop button at any time.
  10. Click Commit. Kaggle kernel will check for errors and make predictions with your model. It then sends submission.csv to the competition.

Further work  —  Transfer learning

Even though the competition doesn’t allow for transfer learning, it can be useful to test the power of transfer learning on the platform. Transfer learning takes advantage from image properties learned from huge datasets like ImageNet or CIFAR-100 and then fine-tunes the neural network for specific image recognition problem. In our case, we would fine-tune it with images of audio files.

The Peltarion platform supports VGG16 with weights trained on ImageNet. 

You can try to get much better results with smaller dataset when using transfer learning.

Get started for free