Multi-label image classification / cheat sheet

Target audience: Data scientists and developers

Problem formulation

Use this cheat sheet:
If your input data consists of a set of images (.jpg or .png), where each image can be said to contain or not contain multiple attributes. This is called multi-label classification.

Example use cases

Please note that data sets, models and other content, including open source software, (collectively referred to as "Content") provided and/or suggested by Peltarion for use in the Platform, may be subject to separate third party terms of use or license terms. You are solely responsible for complying with the applicable terms. Peltarion makes no representations or warranties about Content and specifically disclaim all responsibility for any liability, loss, or risk, which is incurred as a consequence, directly or indirectly, of the use or application of any of the Content.

Smiling face PA2
Label: Smiling, Side part haircut

Images of faces annotated with information such as whether the subject is wearing eyeglasses, is smiling, or has a certain hairstyle. Here’s an example.

Lubeck river church PA2
Label: Water, Church

Images of landscapes, annotated with information such as whether they contain water, people, mountains, etc.

100 Days Of Sunshine 1 cut square PA1
Label: Happy, Dreamy

Predicting the moods of a song. This can be done by
(What? Yes, you can, by looking at spectrograms. Check out this tutorial.

Data preparation

Data input requirements

Prepare a zip file containing all your images and a corresponding index.csv.
The Peltarion Platform supports .jpg or .png.

index csv multi PA1

Structure of index.csv

You need to structure the data so that each of the classes/labels is represented with an integer in its column in the index.csv file. Each row contains a 1 if the label is present in the image and a 0 if it is not.

image Label_1 Label_2 Label_3 Label_4











Use same sized images

All of your images need to be the same size. If they are different sizes, you will need to resize them before creating the zip file.

Most of the best-performing deep learning models for images were constructed based on images sized 256x256 or 224x224. If your images are bigger than that, we recommend resizing your images to around this size to maximize the likelihood of getting good performance on your data.

Create a feature set in the Datasets view

Once you’ve uploaded the dataset to the platform, create a feature set with all the label columns. Use this feature set as the target in the your deep learning model.


Snippets are pre-built neural network architectures available on the platform.

Image size Recommended snippet

Between 10x10 and 96x96 pixels

ResNetV2 Small

Between 96x96 and 320x320 pixels in size

ResNetV2 Large

Above 320x320

We recommend you to resize them to max 320x320.

Try the smallest depth model first

Try the smallest depth model first, since it will be faster to train and may already be complex enough to model your data well.

If the results are not good enough, you can move toward increasingly deep models in later experiments.

The second number next to the snippet name represents the depth of the model; the deeper the model, the more complex it is.

Changes in the Modeling view

On the last Dense block in the snippet change the number of nodes to the number of labels in your data, and set the Activation to Sigmoid. We choose sigmoid because it allows the model to output a number between 0 and 1 for each label independently. This number indicates the probability that the corresponding attribute is present in the image.

Make sure that the loss function in the Target block is Binary crossentropy.

Last blocks Multi label image classificationPA1