Tutorial - Classifying images of clothes

How to solve a classification problem

This tutorial will show you how to build an image classifier, taking you through creating the typical building blocks of a convolution neural network for a 28 x 28 pixels imageset with 10 classes. Taking the input dataset, establishing the convolution layer, pooling/downsampling, flatten and classifying.

Target audience:  Data scientists and Developers

Preread: We suggest that you start your deep dive into the Peltarion Platform with the Deploy an operational AI model tutorial. The link below is a proposed pre-read material for those unfamiliar with CNNs and how they work. It does not endeavor to be an exhaustive learning reference. A Beginner's Guide To Understanding Convolutional Neural Networks

The problem

McKinsey Global Institute identifies the retail industry as the sector likely to create the most annual value from AI techniques (a potential $600bn). They state in 2017 “AI technologies could eliminate many levels of manual activities in areas such as promotions, assortments, and supply chain. AI will enable retailers to increase both the number of customers and the average amount they spend by creating personal and convenient shopping experiences.

Deep learning image classification has many applications in the retail industry and will drive much of this predicted value, for example, reducing errors in supply chain management (accurate inventory/catalog management by automatically identifying items from photo) or as a component in visual search from user-generated content (a customer uses a photo taken on their mobile device to locate or search for similar items in a shopping catalog).

Using a convolutional neural network

Image classification in deep learning is often implemented using a technique called a convolutional neural network (CNN). In this tutorial, we will be building a CNN that will be trained on thousands of images of fashion items (e.g., clothing, accessories, shoes) that will create a model that can be used to predict for a given image what type of fashion item it is.

Although the CNN-model, in this case, will identify fashion items, it can be trained on any class you require. For example, in healthcare, this technique could take a brain scan as input and predict if it contains a tumor or it could be used to classify a personal photo library, much as Apple or Google do with their photo applications.

The data

You will use the Fashion MNIST dataset which is an image classification dataset that consists of small, 28 x 28 pixels, images of clothing or accessories such as shirts, bags, shoes, and other fashion items. Each image is annotated with a label indicating the correct garment. The images come from Zalando and consist of a training set of 60,000 examples and a test set of 10,000 examples.

Goals with the experiment

In this tutorial, it’s validation accuracy that matters. Accuracy is how often you predict the right answer, or actually, the formula is:

For example, if the number of test samples is 1000 and the model classifies 952 of those correctly, then the model's accuracy is 95.2%.

We’ll set our goal to 0.9, i.e., correct prediction 9 out of 10 times. We can note that the “world record” is 0.967. For more benchmark, check Zalando’s collected benchmark classifiers here: Fashion MNIST benchmark.

Create project

First, create a project and name it, so you know what kind of project it is. Naming is important.

A project combines all of the steps in solving a problem, from pre-processing of datasets to model building, evaluation and deployment.

Add and manage dataset

  1. Navigate to the Datasets view and click New dataset.
  2. Click on the Import files tab.
  3. Copy and paste the link below and click Import.

    The zip includes the whole Fashion MNIST dataset.

    Link to zip: https://storage.googleapis.com/bucket-8732/fashion.zip
  4. When done click Next, name the dataset FashionMNIST and click Done.
The Dataset view

Subsets of the Fashion MNIST dataset

Keep the default subsets

As you can see in the Inspector on the right, the dataset is by default split into two subsets: 80% for training and 20% for validation. That means we'll use 80% of the set to train our model and the remaining as 'not seen before' examples to see how well the training is progressing. The percentage of correctly guessed labels on the validation set will produce the accuracy number we're after. Use the default subsets for this tutorial.

Save the dataset

You’ve now created a dataset ready to be used in the platform. Click Save version. Saving this version will lock and version it and allow you to build a model with it. Navigate to the Modeling view.

Note!
Do not forget to save this version of the dataset. You won’t be able to use it in the Modeling view otherwise.

Design a convolutional neural network (CNN) experiment for the Fashion MNIST dataset

The model

Time to create an experiment in the Modeling view. The experiment contains all the information needed to reproduce the experiment:

  • The dataset
  • The AI model
  • The settings or parameters used to run the experiment.

The result from this experiment is a trained AI model that can be evaluated and deployed.

  1. Click New experiment. Name the experiment and click Create.

    In the Inspector, click on the Settings tab and make sure that the dataset you just created is selected in the Dataset wrapper field found in the Dataset settings section.
  2. In the Inspector, click on the Blocks tab.
  3. Click on Snippets to expand the section and click on CNN. This will add a complete convolutional neural network (CNN) to the experiment.

    A CNN is often used when you want to solve an image classification problem. This network is looking for low-level features such as edges and curves and then building up to more abstract concepts through a series of convolutional layers.

    The CNN snippet consists of the following types of blocks:

    2D Convolution. This block is used to detect spatial features in an image.

    2D Max pooling. This layer reduces the size of the data. You can say that 2D max pooling is similar to scaling down the size of an image.

    Batch normalization. This normalizes all input features to a similar range of values which will speed up learning.

    2D Global average pooling. This block is used as an alternative to the Flatten block as it reduces the tensor of the last convolution layer from HxWx128 to a tensor of size 1x1x128.

    Concatenate. This block concatenates a list of inputs. Dense. This is a densely connected neural network layer.

    Dense. This is a densely connected neural network layer.

    After this step, the Information-center-popup dialogue should have appeared showing you what needs to be adjusted before you can run the model. Let's do just that.
  4. Select the Input block in the Modeling canvas. In the Inspector set the Feature to image.
  5. Next, select the Target block and set the Feature to category.

    Keep the default value for Loss, Categorical crossentropy. This loss function computes a score that the model uses to decide which garment it is depicted in the image. If there were only 2 classes in our data (t-shirts and shoes), you could choose binary crossentropy. We have 10 classes, thus categorical crossentropy.
  6. Select the last Dense block in the model. This is a densely connected neural network layer.

    Keep the default value for Nodes, 10. This is the number of labels.

    Keep the default value for Activation,Softmax​​. This activation function is often used together with categorical crossentropy. The softmax function highlights the largest values and suppresses low values. This in effect allows only 1 of the 10 nodes of the dense layer to put it's hand up. There are times when you don't want to squash all in favor of one (like saying this t-shirt is both red and has a logo) but in this experiment, we're trying to say "this is a t-shirt and not a coat or a bag or trousers or....”.

Run experiment to train it

The experiment is done and ready to be trained. Navigate to the Settings tab in the Inspector. In the Run settings section change Batch size to 256 and keep the default values for the rest. For your info:

  • Batch size is how many rows (examples) that are computed at the same time.
  • One Epoch is when the complete dataset has run through the model one time. That means that if you set Epochs to 100 the complete dataset has run through the model a 100 times.
  • Data access seed is just a random number.
  • The Optimizer is how the system optimizes the loss with respect to the weights of the network.

Done!! Click Run in the top right corner.

Analyze experiment

Navigate to the Evaluation view and select your experiment in the left side Experiment section. Watch it train epoch by epoch.

Accuracy graph

Select the Accuracy graph and you’ll notice that after epoch 5 the training and validation graphs diverge, the experiment starts to overfit. That means that the model is just memorizing the picture vs. understanding in more general terms what the shapes and shadows may depict.

At the best epoch, number 5, we have accuracy 0.91. Does that beat your goal you set at the beginning of this tutorial? Yes, it does!!

Loss graph

If you click Categorical crossentropy you’ll see the Loss graph. Loss indicates the magnitude of error your model made on its prediction. It’s a method of evaluating how well your algorithm models your dataset. If your predictions are totally off, your loss function will output a higher number. If they’re pretty good, it’ll output a lower one.

Confusion matrix

The Confusion matrix is used to see how well a system does classification. In a perfect classification, you'll have 100% on the diagonal going from top left to bottom right.

Improve experiment

OK, let’s see if you can improve the model and get at higher accuracy. You see that after epoch 5 the training and validation graphs diverge, it starts to overfit quite a lot. The first thing to do would be to try some regularization, like increasing the drop-rate of the Dropout blocks. By default it’s set to 0.1, meaning 10 % of the input units are dropped, but you can try experimenting with higher values. If the training loss and validation loss becomes more similar the experiment is not so obviously over-fitting.

Other ways to improve the model is to add blocks and change settings in the model. As long as you keep the same loss function, you can compare the results of the experiments and see which one is the best in the Evaluation view.

Examples of what you can change:

  • Add more dense blocks
  • Increase or decrease number of nodes in the dense layers
  • Change the width and height of the convolutional filters
  • Add another CNN snippet
  • Change batch size
  • etc. etc.

Tutorial recap

This tutorial has shown you how fast and easy you can create and test experiments on the Peltarion Platform. You've built a basic CNN-model to solve a classification problem and acquired some basic deep learning knowledge.

Next steps / Read more

In this tutorial, only one label can be correct, but what if the object could have many labels, i.e., a shirt can be labeled both “red” and “short sleeve”. How do you solve such a problem? Check out our tutorial Predicting mood from raw audio data and learn how to solve a multi-label classification problem.

Try the platform