Self sorting wardrobe
How to solve a classification problem
This tutorial will show you how to build an image classifier, taking you through creating the typical building blocks of a convolution neural network (CNN) for a 28 x 28 pixels imageset with 10 classes. Taking the input dataset, establishing the convolution layer, pooling/downsampling, flatten and classifying.
Target audience: Beginners
Preread: We suggest that you start your deep dive into the Peltarion Platform with the Deploy an operational AI model tutorial. The link below is a proposed pre-read material for those unfamiliar with CNNs and how they work. It does not endeavor to be an exhaustive learning reference.
A Beginner’s Guide To Understanding Convolutional Neural Networks.
McKinsey Global Institute identifies the retail industry as the sector likely to create the most annual value from AI techniques (a potential $600B). They state in 2017 “AI technologies could eliminate many levels of manual activities in areas such as promotions, assortments, and supply chain. AI will enable retailers to increase both the number of customers and the average amount they spend by creating personal and convenient shopping experiences.”
Deep learning image classification has many applications in the retail industry and will drive much of this predicted value, for example, reducing errors in supply chain management (accurate inventory/catalog management by automatically identifying items from photo) or as a component in visual search from user-generated content (a customer uses a photo taken on their mobile device to locate or search for similar items in a shopping catalog).
Using a convolutional neural network
Image classification in deep learning is often implemented using a technique called a convolutional neural network (CNN). In this tutorial, we will be building a CNN that will be trained on thousands of images of fashion items (e.g., clothing, accessories, shoes) that will create a model that can be used to predict for a given image what type of fashion item it is.
Although the CNN model, in this case, will identify fashion items, it can be trained on any class you require. For example, in healthcare, this technique could take a brain scan as input and predict if it contains a tumor or it could be used to classify a personal photo library, much as Apple or Google do with their photo applications.
You will use the Fashion MNIST dataset which is an image classification dataset that consists of small, 28 x 28 pixels, images of clothing or accessories such as shirts, bags, shoes, and other fashion items. Each image is annotated with a label indicating the correct garment. The images come from Zalando and consist of a training set of 60,000 examples and a test set of 10,000 examples.
Goals with the experiment
In this tutorial, it’s validation accuracy that matters. Accuracy is how often you predict the right answer, or actually, the formula is:
For example, if the number of test samples is 1,000 and the model classifies 952 of those correctly, then the model’s accuracy is 95.2%.
We’ll set our goal to 0.9, i.e., correct prediction 9 out of 10 times. Note that the “world record” is 0.967. For more benchmarks, check Zalando’s collected benchmark classifiers here: Fashion MNIST benchmark.
First, create a project and name it, so you know what kind of project it is. Naming is important.
A project combines all of the steps in solving a problem, from the preprocessing of datasets to model building, evaluation and deployment.
Add and manage the dataset
After creating the project, you will be taken to the Datasets view, where you can import data.
Click the Data library button and look for the Fashion MNIST - tutorial data dataset in the list. Click on it to get more information.
If you agree with the license, click Accept and import. This will import the dataset in your project, and you will be taken to the dataset’s details where you can edit features and subsets.
Subsets of the Fashion MNIST dataset
Keep the default subsets
As you can see in the Inspector on the left, the dataset is by default split into two subsets: 80% for training and 20% for validation. That means we’ll use 80% of the set to train our model and the remaining as "not seen before" examples to see how well the training is progressing. The percentage of correctly guessed labels on the validation set will produce the accuracy number we’re after. Use the default subsets for this tutorial.
Save the dataset
You’ve now created a dataset ready to be used in the platform. Click Save version. Saving this version will lock and version it and allow you to build a model with it. Click on Use in new experiment.
Design a CNN experiment for the Fashion MNIST dataset
It’s time to create an experiment in the Modeling view. The experiment contains all the information needed to reproduce the experiment:
The AI model
The settings or parameters used to run the experiment.
The result from this experiment is a trained AI model that can be evaluated and deployed.
Make sure that the FashionMNIST dataset is selected in the Experiment wizard.
Click on the Choose snippet tab.
Select CNN and click Create. This will add a complete convolutional neural network (CNN) to the experiment. A CNN is often used when you want to solve an image classification problem. This network looks for low-level features such as edges and curves and then builds up to more abstract concepts through a series of convolutional layers. The CNN snippet consists of the following types of blocks:
2D Convolution. This block is used to detect spatial features in an image.
2D Max pooling. This layer reduces the size of the data. You can say that 2D max pooling is similar to scaling down the size of an image.
Batch normalization. This normalizes all input features to a similar range of values which will speed up learning.
2D Global average pooling. This block is used as an alternative to the Flatten block as it reduces the tensor of the last convolution layer from HxWx128 to a tensor of size 1x1x128.
Dense. This is a densely connected neural network layer.
The experiment is done and ready to be trained. All settings have been pre-populated by the platform, e.g.,
The input feature is set to image in the Input block.
The target feature is set to category in the Target block. The Loss, is set to Categorical crossentropy. This loss function computes a score that the model uses to decide which garment it is depicted in the image. If there were only 2 classes in our data (t-shirts and shoes), you could choose binary crossentropy. We have 10 classes, thus categorical crossentropy.
In the last Dense block in the model the Nodes value is set to 10. This is the number of labels.
The Activation is set to Softmax. This activation function is often used together with categorical crossentropy. The softmax function highlights the largest values and suppresses low values. This in effect allows only 1 of the 10 nodes of the dense layer to put its hand up. There are times when you don’t want to squash all in favor of one (like saying this t-shirt is both red and has a logo), but in this experiment we’re trying to say "this is a t-shirt and not a coat or a bag or trousers or…”.
Run experiment to train model
Your model is now ready to be trained. Navigate to the Settings tab in the Inspector. In the Run settings section change Batch size to 256 and keep the default values for the rest. For your info:
Batch size is how many rows (examples) that are computed at the same time.
One Epoch is when the complete dataset has run through the model one time. That means that if you set Epochs to 10 the complete dataset has run through the model a 10 times.
Data access seed is just a random number.
The Optimizer is how the system optimizes the loss with respect to the weights of the network.
Done! Click Run in the top right corner.
Navigate to the Evaluation view and select your experiment in the left side Experiment section. Watch it train epoch by epoch.
Select the Accuracy graph and you’ll notice that after epoch 5 the training and validation graphs diverge, the experiment starts to overfit. That means that the model is just memorizing the picture vs. understanding in more general terms what the shapes and shadows may depict.
At the best epoch, number 5, we have accuracy 0.91. Does that beat the goal you set at the beginning of this tutorial? Yes, it does!
If you click Categorical crossentropy you’ll see the Loss graph. Loss indicates the magnitude of error your model made on its prediction. It’s a method of evaluating how well your algorithm models your dataset. If your predictions are totally off, your loss function will output a higher number. If they’re pretty good, it’ll output a lower one.
The Confusion matrix is used to see how well a system does classification. In a perfect classification, you’ll have 100% on the diagonal going from top left to bottom right.
OK, let’s see if you can improve the model and get higher accuracy. You see that after epoch 5 the training and validation graphs diverge, it starts to overfit. The first thing to do would be to try some regularization, by adding Dropout blocks after the Dense blocks. Try a droput rate in the range 0.1-0.5.
Other ways to improve the model is to add blocks and change settings in the model. As long as you keep the same loss function, you can compare the results of the experiments and see which one is the best in the Evaluation view.
Examples of what you can change:
Add more dense blocks
Increase or decrease number of nodes in the dense layers
Change the width and height of the convolutional filters
Add another CNN snippet
Change batch size
This tutorial has shown you how fast and easy you can create and test experiments on the Peltarion Platform. You’ve built a basic CNN model to solve a classification problem and acquired some basic deep learning knowledge.
Next steps / Read more
In this tutorial, only one label can be correct, but what if the object could have many labels, i.e., a shirt can be labeled both “red” and “short sleeve.” How do you solve such a problem? Check out our tutorial Predicting mood from raw audio data and learn how to solve a multi-label classification problem.