Car damage assessment

Transfer learning with a pretrained snippet

Save training time and create well-performing models with small datasets. Sounds good? This tutorial will show you how.

Target audience: Beginners

tut7 1

You will learn to

In this tutorial, you will use a pretrained snippet in a classification model designed to detect different types of car damage. The number of images in the input data is small relative to the number of classes and has a significant amount of variation. This makes it challenging to create a well-performing model based on this dataset alone.


The data

The dataset that you will use in this experiment contains approximately 1,500 unique RGB images with the dimensions 224 x 224 pixels, and is split into a training- and a validation subset.

Fix an unbalanced dataset

The underrepresented classes in the training subset have been upsampled in the preprocessing in order to reduce bias. This means that the index file (index.csv) has duplicate entries that are linking to the same image file. The total number of entries in the index file is approximately 3,800.

Classes

Each image belongs to one of the following classes:

  • Broken headlamp

  • Broken tail lamp

  • Glass shatter

  • Door scratch

  • Door dent

  • Bumper dent

  • Bumper scratch

  • Unknown

Below are sample images from the various classes in the dataset. Note that the unknown class contains images of cars that are in either a pristine or wrecked condition.

Each collected image represents one car with one specific type of damage. This means that the dataset can be used to solve a single-label classification problem.

tut7 2
Figure 1. Example images from each class

Dataset generation deep dive

If you want to learn how the raw data was processed to create the dataset used in this tutorial, you can dive deeper into the jupyter notebook here.


Create a project

First, create a project and name it so you know what kind of project it is. Naming is important.

A project combines all of the steps in solving a problem, from preprocessing of datasets to model building, evaluation and deployment. Using projects makes it easy to collaborate with others.


Upload the car damage dataset to the platform

  1. Navigate to the Datasets view and click New dataset.

  2. Copy the link below:
    https://storage.googleapis.com/bucket-8732/car_damage/preprocessed.zip

  3. Paste the link to the Import data from URL box. The zip includes the whole dataset.

  4. When done, name the dataset Car damage classifier and click Done.

Create subsets of the car damage dataset

The subset column, containing a T or a V, indicates if the row should be used for training or validation. The split between training and validation data is approximately 80% and 20%. This column was created during the preprocessing of the raw data.

Even though it is possible to use the default subsets created by the platform when you upload the data, it is more advantageous to create a conditional split based on the subset column. For this dataset, there is no separate labeled test subset, in case you want to analyze the performance of the deployed model outside the platform. Instead, you can compare the model predictions with the ground truth provided with the predefined validation subset.

  1. Delete the default subsets Training and Validation by clicking the Subsets options menu (…​) then Delete.

  2. Click New subset and name the training subset Training. Then click Add conditional filter and set Feature to subset, Operator to is equal to, and Value to T.

  3. Repeat the procedure for a new Validation subset and set Feature to subset, Operator to is equal to, and Value to V.

tut7 3
Figure 2. Training subset
tut7 4
Figure 3. Validation subset

Normalize on training subset

Select the Training subset that you have created in Normalize on subset.

Save the dataset

You’ve now created a dataset ready to be used in the platform. Click Save version and click Use in new experiment.

tut7 5
Figure 4. Saved processed car damage dataset.

Use a pretrained snippet

  • Name the experiment in the Experiment wizard.

  • Make sure that the Car damage classifier dataset is selected in the Define dataset tab.

  • Click on the Choose snippet tab and select the EfficientNet B0 snippet.

  • Click on the Initialize weights tab. Select ImageNet for pretrained data.

  • Click Create.

The EfficientNet B0 blocks will be added to the Modeling canvas. You can expand and collapse the EfficientNet BO group at any time by clicking + or -.


Run experiment

Before you run the experiment click the Settings tab and change the Learning rate to 0.0005.

Now, everything is set up, so just click Run.


Analyze the experiment

Go to the Evaluation view. Since the model solves a classification problem, a confusion matrix is displayed. The top-left to bottom-right diagonal shows correct predictions. Everything outside this diagonal are errors.

tut7 6

Note that metrics are based on the validation subset which only consists of 20% of the original dataset.

Click the dropdown next to Cells and select Percentage. The normalized values that are now displayed correspond to the recall for each class.

The recall values clearly indicate that the model has learned the features in the images.

tut7 7
Figure 5. Model evaluation — Normalized confusion matrix

Analyzing the model in a Jupyter Notebook

If you are familiar with Python, you can analyze the model predictions on the validation subset using this Jupyter Notebook.

  1. Start the Jupyter Notebook:
    $ jupyter notebook car_damage_analysis.ipynb

  2. Install Sidekick and any other required Python packages.

  3. Update the path your dataset (zip file), URL, and token for your deployment.

  4. Run the notebook.


Deploy the trained model

In the Deployment view click New deployment and select experiment and checkpoint.

Click the Enable switch to deploy the experiment. You can now call your deployed experiment using the Url and Token. For example when you build your own app with Bubble.


Tutorial recap

Using transfer learning with pretrained weights, you have saved hours of training time and created a better performing than would have been possible if you had trained the model from scratch.