Tutorial - Skin lesion segmentation

Solve an image segmentation problem

This tutorial will show you how to build a model that will solve an image segmentation problem. This means that your experiment is about partitioning an image into sections, in this case, "lesion" or "not lesion".

Target group: Data scientists and developers

Preread: We suggest that you start your deep dive into the Peltarion Platform with the Deploy an operational AI model tutorial. The link below is a proposed pre-read material for those unfamiliar with CNNs and how they work. It does not endeavor to be an exhaustive learning reference. A Beginner's Guide To Understanding Convolutional Neural Networks

The problem - Predict lesion segmentation boundaries

Although skin lesions are visible to the naked eye, early-stage melanomas may be difficult to distinguish from benign skin lesions with similar appearances. Dermatoscopes, simple hand-held devices that eliminate surface glare and magnify structures invisible to the naked eye, significantly improve the distinction of melanomas from other skin lesions.

The International Skin Imaging Collaboration (ISIC) is a partnership with the goal to help reduce melanoma mortality. ISIC has created an open source public archive of skin images, which can be used for the development of automated diagnostic systems.

The overarching goal of the ISIC Melanoma Project is to support efforts to reduce melanoma-related deaths and unnecessary biopsies by improving the accuracy and efficiency of melanoma early detection. To this end, the ISIC is developing proposed digital imaging standards and creating a public archive of clinical and dermoscopic images of skin lesions.

Since 2016, the ISIC Project has conducted an annual challenge for developers of artificial intelligence (AI) algorithms in the diagnosis of melanoma. The goal of this recurring challenge is to help participants develop image analysis tools to enable the automated diagnosis of melanoma from dermoscopic images.

In this tutorial, you will perform the first task of the ISIC challenge, which is to predict the lesion segmentation boundaries within dermoscopic images.

The data

The original training dataset for the ISIC 2018 challenge consists of 2,594 skin lesion images, each with a corresponding segmentation mask image that indicates the lesion boundaries. White and black color are used to represent lesion and non-lesion areas. A separate validation dataset is also available. However, for this tutorial, we will use the training dataset for both training and validation.

Lesion images and segmentation masks

All images have an approximate aspect ratio of 1:1.5 and the sizes range from 1,022 x 767 to 4,288 x 2,848 pixels.

The images that you will upload to the platform have been processed to have a uniform aspect ratio and a size (64 x 64) that is suitable for training a model based on the U-net or the Tiramisu architecture.

If you want to see how the data was prepared, you can take a look at this Jupyter Notebook.

Goals of the experiment

The goal of the experiment is to build, train and deploy a model that will accurately generate segmentation masks for the images in a test subset. This subset, will not be used for training of the model.

Create project

First, create a project and name it so you know what kind of project it is. Naming is important.

A project combines all of the steps in solving a problem, from pre-processing of datasets to model building, evaluation, and deployment. Using projects makes it easy to collaborate with others.

Add the dataset

Please note that, by working with this dataset you accept the author's license in the Dataset licenses section of the Knowledge center. 

You'll find the link to download the preprocessed training and validation dataset the bottom of this page.

When you have downloaded the dataset, navigate to the Datasets view and click New dataset. Add the dowloaded zip-file to the Upload files tab.

You can also import the dataset without downloading it. To do this, copy the link to the dataset and paste that link in the Import files tab.

Click Next, name the dataset and click Done.

You don't need to change anything in the Datasets view so just go ahead and click Save version.

Create a model for image segmentation

The Tiramisu and U-net architectures are suitable for image segmentation. Both will work well with the dataset that you have imported. For this tutorial you will use U-net, which will take less time to train.

Click the U-net snippet in the Inspector to the right. The snippet will appear on the canvas and some errors will be displayed in the information pop-up.

Information pop-up

Click the error message related to the input in the Information pop-up. The Input block will be selected.

Set Feature to image in the Inspector to the right and the error message will disappear. If you zoom in on the model using the zooming tools, you will also notice that the output dimensions are now displayed under each block.

Click the error message related to the target in the Information pop-up. The Target block will be selected.

Set Feature to mask in the Inspector and the last error message will disappear.

Finally, change the Loss function in the Target block to Binary crossentropy.

Training the model

Click the Settings tab in the Inspector and change Epochs in the Run Settings to 100, then click Run.

You can watch the results unfold in the Evaluation view but the training process will take one hour to complete. In the meantime you may want to have a look at these articles:

Analyze experiment

When you created the model, you selected the binary cross entropy loss function. This is a loss function used on problems involving yes/no (binary) decisions. This applies to the individual pixels in the input images that should be classified as either lesion (white pixel) or non-lesion (black).

There are 2,594 input images with the dimension 64 x 64 and 20% of those are included in the validation set. This means that an approximate of the total number of values (pixels) in the confusion matrix can be calculated as follows:

64 x 64 x 2594 x 0.2 ≈ 2,125,005

Model evaluation - Confusion matrix displays actual number of predictions

For a clearer view of the model performance, click Count under the confusion matrix and select Percentage.

The top-left to bottom-right diagonal gives you the percentage of correct predictions on the individual pixels in the validation subset, i.e., the recall for the two classes (lesion and not-lesion).

The somewhat lower recall for lesion pixels can be attributed to the class imbalance. There are a relatively lower number of lesion pixels compared to non-lesion pixels in the sample images. This is visualized by the horizontal bars on the right of the confusion matrix.

Model evaluation - Confusion matrix displays predictions as percentages

Testing the model

Deploy the trained model

  1. In the Deployment view, click New deployment. The Crete deployment popup will appear.
  2. Select your last Experiment and the Checkpoint for best epoch.
  3. Click the Enable switch to deploy the experiment.

Single predictions with curl

You can use the curl command to test model on a handful of input images.

  1. Download and unzip the test dataset.
  2. Resize some of the test images in an editor so that these have the same dimensions as the images in the training data (64x64 pixels).
  3. Open a terminal and change to the directory that contains the resized images files.
  4. In the Deployment view, click Copy to clipboard next to the Input example.
  5. Update the curl example so that the image parameter references one of the test files.
  6. Run the curl command in the terminal. The output will be a Base64 encoded string.
  7. To visualize the image mask, copy all characters between the double quotes in the output and paste into an online Base64 decoder tool, e.g., onlinejpgtools.
  8. Save the image mask and compare it with the input image. Do you agree that the image mask correctly marks the location of the lesion?

Analyzing the model output in Python

If you are familiar with Python, you can analyze the model predictions on the test dataset using this Jupyter Notebook.

Note!
The following instructions require that you are familiar with Python and Jupyter Notebook.
  1. Download the test dataset from the ISIC 2018 challenge.
  2. Unzip the file that you have downloaded.
  3. Start the Jupyter Notebook:
    $ jupyter notebook skin_lesion_image_segmentation_analysis.ipynb
  4. Install Sidekick and any other required Python packages.
  5. Update the path your dataset, URL and token for your deployment.
  6. Run the notebook.
Notebook output – Left: input image, middle: predicted mask, right: white sections of the mask are made transparent and superimposed onto the image.

The pixels in the output will have an intensity in the range between 0 and 255. Pixels with an intensity greater than 127 are considered white.

Tutorial recap

You have trained a model based on a snippet to create segmentation masks that outlines the contours of skin lesions. The models generate these masks by making a binary prediction on each pixel in the input images.

Get started for free