Skin cancer detection
Solve an image segmentation problem
This tutorial will show you how to build a model that will solve an image segmentation problem. This means that your experiment is about partitioning an image into sections, in this case, "lesion" or "not lesion".
Target group: Intermediate users
Preread: We suggest that you start your deep dive into the Peltarion Platform with the Deploy an operational AI model tutorial. The link below is a proposed pre-read material for those unfamiliar with CNNs and how they work. It does not endeavor to be an exhaustive learning reference. A Beginner’s Guide To Understanding Convolutional Neural Networks
The problem - Predict lesion segmentation boundaries
Although skin lesions are visible to the naked eye, early-stage melanomas may be difficult to distinguish from benign skin lesions with similar appearances. Dermatoscopes, simple hand-held devices that eliminate surface glare and magnify structures invisible to the naked eye, significantly improve the distinction of melanomas from other skin lesions.
The International Skin Imaging Collaboration (ISIC) is a partnership with the goal to help reduce melanoma mortality. ISIC has created an open source public archive of skin images, which can be used for the development of automated diagnostic systems.
The overarching goal of the ISIC Melanoma Project is to support efforts to reduce melanoma-related deaths and unnecessary biopsies by improving the accuracy and efficiency of melanoma early detection. To this end, the ISIC is developing proposed digital imaging standards and creating a public archive of clinical and dermoscopic images of skin lesions.
Since 2016, the ISIC Project has conducted an annual challenge for developers of artificial intelligence (AI) algorithms in the diagnosis of melanoma. The goal of this recurring challenge is to help participants develop image analysis tools to enable the automated diagnosis of melanoma from dermoscopic images.
In this tutorial, you will perform the first task of the ISIC challenge, which is to predict the lesion segmentation boundaries within dermoscopic images.
The original training dataset for the ISIC 2018 challenge consists of 2,594 skin lesion images, each with a corresponding segmentation mask image that indicates the lesion boundaries. White and black color are used to represent lesion and non-lesion areas. A separate validation dataset is also available. However, for this tutorial, we will use the training dataset for both training and validation.
All images have an approximate aspect ratio of 1:1.5 and the sizes range from 1,022 x 767 to 4,288 x 2,848 pixels.
The images that you will upload to the platform have been processed to have a uniform aspect ratio and a size (64 x 64) that is suitable for training a model based on the U-net or the Tiramisu architecture.
If you want to see how the data was prepared, you can take a look at this Jupyter Notebook.
You’ll find the link to download the preprocessed training and validation dataset at the bottom of the terms page for this dataset (because we really want you to read the terms, sorry for the inconvenience).
Goals of the experiment
The goal of the experiment is to build, train and deploy a model that will accurately generate segmentation masks for the images in a test subset. This subset, will not be used for training of the model.
First, create a project and name it so you know what kind of project it is. Naming is important.
A project combines all of the steps in solving a problem, from pre-processing of datasets to model building, evaluation, and deployment. Using projects makes it easy to collaborate with others.
Add the dataset
Please note that by working with this dataset, you accept the author’s license in the Dataset licenses section of the Knowledge center.
If you haven’t downloaded the dataset already, you’ll find the link to download the preprocessed training and validation dataset at the bottom of the terms page for this dataset (it’s a bit cumbersome, we know, but we really want you to read the terms).
When you have downloaded the dataset, navigate to the Datasets view and then click New dataset. Drag and drop the downloaded zip-file to Drag and drop data area.
You can also import the dataset without downloading it. To do this, copy the link to the dataset, and paste the link into the Import data from URL box and then click the arrow.
When then upload is done, name the dataset and click Done.
You don’t need to change anything in the Datasets view so just go ahead and click Save version and then Use in new experiment.
Create a model for image segmentation
In the Experiment wizard make sure the new dataset is selected in the Define dataset tab.
Navigate to the Choose snippet tab. Check that Input feature is image and Target feature is mask. Then choose snippet.
Both the Tiramisu and U-net architectures are suitable for Problem type Image segmentation and will work well with the dataset. For this tutorial you will use U-net, which will take less time to train.
Select the U-net snippet and click Create. The snippet will appear on the Modeling canvas with all settings pre-populated by the platform.
Train the model
Click the Settings tab in the Inspector and change Epochs to 100, then click Run.
You can watch the results unfold in the Evaluation view but the training process will take one hour to complete. In the meantime you may want to have a look at these articles:
When you created the model, you selected the binary crossentropy loss function. This is a loss function used on problems involving yes/no (binary) decisions. This applies to the individual pixels in the input images that should be classified as either lesion (white pixel) or non-lesion (black).
There are 2,594 input images with the dimension 64 x 64 and 20% of those are included in the validation set. This means that an approximate of the total number of values (pixels) in the confusion matrix can be calculated as follows:
64 x 64 x 2594 x 0.2 ≈ 2,125,005
For a clearer view of the model performance, click Count under the confusion matrix and select Percentage.
The top-left to bottom-right diagonal gives you the percentage of correct predictions on the individual pixels in the validation subset, i.e., the recall for the two classes (lesion and not-lesion).
The somewhat lower recall for lesion pixels can be attributed to the class imbalance. There are a relatively lower number of lesion pixels compared to non-lesion pixels in the sample images. This is visualized by the horizontal bars on the right of the confusion matrix.
Test the model
Deploy the trained model
In the Deployment view, click New deployment. The Create deployment popup will appear.
Select your last Experiment and the Checkpoint for best epoch.
Click the Enable switch to deploy the experiment.
Single predictions with curl
You can use the curl command to test model on a handful of input images.
Download and unzip the test dataset.
Resize some of the test images in an editor to have the same dimensions as the training data images (64x64 pixels).
Open a terminal and change to the directory that contains the resized images files.
In the Deployment view, click Copy to clipboard next to the Input example.
Update the curl example so that the image parameter references one of the test files.
Run the curl command in the terminal. The output will be a Base64 encoded string.
To visualize the image mask, copy all characters between the double quotes in the output and paste into an online Base64 decoder tool, e.g., onlinejpgtools.
Save the image mask and compare it with the input image. Do you agree that the image mask correctly marks the location of the lesion?
Analyze the model’s output in Python
If you are familiar with Python, you can analyze the model predictions on the test dataset using this Jupyter Notebook.
Download the test dataset from the ISIC 2018 challenge.
Unzip the file that you have downloaded.
Start the Jupyter Notebook:
$ jupyter notebook skin_lesion_image_segmentation_analysis.ipynb
Install Sidekick and any other required Python packages.
Update the path your dataset, URL and token for your deployment.
Run the notebook.
The pixels in the output will have an intensity in the range between 0 and 255. Pixels with an intensity greater than 127 are considered white.
You have trained a model based on a snippet to create segmentation masks that outlines the contours of skin lesions. The models generate these masks by making a binary prediction on each pixel in the input images.