Seeing the revolutionary advances in AI in recent years, what could be more rewarding than starting to make use of these new possibilities to save human lives? Therefore, in this blog post, we will be looking at how to create a deep learning model for automatic brain tumor segmentation.
To summarize the outcomes – it is possible to get good results with a limited dataset and a simple algorithm. The algorithm used is general and could easily be applied to different problem settings or use cases.
Detecting cancer tumors – a time-consuming task
Worldwide, there are 14 million new cancer cases reported per year and close to 9 million related deaths. In fact, two out of five people will be diagnosed with cancer at some point in their life. In Sweden, approximately 50% of cancer patients are today treated using radiotherapy; a non-invasive method where high-energy radiation is aimed at the tumor, effectively killing the tumor cells.
Prior to radiotherapy treatment, a doctor needs to create a segmentation mask, an image that is used to mark the exact location of the cancerous tumor. Information derived from the mask is used to create a treatment plan, which serves as a basis for the radiotherapy. It is important that the segmentation mask be exact; if the mask is too small, there is a greater likelihood of the tumor recurring, and if the mask is too large, treatment could potentially lead to the loss of cognitive functions.
Segmentation is a time consuming and difficult task - on average it takes up to one hour for a doctor to examine a scan for a single patient and consequently identifying and segmenting tumors. Doing a back-of-the-envelope calculation under the assumption that the average tumor takes an hour to segment and 50% of all cancer cases worldwide are being treated using radiotherapy, the conclusion can be reached that around 3,000 doctors would be needed. That is, working full time, only performing segmentation (0.5 * 14e6 cancer * 1 hour / (8 hours per working day * 261 working days per year). Yet another problem with manual segmentation is the fact that there is a great deal of variety amongst doctors, in turn, creating varying levels of accuracy in the results.
/ Dataset and preprocessing
For this project, the dataset “Multimodal Brain Tumor Segmentation Challenge 2015” is used. This dataset contains 3D magnetic resonance imaging (MRI) scans from 276 patients with brain tumors. For each patient there are four different types of 3D scans, or modalities; Flair, T1, T1c, and T2. Each is treated as different channels, which one would normally do with the different colors in an RGB image. For each patient, there’s also a ground truth segmentation mask, that includes four different tumor tissue types.
For this project, a few 2D slices are used in the transverse plane around the location segmented, rather than using the full 3D scans. Instead of seeing the slices as 3D volume, they are stacked together as channels, effectively getting 4*N channels, where 4 comes from the number of modalities, and N from the number of slices used. By scaling down to 2.5D, a great deal of GPU memory can be saved, which allows for training with a larger batch size. An alternative approach would be to use 3D convolutions, as used by Kamnitsas et al, which may capture more structure in the z-dimension.
For preprocessing, bias field correction and histogram equalization are done for each volume. Bias field signal is a low-frequency artifact that tends to be present in MRI images due to imperfections in the coils used in the MRI scans, or interference between scan slices, among other things. The N4 Bias Field Correction algorithm is used for handling this. Another characteristic of MRI images is that there is no fixed scale for intensity; scanning the same patient in different machines will give different absolute values, even though the semantic information is the same. To mitigate this, a histogram equalization is performed.
During training, data augmentation is performed; rotation between -10 and 10 degrees, zoom between 90 and 110%, shearing up to 5 degrees, and horizontal flips. In the U-Net paper, elastic deformation is also used, which probably could benefit this case as well.
Taking a deep learning approach, the U-net architecture developed by Ronneberger et al is used. The U-net architecture was originally used to segment neuronal structures in electron microscopy images and has since become a common baseline for segmentation. This approach works well even when having small amounts of data. The architecture is fully convolutional, so images of arbitrary size can be processed as long as they fit on the GPU. It consists of an encoder and a decoder with skip connections in between (see image below). The encoder has several blocks with a set of 3x3 convolutions, each followed by regularization (dropout), an activation function (ReLU) and a pooling layer. For each of these blocks, the spatial size will be reduced while the number of feature maps is increased, effectively allowing capture of higher and higher levels of semantic information.
For the decoder, the pooling layer is changed to an upsampling layer, thus, for every decoder block, spatial information is recovered while the number of feature maps is decreased. Every decoder block also receives information from a skip connection from the encoder, allowing the decoder to combine higher level features with lower level ones. Trying to train the network after having removed these skip connections will lead to poor results. The Adam optimizer is used, with a learning rate of 1e-3, which is reduced with a factor 0.9 when the Dice score on the validation dataset plateaus, down to a minimum of 1e-6.
A problem with the dataset is that it’s highly unbalanced, and only about 2.4% of the pixels are tumor pixels. Trying to train this with a standard categorical cross-entropy loss doesn’t give any useful results, so a pixel-weighted cross-entropy is used instead, where uncommon classes get a higher weight.
This can be seen as a way to tell the model that the tumor pixels are considered more important, and that more emphasis is placed on them when updating the weights.
A Tiramisu architecture was also built and tested for this project, where the blocks of the U-Net have been changed to DenseNet blocks. This is tested with and without the weighted loss. Other than that, no other tuning of the hyperparameters is done.
Both the U-net and Tiramisu models are supported on the platform. These models have been pre-built and are available in the “snippets” area in the model view on the platform. See our recent blog on common deep learning models and snippets, that have been pre-built on the platform - increasing the ease of use.
Looking at the results, they are qualitatively quite good. What becomes evident is the slight overprediction created, and this is most likely explained by the way the loss is weighted. The different colors of the segmentation maps correspond to the different tumor structures: edema (white), non-enhancing solid core (yellow), necrotic/cystolic core (red) and enhancing core (orange).
Despite this simple approach and relatively small dataset, it becomes evident that it is possible to reach results resembling those made by doctors.
The following results are obtained from the public leaderboard, testing on examples that were not included in the training. The numbers shown are average Dice scores for the different tumor types for doctors and U-Net respectively. It’s very interesting to see that while it was almost impossible to get any good results using U-Net without the weighted loss, it didn’t pose any problems for the Tiramisu architecture. This could be due to the decreased distances the gradients have to travel and/or due to the increased convexity of the loss surface gained by the skip connections, elegantly explained by Li et al.
A lot of exciting work has been done in the field of image segmentation recently, and there are plenty of ways in which the results could be further improved. For example, Google released their new image segmentation model DeepLab-v3+ recently, reaching a new state of the art on the PASCAL VOC 2012 dataset. Kamnitas et al. used an ensemble of various CNN architectures to win the BRATS 2017 competition.
In the medical field, there is a lot of unlabeled data that cannot be utilized using the method described above. A lot of work has been done recently using GANs for semi-supervised segmentation, making use of both labeled and unlabeled images. This would be very interesting to apply here as well.
The reasons as to why one would look at algorithmic solutions for segmentation, become obvious after reading this blog post.
Firstly, there would be great value in being able to enable doctors to work more efficiently when it comes to cancer treatment of all sorts.
Secondly, due to the significant lack of skilled radiologists, making use of AI in this field could enable a greater amount of doctors and nurses to perform tasks they would not have been able to before. Hence, this could increase the capabilities of humans, enabling more people to be hired and put into work. However, for AI to reach its potential within this area, it urgently needs to mature beyond its academic roots. We believe that our operational AI platform can transform this largely experimental field, moving AI projects from proofs of concept to real-world applications. This would mean speeding up the time-consuming segmentation process done by doctors all around the world.
This project illustrates the potential in using AI to empower humans in their everyday tasks, and the tremendous amount of value this can add to the cancer treatment process worldwide. Whilst only at the very beginning of this development, introducing operational artificial intelligence to cases such as this, AI-technology could be a serious driving force in the fight against cancer.
Want to read more about AI within the medical field? Read the blog post "AI and Healthcare."