Using DeepLab Convolutional Neural Network and transfer learning for the Semantic Segmentation

What is DeepLab?

DeepLab is a recent and one of the most promising technique for semantic image segmentation with Deep Learning. Semantic segmentation is understanding an image at the pixel level and assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.

Application of semantic segmentation:

  • Face detection
  • Pedestrian detection
  • Lane detection
  • Locate tumors in medical images

Most of the successful semantic segmentation systems developed in the previous decade relied on hand-crafted features combined with flat classifiers, such as Random Forests or Support Vector Machines. Over the past few years, the breakthroughs of Deep Learning in image classification were quickly transferred to the semantic segmentation task like Fully Convolutional NetworksSegNetRefineNet and PSPNet.

However, DeepLab overcame two challenges in the application of Deep Learning to semantic image segmentation:

  1. Reduced feature resolution
  2. Existence of objects at multiple scales.

The main contribution of DeepLab is atrous convolution, popularly known as dilated convolution. Atrous convolution helps in enhancing the feature resolution and increases the detailed spatial information.

To handle the problem of segmenting objects at multiple scales, it employs atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates, a method known as Atrous Spatial Pyramid Pooling (ASPP).

The above features make DeepLab an ideal choice for semantic segmentation for our proposed approach.

Garment Segmentation Approach

The given task is to extract the garment from the mannequin in an image. To address this problem we have used semantic segmentation to identify the garment region in the image. The overall process is depicted in the following illustration.

Garment segmentation using trained DeepLab model.

For the semantic segmentation task, we have identified DeepLab CNN model to be appropriate. For training and testing purpose, we have created our own private garment segmentation data-set. It consists of images of various garments and their corresponding segmented ground truth images as described in the image above. These ground truth images are manually annotated by professionals. We have used only two labels for the annotation. 1’s representing garment pixels and 0’s representing the background pixels. The following figure demonstrates the abstract level training process.

Training DeepLab on TensorFlow for the garment data-set.

To speed up the overall training process we take advantage of the transfer learning. Transfer learning is a machine learning technique where a pre-trained model is used as the starting point for training the new model. We have used the DeepLab frozen graph, trained on PASCAL VOC 2012 data-set.

The following code snippet, demonstrates transfer learning using DeepLab code. Here, TF_CHECKPOINT specifies the location of pre-trained model.

python "${WORK_DIR}"/ \
--tf_initial_checkpoint="${TF_CHECKPOINT}" \


For the complete guide to TensorFlow implementation of DeepLab and how to use it for your own dataset visit the following post.

How to use DeepLab in TensorFlow for object segmentation using Deep Learning

Once the DeepLab model is trained on our garment data set, it is now capable of segmenting the garment in a given image. Finally, this segmented output can be used as a mask for extracting garment region from the mannequin.

In the coming months, I will be sharing more of my experiences with Images & Deep Learning. Stay tuned and don’t forget to spare some claps if you like this article. It will encourage me immensely.