Tutorial on TensorFlow Image Segmentation

TensorFlow lets you use deep learning techniques to perform image segmentation, a crucial part of computer vision. There are many ways to perform image segmentation, including Convolutional Neural Networks (CNN), Fully Convolutional Networks (FCN), and frameworks like DeepLab and SegNet.

In this article, we’ll explain the basics of image segmentation, provide two quick tutorials for building and training your models in TensorFlow.

Image Segmentation in Deep Learning: Concepts and Techniques

Image segmentation involves dividing a visual input into segments to simplify image analysis. Segments represent objects or parts of objects, and comprise sets of pixels, or “super-pixels”. Image segmentation sorts pixels into larger components. There are three levels of image analysis:

Classification – categorizing the image into a class such as “people”, “animals”

Object detection – detecting objects within an image and drawing a rectangle around them

Segmentation – identifying parts of the image and understanding what object they belong to

There are three types of segmentation:

Semantic Segmentation which classifies pixels of an image into meaningful classes

Instance Segmentation which identifies the class of each object in the image and overlapping of segment is allowed

Panoptic Segmentation also identifies the class of each object in the image and overlapping of segment is NOT allowed

The following deep learning techniques are commonly used to power image segmentation tasks:

Convolutional Neural Networks (CNNs) – segments of an image can be fed as input to a CNN, which labels the pixels. The CNN cannot process the whole image at once. It scans the image, looking at a small “filter” of several pixels each time.

Fully Convolutional Networks (FCNs) – FCNs use convolutional layers to process varying input sizes. The final output layer has a large receptive field and corresponds to the height and width of the image, while number of channels corresponds to number of classes. FCNs classify every pixel to determine image context and location of objects.

DeepLab – an image segmentation framework that helps control signal decimation (reducing the number of samples and data the network must process), and aggregate features from images at different scales. DeepLab uses a ResNet architecture pre-trained on ImageNet for feature extraction. It uses a special technique called ASPP to process multi-scale information.

SegNet neural network – an architecture based on deep encoders and decoders, also known as semantic pixel-wise segmentation. It involves encoding an input image into low dimensions and recovering it, leveraging orientation invariance in the decoder. This generates a segmented image at the decoder.

Quick Tutorial #1: FCN for Semantic Segmentation with Pre-Trained VGG16 Model

The images below show the implementation of a fully convolutional neural network (FCN). Input for the net is the RGB image on the right. The net creates pixel-wise annotation as a matrix, proportionally, with the value of each pixel correlating with its class, see the image on the left.

Now let begin with our first example, you can download the code from Link below and follow instructions there after,

Github Source Code Link: TensorFlow

Begin by downloading a pre-trained VGG16 model from Google drive Link here or Direct Link here, and add the /Model_Zoo subfolder to the primary code folder.

The steps below are summarized,

1. Training

In: TRAIN.py

Set folder of the training images in Train_Image_Dir
Set folder for the ground truth labels in Train_Label_DIR
Download a pretrained VGG16 model and put in model_path
Set number of classes/labels in NUM_CLASSES
Run training script

2. Predicting pixelwise annotation using trained VGG network

In: Inference.py

Set the Image_Dir to the folder where the input images for prediction are located.
Set the number of classes in NUM_CLASSES
Set folder where you want the output annotated images to be saved to Pred_Dir
Run script

3. Evaluating network performance using Intersection over Union (IOU)

In: Evaluate_Net_IOU.py

Set the Image_Dir to the folder where the input images for prediction are located
Set folder for ground truth labels in Label_DIR. The Label Maps should be saved as PNG image with the same name as the corresponding image and png ending
Set number of classes number in NUM_CLASSES
Run script

Quick Tutorial #2: Modifying the DeepLab Code to Train on Your Own Dataset

DeepLab is semantic image segmentation technique with deep learning, which uses an ImageNet pre-trained ResNet as its primary feature extractor network. The new ResNet block uses atrous convolutions, rather than regular convolutions.

Prerequisites: Before you begin, install one of the DeepLab implementations in TensorFlow.

Github Source Code Link : DeepLab2

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a unified and state-of-the-art TensorFlow codebase for dense pixel labeling tasks, including, but not limited to semantic segmentation, instance segmentation, panoptic segmentation, depth estimation, or even video panoptic segmentation.

Deep labeling refers to solving computer vision problems by assigning a predicted value for each pixel in an image with a deep neural network.

For further reading and research you can visit following Github Link DeepLab2

For this article we will use Demo Google Colab Notebook for built by the DeepLab2 library to perform dense pixel labelling tasks built by the DeepLab2 library to perform dense pixel labeling tasks. The models used in this colab perform panoptic segmentation, where the predicted value encodes both semantic class and instance label for every pixel (including both ‘thing’ and ‘stuff’ pixels).

Google Colab Link : https://colab.research.google.com/github/google-research/deeplab2/blob/main/DeepLab_Demo.ipynb

The following is a summary of tutorial steps:

Step1 : Import and helper methods

Steps2 : Define Functions

Creates a label colormap used in CITYSCAPES segmentation benchmark.
Returns: A 2-D numpy array with each row being mapped RGB color (in uint8 range).

Pertrubs the color with some noise.
If `used_colors` is not None, we will return the color that has not appeared before in it.
Args: color: A numpy array with three elements [R, G, B]. noise: Integer, specifying the amount of perturbing noise (in uint8 range). used_colors: A set, used to keep track of used colors. max_trials: An integer, maximum trials to generate random color. random_state: An optional np.random.RandomState. If passed, will be used to generate random numbers.
Returns: A perturbed color that has not appeared in used_colors.

Helper method to colorize output panoptic map.
Args: panoptic_prediction: A 2D numpy array, panoptic prediction from deeplab model. dataset_info: A DatasetInfo object, dataset associated to the model. perturb_noise: Integer, the amount of noise (in uint8 range) added to each instance of the same semantic class.
Returns: colored_panoptic_map: A 3D numpy array with last dimension of 3, colored panoptic prediction map. used_colors: A dictionary mapping semantic_ids to a set of colors used in `colored_panoptic_map`

Step3: Select a pretrained model

Step4: Load pretrained Model

Step5: Run on sample images

a) Required to upload an image from your local machine., Click Choose file and select the image

b) Run next block of code and it will display Segmented image with labels