2,204 reads

PixelLib: Image and Video Segmentation [Maybe just a Quick One]

by Akis Loumpourdis5mJuly 5th, 2021

Too Long; Didn't Read

PixelLib is a library created by Ayoola Olafenwa that provides easy out-of-the-box solutions to perform object segmentation with just a few lines of code. PixelLib supports two deeplabv3+ models, Keras and TensorFlow. The Tensorflow model performs better than the Keras model extracted from its checkpoint. In this article, we will go through some of the tasks that PixelLib can accomplish. The “Maybe just a quick one” series title is inspired by my most common reply to “Fancy a drink?”

People Mentioned

featured image - PixelLib: Image and Video Segmentation [Maybe just a Quick One]

Unprocessed image by DocuSign on Unsplash

The “Maybe just a quick one” series title is inspired by my most common reply to “Fancy a drink?”, which may or may not end up in a long night. Likewise, these posts are intended to be short, but I get carried away sometimes, so apologies in advance.

The PixelLib library

PixelLib is a library created by Ayoola Olafenwa that provides easy out-of-the-box solutions to perform object segmentation with just a few lines of code. It supports both videos and images.

There are two major types of segmentation that the library can help with:

Semantic: Objects in an image with the same pixel values are segmented with the same colormaps.
Instance: Instances of the same object are segmented with different colormaps.

There is a variety of tasks that PixelLib can accomplish. In this article, we will go through some of them.

Prerequisites

Before starting, some necessary packages need to be installed: Tensorflow (version 2 and above), OpenCV, and of course, PixelLib itself:

pip install opencv-python
pip install tensorflow
pip install pixellib —upgrade

Models

For the use cases illustrated in this article, three models will be used:

Mask R-CNN

The h5 model file can be downloaded here

Deeplabv3+ model trained on pascalvoc dataset

PixelLib supports two deeplabv3+ models, Keras and TensorFlow. The Keras model is extracted from the TensorFlow model’s checkpoint. The TensorFlow model performs better than the Keras model extracted from its checkpoint. In this article, the Tensorflow model will be used. It can be downloaded here.

Xception model trained on ade20k dataset

The h5 model file can be downloaded here.

Models and assets can be found in the repo releases page

Everything is set up now. Let’s write some code, shall we?

The folder structure I will be using is this:

├── app.py
├── input_images
│   ├── clem-onojeghuo-L_hK813fu9k-unsplash.jpg
│   ├── docusign-yiW2yzZNnFo-unsplash.jpg
│   └── jen-theodore-C6LzqZakyp4-unsplash.jpg
└── models
    ├── deeplabv3_xception65_ade20k.h5
    ├── mask_rcnn_coco.h5
    └── xception_pascalvoc.pb

app.py: The python script where all the coding will happen.

input_images: Images that will be used for demonstration purposes.

models: The saved models.

Import the necessary packages:

app.py

import pixellib
from pixellib.instance import instance_segmentation
from pixellib.semantic import semantic_segmentation
import cv2
from pixellib.tune_bg import alter_bg

Image segmentation

Let’s see how instance vs. semantic segmentation looks like applied on this image:

Photo by DocuSign on Unsplash

Instance segmentation using the Mask R-CNN model

segment_image = instance_segmentation()
segment_image.load_model("models/mask_rcnn_coco.h5")
segment_image.segmentImage("input_images/docusign-yiW2yzZNnFo-unsplash.jpg", output_image_name="instance_seg.jpg",
                           text_size=8, box_thickness=5, text_thickness=5,
                           show_bboxes=True)

output_image: Name under which the image will be saved.

show_bboxes: Show bounding boxes.

text_size,box_thickness,text_thickness: Size and thickness of the boxes and text.

Semantic segmentation using the Xception model trained on ade20k dataset

segment_image = semantic_segmentation()
segment_image.load_ade20k_model("models/deeplabv3_xception65_ade20k.h5")
segment_image.segmentAsAde20k("input_images/docusign-yiW2yzZNnFo-unsplash.jpg", output_image_name="semantic_seg.jpg")

Notice how the semantically segmented image uses the same colormap for the same object types.

Image tuning

The underlying object segmentation capabilities of PixelLib can be applied to accomplish image tuning tasks as well. For example, we can change the background of an image and replace it with another one. I will use this image to serve as the foreground:

Photo by Jen Theodore on Unsplash

…and this as a background:

Photo by Clem Onojeghuo on Unsplash

The model used for this task is the Deeplabv3+ model trained on pascalvoc dataset.

change_bg = alter_bg(model_type="pb")
change_bg.load_pascalvoc_model("models/xception_pascalvoc.pb")

#Change background
change_bg.change_bg_img(f_image_path="input_images/jen-theodore-C6LzqZakyp4-unsplash.jpg",
                        b_image_path="input_images/clem-onojeghuo-L_hK813fu9k-unsplash.jpg", output_image_name="new_img.jpg")

f_image_path and b_image_path : the foreground and background image paths, respectively.

output_image_name : Name under which the image will be saved.

Let’s see the outcome:

That is not too bad, is it? The little fella was teleported to a lovely sandy beach!

Real-time image segmentation from camera capture

I will apply the same semantic segmentation logic, using the Mask R-CNN model, but this time, the input source will be a camera capturing in real-time. There is a small amount of code that needs to be changed:


capture = cv2.VideoCapture(0)
segment_video = instance_segmentation()
segment_video.load_model("models/mask_rcnn_coco.h5")
segment_video.process_camera(capture, frames_per_second= 15, output_video_name="output.mp4", show_frames=True,
                             show_bboxes=True,
                             frame_name="frame",
                             extract_segmented_objects=False,
                             save_extracted_objects=False)

You might have noticed that I have included the extract_segmented_objects and save_extracted_objects arguments here that are False by default. If these are set toTrue then the extracted objects will be saved as images.

And here are some screenshots taken from the processed video:

One cup

PixelLib looks like it manages to confidently identify and segment some common objects (with a high probability).

Final thoughts and next steps

PixelLib is a library that promises to simplify object segmentation on images and videos with just a few lines of code, and it doesn’t fail on delivering. It is a good entry point to computer vision due to its simplicity and ease of use. If you are feeling adventurous, you can spend some time examining the source code. I would also recommend reading the paper, Simplifying Object Segmentation with PixelLib Library, available on paperswithcode. It is effortless to read.

In this article, I covered some of the use cases and examples. There are, however, more examples in the documentation, such as using your own dataset to train the models. Feel free to have a look if you are up for it.