Unprocessed image by DocuSign on Unsplash
The “Maybe just a quick one” series title is inspired by my most common reply to “Fancy a drink?”, which may or may not end up in a long night. Likewise, these posts are intended to be short, but I get carried away sometimes, so apologies in advance.
PixelLib is a library created by Ayoola Olafenwa that provides easy out-of-the-box solutions to perform object segmentation with just a few lines of code. It supports both videos and images.
There are two major types of segmentation that the library can help with:
Semantic: Objects in an image with the same pixel values are segmented with the same colormaps.
Instance: Instances of the same object are segmented with different colormaps.
There is a variety of tasks that PixelLib can accomplish. In this article, we will go through some of them.
Before starting, some necessary packages need to be installed: Tensorflow (version 2 and above), OpenCV, and of course, PixelLib itself:
pip install opencv-python
pip install tensorflow
pip install pixellib —upgrade
For the use cases illustrated in this article, three models will be used:
Mask R-CNN
The h5 model file can be downloaded here
Deeplabv3+ model trained on pascalvoc dataset
PixelLib supports two deeplabv3+ models, Keras and TensorFlow. The Keras model is extracted from the TensorFlow model’s checkpoint. The TensorFlow model performs better than the Keras model extracted from its checkpoint. In this article, the Tensorflow model will be used. It can be downloaded here.
Xception model trained on ade20k dataset
The h5 model file can be downloaded here.
Models and assets can be found in the repo releases page
Everything is set up now. Let’s write some code, shall we?
The folder structure I will be using is this:
├── app.py
├── input_images
│ ├── clem-onojeghuo-L_hK813fu9k-unsplash.jpg
│ ├── docusign-yiW2yzZNnFo-unsplash.jpg
│ └── jen-theodore-C6LzqZakyp4-unsplash.jpg
└── models
├── deeplabv3_xception65_ade20k.h5
├── mask_rcnn_coco.h5
└── xception_pascalvoc.pb
app.py
: The python script where all the coding will happen.
input_images
: Images that will be used for demonstration purposes.
models
: The saved models.
Import the necessary packages:
import pixellib
from pixellib.instance import instance_segmentation
from pixellib.semantic import semantic_segmentation
import cv2
from pixellib.tune_bg import alter_bg
Let’s see how instance vs. semantic segmentation looks like applied on this image:
Instance segmentation using the Mask R-CNN model
segment_image = instance_segmentation()
segment_image.load_model("models/mask_rcnn_coco.h5")
segment_image.segmentImage("input_images/docusign-yiW2yzZNnFo-unsplash.jpg", output_image_name="instance_seg.jpg",
text_size=8, box_thickness=5, text_thickness=5,
show_bboxes=True)
output_image
: Name under which the image will be saved.
show_bboxes
: Show bounding boxes.
text_size
,box_thickness
,text_thickness
: Size and thickness of the boxes and text.
Semantic segmentation using the Xception model trained on ade20k dataset
segment_image = semantic_segmentation()
segment_image.load_ade20k_model("models/deeplabv3_xception65_ade20k.h5")
segment_image.segmentAsAde20k("input_images/docusign-yiW2yzZNnFo-unsplash.jpg", output_image_name="semantic_seg.jpg")
Notice how the semantically segmented image uses the same colormap for the same object types.
The underlying object segmentation capabilities of PixelLib can be applied to accomplish image tuning tasks as well. For example, we can change the background of an image and replace it with another one. I will use this image to serve as the foreground:
Photo by Jen Theodore on Unsplash
…and this as a background:
Photo by Clem Onojeghuo on Unsplash
The model used for this task is the Deeplabv3+ model trained on pascalvoc dataset.
change_bg = alter_bg(model_type="pb")
change_bg.load_pascalvoc_model("models/xception_pascalvoc.pb")
#Change background
change_bg.change_bg_img(f_image_path="input_images/jen-theodore-C6LzqZakyp4-unsplash.jpg",
b_image_path="input_images/clem-onojeghuo-L_hK813fu9k-unsplash.jpg", output_image_name="new_img.jpg")
f_image_path
and b_image_path
: the foreground and background image paths, respectively.
output_image_name
: Name under which the image will be saved.
Let’s see the outcome:
That is not too bad, is it? The little fella was teleported to a lovely sandy beach!
I will apply the same semantic segmentation logic, using the Mask R-CNN model, but this time, the input source will be a camera capturing in real-time. There is a small amount of code that needs to be changed:
capture = cv2.VideoCapture(0)
segment_video = instance_segmentation()
segment_video.load_model("models/mask_rcnn_coco.h5")
segment_video.process_camera(capture, frames_per_second= 15, output_video_name="output.mp4", show_frames=True,
show_bboxes=True,
frame_name="frame",
extract_segmented_objects=False,
save_extracted_objects=False)
You might have noticed that I have included the extract_segmented_objects
and save_extracted_objects
arguments here that are False
by default. If these are set toTrue
then the extracted objects will be saved as images.
And here are some screenshots taken from the processed video:
PixelLib looks like it manages to confidently identify and segment some common objects (with a high probability).
PixelLib is a library that promises to simplify object segmentation on images and videos with just a few lines of code, and it doesn’t fail on delivering. It is a good entry point to computer vision due to its simplicity and ease of use. If you are feeling adventurous, you can spend some time examining the source code. I would also recommend reading the paper, Simplifying Object Segmentation with PixelLib Library, available on paperswithcode. It is effortless to read.
In this article, I covered some of the use cases and examples. There are, however, more examples in the documentation, such as using your own dataset to train the models. Feel free to have a look if you are up for it.