🎁 Releasing “Supervisely Person” dataset for teaching machines to segment humans

Hello, Machine Learning community!

We are proud to announce Supervisely Person Dataset_._ It’s publicly available and free for academic purposes.

For AI to be free we need not just Open Source, but also a strong Open Data movement.

— Andrew Ng

We absolutely agree with him. And let us extend this idea. There are a lot of research on Deep Neural Networks for semantic segmentation task. But in most cases data is much harder and expensive to collect than developing and applying the algorithms to run on it.

That is why we also need specially designed platforms to cover entire ML workflow from developing training datasets to training and deploying neural networks.

Few examples from “Supervisely Person Dataset”

We believe that our work will help developers, researchers and businesses and be perceived not only as yet another public dataset but also as a set of innovative approaches and instruments for creating large training datasets faster.

Next, we are going to cover all aspects of how we built this dataset from scratch. Before we continue, let me show you some interesting facts:

Dataset consists of 5711 images with 6884 high-quality annotated person instances.
All steps below are done inside Supervisely without any coding
More importantly, these steps were performed by our in-house annotators with no machine learning (ML) expertise at all. Data scientists just controlled and managed this process.
Annotation team consisted of two members and the whole process took only 4 days.

Supervisely is Machine Learning platform which include data science smarts. It allows data scientists to focus on real innovations and leave routine work to others (yes, training of well known NN architectures is a routine work too).

The problem to solve

Person segmentation is critical task in analysing humans on images for many real-world applications: action recognition, self-driving cars, video surveillance, mobile applications and much more.

We at DeepSystems had our internal research on this field and we realized that there is a lack of data for this task. You can ask us: what about public datasets like COCO, Pascal, Mapillary and others? To answer this question i’ll better show you few examples:

Few examples of human annotation from COCO dataset

The quality of human segmentation in most public datasets is not satisfied our requirements and we had to create our own dataset with high quality annotations. I will show you how we did it below.

Step 0: upload and prepare public datasets as a start point to train initial NN

Upload public datasets to the system: PascalVoc, Mapillary. Our “Import” module supports most of public datasets and converts them to unified json-based format called Supervisely format :)

Them we execute the DTL (“Data Transformation Language”) query to perform few things: merge datasets -> skip images without person objects -> crop each person from images -> filter them by width and height -> split to train/test sets.

It seems like there are a lot of publicly available data but we mentioned earlier, that there are some hidden problems: low quality of annotations, low resolution and so on.

Thus, we construct our first training dataset.

Step 1: train NN

We will train slightly customizes UNet-like architecture.

Unet_v2 architecture

loss = BinaryCrossEntropy + (1 — dice).

This network is fast to train, it is pretty accurate and easy to implement and customize. It allows us to experiment a lot. Supervisely can be distributed across multiple nodes in cluster.

Thus we can train few NNs simultaneously. Also all NNs support multi-GPU training in our platform. Each training experiment with input resolution 256*256 took no more than 15 minutes.

Step 2: prepare data to annotate

We didn’t have the collection of unlabeled images, so we decided to download it from the Web. We implemented service (github) that downloads data from great photo stock — Pexels (thank you guys for really cool work).

So, we downloaded around 15k images with tags related to our task, upload them to Supervisely and perform resize operation via DTL query because they had super resolution.

Step 3: apply NN to unlabeled images

Used architecture do not support instance segmentation. We deliberately didn’t use Mask-RCNN, because the quality of segmentation near object edges is low.

That’s why we decided to make a two-steps scheme: apply Faster-RCNN (based on NasNet) to detect all persons on images, and then for each person bounding box apply segmentation network to segment dominating object. This approach allows us both to simulate instance segmentation and to segment object edges accurately.

3-min video of applying model and manual correction of segmentation

We experimented with different resolutions: the more resolution we pass to NN, the better result it produces. We didn’t care about the total inference time, because Supervisely supports inference that is distributed across multiple machines. For the task of automatic pre-annotation it is more than enough.

Step 4: manual validation and correction

All inference results appear in dashboard in real time. Our operators preview all results and label images with a few tags: bad prediction, prediction to correct, good prediction. This process is fast because they need few keyboard shortcuts for “next image” and “assign tag to image”.

How we tag images: left — bad prediction, medium — prediction that needed light manual correction, right — good prediction.

Images tagged as “bad prediction” are skipped. Further work continues with the images we need to correct.

How to correct Neural Network predictions

Manual correction requires significantly less time than annotation from scratch.

Step 5: add results to training dataset and go to Step 1

That’s all.

Some hints:

When we apply NN that was trained only on public data, the percent of “suitable” images (marked as “good prediction” and “prediction to correct”) was about 20%.
After the tree fast iterations this number was increased up to 70%.
We have done 6 iterations in total and final NN became pretty accurate :-)
Before training we added small band across object edges to smooth jagged edges and perform several augmentations: flip, random crop, rotation by random angle and color transformations.

As you can see, such approach is applicable to many computer vision tasks even if you need to annotate several object classes on images.

Bonus

This dataset helps us to improve AI powered annotation tool —customize it to segment humans. We have added the ability to train NN for this tool inside system in our latest release. Here is the comparison of class-agnostic based tool and its customized version. It is available and you can try it on your data.

How to access the dataset

Sign up for Supervisely, go to “Import” tab -> “Datasets library”. Click to “Supervisely Person” dataset, write name for new project. Then click “three dots” button -> “Download as json”-> “Start” button. That’s all. Total download time may take 15 minutes (~ 7 GB).

How to download

Conclusion

It was very interesting to look at how people without any ML background went through all this steps. We as Deep Learning specialists saved a lot of time and our annotation team became more productive in terms of annotation speed and quality.

We hope, that Supervisely platform will help every deep learning team to make AI products faster and easier.

Let me list most valuable Supervisely features we use in this work:

“Import” module to upload all public datasets
“Data Transformation Language” to manipulate, merge and augment datasets
“Neural networks” module to use Faster-RCNN and UnetV2
“Statistics” module to automatically get useful insights from data we have
“Annotation tools” are like Photoshop for training data
“Collaboration” feature allow to combine workers to annotation teams, assign them tasks and control entire process.

Feel free to ask any questions! Thank you!

If you found this article interesting, then let’s help others too. More people will see it if you give it some 👏.