Hello, Machine Learning community!
We are proud to announce Supervisely Person Dataset. It’s publicly available and free for academic purposes.
For AI to be free we need not just Open Source, but also a strong Open Data movement.
We absolutely agree with him. And let us extend this idea. There are a lot of research on Deep Neural Networks for semantic segmentation task. But in most cases data is much harder and expensive to collect than developing and applying the algorithms to run on it.
That is why we also need specially designed platforms to cover entire ML workflow from developing training datasets to training and deploying neural networks.
We believe that our work will help developers, researchers and businesses and be perceived not only as yet another public dataset but also as a set of innovative approaches and instruments for creating large training datasets faster.
Next, we are going to cover all aspects of how we built this dataset from scratch. Before we continue, let me show you some interesting facts:
Supervisely is Machine Learning platform which include data science smarts. It allows data scientists to focus on real innovations and leave routine work to others (yes, training of well known NN architectures is a routine work too).
Person segmentation is critical task in analysing humans on images for many real-world applications: action recognition, self-driving cars, video surveillance, mobile applications and much more.
We at DeepSystems had our internal research on this field and we realized that there is a lack of data for this task. You can ask us: what about public datasets like COCO, Pascal, Mapillary and others? To answer this question i’ll better show you few examples:
The quality of human segmentation in most public datasets is not satisfied our requirements and we had to create our own dataset with high quality annotations. I will show you how we did it below.
Upload public datasets to the system: PascalVoc, Mapillary. Our “Import” module supports most of public datasets and converts them to unified json-based format called Supervisely format :)
Them we execute the DTL (“Data Transformation Language”) query to perform few things: merge datasets -> skip images without person objects -> crop each person from images -> filter them by width and height -> split to train/test sets.
It seems like there are a lot of publicly available data but we mentioned earlier, that there are some hidden problems: low quality of annotations, low resolution and so on.
Thus, we construct our first training dataset.
We will train slightly customizes UNet-like architecture.
loss = BinaryCrossEntropy + (1 — dice).
This network is fast to train, it is pretty accurate and easy to implement and customize. It allows us to experiment a lot. Supervisely can be distributed across multiple nodes in cluster.
Thus we can train few NNs simultaneously. Also all NNs support multi-GPU training in our platform. Each training experiment with input resolution 256*256 took no more than 15 minutes.
We didn’t have the collection of unlabeled images, so we decided to download it from the Web. We implemented service (github) that downloads data from great photo stock — Pexels (thank you guys for really cool work).
So, we downloaded around 15k images with tags related to our task, upload them to Supervisely and perform resize operation via DTL query because they had super resolution.
Used architecture do not support instance segmentation. We deliberately didn’t use Mask-RCNN, because the quality of segmentation near object edges is low.
That’s why we decided to make a two-steps scheme: apply Faster-RCNN (based on NasNet) to detect all persons on images, and then for each person bounding box apply segmentation network to segment dominating object. This approach allows us both to simulate instance segmentation and to segment object edges accurately.
We experimented with different resolutions: the more resolution we pass to NN, the better result it produces. We didn’t care about the total inference time, because Supervisely supports inference that is distributed across multiple machines. For the task of automatic pre-annotation it is more than enough.
All inference results appear in dashboard in real time. Our operators preview all results and label images with a few tags: bad prediction, prediction to correct, good prediction. This process is fast because they need few keyboard shortcuts for “next image” and “assign tag to image”.
Images tagged as “bad prediction” are skipped. Further work continues with the images we need to correct.
Manual correction requires significantly less time than annotation from scratch.
As you can see, such approach is applicable to many computer vision tasks even if you need to annotate several object classes on images.
This dataset helps us to improve AI powered annotation tool —customize it to segment humans. We have added the ability to train NN for this tool inside system in our latest release. Here is the comparison of class-agnostic based tool and its customized version. It is available and you can try it on your data.
Sign up for Supervisely, go to “Import” tab -> “Datasets library”. Click to “Supervisely Person” dataset, write name for new project. Then click “three dots” button -> “Download as json”-> “Start” button. That’s all. Total download time may take 15 minutes (~ 7 GB).
It was very interesting to look at how people without any ML background went through all this steps. We as Deep Learning specialists saved a lot of time and our annotation team became more productive in terms of annotation speed and quality.
We hope, that Supervisely platform will help every deep learning team to make AI products faster and easier.
Let me list most valuable Supervisely features we use in this work:
Feel free to ask any questions! Thank you!
If you found this article interesting, then let’s help others too. More people will see it if you give it some 👏.