Hello, Machine Learning community!
We are proud to announce Supervisely Person Dataset_._ Itâs publicly available and free for academic purposes.
For AI to be free we need not just Open Source, but also a strong Open Data movement.
â Andrew Ng
We absolutely agree with him. And let us extend this idea. There are a lot of research on Deep Neural Networks for semantic segmentation task. But in most cases data is much harder and expensive to collect than developing and applying the algorithms to run on it.
That is why we also need specially designed platforms to cover entire ML workflow from developing training datasets to training and deploying neural networks.
Few examples from âSupervisely Person Datasetâ
We believe that our work will help developers, researchers and businesses and be perceived not only as yet another public dataset but also as a set of innovative approaches and instruments for creating large training datasets faster.
Next, we are going to cover all aspects of how we built this dataset from scratch. Before we continue, let me show you some interesting facts:
- Dataset consists of 5711 images with 6884 high-quality annotated person instances.
- All steps below are done inside Supervisely without any coding
- More importantly, these steps were performed by our in-house annotators with no machine learning (ML) expertise at all. Data scientists just controlled and managed this process.
- Annotation team consisted of two members and the whole process took only 4 days.
Supervisely is Machine Learning platform which include data science smarts. It allows data scientists to focus on real innovations and leave routine work to others (yes, training of well known NN architectures is a routine work too).
The problem to solve
Person segmentation is critical task in analysing humans on images for many real-world applications: action recognition, self-driving cars, video surveillance, mobile applications and much more.
We at DeepSystems had our internal research on this field and we realized that there is a lack of data for this task. You can ask us: what about public datasets like COCO, Pascal, Mapillary and others? To answer this question iâll better show you few examples:
Few examples of human annotation from COCO dataset
The quality of human segmentation in most public datasets is not satisfied our requirements and we had to create our own dataset with high quality annotations. I will show you how we did it below.
Step 0: upload and prepare public datasets as a start point to train initial NN
Upload public datasets to the system: PascalVoc, Mapillary. Our âImportâ module supports most of public datasets and converts them to unified json-based format called Supervisely format :)
Them we execute the DTL (âData Transformation Languageâ) query to perform few things: merge datasets -> skip images without person objects -> crop each person from images -> filter them by width and height -> split to train/test sets.
It seems like there are a lot of publicly available data but we mentioned earlier, that there are some hidden problems: low quality of annotations, low resolution and so on.
Thus, we construct our first training dataset.
Step 1: train NN
We will train slightly customizes UNet-like architecture.
Unet_v2 architecture
loss = BinaryCrossEntropy + (1âââdice).
This network is fast to train, it is pretty accurate and easy to implement and customize. It allows us to experiment a lot. Supervisely can be distributed across multiple nodes in cluster.
Thus we can train few NNs simultaneously. Also all NNs support multi-GPU training in our platform. Each training experiment with input resolution 256*256 took no more than 15 minutes.
Step 2: prepare data to annotate
We didnât have the collection of unlabeled images, so we decided to download it from the Web. We implemented service (github) that downloads data from great photo stockâââPexels (thank you guys for really cool work).
So, we downloaded around 15k images with tags related to our task, upload them to Supervisely and perform resize operation via DTL query because they had super resolution.
Step 3: apply NN to unlabeled images
Used architecture do not support instance segmentation. We deliberately didnât use Mask-RCNN, because the quality of segmentation near object edges is low.
Thatâs why we decided to make a two-steps scheme: apply Faster-RCNN (based on NasNet) to detect all persons on images, and then for each person bounding box apply segmentation network to segment dominating object. This approach allows us both to simulate instance segmentation and to segment object edges accurately.
3-min video of applying model and manual correction of segmentation
We experimented with different resolutions: the more resolution we pass to NN, the better result it produces. We didnât care about the total inference time, because Supervisely supports inference that is distributed across multiple machines. For the task of automatic pre-annotation it is more than enough.
Step 4: manual validation and correction
All inference results appear in dashboard in real time. Our operators preview all results and label images with a few tags: bad prediction, prediction to correct, good prediction. This process is fast because they need few keyboard shortcuts for ânext imageâ and âassign tag to imageâ.
How we tag images: leftâââbad prediction, mediumâââprediction that needed light manual correction, rightâââgood prediction.
Images tagged as âbad predictionâ are skipped. Further work continues with the images we need to correct.
How to correct Neural Network predictions
Manual correction requires significantly less time than annotation from scratch.
Step 5: add results to training dataset and go to Step 1
Thatâs all.
Some hints:
- When we apply NN that was trained only on public data, the percent of âsuitableâ images (marked as âgood predictionâ and âprediction to correctâ) was about 20%.
- After the tree fast iterations this number was increased up to 70%.
- We have done 6 iterations in total and final NN became pretty accurate :-)
- Before training we added small band across object edges to smooth jagged edges and perform several augmentations: flip, random crop, rotation by random angle and color transformations.
As you can see, such approach is applicable to many computer vision tasks even if you need to annotate several object classes on images.
Bonus
This dataset helps us to improve AI powered annotation tool âcustomize it to segment humans. We have added the ability to train NN for this tool inside system in our latest release. Here is the comparison of class-agnostic based tool and its customized version. It is available and you can try it on your data.
How to access the dataset
Sign up for Supervisely, go to âImportâ tab -> âDatasets libraryâ. Click to âSupervisely Personâ dataset, write name for new project. Then click âthree dotsâ button -> âDownload as jsonâ-> âStartâ button. Thatâs all. Total download time may take 15 minutes (~ 7 GB).
How to download
Conclusion
It was very interesting to look at how people without any ML background went through all this steps. We as Deep Learning specialists saved a lot of time and our annotation team became more productive in terms of annotation speed and quality.
We hope, that Supervisely platform will help every deep learning team to make AI products faster and easier.
Let me list most valuable Supervisely features we use in this work:
- âImportâ module to upload all public datasets
- âData Transformation Languageâ to manipulate, merge and augment datasets
- âNeural networksâ module to use Faster-RCNN and UnetV2
- âStatisticsâ module to automatically get useful insights from data we have
- âAnnotation toolsâ are like Photoshop for training data
- âCollaborationâ feature allow to combine workers to annotation teams, assign them tasks and control entire process.
Feel free to ask any questions! Thank you!
If you found this article interesting, then letâs help others too. More people will see it if you give it some đ.