Spoiler: sure, you can! đȘ
Motivation
Person detection task is really everywhere. If you are working in computer vision field, chances are that you have faced this task before. You can take almost any industry and person detection will arise. The examples are:
- Self-driving. Identifying pedestrians on a road scene
- Retail. Analysing visitors behaviour within a supermarket
- Fashion. Identify specific brands and persons who wear them
- Security. Restrict access for certain people to certain places
- Mobile apps. Find a person and apply cool filter
But how hard is the task today?
Well, 5 years ago, the dominant solutions were built using opencv, based on cascade classifiers on top of Haar-like features. These detectors required time & efforts to build and the detection quality was not very good by todayâs standards.
In the Deep Learning Era, the combination of feature engineering and simple classifier is left behind (at least, when it comes to computer vision) and neural networks dominate the field.
Actually, there are a number of implementations of person detection out there on github. For example, take a look at this and this repos.
Nevertheless, even today, companies keep contacting us (DeepSystems.ai) and ask us to help them with this particular task. Hopefully, after reading this blog post, you will find it feasible to run person detector that will work for your task.
Letâs choose the tools
We will use Supervise.ly platform to address Person Detection task. The reasons, besides promo, are:
- It will take us 5 minutes to get initial solution
- No need to write code and jump back and forth between various developer tools
- We get out-of-the-box: a bunch of pre-trained models, visualization and deployment
How to approach the task
Actually, we have two ways to address the task: (1) use a pre-trained model or (2) train our own Person Detector.
With Supervise.ly you can go both ways, but, for simplicity, we will focus on the firstâââwe will use NASNet based Faster R-CNN model that is pre-trained on COCO dataset.
So, our high level plan is the following:
I. Setup a person detector model
II. Detect persons on your images to check the quality
III. Deploy the model for production applications
Letâs start âŠ
Step by step guide
First of all, we need to go to Supervise.ly website and sing in. Then, just follow step-by-step guideline below.
I. Setup a person detector model
Setting up a detector model is easy. To do that, we need to connect your GPU machine to Supervise.ly and then pick one of the pre-trained models.
1. Connect your GPU machine to Supervise.ly platform
(1) click âconnect your first nodeâ button (2) run selected command in your terminal (3) check that your computer is connected to the platform
Go to Cluster page, click âConnect your first nodeâ button (1). Then, you will see a following window (2). Just copy the command and execute it in your terminal. Python agent will be installed on your machine, and it will be shown in a list of available nodes (3).
2.Pick a pre-trained model
(1) go to Model Zoo page (2) pick your detector (3) check âMy Modelsâ list
Go to âNeural networksâ -> âModel Zooâ page (1). You will see there a bunch of pre-trained models for semantic segmentation and object detection tasks. If you hover cursor over âFasterRCNN NasNetâ, you will see âAdd modelâ button (2). After clicking it, the model will be available in âMy modelsâ list (3).
Now, Faster R-CNN detector is ready. The next step is to check how it works for your images.
II. Detect persons on your images to check the quality
Before you deploy model as API, it is a good idea to visualize neural network predictions to understand whether it fits your requirements or not. It is super easy with Supervisely: drag&drop your images and run inference process with a few clicks.
1. Import your images
(1) go to âImportâ->âUploadâ page and drag&drop your images (2) define project name and start import (3) check âProjectsâ page
Go to import page, and drag&drop the folder with your test images (1). Name the project where your images will be kept. In our case, the project name is âtest_imagesâ. Then click start import button (2). After the import process is finished, go to âProjectsâpage to check the project with your images is created (3).
2. Run inference process
(1) click âTestâ button (2) choose a project (3) define inference parameters
Now letâs test our persons detector on your images. Go to âMy modelsâ page and click âTestâ button (1). Then you pick a project with the images to detect persons on. In our case, itâs âtest_imagesâ project, then click âNextâ button (2). Then, we should specify the name of the project where detection results will be stored. In our case, itâs âinf_test_imagesâ project (3).
The only thing left is to select an inference mode. Select in a list âinference_full_imageâ. The last step is to replace that line:
âsave_classesâ: â__all__â,
with this line:
âsave_classesâ: [âpersonâ],
Then, click âStart inferenceâ button.
3. Check the results
(1) click to the resulting project (2) look at predictions
After inference process is finished, you will see âinf_test_imagesâ on the project page (1). To visually check the results (2), click on âinf_test_imagesâ project, and, then, on a dataset within this project.
Now that we understand that our model meet the requirements, letâs go to the final stepâââmodel deployment.
III. Deploy the model for production applications
In most cases, once we are satisfied with detection quality, we need to use the model from custom environment via APIs. The instructions below describe how to deploy the model for production applications.
1. Deploy person detection model
(1) click âDeployâ button (2) specify and submit deployment parameters (3) make sure the task is completed
Go to âMy modelsâ page and click âDeployâ button (1). Then, after specifying the target GPU machine and device, click âSubmitâ button (2). As a result, a new task will be created, and, as soon as, it is completed (3) you can send API requests to your person detection model.
2. Send API requests
(1) click âDeploy API Infoâ item (2) get deployment information
Before calling API, we need to get token and url information. To do that, on a âCluster->Tasksâ page click on âDeploy API infoâ from context menu (1). On the next page, you will see all the information needed to use our detection model via API (2).
More specifically, here (2) we can see:
- API token. RsiYTrSBsyE5BIXRYYCFBLJf13JqVQ4NeEUUxX2oE1SdkwgdpmErjZ0tHEKljadILv8cQrosxMVmirJVOOf025mR8XB88feSRDbbeAYpKL2MwJ1MAZtJ2PfideN4UmNP
- URL. https://app.supervise.ly/public/api/v1/models/435/deploy/upload
3. Usage example.
Itâs evident, but let me say it anyway, that your API tokens and URLs will differ from the ones above.
For example, suppose that you have âdl_heroes.jpgâ image
Yann LeCun, Geoffrey Hinton, Yoshua Bengio, Andrew Ng
So, if you run in terminal the following command:
curl -XPOST -H âX-API-KEY: YourTokenâ -F âimage=@dl_heroes.jpg;type=image/jpegâ YourUrl
then, Supervise.ly will give you back detection results in json format:
[{"tags":[],"description":"","objects":[{"bitmap":{"origin":[],"np":[]},"type":"rectangle","classTitle":"person","description":"","tags":[],"points":{"exterior":[[343,87],[493,375]],"interior":[]},"score":0.999502420425415},{"bitmap":{"origin":[],"np":[]},"type":"rectangle","classTitle":"person","description":"","tags":[],"points":{"exterior":[[0,94],[149,375]],"interior":[]},"score":0.9994213581085205},{"bitmap":{"origin":[],"np":[]},"type":"rectangle","classTitle":"person","description":"","tags":[],"points":{"exterior":[[247,96],[367,377]],"interior":[]},"score":0.9987866282463074},{"bitmap":{"origin":[],"np":[]},"type":"rectangle","classTitle":"person","description":"","tags":[],"points":{"exterior":[[138,96],[256,378]],"interior":[]},"score":0.99868243932724},{"bitmap":{"origin":[],"np":[]},"type":"rectangle","classTitle":"person","description":"","tags":[],"points":{"exterior":[[100,133],[129,177]],"interior":[]},"score":0.9136056900024414}],"size":{"width":506,"height":380}}]
Json above corresponds to all the object detected, including coordinates and confidence levels. After visualization, we get
Visualization of predictions returned by API
To make the life even easier, we provide a python notebook that implements API calls and visualise detection results. We encourage you to play with it!
Jupyter notebook (not much code here)
Back to the main question
Recall the question in the title âCan you solve a person detection task in 10 minutes?â
- The quick answer is Yes, just follow the instructions above.
- More thoughtful answer is âIt dependsâ, the devil is, as always, in details.
Again, the number of possible apps where person detection is needed is huge. Below are three most popular factors that cause a headache:
- Hardware constraints. For some apps itâs ok to use Desktop Computer with high-end GPU onboard. Other apps should work on a mobile phone or inside a robot. In this case, we need to use small & fast neural network at the expense of the model accuracy.
- Real-time requirements. For example, in self-driving industry the software should work in real-time. In this case, the latest GPU is still not powerful enough to run State-of-The-Art implementations of Faster R-CNN. So, again, we have to sacrifice the accuracy and pick simpler model.
- Specific conditions. if we build a security app, then, very likely, we have to spot unwanted persons during the night. There are no guarantees that out-of-the-box detectors will solve this task. A lot of other variations are possibleâââdifferent weather, camera angles or the fact that only small part of a person is visible. The good news is that we can train the models to work well in a specified conditions, but it might take us some extra work.
In the future, we are going to publish a series of blog posts that address the more complicated scenarios, including training of custom object detectors
If you found this article interesting, give it some đ, so that more people could see it!