CDC officially recommends wearing face masks (even though not everyone complies). Meanwhile, governments in European countries like Spain, Ukraine, or certain regions in Italy require everyone, big or small, to wear masks all the time, when shopping, walking a dog, or plainly going outside. Breaking the requirements could result in a hefty fine.

Here at the development agency Fulcrum (partially because it’s a quarantine and we had more free time than usual) we came up with this curious idea. We wanted to check if it was technically possible to recognize whether people, indeed, wear masks on the streets. For that, we decided to use online web cams located all over the world.

Let me make this clear from start: this is not a commercial project, but a curious experiment. Our goal was to check how viable this option is. Mass surveillance is not what we pursued, at any point.

So, in just a few weeks we created a neural network that could process images, video footage and recognize people wearing masks. Pretty accurately, I must say.

Here’s what we got in the end.

Technologies inside

When building our neural network, we used different open-source solutions, namely TensorFlow 2 Nightly, OpenCV 2, Keras, Yolov3. The project is also available on GitHub.

Yolov3 is ‘the brains’ behind the system. It included TensorFlow Nightly with built-in Keras. These technologies were used specifically for educating modules.

We used OpenCV for processing images and drawing ‘squares’ on the photos/videos.

Our neural networks includes 2 different programs, written with different programming languages.

Program 1.

It’s used for creating labels, composing datasets and annotations. The software is written in NodeJs. It comprises:

• opencv4nodejs

• elementtree

• keras-js

Program 2.
This software is used for educating models. It is written in Python and includes:

• Modified latest yolov3

• Latest Python 3.6+

• opencv-python

• tensorflow 2.0.0-beta1 / nightly

Educating Modules

As we mentioned before, Yolov3 is ‘brain mechanism’ of our entire system. It’s an open-source project that we found on GitHub. The program requires the following parameters: anchors, labels, models, sizes, batch size, jitter, datasets.

Anchors are the extent to which the needed elements can change their location, widen or narrow down.
Labels are the exact same objects that we are looking for in the image. In our case, it’s a mask.
We use models for educating. At first it’s crucial to use pre-defined yolov3. weights. But this model shouldn’t be educated later on. It’s used only for the structure and annotations.
Define min size, max size and net sizes of the images.
Batch size – the amount of images that are compared to each other.
Jitter is the value used for cropping images (we typically use false or 0.3)

Datasets are the actual images and their descriptions.

How to Generate Datasets

At this stage we need to locate images & process them. Initially, we parsed photos from Google using a simple software Picture Google Grabber.

So, after you receive your collection of images, you have to create labels and annotations. That’s why we used LabelBox. We applied this platform to identify the precise location of the masks. Labelbox is pretty useful, since it generates the file with the needed settings (file names, mask locations, time spent). Later on we use this data for one of our programs.

Yet, it has its downsides too, since the structure in Json is too customized. It also doesn't include image dimensions. Therefore we had to use opencv4nodejs for processing images.

We also used Elementree for composing the structure of the XML tree & set the needed parameters. Afterward, we just created a loop, so that it would work for many images at the same time.

All the results were saved into Annotations folder. In the end, we received full datasets with needed annotations and beautiful structure. All these technologies are built into our first app (written in NodeJs).

Commands

Then we need to run our second app written in Python with all the needed annotations and datasets. It responds to the following commands:

‘Read’
python src/pred.py -c configs/mask.json -imgs/1.jpg
This command helps to recognize the image.

‘Test’
python src/eval.py -c configs/ mask.json
This one shows the quality of the image, showing its 3 major dimensions (Fscore, Precision, Recall)

‘Train’
python src/train_eager.py -c
This command is actually used for educating our neural network!

‘Video’
python video.py -c configs/mask_500.json -i videoplayback.mp4
We use this command for video recognition.

How can this work with online web cams?

WebCam footage is usually stored as short videos, that generally last 5-10 minutes. These videos could be easily processed with the neural network like ours. Although, it’d be hard to implement the network on the streets, this solution could be helpful at different factories that require people to wear masks when working.

For more details, you can always check out our post on how we built a neural network and a dedicated whitepaper - where we described the major development process. We’d be happy to hear your feedback on this experiment – let us know your thoughts!

How to Build Neural Network that Recognizes People Wearing Masks

Technologies inside

Educating Modules

How to Generate Datasets

Commands

How can this work with online web cams?