Face recognition with Go

This article tells about the process of creation and usage face recognition library for Go language.

Neural networks are highly popular today, people use them for a variety of tasks. One particularly useful appliance is face recognition.

Recently I’ve realized that my hobby project, a forum software with Go backend, would benefit from face recognition feature. It would be really neat to have a way to recognize people on uploaded photos (pop singers) so that newcomers don’t need to ask who’s on the photo. This sounded like a good idea so I decided to give it a try.

One thing to note is that I try to keep system requirements of that software pretty low, so that more people can install it, using a cheap server. That’s why implementation can’t use CUDA or require GPU. While you can easily rent such a server today it will cost more, thus reducing the potential auditory. It would be much better if it can work just on CPU, preferably without exotic dependencies.

Choosing the language

If you ask data scientist or person involved in practical experience with neural networks, almost all of them will recommend you to grab Python language for solving machine learning task. It’s definetely a smart choice because of the community, amount of libraries available, simplicity of the language and so on. Needless to say you’ll easily find extremely popular face recognition libraries in Python with great examples and documentation.

However I decided to choose Go for several reasons:

My forum is written in Go, I really like the convenience of single-binary backend so it would be nice to have simple integration of face recognition routines with the rest of backend, instead of implementing some IPC and requiring Python dependencies.
Go is generally faster and more importantly consumes less memory than Python. Of course critical parts of any performant Python library are written in C/C++ but you will have overhead of Python VM anyway. You can always rent machine with more memory if we’re speaking about hosting but I prefer faster languages unless it significantly hurts development time. I won’t use C or C++ as my main language for writing web applications but Go is fine, almost as simple as Python.
I haven’t found face recognition libraries for Go so writing one would be both fun and helpful for community.

Choosing the framework

As said earlier, neural networks and thus frameworks implementing them are massively widespread. Only in computer vision you have Caffe, Torch, TensorFlow and others.

But there is one particularly cool library dlib that almost immediately attracted my attention. First, it’s written in C++ so you can easily create Go bindings with cgo. Second, it claims 99.38% accuracy on the Labeled Faces in the Wild benchmark which sounds quite impressive. Third, popular face recognition libraries such as face_recognition and openface use dlib underneath so it looks like a really good choice.

Installing dependencies

Framework is chosen, but how would we get it on development and production machines? C++ dependencies might be tricky to install, you can’t use convenient “go get” or “pip install” commands. Either it’s provided in the repository of your OS or expect a tedious compilation process. The issue is even more nasty if you are library owner and asking your users to compile software by themselves. E.g. here you can see how many real people experience problems with dlib compilation.

Fortunately there is better option: in case if user’s target system is known we can build binary package of dlib which would greatly simplify the installation. Since we’re speaking about server software, Ubuntu is almost the standard here, so you really want to support it in the first place.

Ubuntu has dlib in its standard repos but the version is too old: face recognition support was added only in dlib 19.3. So we need to build our own package. I’ve created PPA (custom repository) for Ubuntu 16.04 and 18.04, the two latest LTS versions. Installation is as simple as:

sudo add-apt-repository ppa:kagamih/dlibsudo apt-get updatesudo apt-get install libdlib-dev

It will install latest dlib (19.15 at the moment) and Intel’s Math Kernel Library, which seems to be the fastest implementation of standard BLAS and LAPACK interfaces, at least for Intel processors.

Good news for Debian sid and Ubuntu 18.10 (not yet released), pretty fresh dlib is available in standard repos. All you need is:

sudo apt-get install libdlib-dev libopenblas-dev

This will use OpenBLAS implementation instead of MKL which is pretty fast too. Alternatively you could enable non-free packages and install libmkl-dev instead.

Also we will need libjpeg to load JPEG images, install libjpeg-turbo8-dev package on Ubuntu and libjpeg62-turbo-dev on Debian for that (don’t ask me why names are so different).

Right now I don’t have instructions for other systems so let me know if you have problems with getting dlib. It makes perfect sense to provide short and precise recipes at least for the most popular of them.

I’m considering to also provide Docker image for dlib (few of them already exist), many projects with complex dependencies tend to use that method of distribution. But in my opinion a native package will always provide better user experience. You don’t need to write long commands in console or deal with sandboxed environment, everything works like it used to.

Writing the library

Modern face recognition libraries work by returning a set of numbers (vector embedding or descriptor) for each face on the photo so you can compare them to each other and find the name of person on passed image by comparing that numbers (normally by Euclidean distance between vectors, the two faces with minimal distance should belong to the same person). That concept is already described in other articles so I won’t go into details here.

Basic code for creating face descriptor from the passed image is trivial, it pretty much follows official example. Check out facerec.cc. The corresponding header facerec.h defines 5 functions and several structures for interaction between Go and dlib.

Here I discrovered one unfortunate thing with dlib. While it supports all popular image formats, it can only load them from file. This can be very confusing restriction, because often you keep image data only in memory and writing it to temporal file is a mess. So I had to write my own image loader using libjpeg. Since most photos are stored in that format it should be enough for now, other formats might be added later.

A tiny glue layer that connects C++ and Go is placed in face.go. It provides Face structure that holds coordinates of the face on the image and its descriptor. And the Recognizer interface for all actions such as initialization and the actual recognition.

What do we do once we have descriptor? In the simplest case you would compare the Euclidean distance between unknown descriptor and all known descriptors as said earlier. It’s not perfect, on the current state of the art sometimes you will get wrong answers. If we want to improve results a bit, we would use many images for each person and check if at least several of them were pretty close to the provided face.

It’s exactly that classify.cc does. First it computes distances, then sorts them, then counts hits of the same person in top 10 minimal distances.

They’re better algorithms for this task exist, e.g. support vector machines are often used. dlib even provides convenient API for training such models. I’ve seen few mentions that SVM on huge datasets might be slow though, so I need to test it on large collection first which I haven’t done yet.

Usage

Resulting library is available at github.com/Kagami/go-face, include it in your Go project as usual:

import "github.com/Kagami/go-face"

See GoDoc documentation for overview of all structures and methods. There’re not many of them, a typical workflow is:

Init recognizer
Recognize all known images, collect descriptors
Pass known descriptors with corresponding categories to the recognizer
Get descriptor of unknown image
Classify its category

Here is working example that illustrates all steps described above:

package main

import ("fmt""log""path/filepath"

"github.com/Kagami/go-face")

// Path to directory with models and test images. Here it's// assumed it points to the// <https://github.com/Kagami/go-face-testdata> clone.const dataDir = "testdata"

// This example shows the basic usage of the package: create an// recognizer, recognize faces, classify them using few known// ones.func main() {// Init the recognizer.rec, err := face.NewRecognizer(dataDir)if err != nil {log.Fatalf("Can't init face recognizer: %v", err)}// Free the resources when you're finished.defer rec.Close()

// Test image with 10 faces.testImagePristin := filepath.Join(dataDir, "pristin.jpg")// Recognize faces on that image.faces, err := rec.RecognizeFile(testImagePristin)if err != nil {log.Fatalf("Can't recognize: %v", err)}if len(faces) != 10 {log.Fatalf("Wrong number of faces")}

// Fill known samples. In the real world you would use a lot of// images for each person to get better classification results// but in our example we just get them from one big image.var samples []face.Descriptorvar cats []int32for i, f := range faces {samples = append(samples, f.Descriptor)// Each face is unique on that image so goes to its own// category.cats = append(cats, int32(i))}// Name the categories, i.e. people on the image.labels := []string{"Sungyeon", "Yehana", "Roa", "Eunwoo", "Xiyeon","Kyulkyung", "Nayoung", "Rena", "Kyla", "Yuha",}// Pass samples to the recognizer.rec.SetSamples(samples, cats)

// Now let's try to classify some not yet known image.testImageNayoung := filepath.Join(dataDir, "nayoung.jpg")nayoungFace, err := rec.RecognizeSingleFile(testImageNayoung)if err != nil {log.Fatalf("Can't recognize: %v", err)}if nayoungFace == nil {log.Fatalf("Not a single face on the image")}catID := rec.Classify(nayoungFace.Descriptor)if catID < 0 {log.Fatalf("Can't classify")}// Finally print the classified label. It should be "Nayoung".fmt.Println(labels[catID])}

To run it do:

mkdir -p ~/go && cd ~/go # Or cd to your $GOPATHmkdir -p src/go-face-example && cd src/go-face-examplegit clone https://github.com/Kagami/go-face-testdata testdataedit main.go # Paste example codego get .../../bin/go-face-example

It will take some time to compile go-face (~1 minute on my i7) because of extensive use of C++ templates in dlib’s code. Luckily Go caches build outputs so future builds will be much faster.

Example should print “Nayoung” indicating that unknown image was recognized correctly.

Models

go-face requires shape_predictor_5_face_landmarks.dat anddlib_face_recognition_resnet_model_v1.dat models for work. You may download them from dlib-models repository:

mkdir models && cd modelswget https://github.com/davisking/dlib-models/raw/master/shape_predictor_5_face_landmarks.dat.bz2bunzip2 shape_predictor_5_face_landmarks.dat.bz2wget https://github.com/davisking/dlib-models/raw/master/dlib_face_recognition_resnet_model_v1.dat.bz2bunzip2 dlib_face_recognition_resnet_model_v1.dat.bz2

They’re also available in go-face-testdata repository which you’ve cloned to run example.

Future ideas

I’m pretty satisfied with the result, library has simple API, decent recognition quality and can be easily embedded into Go application. But of course there is always room for improvements:

go-face currently don’t jitter face images when creating descriptor for simplicity and speed, but it’s definetely worth to add option for that as it might improve recognition quality.
dlib supports a lot of image formats (JPEG, PNG, GIF, BMP, DNG) but go-face currently implements only JPEG, would be good to support more.
As suggested by Davis, the author of dlib, multiclass SVM might give better classification result than search for minimal distance, so this needs additional testing.
In go-face I’m trying not to copy values unless really necessary, but haven’t actually tested performance for huge (10,000+) collection of face samples, there might be some bottlenecks.
Extracting feature vector from face is a powerful concept because you don’t need to collect your own train data which is quite ambitious task (Davis mentions dataset of 3 million faces used to create dlib’s ResNet model) but this may be inevitable to get higher quality of recognition, so it’s worth to provide tool for training your own model.