Data Set and Data Augmentation for Face Detection and Recognition

When it comes to building an Artificially Intelligent (AI) application, your approach must be data first, not application first.

Without considering the data first we will continue to incur high-interest technical debt.

Dependencies on data cost more than software dependencies, but are constantly overlooked.

"Data is a precious thing and will last longer than the system themselves", Tim Berners.Lee

To build a face detection and/or face recognition model it's important to know available data set and data augmentation approaches to be followed for training the model.

Typical Face Recognition Engine flow looks as below:

Note that there's difference between Face Identification and Face Recognition.

Face Verification/Authentication:

It is process of comparing face image with claimed identity, basically it is a "One-to-one matching".

Face Identification/Recognition:

It is process of comparing face image with all the registered persons, which is nothing but a "One-to-many matching".

Data Augmentation:

Data augmentation provides an effective alternative to compensate the insufficient facial training data.
Deep learning strongly relies on large and complex training sets to generalize well in unconstrained settings.
Collecting and labeling a large quantity of real samples is widely recognized as laborious, expensive and error-prone.
Existing datasets are still lack of variations comparing to the samples in the real world.

Data Augmentation for Face Detection Data Set:

Horizontal Flip: Flip or mirror a face image so that left side becomes the right side.
Random Cropping: Square patches (If face is on crop boundary keep the overlapped part of the face box if its centre is within the crop patch)
Photometric Distortion: Involving random changes to image properties such as color, contrast, and brightness.

Data Set for Face Recognition:

Let us say we aim at recognising 1M unique faces, it makes sense to split the training process into basic training and Advanced training and accordingly split the data set as well.

Data Set should meet following two criterion:

It should contain as many photos as possible (at least couple of million) with unconstrained pose, expression, lighting, and exposure.
It should be broad(many unique people/faces) and deep(many photos for same person)

Data Augmentation for Face Recognition:

Geometric Transformation: translation, rotation, reflection, flipping, zoomming, scaling, cropping, padding, perspective transformation, elastic distortion, lens distortion, mirroring, etc.
Photometric Transformation: color jittering, grayscaling, filtering, lighting perturbation, noise adding, vignetting, contrast adjustment, random erasing, etc.

Advanced Data Augmentation methods to improve accuracy on real time data:

Component Transformation: Hair style, make-up, accessory
Attribute Transformation: Pose, Expression and Age

Implementation examples for these are:

Model-based Transformation (2D-models and 3D-models)
Generative based Transformation (GANs, VAEs, PixelCNN, Glow, etc)