Papers, Please! - Know Your Customer With AI

In Papers, Please, a popular video game, you play the role of a customs officer checking documents according to increasingly complex rules. The main mechanic of the game is checking documents for compliance with all standards, such as the correct date and place of issue, compliance with the person’s first and last name on all documents, visa validity, whether the person is on “black” lists, and more.

The game has attracted thousands of users around the world with its original style and unusual game mechanics, but for some people, this game would seem like a real nightmare because in real life, in their real workplace, they do the same thing.

What Is KYC and Who Needs It

KYC stands for Know Your Customer — a set of practices targeted towards obtaining as much information about a new customer as possible to weed out unreliable customers and reduce fraud.

A comprehensive background check of a client, including their documents, bank history, and even activity on social networks, is included in the standard set of KYC practices aimed at reducing risks, which can be different for each business.

Thus, a bank employee is forced to check not only the documents provided by the client but also interact with authorities in order to find out how respectable the person is and how likely they are to pay a loan back, and analyze their behavioral characteristics, for example, their behavior on social media.

Online service managers, like online marketplaces and booking sites, on the other hand, do not need to analyze the user’s credit history, but it is important for them to know that their client does not use various loopholes, for example, does not register several times in a row using a temporary email to take advantage of a promo code, or does not register another account in order to leave negative feedback in order to lower the rating of the service.

For some businesses, for example, car sharing and any Uber-like services, it is important to know who exactly uses their services (or provides them). For example, Uber needs to know who was driving the car when it got into an accident.

Crypto exchanges, despite rejecting almost any regulations aimed at them, are forced to turn to KYC practices to identify clients due to local and global legal pressure. For example, to register on Binance, you not only need to enter your name and date of birth but also submit a photo of your ID, upload a selfie, and go through face verification using a webcam.

Online dating sites and apps also strive to implement as many KYC practices as possible into the account registration process in order to avoid catfishing because this phenomenon can greatly reduce the number of users and undermine the reputation of the service.

How It Usually Works

KYC measures, in fact, are often very simple, both for the client and for the one who implements them. For example, verifying an email address via a message with a code or a phone number via a text message are the simplest (and most unreliable) examples of KYC.

These measures are easily abused: everyone knows about temporary email services that are used to create multiple accounts in one service, which can be very useful for endlessly extending the free period of using the service, for example.

These measures are not enough for businesses that value their user base. Let's talk about several options for making user verification more complex at the registration stage.

Step 1: Are you human?

The simplest step is the implementation of Captcha, which at least partially solves the problem with the mass registration of bots. This is very simple, cheap, and quite effective, with the reduction of bots noticeable almost instantly. However, Captcha alone cannot solve the problem of multi-accounts, a situation where one person registers in the system several times with malicious intent.

Step 2: Determine IP

Checking your clients' IPs is the next step in weeding out unreliable users. If different users regularly log into your service from the same IP, this may raise doubts about their integrity and purpose for using the service. In addition, detecting IPs of popular VPN services can also help reduce bots and unreliable customers.

Step 3: Email and phone number verification

Here, we are no longer talking about checking the user’s access to the specified phone number and email address but a more in-depth data verification. The validity of the email domain, the email registration date, and the connection with accounts on social networks are often checked.

A phone number can also provide quite a lot of useful data, such as cellular operator, country, and connection to social networks. This way, you can identify those who created an email only for registration or use a burner phone.

Step 4: Papers, please!

This stage of improving verification significantly increases the number of steps during registration for the user, but it also increases the reliability of the service and significantly reduces the number of unscrupulous users within it. The person is asked to provide an identification document and enter their details when registering.

Checking the validity of IDs is generally a simple task, as data on valid and invalid documents is available online.

Step 5: Show your face

There is still room for loopholes in the previous step. Nothing prevents a user who has been banned from a certain service but who ardently wants to continue using it from borrowing a passport from an unsuspecting grandmother and entering her information when registering a second account. This situation is extremely unpleasant for services seeking to maintain a healthy user base, and therefore, there needs to be an additional step: facial recognition.

The most convenient way to implement facial recognition is in a mobile application. When registering, the user enters their ID information, takes a photo of the ID, and then the app asks them to make faces at the camera, distracting them from the facial recognition process, during which the following is performed:

Comparison of the person’s face with their photo from the ID to determine whether the document belongs to them,
Comparison with the database of faces of already registered users so that a person does not register several times using different documents, like a driver’s license instead of an ID,
Check for liveness or whether a real person is in front of a camera and not their photo or a prerecorded video.

Smart KYC: Real-life application and development

Having had an opportunity to work on multiple AI-powered KYC systems for both web and mobile applications, I have a good understanding of how theory applies to practice.

All of the steps outlined above are used in real-life applications, from simple email verification to facial recognition. While there are many ways to implement these practices in applications, I’ve compiled techniques and technologies behind smart KYC in modern systems.

ID Recognition

There are many different tools for document detection, all of which are geared toward different document types and different use cases. Most of them are paid, too, which is not ideal for a KYC system since the cost for each onboarding client would quickly pile up.

One of the tools best suited for this application is Google Vision API, which detects text in any language and creates a bounding box around each word. Google Vision allows up to 1000 documents to be processed monthly for free, which should cover a decent chunk of new users for many services.

To begin with, using the user-provided passport photo, we use OpenCV to determine the outline of the passport. This selection may not always go smoothly, for example, if the photo was taken against a colorful background or there are other objects nearby. In this case, we find the largest contour and believe that this is the passport.

The resulting image is passed on to Google Vision, which identifies and highlights each word separately. After Google Vision detects each word, you can search for keywords like “Name”, “Surname”, “Date of Birth”, “Place Of Birth”, etc.

Each passport type, which equals to country of origin, has its own config file with the following information:

Keywords, or field titles,
Spaces and line spacing

Concerning spaces

Do you think that countries differ from each other in culture, history, and their people? No, they differ in the size of spaces and line spacing in the passports of their citizens. For example, Romanian passports have short spaces, while Swedish passports have long spaces.

The size of spaces is important to find the necessary information: after a keyword, in this case, field name, is found, the system looks for its value or field content, which can be either to the right of the keyword or below.

Knowing the size of spaces and line spacing of a certain type of passport, we know how much to move to the right or down to find the value of a keyword. The size of spaces and intervals is not absolute but relative (depending on the size of the image).

Facial recognition

Another important aspect of smart KYC is facial recognition. With today’s abundance of excellent facial detection algorithms, there’s no need to reinvent the wheel and create your own bespoke solution, it’s much more cost-efficient to use a pretrained model for this job. For example, FaceNet is a good solution for the task of comparing two face photos and determining if it’s the same person or not.

A simple script can extract a photo from an ID, while FaceNet (or any facial recognition model of your choice) analyzes the user's face using a smartphone camera and compares the two images to determine if the ID actually belongs to the user.

Liveness detection

Another important task when it comes to KYC is liveness detection, which is a set of techniques aimed at determining whether a source is a live human or a fake presentation.

This is a vital step to eliminating multi-account fraud and preventing identity fraud. Liveness checks usually utilize motion detection and face detection algorithms to confirm the user’s identity.

The easiest — and most effective — way to implement liveness detection is by prompting a user to perform a series of simple movements, like turning the head left or right, winking, smiling, etc.

Again, there are several algorithms and models for movement detection on the market; it's a matter of choosing the right one for the task at hand. I recommend Google ML Kit as it's very easy to use and implement into an app.

Summing Up

Modern AI-based services and technologies make it possible to quickly and inexpensively implement a system that will eliminate account and identity fraud and improve KYC efforts. Well-implemented KYC measures are easy for the user to follow and complete and are effective enough to weed out users with bad intentions.