. Originally published @ devan-sabaratnam.squarespace.com Over the weekend, I was flicking through my Amazon console, and I noticed a new service on there called ‘ ’. I guess it was the mangled spelling that caught my attention, but I wondered what this service was? Amazon has a habit of adding new services to their platform with alarming regularity, and this one slipped past my radar somehow. AWS Rekognition So I dived in and checked it out, and it turns out that in late 2016, Amazon released their own on their platform. It not only does facial recognition, but general photo object identification too. It is still fairly new, so the details were sketchy, but I was immediately excited to try it out. Long story short, within an hour, I had knocked up a quick sample web page that could grab photos from my PC camera and perform basic facial recognition on it. Want to know how to do the same? Read on… image recognition engine I had dabbled in facial recognition technology before, using third party libraries, along with the , but the effort of putting together even a rudimentary prototype was fraught with complexity and a steep learning curve. But while browsing the Rekognition docs (thin as they are), I realised that the AWS API was actually quite simple to use, while seemingly quite powerful. I couldn’t wait, and decided to jump in feet first to knock up a quick prototype. Microsoft Face API THE OBJECTIVE I wanted a ‘quick and dirty’ single web page that would allow me to grab a photo using my iMac camera, and perform some basic recognition on the photo — basically, I wanted to identify the user sitting in front of the PC. The Amazon Rekognition service allows you to create one or more . A collection is simply a, well, collection of facial vectors for sample photos that you tell it to save. NOTE: The service doesn’t store the actual photos, but a JSON representation of measurements obtained from a reference photo. collections Once you have a collection on Amazon, you can then take a subject photo and have it compare the features of the subject to its reference collection, and return the closest match. Sounds simply doesn’t it? And it is. To be honest, coding the front end of this web page to get the camera data actually took longer than the back end to perform the recognition — by a factor of 3 to 1 !! So, in short, the web page lets you (1) create or delete a collection of facial data on Amazon, (2) upload face data via a captured photo to your collection, and (3) compare new photos to the existing collection to find a match. Oh, and as a tricky extra (4), I also added in the Amazon service to this demo so that after recognising a photo, the page will broadcast a verbal, customised greeting to the person named in the photo! Polly The simple app (Note: Not my real face! :D ) THE FRONT END My first question was what library to use to capture the image using my iMac camera. After a quick Google search, I found the amazing on GitHub by , which allows you to use a standard HTML5 canvas to perform the capture, or fallback to a Flash widget for older browsers. I quickly grabbed the library, and modified the example file for my needs. JPEG Camera library amw javascript THE BACK END For the back end, I knocked up a quick project, for a lightweight Ruby based framework that could do all the heavy lifting with AWS. I actually used Sinatra extensively (well, actually) to build all my web apps, and highly recommend the platform. Sinatra Padrino Note: Amazon Rekognition example actually promote uploading the source photos used in their API to an Amazon S3 bucket first, then processing them. I wanted to avoid this double step and send the image data directly to their API instead, which I managed to do. I also managed to do a similar thing with their Polly greeting. Instead of saving the audio to an MP3 file and playing that, I managed to encode the MP3 data directly into an <audio> tag on the page and play it from there! THE CODE I have placed all the code for this project on my . Feel free to grab it, fork it and improve it as you like. I will endeavour to explain the code in more detail here. GitHub page THE STEPS First things first, you will need an Amazon AWS account. I won’t go into the details of setting that up here, because there are many articles you can find on Google for doing so. CREATING AN AWS IAM USER But once you are set up on AWS, the first thing we need to do is to create an Amazon IAM (Identity & Access Management) user which has the permissions to use the Rekognition service. Oh, we will also set up permissions for Amazon’s service as well, because once I got started on these new services, I could not stop. Polly In the Amazon console, click on ‘ ’ in the top left corner, then choose ‘ ’ from the vast list of Amazon services. Then, on the left hand side menu, click on ‘ ’. This should show you a list of existing IAM users that you have created on the console, if you have done so in the past. Services IAM Users Click on the ‘ ’ blue button on the top of this list to add a new IAM user. Add User Give the user a recognisable name (more for your own reference), and make sure you tick ‘ ’ as you will be using this IAM in an API call. Programmatic Access Next is the permissions settings. Make sure you click the THIRD box on the screen, that says ‘ ’. Then, on the ‘ ’ search box below that, type in ‘ ’ (note the Amazonian spelling) to filter only the Rekognition policies. Choose ‘ ’ from the list by placing a check mark next to it. Attach existing policies directly Filter: Policy Type rekognition AmazonRekognitionFullAccess Next, change the search filter to ‘ ’, and place a check mark next to ‘ ’. polly AmazonPollyFullAccess Nearly there. We now have full permission for this IAM for Amazon Rekognition and Amazon Polly. Click on ‘ ’ on the bottom right. Next: Review On the review page, you should see 2 Managed Policies giving you full access to Rekognition and Polly. If you don’t, go back and re-select the policies again as per the previous step. If you do, then click ‘ ’ on the bottom right. Create User Now this page is IMPORTANT. Make a note of the AWS Key and Secret that you are given on this page, as we will need to incorporate it into our application below. This is the ONLY time that you will be shown the key/secret for this user, so please copy and paste the info somewhere safe, and download the CSV file from this page with the information in it and keep it safe as well. DOWNLOAD THE CODE Next step, is to download the sample code from my so you can modify it as necessary. Go to this link and either download the code as ZIP file, or perform a ‘git clone’ to clone it to your working folder. GitHub page First thing you need to do is to create a file called in your working folder, and enter these two lines, substituting your Amazon IAM Key and Secret in there (Note: These are NOT real key details below): .env export AWS_KEY=A1B2C3D4E5J6K7L10export AWS_SECRET=T/9rt344Ur+ln89we3552H5uKp901 Optional: You can also just run these two lines on your command shell (Linux and OSX) to set them as environment variable that the app can use. Windows user can run them too, just replace the ‘export’ prefix with ‘set’. Now, if you have Ruby installed on your system (Note: No need for full Ruby on Rails, just the basic Ruby language is all you need), then you can run bundle install to install all the pre-requisites (Sinatra etc.), then you can type ruby faceapp.rb to actually run the app. This should start up a web browser on port 4567, so you can fire up your browser and go to http://localhost:4567 to see the web page and begin testing. Using the App The web page itself is fairly simple. You should see a live streaming image on the top center, which is the feed from your on board camera. The first thing you will need to do is to create a collection by clicking the link at the very bottom left of the page. This will create an empty collection on Amazon’s servers to hold your image data. Note that the default name for this collection is , but you can change that on the ruby code (line 17). faceapp_test faceapp.rb Then, to begin adding faces to your collection, ask several people to sit down in front of your PC or table/phone, and make sure their face is in the photo frame ONLY (Multiple faces will make the scan fail). Once ready, enter their name in the text input box and click the ‘ ’ button. You should see a message that their facial data has been added to the database. Add to collection Once you have built up several faces in your database, then you can get random people to sit down in front of the camera and click on ‘ ’. Hopefully for people who have been already added to the collection, you should get back their name on screen, as well as a verbal greeting personalised to their name. Compare image Please note that the usual way for Amazon Rekognition to work is to upload the JPEG/PNG photo to an Amazon S3 Bucket, then run the processing from there, but I wanted to bypass that double step and actually send the photo data directly to Rekognition as a Base64 encoded byte stream. Fortunately, the aws-sdk for Ruby allows you to do both methods. Lets walk through the code now. First of all, lets take a look at the raw HTML page itself. This is a really simple page that should be self explanatory to anyone familiar with HTML creation. Just a series of names divs, as well as buttons and links. Note that we are using jQuery, and also for the custom greeting. Of note is the code, which does all the tricky stuff, and the links to the JPEG camera library. Moment.js faceapp.js You may also notice the tags at the bottom of the file, and you may ask what this is all about — well, this is going to be the placeholder for the audio greeting we send to the user (see below). <audio> Let’s break down the main app js file. This sets up the JPEG Camera library to show the camera feed on screen, and process the upload of the images. The function is straightforward, in that it takes the captured image from the camera, then does a post to the endpoint along with the user’s name as the parameter. The function will check that you have actually entered a name or it will not continue, as you need a short name as a unique identifier for this facial data. add_to_collection() /upload The upload function simply checks that the call to finished cleanly, and either displays a success message or the error if it doesn’t. /upload The function is what gets called when you click the, well, ‘ ’ button. It simply grabs a frame from the camera, and POSTs the photo data to the endpoint. This endpoint will return either an error, or else a JSON structure containing the id (name) of the found face, as well as the percentage confidence. compare_image() Compare image /compare If there is a successful face match, the function will then go ahead and send the name of the found face to the endpoint. This endpoint calls the Amazon Polly service to convert the custom greeting to an MP3 file that can be played back to the user. /speech The Amazon Polly service returns the greeting as a binary MP3 stream, and so we take this IO stream and BaseEncode64 it, and place it as an encoded source link in the placeholder tags on our web page, which we can then do a . on the element in order to play the MP3 through the user’s speakers using the HTML5 Web Audio API. <audio> play() This is also the first time I have placed encoded data in the audio src attribute, rather than a link to a physical MP3 file, and I am glad to report that it worked a treat! Lastly on the app js file is the function. All this does is work out whether to say ‘good morning/afternoon/evening’ depending on the user’s time of day. A lot of code for something so simple, but I wanted the custom greeting they hear to be tailored to their time of day. Thanks to for his that I stole! greetingTime() James1x0 code snippet Lastly, lets look at the Ruby code for the Sinatra app. Pretty straightforward Sinatra stuff here. The top is just the requires that we need for the various AWS SDK and other libraries. Then there is a block setting up the AWS authentication configuration, and the default collection name that we will be using (which you can feel free to change). Then, the rest of the code is simply the endpoints that Sinatra will listen out for. It listens for a GET on ‘/’ in order to display the actual web page to the end user, and it also listens out for POST calls to /upload, /compare and /speech which the javascript file above posts data to. Only about 3 or 4 lines of code for each of these endpoints to actually carry out the facial recognition and speech tasks, all documented in the . AWS SDK documentation That’s about all that I can think of to share at this point. Please have fun with the project, and let me know what you end up building with it. Personally, I am using this project as a starting block for some amazing new features that I would love to have in our main web app . HR Partner Thoughts For Improvements Amazon Rekognition and Polly are still fairly new, so example apps and documentation is a little basic at the moment, however this is probably as expected, because the API is pretty easy to use, compared to most of their other services. One thing I wish they would improve upon is the console support for these services — especially the ability to check current image collections for Rekognition. At the moment it is impossible to see how many collections you have, and how many image descriptors (or image IDs) are in each collection without resorting to pure API calls. Hopefully this will be improved in the future. Oh, and I also wish there was an easier way to create the image JSON profile locally, without having to send the encoded image BLOB all the way to the Amazon servers for interpretation. I hope that a future version of the API may have a localised feature for doing this, because at the moment, the roundtrip to the server after clicking ‘Compare Image’ can take up to 4 or 5 seconds to complete the identification. Good Luck, and enjoy your facial recognition/speech synthesis journey.

Amazon

Google

Building a face recognition web app in under an hour

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Building a SaaS pricing page that populates local currency

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

Building a SaaS pricing page that populates local currency

101 Stories To Learn About Cloud Infrastructure

10 Things in Engineering We Don't Spend Enough Time On

10 Things I Did To Increase CloudTrail Logs Security

10 reasons to give cloud computing a go

10 Lessons from 10 Years of AWS (part 1)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps