Last week, Evan Spiegel of Snap Inc. unveiled his first hardware product Spectacles to a few journalists. Wall Street Journal author Seth Stevenson recalls how Spiegel invited him into “a small conference room” where he “draped a towel over a mysterious object sitting on a table” calling Spiegel “eager to the point of jitters”.
Perhaps “eager” isn’t the right adjective.
So a white-hot, consumer-focused company with a $20-Billion valuation reveals its first-ever spiffy gadget via a cramped conference room… with a towel.
What a complete and utter lack of fanfare.
What changed in Silicon Valley?
Were all the stages and conference halls booked in Palo Alto that day?
This is the flagship hardware launch of one of the hottest entities in the valley. Surely they want a buzz around their new device.
Yet… no spotlights or smoke machines? No music? No crowd? Not even a black turtleneck?
None in sight.
Additionally, the product which Spiegel refers to as a “toy” doesn’t seem meaningfully distinguishable from Epiphany Eyewear — video-recording glasses developed by Vergence Labs whom Snapchat acquired in 2014.
Tech journalists know that Snap Inc. has been working on a secret project for months, possibly years, all-the-while hiring up all the best electronics and robotics talent in the industry. This massive effort finally culminated to build… a toy?
Could the recent press release about Spectacles be an elaborate distraction to take attention off of a more unsavory product?
Snapchat may have indicated in a July patent published at the United States Patent and Trademark Office that they have developed facial-recognition device which displays personal information within seconds of a facial scan. The patent details a means of “executing a facial recognition technique against an individual face within the image to obtain a recognized face”.
This patent comes after their recent acquisition of Vergence Labs, known for developing Epiphany Eyewear — a product similar to Google Glass, as well as a string of high-profile hires in the consumer electronics industry. These newly-hired hardware specialists reportedly joined a secret research and development lab according to a March article by CNET. Their previous work ranges from wireless-video doorbells, security cameras, robotic Star Wars toys, Google Glass, GoPro, and the Oculus VR headset according to a recent Financial Times article.
Also, they reported Snap Inc. was “looking at pretty much every AR startup with computer vision skills” as a target for a possible acquisition.
Up until last Saturday, Snapchat still had not publicly announced any plans to develop hardware. It wasn’t teased at all until rumors started circulating. The conclusion was pretty much a slam dunk when Financial Times journalists discovered Snap Inc.’s move to pay and join the Bluetooth consortium which they called a “clear signal of intent” to develop hardware.
So, if press about their secret operation was the catalyst for them to pass off Spectacles’ 2-year-old product as something new, what is it that they’re really working on?
In order to use the silly-cartoon-face-making “Lenses” feature on Snapchat, the interface instructs the user to tap on their face which initiates a facial scan. This captures the user’s face for a seemingly-temporary period so they can apply silly dog ears and rainbow barf to their heart’s content and send it to their friends.
Accessing “Lenses” Feature Initiates Facial Scan
Adam Geitgey’s Medium article Modern Face Recognition with Deep Learning explains how accurate facial recognition relies on a system’s ability to “pick out unique features of the face that you can use to tell it apart from other people — like how big the eyes are, how long the face is, etc”. And the system must also be able to “compare the unique features of that face to all the people you already know to determine the person’s name.”
One method of face recognition is to program a system to compare measurements of obvious facial landmarks like the outside edges of eyes or top-of-the-chin to mouth etc. but the most accurate way for a system to reliably recognize a face is to let it decide which measurements matter most by feeding it millions of faces.
Determining these mysterious measurements is resource-intensive but highly accurate. Luckily, services like OpenFace have processed the millions of face images necessary to discover the 128 unique measurements that make for an accurate result. Using a service like this, any 10 different pictures of the same person should give roughly the same measurements.
In Machine Learning, capturing these vital 128 facial measurements is called “embedding”. These measurements are unique to almost every human being.
To capture a person’s facial signature, an algorithm must first encode their facial features using a method called HOG (Histogram of Oriented Gradients) which outputs a simplified image that is basically a flattened-and-centered set of the subject’s primary facial features. That output is then passed through a neural network that knows which 128 measurements to make and saves them.
With our face captured, all the system has to do to identify someone is compare the measurements to those of all the facial measurements captured for other people and figure out which person’s measurements are the closest to find a match.
Facial Recognition Training
In the Alpha version of HiringSolved’s AR tech, we trained our machines to identify me, Christopher Murray, by telling it my name and subsequently capturing 50 frames of my face. These images were encoded using the HOG method and the output was passed through a neural network which captured the 128 measurements which uniquely identify me.
By capturing these measurements, we succeeded at “embedding” my facial signature so it could be retrieved later if another image’s measurements were found to match.
Since we intend for our tech to eventually be used in real-time so recruiters can easily identify talent at a conference, we built an app that encodes live faces from a camera feed and passes them to a neural network. When the system finds measurements that match — it not only retrieves the person’s name, but also their work history, job title, certifications, and location which it overlays in real-time within seconds of scanning a face!
Display in Real-Time — Taken Using Droid Turbo 2
Spectacles may not be the final form of Snap Inc.’s covert electronics project, but it does reveal some hardware advantages over a smartphone camera. First off, by making their own camera hardware they are no longer at the whim of Samsung or Apple. The Spectacles special lens can capture more of a radius as well.
Not the least of the advantages is the ability to overcome frame-rate issues present when conducting a facial scan from smartphone hardware. Our engineers used a workaround for this obstacle but specialized hardware could easily make for a smoother, more instantaneous facial scan.
On the privacy center page of Snapchat’s website, they claim that the Lenses feature uses something called “object recognition” which “isn’t the same as facial recognition”. Despite their claim, their technology that they say “lets us know that a nose is a nose or an eye is an eye” bears a stark resemblance to the HOG algorithm we used to encode facial features for use with the neural net.
Snapchat’s Official Privacy Statement on “Lenses” Facial Data
While the general vibe from Snapchat is that they’re not interested in collecting facial recognition data, they have a history of going against their general vibe when it suits them. The promise of deleting photos after a short life was short-lived when features like “memories” made their data retention obvious.
While we were able to build software that recognizes faces, the only way to have it be broadly-usable would be to collect a facial signature from everybody using something like the “trainer” software which captured the 50 frames of my face above.
How could we possibly convince a population to undergo such a scan?
If only there was a way to convince people. We could make it fun. And social. Let people invite their friends. Spread it. Let them put silly cartoon items on their face. Then promise we’re deleting it all. We’d have some of the most valuable personal information in the world if this catches on.
“On any given day, the app reaches 41 percent of all 18-to-34-year-olds in the United States” — WSJ
Snapchat culls users in with a perceived sense of privacy. They insinuate everything you share will be stricken from the record. And for users it appears that way, but of course that’s not the case.
The draw of Snapchat was photos and video that would self-destruct. No need to explain your brief stint as a Rastafari to potential employers. No evidence of that time you let your freak-flag fly if you decide to run for mayor. Your wild days were hidden, as was your right.
Snap Inc. discovered that people are very willing to share intimate things — as long as it’s impermanent.
With the unenthusiastic reception of Spectacles in the Wall Street Journal article’s comment section and on social media, Snap Inc.’s game-plan for revenue is tough to pin down.
Even the WSJ article admits Snap Inc.’s “hunt for revenue was, and remains, uncertain. A huge number of daily users does not guarantee success — just ask Twitter.“
Perhaps they’ve got something else up their sleeve. Just a clickbait-y thought.
Of course all of this is just conjecture. Perhaps Spectacles have a major application that I am all-too-ignorant of. Perhaps the advertising end-game for Snapchat is more sophisticated than I’m giving them credit.
But, the major issue with the notion that they’re not building hardware to exploit their facial recognition capabilities is how stupid it would be not to.
Yes — they’re not saying that they intend to do this. Similarly, Facebook never mentioned that they would be selling user information to advertisers.
Just as Facebook harnessed a database of personal details given to them willingly by users, Snapchat has accrued a database of facial profiles attached to names. They may now have an index of user’s’ facial data using a supervised machine learning technique
Even if they’re not mapping faces already, they easily could. It’s just a matter of running an algorithm on images they already have and processing the encoded data through a neural net.