Recent developments in artificial intelligence-based image synthesis has endowed machines with the ability to generate photos and videos of the real world and with accuracy that would have appeared impossible only a few years ago. While the incredible applications for this technology are only starting to be explored, recent press coverage of fake videos shows us that malicious use of this type of technology is around the corner.
Although creating a fake video of sufficient quality to fool most people most of the time is both expensive and time consuming, year-on-year developments in capabilities are seeing this technology commoditized at a breakneck speed. Current efforts seem laughable at times but the mere existence of deep learning-based fakery (or “deepfakes”) has already started casting doubt in our own senses, in criminal evidence, and in the institutions we trust.
In parallel with the development of deepfake technology, AI is also being developed to counter this threat: machines trained to detect malicious alterations in video for the inevitable future where we find ourselves unable to detect the forgeries ourselves.
An arms race between two fields of study of AI, which may very well go on forever.
But can we tackle this challenge through a different lens?
We’ve been dealing with faked electronic data for as long as we’ve had computer networks.
These are all forms of electronic identification and authentication which we come across in our everyday lives. Infrastructure around us — healthcare, finance, justice systems and many others — also rely on electronic identification and authentication trust. Digital certificates and electronic signatures are constantly being exchanged in the background of our lives, all but invisible to us.
While most of the conversation to counter the looming threat of fake video to society centers around detecting malicious alterations in video by forensically analyzing the characteristics of the video — deploying good AI to fight bad AI — we can instead protect ourselves from fake video by making a few changes to the way we record and consume video content.
Cryptographic signing of video from source provides us with evidence that that video came from the device that recorded it, untouched.
Software to analyze and detect fake video is easy to integrate as this type of software sits at distribution between when a video is uploaded and when it is played. Comparatively, authenticating video is harder to coordinate as there are numerous stakeholders, from camera makers through to distributors, which have to agree on and adopt an authentication standard. Bringing organizations together to collaborate is how we have standards for interoperability in numerous areas in society. Just because it is hard, doesn’t mean we shouldn’t try — especially if the risks justify the effort.
There are real lives at stake.
Some argue that we’ve had “fake” text and “Photoshopped” images for awhile. That video is no different, fundamentally: humans will adapt to fake video the same way they have for altered images. Even with a skeptical view that society eventually reorients itself: the consequences in the short term are real as we saw with election meddling through fake news in 2016.
While it seems like video is just another step in an inevitable sequence (text →images →video), the human relationship to video is much more powerful and unique than to the other forms (we will expound more on this in a future blog). To analyze the scale of the risk and impact of fake video, we need to consider the trifecta of three trends: a video-first world, social networks providing targeted distribution at scale, and sophisticated deep learning AI creating realistic content at scale.
We have not yet seen a scaled, coordinated deployment of AI to create and release a mass proliferation of fakes but the foundation for it has been sowed.
In a world where we can’t trust our eyes and ears, how will we interpret events and situations?
Deepfakes that seek to deceive and spread disinformation is a problem that needs to be combated, so what are our options?
The most commonly discussed approach is that we should counter deepfakes with software that forensically analyses video. Software like this may examine the characteristics of the audio and video data itself, looking for artifacts, abnormal compression signatures, or camera or microphone noise patterns. Aside from the characteristics of the data, AI may also analyze the video metadata, or even perform behavior pattern analysis on the subjects of the video.
So-called good AI used to police the bad AI.
The challenge with this approach is the bad AI will have a feedback loop in the form of knowledge the video it generated has been flagged as fake or penalized in search results.
And it will continuously learn from the knowledge.
A deepfake-generating system can keep producing content, learning from the certainty it has obtained of what is and isn’t detectable, while the good AI will always be one step behind, slowly learning but never basing its newfound knowledge on the certainty of what content was, or wasn’t, fake.
Good AI is like Norton Antivirus software: it will catch most of the viruses but not all.
There will be false positives and there will be false negatives. Ultimately, the bad AI only needs to get one video through as a false negative to deceive its intended target.
Countering fake video doesn’t have to be an arms-race.
Instead of focusing on a remedy, let’s look at the problem from the prevention side: authentication.
YouTube: https://www.youtube.com/watch?time_continue=39&v=MVBe6_o4cMI
Video authentication is where video data is processed through a hashing algorithm which maps a collection of video data (for instance a file) to a small string of text, or “fingerprint”. A video file fingerprint can carry-on with it throughout the life of that video, from capture through to distribution. At playback, those fingerprints are reconfirmed, proving the authenticity of the video data — confirming it is the same video that was originally recorded.
This fingerprint could also be digitally signed by the recording device, providing evidence of where the content originally came from with the device details and other metadata. Whether that is a CCTV camera, a first responder’s body camera, a journalist’s registered equipment, or the mobile app of a concerned citizen.
When a viewer watches a recording, they could inspect an authenticity certificate, reviewing the chain of custody for that file. During playback, there should even be a visual representation of where in a video fingerprints match the original recorded content (it is authentic) and don’t match (authenticity cannot be guaranteed). This is similar to how a browser’s lock icon simply and clearly communicates authenticity of the site you are visiting.
Amber’s UI uses a border around the video to communicate where it has been altered
This form of video authentication would entail:
If the fingerprints do not match, the video has been altered.
If the fingerprints match, the video is authentic and unaltered.
The results are binary: match or non-match. As this approach would rely on a hash function, it is very, very difficult — likely impossible — to maliciously alter a video so that it produces the same fingerprint and while still being playable.
The net result would be analogous to Transport Layer Security technology used in browsers, creating a “truth layer” for media files.
It is critical that these fingerprints are stored in an immutable yet transparent database so multiple stakeholders can have confidence in the veracity of the video. If not, a bad actor could alter both the video and the original fingerprints to make an altered video seem authentic. Or they may alter just the fingerprints themselves, to sow doubt as to the legitimacy of a genuine video.
For authenticated video to become commonplace in relevant categories of video such as in news or video evidence — that is, most video does not need to be authenticated and signed — the key challenge is in bringing onboard a number of key participants. These groups include camera manufacturers (including the smartphone makers) and distributors like traditional TV and radio broadcasters and new media platforms like Twitter, Facebook and YouTube.
In a world where we should be skeptical of our own eyes and ears (and what they are interpreting), authentication is a system design where truth is baked-in at a foundational layer.
Of course, video authentication, or fake video detection, won’t stop:
The choices in a system design could, though, prevent the spread of an altered video or allow a viewer to review the edited footage they’re watching against the source footage and remove some of the fuel in the flames of those who seek to discredit or to misinform.
If the Rodney King Beatings happened today, would people doubt the legitimacy of the video as a product of a deepfake? The existence of deepfakes will cast a shadow over authentic videos.
Authentication technology is not required for the latest blockbuster or the latest image filter on our social media posts. But fact-based content, especially where the stakes are high, should be recorded with trust taken into account from the outset.
At Amber, we have created a video authentication system or “truth layer” for video where trust is built into the foundational layer. This system includes a video recording app, storing fingerprints and an audit trail as an immutable blockchain record, and a site to playback reconfirmed videos.
Our video authentication system mirrors the Apple App Store in a number of important respects. An iPhone (unless jail broken) can only download and run apps from the official App Store. Those apps on that App Store have been assessed for compliance to security policies. The app itself was submitted by a verified developer, vetted at a stringency level based on if the submitter is an individual developer or a company.
Apple has created a security ring around apps in its ecosystem and thus instilled confidence in its customers of the safety and efficacy of the applications they choose to download from the App Store.
If you download an app on your iPhone, you can be almost sure that will function as it claimed to in its App Store description. The app (and its developer) were authenticated in the Apple ecosystem. Your phone is part of the ecosystem. And there is a lot more trust within the ecosystem.
As a transaction on an e-commerce site is passed through fraud detection software even after the authentication checks have passed, Amber takes a combined approach of not only generating fingerprints of content, but also deploying the latest good AI developments to detect fakes. We perform various analyses on videos uploaded to our platform, to provide a measure of not only the likelihood that a video was altered, but where in the image this alteration was likely to have occurred.
Video authentication is an important approach but it may not be foolproof alone such as if you use a camera (with video fingerprinting tech) to film a screen playing a manipulated video. Using software to detect fakes and manipulations are an important complement to authentication.
At Amber, we strongly believe that one of the greatest impacts on humanity has been the adoption of the scientific method and its premise of evidence-based conclusions. We are on a mission to protect truth of aural and visual evidence.
Deepfakes, and malicious AI in general, can cause great harm in the near future. We need to preempt the challenge today and video authentication is a critical piece of the solution.
Want to participate?
2. Contact us to find out if our verification products could be right for you: https://ambervideo.co
3. Leave us your thoughts below or send us a message: https://twitter.com/ambervid
Amber is developing the “truth layer” for video. Amber Authenticate fingerprints source videos & tracks its provenance. Amber Detect analyzes synthetic videos of unknown source using advanced AI and deepfakes counter systems.
Contact us if you would like to find out more about what we do and how you can seamlessly integrate Authenticate and Detect products into your workflow for frictionless video veracity, at scale.
Thank you to Roderick Hodgson, Shaan Puri, and Sikander Mohammed Khan for all your input and feedback on this post.