How to Spot a DeepFake in 2021

Written by whatsai | Published 2021/06/06
Tech Story Tags: deepfakes | deepfake | artificial-intelligence | ai | hackernoon-top-story | youtubers | youtube-transcripts | computer-vision | web-monetization

TLDR

In this video, learn about a breakthrough US Army technology that uses artificial intelligence to find deepfakes. The first realistic deepfake didn't appear until 2017. We cannot see the difference between a real video or picture and a deepfake any more. There is a website from MIT where you can test your ability to spot fake fakes from MIT's 'DeepFakeHop' - a light-weight high-performance deepfake detector. The model is 500 HUNDRED times smaller while outperforming the previous state-of-the-art techniques.via the TL;DR App

Wondering about the best ways to spot a deepfake? In this video, learn about a breakthrough US Army technology that uses artificial intelligence to find deepfakes.

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/spot-deepfakes

►Test your deepfake detection capacity: https://detectfakes.media.mit.edu/

►DeepFakeHop: Chen, Hong-Shuo et al., (2021), “DefakeHop: A Light-Weight High-Performance Deepfake Detector.” ArXiv abs/2103.06929

►Saab Transforms: Kuo, C.-C. Jay et al., (2019), “Interpretable Convolutional Neural Networks via Feedforward Design.” J. Vis. Commun. Image Represent.►OpenFace 2.0: T. Baltrusaitis, A. Zadeh, Y. C. Lim and L. Morency, "OpenFace 2.0: Facial Behavior Analysis Toolkit," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018, pp. 59-66, doi: 10.1109/FG.2018.00019.

Video Transcript

00:00

While they seem like they've always been there, the very first realistic deepfake didn't appear

00:05

until 2017.

00:07

It went from these first-ever resembling fake images automatically generated to today's

00:13

identical copy of someone on videos, with sound.

00:16

The reality is that we cannot see the difference between a real video or picture and a deepfake

00:22

anymore.

00:23

How can we tell what's real from what isn't?

00:25

How can audio files or video files be used in court as proof if an AI can entirely generate

00:32

them?

00:33

Well, this new paper may provide answers to these questions.And the answer here may again

00:37

be the use of artificial intelligence.

00:40

The saying "I'll believe it when I'll see it" may soon change for "I'll believe it when

00:45

the AI tells me to believe it..."

00:47

I will assume that you've all seen deepfakes and know a little about them.

00:51

Which will be enough for this article.

00:53

For more information about how they are generated, I invite you to watch the video I made explaining

00:58

deepfakes just below, as this video will focus on how to spot them.

01:04

More precisely, I will cover a new paper by the USA DEVCOM Army Research Laboratory entitled

01:09

"DEFAKEHOP: A LIGHT-WEIGHT HIGH-PERFORMANCE DEEPFAKE DETECTOR."

01:14

Indeed, they can detect deepfakes with over 90% accuracy in all datasets and even reach

01:20

100% accuracy in some benchmark datasets.

01:24

What is even more incredible is the size of their detection model.

01:27

As you can see, this DeFakeHop model has merely 40 thousand parameters, whereas the other

01:33

techniques yielding much worse accuracy had around 20 million!

01:37

This means that their model is 500 HUNDRED times smaller while outperforming the previous

01:43

state-of-the-art techniques.

01:44

This allows the model to quickly run on your mobile phone and allows you to detect deep

01:49

fakes anywhere.

01:50

You may think that you can tell the difference between a real picture or a fake one, but

01:54

if you remember the study I shared a couple of weeks ago, it clearly showed that around

01:59

50 percent of participants failed.

02:01

It was basically a random guess on whether a picture was fake or not.

02:05

There is a website from MIT where you can test your ability to spot deefakes if you'd

02:10

like to.

02:11

Having tried it myself, I can say it's pretty fun to do.

02:14

There are audio files, videos, pictures, etc.

02:17

The link is in the description below.

02:19

If you try it, please let me know how well you do!

02:21

And if you know any other fun apps to test yourself or help research by trying our best

02:26

to spot deepfakes, please link them in the comments.

02:29

I'd love to try them out!

02:31

Now, if we come back to the paper able to detect them much better than we can, the question

02:36

is: how is this tiny machine learning model able to achieve that while humans can't?

02:42

DeepFakeHop works in four steps.

02:44

Step 1:

02:45

At first, they use another model to extract 68 different facial landmarks from each video

02:50

frame.

02:51

These 68 points are extracted to understand where the face is, recenter, orient and resize

02:57

it to make them more consistent, and then extract specific parts of the face from the

03:02

image.

03:03

These are the "patches" of the image we will send our network, containing specific individual

03:09

face features like the eyes, mouth, nose.

03:12

It is done using another model called OpenFace 2.0.

03:16

It can accurately perform facial landmark detection, head pose estimation, facial action

03:22

unit recognition, and eye-gaze estimation in real-time.

03:26

These are all tiny patches of 32 by 32 that will all be sent into the actual network one

03:32

by one.

03:33

This makes the model super efficient because it deals with only a handful of tiny images

03:38

instead of the full image.

03:40

More details about OpenFace2.0 can be found in the references below if you are curious

03:45

about it.

03:46

Step 2 to 4 (left to right, blue, green, orange):

03:47

More precisely, the patches are sent to the first PixelhHop++ unit named Hop-1, as you

03:53

can see.

03:54

Representing the step one in blue.

03:55

This is an algorithm called Saab transform to reduce the dimension.

03:58

It will take the 32 by 32 image and reduce it to a downscaled version of the image but

04:04

with multiple channels representing its response from different filters learned from the Saab

04:10

transform.

04:11

You can see the Saab transform as a convolution process, where the kernels are found using

04:16

the PCA dimension reduction algorithm replacing the need of backpropagation to learn these

04:21

weights.

04:22

I will come back to the PCA dimension reduction algorithm in a minute as it is repeated in

04:26

the next stage.

04:27

These filters are optimized to represent the different frequencies in the image, basically

04:32

getting activated by varying degrees of details.

04:35

The Saab transform was shown to work well against adversarial attacks compared to basic

04:40

convolutions trained with backpropagation.

04:43

You can also find more information about the Saab transformation in the references below.

04:47

If you are not used to how convolutions work, I strongly invite you to watch the video I

04:52

made introducing them:

04:56

I said Saab transforms worked well on adversarial attacks.

05:00

These adversarial attacks happen when we "attack" an image by changing a few pixels or adding

05:06

noise that humans cannot see to change the results of a machine learning model processing

05:11

the image.

05:12

So to simplify, we can basically see this PixelHop++ Unit as a typical 3 by 3 convolution

05:19

here since we do not look at the training process.

05:21

Of course, it works a bit differently, but it will make the explanation much more straightforward

05:26

as the process is comparable.

05:28

Then, the "Hop" step is repeated three times to get smaller and smaller versions of the

05:33

image with concentrated general information and more channels.

05:37

These channels are simply the outputs, or responses, of the input image by filters that

05:42

react differently depending on the level of detail in the image, as I said earlier.

05:48

One new channel per filter used.

05:50

Thus, we obtain various results giving us precise information about what the image contains,

05:55

but these results are smaller and smaller containing less spatial details unique to

06:00

that precise image sent in the network, and therefore have more general and useful information

06:06

with regard to what the image actually contains.

06:09

The first few images are still relatively big, starting at 32 by 32, being the initial

06:15

size of the patch and thus contains all the details.

06:18

Then, it drops to 15 by 15, and finally to 7 by 7 images, meaning that we have close

06:24

to zero spatial information in the end.The 15 by 15 image will just look like a blurry

06:29

version of the initial image but still contains some spatial information, while the 7 by 7

06:35

image will basically be a very general and broad version of the image with close to no

06:40

spatial information at all.

06:43

So just like a convolutional neural network, the deeper we get, the more channels we have

06:48

meaning that we have more filter responses reacting to different stimuli, but the smaller

06:53

they each are, ending with images of size 5x5.

06:56

Allowing us to have a broader view in many ways, keeping a lot of unique valuable information

07:02

even with smaller versions of the image.

07:05

The images get even smaller because each of the PixelHop units is followed by a max-pooling

07:12

step.

07:13

They are simply taking the maximum value of each square of two by two pixels, reducing

07:17

the image size by a factor of four at each step.

07:20

Then, as you can see in the full model shown above, the outputs from each max-pooling layer

07:24

are sent for further dimension reduction using the PCA algorithm.

07:26

Which is the third step, in green.

07:28

The PCA algorithm mainly takes the current dimensions, for example, 15 by 15 here in

07:34

the first step, and minimizes that while maintaining at least 90% of the intensity of the input

07:40

image.

07:41

Here is a very simple example of how PCA can reduce the dimension, where two-dimensional

07:45

points of cats and dogs are reduced to one dimension on a line, allowing us to add a

07:51

threshold and easily build a classifier.

07:54

Each hop gives us respectively 45, 30, and 5 parameters per channel instead of having

08:00

images of size 15 by 15, 7 by 7, and 3 by 3, which would give us in the same order 225,

08:07

49, and 9 parameters.

08:11

This is a much more compact representation while maximizing the quality of information

08:16

it contains.

08:17

All these steps were used to compress the information and make the network super fast.

08:22

You can see this as squeezing all the helpful juice at different levels of details of the

08:27

cropped image to finally decide whether it is fake or not, using both detailed and general

08:32

information in the decision process (step 4 in orange).

08:35

I'm glad to see that the research in countering these deepfakes is also advancing, and I'm

08:39

excited to see what will happen in the future with all that.

08:43

Let me know in the comments what you think will be the main consequences and concerns

08:47

regarding deepfakes.

08:48

Is it going to affect law, politics, companies, celebrities, ordinary people?

08:53

Well, pretty much everyone...

08:55

Let's have a discussion to share awareness and spread the word to be careful and that

09:00

we cannot believe what we see anymore, unfortunately.

09:02

This is both an incredible and dangerous new technology.

09:06

Please, do not abuse this technology and stay ethically correct.

09:10

The goal here is to help improve this technology and not to use it for the wrong reasons.

Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.

Published by HackerNoon on 2021/06/06