Hackernoon logoHow to Spot a DeepFake in 2021 by@whatsai

How to Spot a DeepFake in 2021

In this video, learn about a breakthrough US Army technology that uses artificial intelligence to find deepfakes. The first realistic deepfake didn't appear until 2017. We cannot see the difference between a real video or picture and a deepfake any more. There is a website from MIT where you can test your ability to spot fake fakes from MIT's 'DeepFakeHop' - a light-weight high-performance deepfake detector. The model is 500 HUNDRED times smaller while outperforming the previous state-of-the-art techniques.
image
Louis Bouchard Hacker Noon profile picture

Louis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

Wondering about the best ways to spot a deepfake? In this video, learn about a breakthrough US Army technology that uses artificial intelligence to find deepfakes.

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/spot-deepfakes

►Test your deepfake detection capacity: https://detectfakes.media.mit.edu/

►DeepFakeHop: Chen, Hong-Shuo et al., (2021), “DefakeHop: A Light-Weight High-Performance Deepfake Detector.” ArXiv abs/2103.06929

►Saab Transforms: Kuo, C.-C. Jay et al., (2019), “Interpretable Convolutional Neural Networks via Feedforward Design.” J. Vis. Commun. Image Represent.►OpenFace 2.0: T. Baltrusaitis, A. Zadeh, Y. C. Lim and L. Morency, "OpenFace 2.0: Facial Behavior Analysis Toolkit," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018, pp. 59-66, doi: 10.1109/FG.2018.00019.

Video Transcript

00:00

While they seem like they've always been there, the very first realistic deepfake didn't appear

00:05

until 2017.

00:07

It went from these first-ever resembling fake images automatically generated to today's

00:13

identical copy of someone on videos, with sound.

00:16

The reality is that we cannot see the difference between a real video or picture and a deepfake

00:22

anymore.

00:23

How can we tell what's real from what isn't?

00:25

How can audio files or video files be used in court as proof if an AI can entirely generate

00:32

them?

00:33

Well, this new paper may provide answers to these questions.And the answer here may again

00:37

be the use of artificial intelligence.

00:40

The saying "I'll believe it when I'll see it" may soon change for "I'll believe it when

00:45

the AI tells me to believe it..."

00:47

I will assume that you've all seen deepfakes and know a little about them.

00:51

Which will be enough for this article.

00:53

For more information about how they are generated, I invite you to watch the video I made explaining

00:58

deepfakes just below, as this video will focus on how to spot them.

01:04

More precisely, I will cover a new paper by the USA DEVCOM Army Research Laboratory entitled

01:09

"DEFAKEHOP: A LIGHT-WEIGHT HIGH-PERFORMANCE DEEPFAKE DETECTOR."

01:14

Indeed, they can detect deepfakes with over 90% accuracy in all datasets and even reach

01:20

100% accuracy in some benchmark datasets.

01:24

What is even more incredible is the size of their detection model.

01:27

As you can see, this DeFakeHop model has merely 40 thousand parameters, whereas the other

01:33

techniques yielding much worse accuracy had around 20 million!

01:37

This means that their model is 500 HUNDRED times smaller while outperforming the previous

01:43

state-of-the-art techniques.

01:44

This allows the model to quickly run on your mobile phone and allows you to detect deep

01:49

fakes anywhere.

01:50

You may think that you can tell the difference between a real picture or a fake one, but

01:54

if you remember the study I shared a couple of weeks ago, it clearly showed that around

01:59

50 percent of participants failed.

02:01

It was basically a random guess on whether a picture was fake or not.

02:05

There is a website from MIT where you can test your ability to spot deefakes if you'd

02:10

like to.

02:11

Having tried it myself, I can say it's pretty fun to do.

02:14

There are audio files, videos, pictures, etc.

02:17

The link is in the description below.

02:19

If you try it, please let me know how well you do!

02:21

And if you know any other fun apps to test yourself or help research by trying our best

02:26

to spot deepfakes, please link them in the comments.

02:29

I'd love to try them out!

02:31

Now, if we come back to the paper able to detect them much better than we can, the question

02:36

is: how is this tiny machine learning model able to achieve that while humans can't?

02:42

DeepFakeHop works in four steps.

02:44

Step 1:

02:45

At first, they use another model to extract 68 different facial landmarks from each video

02:50

frame.

02:51

These 68 points are extracted to understand where the face is, recenter, orient and resize

02:57

it to make them more consistent, and then extract specific parts of the face from the

03:02

image.

03:03

These are the "patches" of the image we will send our network, containing specific individual

03:09

face features like the eyes, mouth, nose.

03:12

It is done using another model called OpenFace 2.0.

03:16

It can accurately perform facial landmark detection, head pose estimation, facial action

03:22

unit recognition, and eye-gaze estimation in real-time.

03:26

These are all tiny patches of 32 by 32 that will all be sent into the actual network one

03:32

by one.

03:33

This makes the model super efficient because it deals with only a handful of tiny images

03:38

instead of the full image.

03:40

More details about OpenFace2.0 can be found in the references below if you are curious

03:45

about it.

03:46

Step 2 to 4 (left to right, blue, green, orange):

03:47

More precisely, the patches are sent to the first PixelhHop++ unit named Hop-1, as you

03:53

can see.

03:54

Representing the step one in blue.

03:55

This is an algorithm called Saab transform to reduce the dimension.

03:58

It will take the 32 by 32 image and reduce it to a downscaled version of the image but

04:04

with multiple channels representing its response from different filters learned from the Saab

04:10

transform.

04:11

You can see the Saab transform as a convolution process, where the kernels are found using

04:16

the PCA dimension reduction algorithm replacing the need of backpropagation to learn these

04:21

weights.

04:22

I will come back to the PCA dimension reduction algorithm in a minute as it is repeated in

04:26

the next stage.

04:27

These filters are optimized to represent the different frequencies in the image, basically

04:32

getting activated by varying degrees of details.

04:35

The Saab transform was shown to work well against adversarial attacks compared to basic

04:40

convolutions trained with backpropagation.

04:43

You can also find more information about the Saab transformation in the references below.

04:47

If you are not used to how convolutions work, I strongly invite you to watch the video I

04:52

made introducing them:

04:56

I said Saab transforms worked well on adversarial attacks.

05:00

These adversarial attacks happen when we "attack" an image by changing a few pixels or adding

05:06

noise that humans cannot see to change the results of a machine learning model processing

05:11

the image.

05:12

So to simplify, we can basically see this PixelHop++ Unit as a typical 3 by 3 convolution

05:19

here since we do not look at the training process.

05:21

Of course, it works a bit differently, but it will make the explanation much more straightforward

05:26

as the process is comparable.

05:28

Then, the "Hop" step is repeated three times to get smaller and smaller versions of the

05:33

image with concentrated general information and more channels.

05:37

These channels are simply the outputs, or responses, of the input image by filters that

05:42

react differently depending on the level of detail in the image, as I said earlier.

05:48

One new channel per filter used.

05:50

Thus, we obtain various results giving us precise information about what the image contains,

05:55

but these results are smaller and smaller containing less spatial details unique to

06:00

that precise image sent in the network, and therefore have more general and useful information

06:06

with regard to what the image actually contains.

06:09

The first few images are still relatively big, starting at 32 by 32, being the initial

06:15

size of the patch and thus contains all the details.

06:18

Then, it drops to 15 by 15, and finally to 7 by 7 images, meaning that we have close

06:24

to zero spatial information in the end.The 15 by 15 image will just look like a blurry

06:29

version of the initial image but still contains some spatial information, while the 7 by 7

06:35

image will basically be a very general and broad version of the image with close to no

06:40

spatial information at all.

06:43

So just like a convolutional neural network, the deeper we get, the more channels we have

06:48

meaning that we have more filter responses reacting to different stimuli, but the smaller

06:53

they each are, ending with images of size 5x5.

06:56

Allowing us to have a broader view in many ways, keeping a lot of unique valuable information

07:02

even with smaller versions of the image.

07:05

The images get even smaller because each of the PixelHop units is followed by a max-pooling

07:12

step.

07:13

They are simply taking the maximum value of each square of two by two pixels, reducing

07:17

the image size by a factor of four at each step.

07:20

Then, as you can see in the full model shown above, the outputs from each max-pooling layer

07:24

are sent for further dimension reduction using the PCA algorithm.

07:26

Which is the third step, in green.

07:28

The PCA algorithm mainly takes the current dimensions, for example, 15 by 15 here in

07:34

the first step, and minimizes that while maintaining at least 90% of the intensity of the input

07:40

image.

07:41

Here is a very simple example of how PCA can reduce the dimension, where two-dimensional

07:45

points of cats and dogs are reduced to one dimension on a line, allowing us to add a

07:51

threshold and easily build a classifier.

07:54

Each hop gives us respectively 45, 30, and 5 parameters per channel instead of having

08:00

images of size 15 by 15, 7 by 7, and 3 by 3, which would give us in the same order 225,

08:07

49, and 9 parameters.

08:11

This is a much more compact representation while maximizing the quality of information

08:16

it contains.

08:17

All these steps were used to compress the information and make the network super fast.

08:22

You can see this as squeezing all the helpful juice at different levels of details of the

08:27

cropped image to finally decide whether it is fake or not, using both detailed and general

08:32

information in the decision process (step 4 in orange).

08:35

I'm glad to see that the research in countering these deepfakes is also advancing, and I'm

08:39

excited to see what will happen in the future with all that.

08:43

Let me know in the comments what you think will be the main consequences and concerns

08:47

regarding deepfakes.

08:48

Is it going to affect law, politics, companies, celebrities, ordinary people?

08:53

Well, pretty much everyone...

08:55

Let's have a discussion to share awareness and spread the word to be careful and that

09:00

we cannot believe what we see anymore, unfortunately.

09:02

This is both an incredible and dangerous new technology.

09:06

Please, do not abuse this technology and stay ethically correct.

09:10

The goal here is to help improve this technology and not to use it for the wrong reasons.         



Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.