Wondering about the best ways to spot a deepfake? In this video, learn about a breakthrough US Army technology that uses artificial intelligence to find deepfakes.
►Read the full article: https://www.louisbouchard.ai/spot-deepfakes
►Test your deepfake detection capacity: https://detectfakes.media.mit.edu/
►DeepFakeHop: Chen, Hong-Shuo et al., (2021), “DefakeHop: A Light-Weight High-Performance Deepfake Detector.” ArXiv abs/2103.06929
►Saab Transforms: Kuo, C.-C. Jay et al., (2019), “Interpretable Convolutional Neural Networks via Feedforward Design.” J. Vis. Commun. Image Represent.►OpenFace 2.0: T. Baltrusaitis, A. Zadeh, Y. C. Lim and L. Morency, "OpenFace 2.0: Facial Behavior Analysis Toolkit," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018, pp. 59-66, doi: 10.1109/FG.2018.00019.
00:00
While they seem like they've always been there, the very first realistic deepfake didn't appear
00:05
until 2017.
00:07
It went from these first-ever resembling fake images automatically generated to today's
00:13
identical copy of someone on videos, with sound.
00:16
The reality is that we cannot see the difference between a real video or picture and a deepfake
00:22
anymore.
00:23
How can we tell what's real from what isn't?
00:25
How can audio files or video files be used in court as proof if an AI can entirely generate
00:32
them?
00:33
Well, this new paper may provide answers to these questions.And the answer here may again
00:37
be the use of artificial intelligence.
00:40
The saying "I'll believe it when I'll see it" may soon change for "I'll believe it when
00:45
the AI tells me to believe it..."
00:47
I will assume that you've all seen deepfakes and know a little about them.
00:51
Which will be enough for this article.
00:53
For more information about how they are generated, I invite you to watch the video I made explaining
00:58
deepfakes just below, as this video will focus on how to spot them.
01:04
More precisely, I will cover a new paper by the USA DEVCOM Army Research Laboratory entitled
01:09
"DEFAKEHOP: A LIGHT-WEIGHT HIGH-PERFORMANCE DEEPFAKE DETECTOR."
01:14
Indeed, they can detect deepfakes with over 90% accuracy in all datasets and even reach
01:20
100% accuracy in some benchmark datasets.
01:24
What is even more incredible is the size of their detection model.
01:27
As you can see, this DeFakeHop model has merely 40 thousand parameters, whereas the other
01:33
techniques yielding much worse accuracy had around 20 million!
01:37
This means that their model is 500 HUNDRED times smaller while outperforming the previous
01:43
state-of-the-art techniques.
01:44
This allows the model to quickly run on your mobile phone and allows you to detect deep
01:49
fakes anywhere.
01:50
You may think that you can tell the difference between a real picture or a fake one, but
01:54
if you remember the study I shared a couple of weeks ago, it clearly showed that around
01:59
50 percent of participants failed.
02:01
It was basically a random guess on whether a picture was fake or not.
02:05
There is a website from MIT where you can test your ability to spot deefakes if you'd
02:10
like to.
02:11
Having tried it myself, I can say it's pretty fun to do.
02:14
There are audio files, videos, pictures, etc.
02:17
The link is in the description below.
02:19
If you try it, please let me know how well you do!
02:21
And if you know any other fun apps to test yourself or help research by trying our best
02:26
to spot deepfakes, please link them in the comments.
02:29
I'd love to try them out!
02:31
Now, if we come back to the paper able to detect them much better than we can, the question
02:36
is: how is this tiny machine learning model able to achieve that while humans can't?
02:42
DeepFakeHop works in four steps.
02:44
Step 1:
02:45
At first, they use another model to extract 68 different facial landmarks from each video
02:50
frame.
02:51
These 68 points are extracted to understand where the face is, recenter, orient and resize
02:57
it to make them more consistent, and then extract specific parts of the face from the
03:02
image.
03:03
These are the "patches" of the image we will send our network, containing specific individual
03:09
face features like the eyes, mouth, nose.
03:12
It is done using another model called OpenFace 2.0.
03:16
It can accurately perform facial landmark detection, head pose estimation, facial action
03:22
unit recognition, and eye-gaze estimation in real-time.
03:26
These are all tiny patches of 32 by 32 that will all be sent into the actual network one
03:32
by one.
03:33
This makes the model super efficient because it deals with only a handful of tiny images
03:38
instead of the full image.
03:40
More details about OpenFace2.0 can be found in the references below if you are curious
03:45
about it.
03:46
Step 2 to 4 (left to right, blue, green, orange):
03:47
More precisely, the patches are sent to the first PixelhHop++ unit named Hop-1, as you
03:53
can see.
03:54
Representing the step one in blue.
03:55
This is an algorithm called Saab transform to reduce the dimension.
03:58
It will take the 32 by 32 image and reduce it to a downscaled version of the image but
04:04
with multiple channels representing its response from different filters learned from the Saab
04:10
transform.
04:11
You can see the Saab transform as a convolution process, where the kernels are found using
04:16
the PCA dimension reduction algorithm replacing the need of backpropagation to learn these
04:21
weights.
04:22
I will come back to the PCA dimension reduction algorithm in a minute as it is repeated in
04:26
the next stage.
04:27
These filters are optimized to represent the different frequencies in the image, basically
04:32
getting activated by varying degrees of details.
04:35
The Saab transform was shown to work well against adversarial attacks compared to basic
04:40
convolutions trained with backpropagation.
04:43
You can also find more information about the Saab transformation in the references below.
04:47
If you are not used to how convolutions work, I strongly invite you to watch the video I
04:52
made introducing them:
04:56
I said Saab transforms worked well on adversarial attacks.
05:00
These adversarial attacks happen when we "attack" an image by changing a few pixels or adding
05:06
noise that humans cannot see to change the results of a machine learning model processing
05:11
the image.
05:12
So to simplify, we can basically see this PixelHop++ Unit as a typical 3 by 3 convolution
05:19
here since we do not look at the training process.
05:21
Of course, it works a bit differently, but it will make the explanation much more straightforward
05:26
as the process is comparable.
05:28
Then, the "Hop" step is repeated three times to get smaller and smaller versions of the
05:33
image with concentrated general information and more channels.
05:37
These channels are simply the outputs, or responses, of the input image by filters that
05:42
react differently depending on the level of detail in the image, as I said earlier.
05:48
One new channel per filter used.
05:50
Thus, we obtain various results giving us precise information about what the image contains,
05:55
but these results are smaller and smaller containing less spatial details unique to
06:00
that precise image sent in the network, and therefore have more general and useful information
06:06
with regard to what the image actually contains.
06:09
The first few images are still relatively big, starting at 32 by 32, being the initial
06:15
size of the patch and thus contains all the details.
06:18
Then, it drops to 15 by 15, and finally to 7 by 7 images, meaning that we have close
06:24
to zero spatial information in the end.The 15 by 15 image will just look like a blurry
06:29
version of the initial image but still contains some spatial information, while the 7 by 7
06:35
image will basically be a very general and broad version of the image with close to no
06:40
spatial information at all.
06:43
So just like a convolutional neural network, the deeper we get, the more channels we have
06:48
meaning that we have more filter responses reacting to different stimuli, but the smaller
06:53
they each are, ending with images of size 5x5.
06:56
Allowing us to have a broader view in many ways, keeping a lot of unique valuable information
07:02
even with smaller versions of the image.
07:05
The images get even smaller because each of the PixelHop units is followed by a max-pooling
07:12
step.
07:13
They are simply taking the maximum value of each square of two by two pixels, reducing
07:17
the image size by a factor of four at each step.
07:20
Then, as you can see in the full model shown above, the outputs from each max-pooling layer
07:24
are sent for further dimension reduction using the PCA algorithm.
07:26
Which is the third step, in green.
07:28
The PCA algorithm mainly takes the current dimensions, for example, 15 by 15 here in
07:34
the first step, and minimizes that while maintaining at least 90% of the intensity of the input
07:40
image.
07:41
Here is a very simple example of how PCA can reduce the dimension, where two-dimensional
07:45
points of cats and dogs are reduced to one dimension on a line, allowing us to add a
07:51
threshold and easily build a classifier.
07:54
Each hop gives us respectively 45, 30, and 5 parameters per channel instead of having
08:00
images of size 15 by 15, 7 by 7, and 3 by 3, which would give us in the same order 225,
08:07
49, and 9 parameters.
08:11
This is a much more compact representation while maximizing the quality of information
08:16
it contains.
08:17
All these steps were used to compress the information and make the network super fast.
08:22
You can see this as squeezing all the helpful juice at different levels of details of the
08:27
cropped image to finally decide whether it is fake or not, using both detailed and general
08:32
information in the decision process (step 4 in orange).
08:35
I'm glad to see that the research in countering these deepfakes is also advancing, and I'm
08:39
excited to see what will happen in the future with all that.
08:43
Let me know in the comments what you think will be the main consequences and concerns
08:47
regarding deepfakes.
08:48
Is it going to affect law, politics, companies, celebrities, ordinary people?
08:53
Well, pretty much everyone...
08:55
Let's have a discussion to share awareness and spread the word to be careful and that
09:00
we cannot believe what we see anymore, unfortunately.
09:02
This is both an incredible and dangerous new technology.
09:06
Please, do not abuse this technology and stay ethically correct.
09:10
The goal here is to help improve this technology and not to use it for the wrong reasons.