Wondering about the best ways to spot a deepfake? In this video, learn about a breakthrough US Army technology that uses artificial intelligence to find deepfakes. Watch the video References ►Read the full article: ►Test your deepfake detection capacity: ►DeepFakeHop: Chen, Hong-Shuo et al., (2021), “DefakeHop: A Light-Weight High-Performance Deepfake Detector.” ArXiv abs/2103.06929 ►Saab Transforms: Kuo, C.-C. Jay et al., (2019), “Interpretable Convolutional Neural Networks via Feedforward Design.” J. Vis. Commun. Image Represent.►OpenFace 2.0: T. Baltrusaitis, A. Zadeh, Y. C. Lim and L. Morency, "OpenFace 2.0: Facial Behavior Analysis Toolkit," 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), 2018, pp. 59-66, doi: 10.1109/FG.2018.00019. https://www.louisbouchard.ai/spot-deepfakes https://detectfakes.media.mit.edu/ Video Transcript 00:00 While they seem like they've always been there, the very first realistic deepfake didn't appear 00:05 until 2017. 00:07 It went from these first-ever resembling fake images automatically generated to today's 00:13 identical copy of someone on videos, with sound. 00:16 The reality is that we cannot see the difference between a real video or picture and a deepfake 00:22 anymore. 00:23 How can we tell what's real from what isn't? 00:25 How can audio files or video files be used in court as proof if an AI can entirely generate 00:32 them? 00:33 Well, this new paper may provide answers to these questions.And the answer here may again 00:37 be the use of artificial intelligence. 00:40 The saying "I'll believe it when I'll see it" may soon change for "I'll believe it when 00:45 the AI tells me to believe it..." 00:47 I will assume that you've all seen deepfakes and know a little about them. 00:51 Which will be enough for this article. 00:53 For more information about how they are generated, I invite you to watch the video I made explaining 00:58 deepfakes just below, as this video will focus on how to spot them. 01:04 More precisely, I will cover a new paper by the USA DEVCOM Army Research Laboratory entitled 01:09 "DEFAKEHOP: A LIGHT-WEIGHT HIGH-PERFORMANCE DEEPFAKE DETECTOR." 01:14 Indeed, they can detect deepfakes with over 90% accuracy in all datasets and even reach 01:20 100% accuracy in some benchmark datasets. 01:24 What is even more incredible is the size of their detection model. 01:27 As you can see, this DeFakeHop model has merely 40 thousand parameters, whereas the other 01:33 techniques yielding much worse accuracy had around 20 million! 01:37 This means that their model is 500 HUNDRED times smaller while outperforming the previous 01:43 state-of-the-art techniques. 01:44 This allows the model to quickly run on your mobile phone and allows you to detect deep 01:49 fakes anywhere. 01:50 You may think that you can tell the difference between a real picture or a fake one, but 01:54 if you remember the study I shared a couple of weeks ago, it clearly showed that around 01:59 50 percent of participants failed. 02:01 It was basically a random guess on whether a picture was fake or not. 02:05 There is a website from MIT where you can test your ability to spot deefakes if you'd 02:10 like to. 02:11 Having tried it myself, I can say it's pretty fun to do. 02:14 There are audio files, videos, pictures, etc. 02:17 The link is in the description below. 02:19 If you try it, please let me know how well you do! 02:21 And if you know any other fun apps to test yourself or help research by trying our best 02:26 to spot deepfakes, please link them in the comments. 02:29 I'd love to try them out! 02:31 Now, if we come back to the paper able to detect them much better than we can, the question 02:36 is: how is this tiny machine learning model able to achieve that while humans can't? 02:42 DeepFakeHop works in four steps. 02:44 Step 1: 02:45 At first, they use another model to extract 68 different facial landmarks from each video 02:50 frame. 02:51 These 68 points are extracted to understand where the face is, recenter, orient and resize 02:57 it to make them more consistent, and then extract specific parts of the face from the 03:02 image. 03:03 These are the "patches" of the image we will send our network, containing specific individual 03:09 face features like the eyes, mouth, nose. 03:12 It is done using another model called OpenFace 2.0. 03:16 It can accurately perform facial landmark detection, head pose estimation, facial action 03:22 unit recognition, and eye-gaze estimation in real-time. 03:26 These are all tiny patches of 32 by 32 that will all be sent into the actual network one 03:32 by one. 03:33 This makes the model super efficient because it deals with only a handful of tiny images 03:38 instead of the full image. 03:40 More details about OpenFace2.0 can be found in the references below if you are curious 03:45 about it. 03:46 Step 2 to 4 (left to right, blue, green, orange): 03:47 More precisely, the patches are sent to the first PixelhHop++ unit named Hop-1, as you 03:53 can see. 03:54 Representing the step one in blue. 03:55 This is an algorithm called Saab transform to reduce the dimension. 03:58 It will take the 32 by 32 image and reduce it to a downscaled version of the image but 04:04 with multiple channels representing its response from different filters learned from the Saab 04:10 transform. 04:11 You can see the Saab transform as a convolution process, where the kernels are found using 04:16 the PCA dimension reduction algorithm replacing the need of backpropagation to learn these 04:21 weights. 04:22 I will come back to the PCA dimension reduction algorithm in a minute as it is repeated in 04:26 the next stage. 04:27 These filters are optimized to represent the different frequencies in the image, basically 04:32 getting activated by varying degrees of details. 04:35 The Saab transform was shown to work well against adversarial attacks compared to basic 04:40 convolutions trained with backpropagation. 04:43 You can also find more information about the Saab transformation in the references below. 04:47 If you are not used to how convolutions work, I strongly invite you to watch the video I 04:52 made introducing them: 04:56 I said Saab transforms worked well on adversarial attacks. 05:00 These adversarial attacks happen when we "attack" an image by changing a few pixels or adding 05:06 noise that humans cannot see to change the results of a machine learning model processing 05:11 the image. 05:12 So to simplify, we can basically see this PixelHop++ Unit as a typical 3 by 3 convolution 05:19 here since we do not look at the training process. 05:21 Of course, it works a bit differently, but it will make the explanation much more straightforward 05:26 as the process is comparable. 05:28 Then, the "Hop" step is repeated three times to get smaller and smaller versions of the 05:33 image with concentrated general information and more channels. 05:37 These channels are simply the outputs, or responses, of the input image by filters that 05:42 react differently depending on the level of detail in the image, as I said earlier. 05:48 One new channel per filter used. 05:50 Thus, we obtain various results giving us precise information about what the image contains, 05:55 but these results are smaller and smaller containing less spatial details unique to 06:00 that precise image sent in the network, and therefore have more general and useful information 06:06 with regard to what the image actually contains. 06:09 The first few images are still relatively big, starting at 32 by 32, being the initial 06:15 size of the patch and thus contains all the details. 06:18 Then, it drops to 15 by 15, and finally to 7 by 7 images, meaning that we have close 06:24 to zero spatial information in the end.The 15 by 15 image will just look like a blurry 06:29 version of the initial image but still contains some spatial information, while the 7 by 7 06:35 image will basically be a very general and broad version of the image with close to no 06:40 spatial information at all. 06:43 So just like a convolutional neural network, the deeper we get, the more channels we have 06:48 meaning that we have more filter responses reacting to different stimuli, but the smaller 06:53 they each are, ending with images of size 5x5. 06:56 Allowing us to have a broader view in many ways, keeping a lot of unique valuable information 07:02 even with smaller versions of the image. 07:05 The images get even smaller because each of the PixelHop units is followed by a max-pooling 07:12 step. 07:13 They are simply taking the maximum value of each square of two by two pixels, reducing 07:17 the image size by a factor of four at each step. 07:20 Then, as you can see in the full model shown above, the outputs from each max-pooling layer 07:24 are sent for further dimension reduction using the PCA algorithm. 07:26 Which is the third step, in green. 07:28 The PCA algorithm mainly takes the current dimensions, for example, 15 by 15 here in 07:34 the first step, and minimizes that while maintaining at least 90% of the intensity of the input 07:40 image. 07:41 Here is a very simple example of how PCA can reduce the dimension, where two-dimensional 07:45 points of cats and dogs are reduced to one dimension on a line, allowing us to add a 07:51 threshold and easily build a classifier. 07:54 Each hop gives us respectively 45, 30, and 5 parameters per channel instead of having 08:00 images of size 15 by 15, 7 by 7, and 3 by 3, which would give us in the same order 225, 08:07 49, and 9 parameters. 08:11 This is a much more compact representation while maximizing the quality of information 08:16 it contains. 08:17 All these steps were used to compress the information and make the network super fast. 08:22 You can see this as squeezing all the helpful juice at different levels of details of the 08:27 cropped image to finally decide whether it is fake or not, using both detailed and general 08:32 information in the decision process (step 4 in orange). 08:35 I'm glad to see that the research in countering these deepfakes is also advancing, and I'm 08:39 excited to see what will happen in the future with all that. 08:43 Let me know in the comments what you think will be the main consequences and concerns 08:47 regarding deepfakes. 08:48 Is it going to affect law, politics, companies, celebrities, ordinary people? 08:53 Well, pretty much everyone... 08:55 Let's have a discussion to share awareness and spread the word to be careful and that 09:00 we cannot believe what we see anymore, unfortunately. 09:02 This is both an incredible and dangerous new technology. 09:06 Please, do not abuse this technology and stay ethically correct. 09:10 The goal here is to help improve this technology and not to use it for the wrong reasons.