Learn how this algorithm can understand images and automatically remove the undesired object or person and save your future Instagram post! You’ve most certainly experienced this situation once: You take a great picture with your friend, and someone is photobombing behind you, ruining your future Instagram post. Well, that’s no longer an issue. Either it is a person or a trashcan you forgot to remove before taking your selfie that’s ruining your picture. This AI will just automatically remove the undesired object or person in the image and save your post. It’s just like a professional photoshop designer in your pocket, and with a simple click! This task of removing part of an image and replacing it with what should appear behind has been tackled by many AI researchers for a long time. It is called image inpainting, and it’s extremely challenging. Learn more in the video! Watch the video References ► Complete article: ► Suvorov, R., Logacheva, E., Mashikhin, A., Remizova, A., Ashukha, A., Silvestrov, A., Kong, N., Goka, H., Park, K. and Lempitsky, V., 2022. Resolution-robust Large Mask Inpainting with Fourier Convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 2149-2159). ► Code: ► Colab Demo: Product using LaMa: https://cleanup.pictures/ ► Fourier Domain explained by the great : ► Great in-depth explanation of LaMa with the authors by : ► My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/lama/ https://github.com/saic-mdal/lama https://colab.research.google.com/github/saic-mdal/lama/blob/master/colab/LaMa_inpainting.ipynb @3Blue1Brown https://youtu.be/spUNpyF58BY @Yannic Kilcher https://youtu.be/Lg97gWXsiQ4 https://www.louisbouchard.ai/newsletter/ Video Transcript 00:00 you've most certainly experienced this 00:01 situation once you take a great picture 00:04 with your friend and someone is 00:05 photobombing behind you ruining your 00:07 future instagram post well that's no 00:10 longer an issue either it's a person or 00:12 a trash can you forgot to remove before 00:14 taking your selfie that's ruining your 00:16 picture this ai will automatically 00:18 remove the undesired object or person in 00:21 the image and save your post it's just 00:23 like a professional photoshop designer 00:25 in your pocket with a simple click this 00:27 task of removing part of an image and 00:29 replacing it with what should appear 00:31 behind it has been tackled by many ai 00:34 researchers for a long time it's called 00:36 image and painting and it's extremely 00:38 challenging as you will see the paper i 00:40 want to show you achieves it with 00:41 incredible results and can do it easily 00:43 in high definition unlike previous 00:45 approaches you may have heard of before 00:48 you definitely want to stay until the 00:49 end of the video to see that you won't 00:51 believe how great and realistic it looks 00:54 for something produced in a split second 00:56 by an algorithm as i said this task of 00:59 imaging painting is basically you 01:00 removing unwanted objects from your 01:02 images you should be doing the same in 01:05 your work life and remove any friction 01:07 your next step as an ar professional or 01:09 student to do that should be to do like 01:11 me and try the sponsor of today's 01:13 episode weights and biases if you run a 01:16 lot of experiments you should be using 01:18 weights and vices it will remove all 01:20 painful steps from hyperparameter tuning 01:22 to results analysis with a handful of 01:25 lines of code added and it's entirely 01:27 free for personal usage it takes not 01:29 even five minutes to set up and you 01:31 don't have anything else to do forever 01:34 talking about removing friction points i 01:36 don't think you can do better than that 01:38 weight and biases has everything you 01:39 need for your code to be reproducible 01:42 without you even trying for your 01:44 well-being do like me and give weights 01:46 and biases a try for free with the first 01:48 link below to remove an object from an 01:50 image the machine needs to understand 01:52 what should appear behind the subject 01:54 and to do this will require having a 01:56 three-dimensional understanding of the 01:58 world as humans do but it doesn't have 02:00 that it just has access to a few pixels 02:03 in an image which is why it's so 02:05 complicated whereas it looks quite 02:06 simple for us that can simply imagine 02:09 the depth and guess that there should be 02:11 the rest of the wall here the window and 02:13 etc we basically need to teach the 02:15 machine how the world typically looks 02:17 like 02:18 so we will do that using a lot of 02:20 examples of real world images so that it 02:23 can have an idea of what our world looks 02:25 like in the two-dimensional picture 02:27 world which is not a perfect approach 02:29 but does the job then another problem 02:32 comes with the computational cost of 02:34 using real-world images with way too 02:36 many pixels to fix that most current 02:38 approaches work with low quality images 02:41 so a downsized version of the image that 02:43 is manageable for our computers and 02:46 upscale the inpainted part at the end to 02:48 replace it in the original image making 02:50 the final results look worse than it 02:53 could be or at least they won't look 02:55 great enough to be shared on instagram 02:57 and have all the likes you deserve you 02:59 can't really feel it high quality images 03:01 directly as it will take way too much 03:03 time to process and train or can you 03:06 well these are the main problems the 03:08 researchers attacked in this paper and 03:10 here's how roman suvarov ital from 03:13 samsung research introduced a new 03:15 network called llama that is quite 03:17 particular as you can see in image and 03:19 painting you will typically send the 03:21 initial image as well as what you'd like 03:23 to remove from it this is called a mask 03:26 and will cover the image as you can see 03:28 here and the network won't have access 03:30 to this information anymore as it needs 03:32 to fill in the pixels then it has to 03:35 understand the image and try to fill in 03:37 the same pixels it thinks should fit 03:39 best so in this case they start like any 03:41 other network and downscale the image 03:44 but don't worry their technique will 03:45 allow them to keep the same quality as a 03:47 high resolution image this is because 03:50 here in the processing of the image they 03:52 use something a bit different than usual 03:54 typically we can see different networks 03:56 here in the middle mostly convolutional 03:58 neural networks such networks are often 04:01 used on images due to how convolutions 04:03 work which i explained in other videos 04:05 like the one appearing on the top right 04:07 of your screen if you are interested in 04:09 how it works in short the network will 04:11 work in two steps first it will compress 04:14 the image and try to only save relevant 04:16 information the network will end up 04:18 conserving mostly the general 04:20 information about the image like its 04:22 color overall style or general object 04:24 appearing but not precise details then 04:27 it will try to reconstruct the image 04:29 using the same principles but backward 04:32 we use some tricks like skip connections 04:34 that will save information from the 04:35 first few layers of the network and pass 04:38 it along the second step so that it can 04:40 orient it towards the right objects in 04:42 short it easily knows that there's a 04:44 tower with a blue sky and trees called 04:47 global information but it needs the skip 04:49 connections to know that it's the eiffel 04:51 tower in the middle of the screen that 04:53 there are clouds here and there the 04:55 trees have these colors etc all the fine 04:58 grained details which we call local 05:00 information following a long training 05:02 with many examples we will expect our 05:04 network to reconstruct the image or at 05:06 least a very similar image that contains 05:09 the same kind of objects and be very 05:11 similar if not identical to the initial 05:14 image but remember in this case we are 05:16 working with low quality images that we 05:18 need to upscale which will hurt the 05:20 quality of the results the particularity 05:22 here is that instead of using 05:24 convolutions as in regular convolutional 05:26 networks and skip connections to keep 05:28 local knowledge it uses what we call the 05:31 fast fourier convolution or ffc this 05:34 means that the network will work in both 05:36 the spatial and frequency domains and 05:39 doesn't need to get back to the early 05:40 layers to understand the context of the 05:42 image each layer will work with 05:44 convolutions in the spatial domain to 05:46 process local features and use 4g 05:49 convolutions in the frequency domain to 05:51 analyze global features the frequency 05:53 domain is a bit special and i linked a 05:55 great video covering it in the 05:57 description below if you are curious it 05:58 will basically transform your image into 06:00 all possible frequencies just like sound 06:03 waves and tell you how much of each 06:05 frequency the image contains so each new 06:09 pixel of this newly created image will 06:11 represent a frequency covering the whole 06:13 spatial image and how much it is present 06:16 instead of colors the frequencies here 06:19 are just the repeated patterns at 06:21 different scales for example one of 06:23 these frequency pixels could be highly 06:25 activated by the vertical lines at a 06:27 specific distance from each other in 06:29 this case it could be the same distance 06:31 as the length of a brick so it will be 06:33 highly activated if there is a brick 06:35 wall in the image from this you'd 06:37 understand that there's probably a brick 06:39 wall and the size proportional to how 06:41 much it is activated and you can repeat 06:43 this for all pixels being activated for 06:45 similar patterns giving you good hints 06:48 of the overall aspect of the image but 06:50 nothing about the object themselves or 06:52 the colors the spatial domain will take 06:54 charge of this so doing convolutions on 06:57 this new 4d image allows you to work 06:59 with the whole image at each step of the 07:01 convolution process so it has access to 07:04 a much better global understanding of 07:05 the image even at early layers without 07:08 much computational cost which is 07:10 impossible to achieve with regular 07:11 convolutions in the spatial domain then 07:14 both global and local results are saved 07:17 and sent to the next layer which will 07:19 repeat these steps you will end up with 07:21 the final image that you can upscale 07:23 back the use of the fourier domain is 07:25 what makes it scalable to bigger images 07:27 as their image resolution doesn't affect 07:29 the fourier domain since it uses 07:31 frequencies over the whole image instead 07:34 of colors and the repeated pattern it's 07:36 looking for will be the same whatever 07:38 the size of the image meaning that even 07:40 with training this network with small 07:42 images you will be able to feed it much 07:44 larger images afterward and get amazing 07:47 results 07:54 as you can see the results are not 07:55 perfect but they are quite impressive 07:57 and i'm excited to see what they will do 07:59 next to improve them of course this was 08:01 just a simple overview of this new model 08:03 and you can find more detail about the 08:05 implementation in the paper linked in 08:07 the description below you can also 08:09 implement it yourself with the code link 08:11 below as well i hope you enjoyed the 08:13 video and if so please take a second to 08:16 share it with a friend that could find 08:17 this interesting it will mean a lot and 08:20 help this channel grow 08:21 thank you for watching 08:24 [Music]