Have you ever wanted to edit a video to remove or add someone, change the background, make it last a bit longer, or change the resolution to fit a specific aspect ratio without compressing or stretching it? For those of you who already ran advertisement campaigns, you certainly wanted to have variations of your videos for AB testing and see what works best. Well, this new research by Niv Haim et al. can help you do all of the about in a single video and in HD! Indeed, using a simple video, you can perform any tasks I just mentioned in seconds or a few minutes for high-quality videos. You can basically use it for any video manipulation or video generation application you have in mind. It even outperforms GANs in all ways and doesn’t use any deep learning fancy research nor requires a huge and impractical dataset! And the best thing is that this technique is scalable to high-resolution videos... Watch the video References ►Read the full article: ►Paper covered: Haim, N., Feinstein, B., Granot, N., Shocher, A., Bagon, S., Dekel, T., & Irani, M. (2021). Diverse Generation from a Single Video Made Possible. ArXiv, abs/2109.08591. ►The technique that was adapted from images to videos: Niv Granot, Ben Feinstein, Assaf Shocher, Shai Bagon, and Michal Irani. Drop the gan: In defense of patches nearest neighbors as single image generative models. arXiv preprint arXiv:2103.15545, 2021. ►Code (available soon): ►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/vgpnn-ge... https://nivha.github.io/vgpnn/ https://www.louisbouchard.ai/newsletter/ Video Transcript 00:00 have you ever wanted to edit a video 00:02 remove or add someone change the 00:04 background make it last a bit longer or 00:06 change the resolution to fit a specific 00:08 aspect ratio without compressing or 00:10 stretching it for those of you who 00:12 already ran advertisement campaigns you 00:14 certainly wanted to have variations of 00:16 your videos for a b testing and see what 00:19 works best well this new research by niv 00:22 haim ital can help you do all of these 00:24 out of a single video and in high 00:27 definition indeed using a simple video 00:29 you can perform any tasks i just 00:32 mentioned in seconds or in a few minutes 00:34 for high quality videos you can 00:36 basically use it for any video 00:38 manipulation or video generation 00:40 application you have in mind it even 00:42 outperforms guns in any ways and doesn't 00:45 use any deep learning fancy research nor 00:48 requires a huge and impractical data set 00:51 and the best thing is that this 00:52 technique is scalable to high resolution 00:55 videos it is not only for research 00:57 purposes with 256 by 256 pixel videos oh 01:01 and of course you can use it with images 01:04 let's see how it works the model is 01:06 called video based generative patch 01:08 nearest neighbors vgpnn instead of using 01:11 complex algorithms and models like gans 01:14 or transformers the researchers that 01:16 developed vgpn opt for a much simpler 01:19 approach but revisited the nearest 01:22 neighbor algorithm first they downscale 01:24 the image in a pyramid way where each 01:26 level is a flower resolution than the 01:28 one above then they add random noise to 01:31 the coarsest level to generate a 01:33 different image similar to what guns do 01:36 in the compressed space after encoding 01:38 the image note that here i will say 01:40 image for simplicity but in this case 01:42 since it's applied to videos the process 01:45 is made on three frames simultaneously 01:48 adding a time dimension but the 01:49 explanation stays the same with an extra 01:52 step at the end the image at the 01:54 coarsest scale with noise added is 01:56 divided into multiple small square 01:59 patches all patches in the image with 02:01 noise added are replaced with the most 02:04 similar patch from the initial scaled 02:06 down image without noise this most 02:09 similar patch is measured with the 02:11 nearest neighbor algorithm as we will 02:13 see most of these patches will stay the 02:15 same but depending on the added noise 02:17 some patches will change just enough to 02:19 make them look more similar to another 02:21 patch in the initial image this is the 02:24 vpn output you see here these changes 02:27 are just enough to generate a new 02:29 version of the image then this first 02:31 output is upscaled and used to compare 02:34 with the input image of the next scale 02:36 to act as a noisy version of it and the 02:38 same steps are repeated in this next 02:41 iteration we split these images into 02:43 small patches and replace the previously 02:45 generated ones with the most similar 02:48 ones at the current step let's get into 02:50 this vpn module we just covered as you 02:53 can see here the only difference from 02:55 the initial step with noise added is 02:58 that we compare the upscale generated 03:00 image here denoted as q with an upscaled 03:03 version of the previous image just so it 03:06 has the same level of details denoted as 03:09 k basically using the level below as 03:12 comparisons we compare q and k and then 03:15 select corresponding patches in the 03:17 image from this current level v to 03:20 generate the new image for this step 03:22 which will be used for the next 03:24 iteration as you see here with the small 03:26 arrows k is just an upscale version of 03:28 the image we created downscaling v in 03:31 the initial step of this algorithm where 03:33 we created the pyramidal scaling 03:35 versions of our image this is done to 03:38 compare the same level of sharpness in 03:40 both images as the upscale generated 03:42 image from the previous layer q will be 03:45 much more blurry than the image at the 03:48 current step v and it will be very hard 03:50 to find similar patches this is repeated 03:53 until we get back to the top of the 03:54 pyramid with high resolution results 03:57 then all these generated patches are 03:59 folded into a video and voila you can 04:02 repeat this with different noises or 04:04 modifications to generate any variations 04:06 you want on your videos let's do a quick 04:09 recap the image is downscaled at 04:11 multiple scales noise is added to the 04:13 corsa scale image which is divided into 04:16 small square patches each noisy patch is 04:18 then replaced with the most similar 04:20 patches from the same compressed image 04:23 without noise causing few random changes 04:26 in the image while keeping realism both 04:28 the newly generated image and image 04:31 without noise of this step are upscaled 04:33 and compared to find the most similar 04:36 patches with the nearest neighbor again 04:38 these most similar patches are then 04:40 chosen from the image at the current 04:42 resolution to generate a new image for 04:45 the step again and we repeat this 04:47 upscaling and comparing steps until we 04:49 get back to the top of the pyramid with 04:52 high resolution results of course the 04:54 results are not perfect you can still 04:56 see some artifacts like people appearing 04:58 and disappearing at weird places or 05:00 simply copy-pasting someone in some 05:02 cases making it very obvious if you 05:05 focus on it still it's only the first 05:07 paper attacking video manipulations with 05:09 the nearest neighbor algorithm and 05:11 making it scalable to high resolution 05:13 videos it's always awesome to see 05:15 different approaches i'm super excited 05:18 to see the next paper improving upon 05:20 this one also the results are still 05:22 quite impressive and they could be used 05:24 as a data augmentation tool for models 05:26 working on videos due to their very low 05:29 run time allowing other models to train 05:31 on larger and more diverse data sets 05:33 without much cost if you are interested 05:35 in learning more about this technique i 05:37 will strongly recommend reading their 05:38 paper it is the first link in the 05:40 description thank you for watching and 05:42 to everyone supporting my work on 05:44 patreon or by commenting and liking the 05:46 videos here on youtube 05:54 you English (auto-generated) All Recently uploaded Watched