Have you ever wanted to edit a video to remove or add someone, change the background, make it last a bit longer, or change the resolution to fit a specific aspect ratio without compressing or stretching it? For those of you who already ran advertisement campaigns, you certainly wanted to have variations of your videos for AB testing and see what works best.
Well, this new research by Niv Haim et al. can help you do all of the about in a single video and in HD!
Indeed, using a simple video, you can perform any tasks I just mentioned in seconds or a few minutes for high-quality videos. You can basically use it for any video manipulation or video generation application you have in mind. It even outperforms GANs in all ways and doesn’t use any deep learning fancy research nor requires a huge and impractical dataset!
And the best thing is that this technique is scalable to high-resolution videos...
►Read the full article: https://www.louisbouchard.ai/vgpnn-ge...
►Paper covered: Haim, N., Feinstein, B., Granot, N., Shocher, A., Bagon, S., Dekel, T., & Irani, M. (2021). Diverse Generation from a Single Video Made Possible. ArXiv, abs/2109.08591.
►The technique that was adapted from images to videos: Niv Granot, Ben Feinstein, Assaf Shocher, Shai Bagon, and Michal Irani. Drop the gan: In defense of patches nearest neighbors as single image generative models. arXiv preprint arXiv:2103.15545, 2021.
►Code (available soon): https://nivha.github.io/vgpnn/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
00:00
have you ever wanted to edit a video
00:02
remove or add someone change the
00:04
background make it last a bit longer or
00:06
change the resolution to fit a specific
00:08
aspect ratio without compressing or
00:10
stretching it for those of you who
00:12
already ran advertisement campaigns you
00:14
certainly wanted to have variations of
00:16
your videos for a b testing and see what
00:19
works best well this new research by niv
00:22
haim ital can help you do all of these
00:24
out of a single video and in high
00:27
definition indeed using a simple video
00:29
you can perform any tasks i just
00:32
mentioned in seconds or in a few minutes
00:34
for high quality videos you can
00:36
basically use it for any video
00:38
manipulation or video generation
00:40
application you have in mind it even
00:42
outperforms guns in any ways and doesn't
00:45
use any deep learning fancy research nor
00:48
requires a huge and impractical data set
00:51
and the best thing is that this
00:52
technique is scalable to high resolution
00:55
videos it is not only for research
00:57
purposes with 256 by 256 pixel videos oh
01:01
and of course you can use it with images
01:04
let's see how it works the model is
01:06
called video based generative patch
01:08
nearest neighbors vgpnn instead of using
01:11
complex algorithms and models like gans
01:14
or transformers the researchers that
01:16
developed vgpn opt for a much simpler
01:19
approach but revisited the nearest
01:22
neighbor algorithm first they downscale
01:24
the image in a pyramid way where each
01:26
level is a flower resolution than the
01:28
one above then they add random noise to
01:31
the coarsest level to generate a
01:33
different image similar to what guns do
01:36
in the compressed space after encoding
01:38
the image note that here i will say
01:40
image for simplicity but in this case
01:42
since it's applied to videos the process
01:45
is made on three frames simultaneously
01:48
adding a time dimension but the
01:49
explanation stays the same with an extra
01:52
step at the end the image at the
01:54
coarsest scale with noise added is
01:56
divided into multiple small square
01:59
patches all patches in the image with
02:01
noise added are replaced with the most
02:04
similar patch from the initial scaled
02:06
down image without noise this most
02:09
similar patch is measured with the
02:11
nearest neighbor algorithm as we will
02:13
see most of these patches will stay the
02:15
same but depending on the added noise
02:17
some patches will change just enough to
02:19
make them look more similar to another
02:21
patch in the initial image this is the
02:24
vpn output you see here these changes
02:27
are just enough to generate a new
02:29
version of the image then this first
02:31
output is upscaled and used to compare
02:34
with the input image of the next scale
02:36
to act as a noisy version of it and the
02:38
same steps are repeated in this next
02:41
iteration we split these images into
02:43
small patches and replace the previously
02:45
generated ones with the most similar
02:48
ones at the current step let's get into
02:50
this vpn module we just covered as you
02:53
can see here the only difference from
02:55
the initial step with noise added is
02:58
that we compare the upscale generated
03:00
image here denoted as q with an upscaled
03:03
version of the previous image just so it
03:06
has the same level of details denoted as
03:09
k basically using the level below as
03:12
comparisons we compare q and k and then
03:15
select corresponding patches in the
03:17
image from this current level v to
03:20
generate the new image for this step
03:22
which will be used for the next
03:24
iteration as you see here with the small
03:26
arrows k is just an upscale version of
03:28
the image we created downscaling v in
03:31
the initial step of this algorithm where
03:33
we created the pyramidal scaling
03:35
versions of our image this is done to
03:38
compare the same level of sharpness in
03:40
both images as the upscale generated
03:42
image from the previous layer q will be
03:45
much more blurry than the image at the
03:48
current step v and it will be very hard
03:50
to find similar patches this is repeated
03:53
until we get back to the top of the
03:54
pyramid with high resolution results
03:57
then all these generated patches are
03:59
folded into a video and voila you can
04:02
repeat this with different noises or
04:04
modifications to generate any variations
04:06
you want on your videos let's do a quick
04:09
recap the image is downscaled at
04:11
multiple scales noise is added to the
04:13
corsa scale image which is divided into
04:16
small square patches each noisy patch is
04:18
then replaced with the most similar
04:20
patches from the same compressed image
04:23
without noise causing few random changes
04:26
in the image while keeping realism both
04:28
the newly generated image and image
04:31
without noise of this step are upscaled
04:33
and compared to find the most similar
04:36
patches with the nearest neighbor again
04:38
these most similar patches are then
04:40
chosen from the image at the current
04:42
resolution to generate a new image for
04:45
the step again and we repeat this
04:47
upscaling and comparing steps until we
04:49
get back to the top of the pyramid with
04:52
high resolution results of course the
04:54
results are not perfect you can still
04:56
see some artifacts like people appearing
04:58
and disappearing at weird places or
05:00
simply copy-pasting someone in some
05:02
cases making it very obvious if you
05:05
focus on it still it's only the first
05:07
paper attacking video manipulations with
05:09
the nearest neighbor algorithm and
05:11
making it scalable to high resolution
05:13
videos it's always awesome to see
05:15
different approaches i'm super excited
05:18
to see the next paper improving upon
05:20
this one also the results are still
05:22
quite impressive and they could be used
05:24
as a data augmentation tool for models
05:26
working on videos due to their very low
05:29
run time allowing other models to train
05:31
on larger and more diverse data sets
05:33
without much cost if you are interested
05:35
in learning more about this technique i
05:37
will strongly recommend reading their
05:38
paper it is the first link in the
05:40
description thank you for watching and
05:42
to everyone supporting my work on
05:44
patreon or by commenting and liking the
05:46
videos here on youtube
05:54
you
English (auto-generated)
All
Recently uploaded
Watched