This AI Performs Seamless Video Manipulation Without Deep Learning or Datasets

Written by whatsai | Published 2021/09/26
Tech Story Tags: artificial-intelligence | ai | video-synthesis | video-generation | computer-vision | technology | innovation | hackernoon-top-story | web-monetization

TLDR

Have you ever wanted to edit a video to remove or add someone, change the background, make it last a bit longer, or change the resolution to fit a specific aspect ratio without compressing or stretching it? For those of you who already ran advertisement campaigns, you certainly wanted to have variations of your videos for AB testing and see what works best. Well, this new research by Niv Haim et al. can help you do all of the about in a single video and in HD! Indeed, using a simple video, you can perform any tasks I just mentioned in seconds or a few minutes for high-quality videos. You can basically use it for any video manipulation or video generation application you have in mind. It even outperforms GANs in all ways and doesn’t use any deep learning fancy research nor requires a huge and impractical dataset! And the best thing is that this technique is scalable to high-resolution videosvia the TL;DR App

Have you ever wanted to edit a video to remove or add someone, change the background, make it last a bit longer, or change the resolution to fit a specific aspect ratio without compressing or stretching it? For those of you who already ran advertisement campaigns, you certainly wanted to have variations of your videos for AB testing and see what works best.

Well, this new research by Niv Haim et al. can help you do all of the about in a single video and in HD!

Indeed, using a simple video, you can perform any tasks I just mentioned in seconds or a few minutes for high-quality videos. You can basically use it for any video manipulation or video generation application you have in mind. It even outperforms GANs in all ways and doesn’t use any deep learning fancy research nor requires a huge and impractical dataset!

And the best thing is that this technique is scalable to high-resolution videos...

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/vgpnn-ge...
►Paper covered: Haim, N., Feinstein, B., Granot, N., Shocher, A., Bagon, S., Dekel, T., & Irani, M. (2021). Diverse Generation from a Single Video Made Possible. ArXiv, abs/2109.08591.
►The technique that was adapted from images to videos: Niv Granot, Ben Feinstein, Assaf Shocher, Shai Bagon, and Michal Irani. Drop the gan: In defense of patches nearest neighbors as single image generative models. arXiv preprint arXiv:2103.15545, 2021.
►Code (available soon): https://nivha.github.io/vgpnn/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

00:00

have you ever wanted to edit a video

00:02

remove or add someone change the

00:04

background make it last a bit longer or

00:06

change the resolution to fit a specific

00:08

aspect ratio without compressing or

00:10

stretching it for those of you who

00:12

already ran advertisement campaigns you

00:14

certainly wanted to have variations of

00:16

your videos for a b testing and see what

00:19

works best well this new research by niv

00:22

haim ital can help you do all of these

00:24

out of a single video and in high

00:27

definition indeed using a simple video

00:29

you can perform any tasks i just

00:32

mentioned in seconds or in a few minutes

00:34

for high quality videos you can

00:36

basically use it for any video

00:38

manipulation or video generation

00:40

application you have in mind it even

00:42

outperforms guns in any ways and doesn't

00:45

use any deep learning fancy research nor

00:48

requires a huge and impractical data set

00:51

and the best thing is that this

00:52

technique is scalable to high resolution

00:55

videos it is not only for research

00:57

purposes with 256 by 256 pixel videos oh

01:01

and of course you can use it with images

01:04

let's see how it works the model is

01:06

called video based generative patch

01:08

nearest neighbors vgpnn instead of using

01:11

complex algorithms and models like gans

01:14

or transformers the researchers that

01:16

developed vgpn opt for a much simpler

01:19

approach but revisited the nearest

01:22

neighbor algorithm first they downscale

01:24

the image in a pyramid way where each

01:26

level is a flower resolution than the

01:28

one above then they add random noise to

01:31

the coarsest level to generate a

01:33

different image similar to what guns do

01:36

in the compressed space after encoding

01:38

the image note that here i will say

01:40

image for simplicity but in this case

01:42

since it's applied to videos the process

01:45

is made on three frames simultaneously

01:48

adding a time dimension but the

01:49

explanation stays the same with an extra

01:52

step at the end the image at the

01:54

coarsest scale with noise added is

01:56

divided into multiple small square

01:59

patches all patches in the image with

02:01

noise added are replaced with the most

02:04

similar patch from the initial scaled

02:06

down image without noise this most

02:09

similar patch is measured with the

02:11

nearest neighbor algorithm as we will

02:13

see most of these patches will stay the

02:15

same but depending on the added noise

02:17

some patches will change just enough to

02:19

make them look more similar to another

02:21

patch in the initial image this is the

02:24

vpn output you see here these changes

02:27

are just enough to generate a new

02:29

version of the image then this first

02:31

output is upscaled and used to compare

02:34

with the input image of the next scale

02:36

to act as a noisy version of it and the

02:38

same steps are repeated in this next

02:41

iteration we split these images into

02:43

small patches and replace the previously

02:45

generated ones with the most similar

02:48

ones at the current step let's get into

02:50

this vpn module we just covered as you

02:53

can see here the only difference from

02:55

the initial step with noise added is

02:58

that we compare the upscale generated

03:00

image here denoted as q with an upscaled

03:03

version of the previous image just so it

03:06

has the same level of details denoted as

03:09

k basically using the level below as

03:12

comparisons we compare q and k and then

03:15

select corresponding patches in the

03:17

image from this current level v to

03:20

generate the new image for this step

03:22

which will be used for the next

03:24

iteration as you see here with the small

03:26

arrows k is just an upscale version of

03:28

the image we created downscaling v in

03:31

the initial step of this algorithm where

03:33

we created the pyramidal scaling

03:35

versions of our image this is done to

03:38

compare the same level of sharpness in

03:40

both images as the upscale generated

03:42

image from the previous layer q will be

03:45

much more blurry than the image at the

03:48

current step v and it will be very hard

03:50

to find similar patches this is repeated

03:53

until we get back to the top of the

03:54

pyramid with high resolution results

03:57

then all these generated patches are

03:59

folded into a video and voila you can

04:02

repeat this with different noises or

04:04

modifications to generate any variations

04:06

you want on your videos let's do a quick

04:09

recap the image is downscaled at

04:11

multiple scales noise is added to the

04:13

corsa scale image which is divided into

04:16

small square patches each noisy patch is

04:18

then replaced with the most similar

04:20

patches from the same compressed image

04:23

without noise causing few random changes

04:26

in the image while keeping realism both

04:28

the newly generated image and image

04:31

without noise of this step are upscaled

04:33

and compared to find the most similar

04:36

patches with the nearest neighbor again

04:38

these most similar patches are then

04:40

chosen from the image at the current

04:42

resolution to generate a new image for

04:45

the step again and we repeat this

04:47

upscaling and comparing steps until we

04:49

get back to the top of the pyramid with

04:52

high resolution results of course the

04:54

results are not perfect you can still

04:56

see some artifacts like people appearing

04:58

and disappearing at weird places or

05:00

simply copy-pasting someone in some

05:02

cases making it very obvious if you

05:05

focus on it still it's only the first

05:07

paper attacking video manipulations with

05:09

the nearest neighbor algorithm and

05:11

making it scalable to high resolution

05:13

videos it's always awesome to see

05:15

different approaches i'm super excited

05:18

to see the next paper improving upon

05:20

this one also the results are still

05:22

quite impressive and they could be used

05:24

as a data augmentation tool for models

05:26

working on videos due to their very low

05:29

run time allowing other models to train

05:31

on larger and more diverse data sets

05:33

without much cost if you are interested

05:35

in learning more about this technique i

05:37

will strongly recommend reading their

05:38

paper it is the first link in the

05:40

description thank you for watching and

05:42

to everyone supporting my work on

05:44

patreon or by commenting and liking the

05:46

videos here on youtube

05:54

you

English (auto-generated)

All

Recently uploaded

Watched

Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.

Published by HackerNoon on 2021/09/26