paint-brush
Realistic Face Manipulation in Videos With AIby@whatsai
2,402 reads
2,402 reads

Realistic Face Manipulation in Videos With AI

by Louis BouchardJanuary 30th, 2022
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Many techniques allow you to add smiles, make you look younger or older, all automatically using AI-based algorithms in videos. It is called ai-based face manipulations in videos and here's the current state-of-the-art in 2022!... Learn more in the video: The Stitch it in Time: GAN-Based Facial Editing of Real Videos. Read the full article: https://www.louisbouchard.ai/stitch-it-in-time/

People Mentioned

Mention Thumbnail

Company Mentioned

Mention Thumbnail
featured image - Realistic Face Manipulation in Videos With AI
Louis Bouchard HackerNoon profile picture

You've most certainly seen movies like the recent Captain Marvel or Gemini Man where Samuel L Jackson and Will Smith appeared to look like they were much younger. This requires hundreds if not thousands of hours of work from professionals manually editing the scenes he appeared in. Instead, you could use a simple AI and do it within a few minutes.

Indeed, many techniques allow you to add smiles, make you look younger or older, all automatically using AI-based algorithms. It is called AI-based face manipulations in videos and here's the current state-of-the-art in 2022!...

Learn more in the video:

Video Transcript

00:01

you've most certainly seen movies like

00:03

the recent captain marvel or gemini man

00:05

where samuel l jackson and will smith

00:07

appear to look like they were much

00:09

younger this requires hundreds if not

00:11

thousands of hours of work from

00:13

professionals manually editing the

00:15

scenes he appeared in instead you could

00:18

use a simple ai and do it within a few

00:20

minutes indeed many techniques allow you

00:22

to add smiles make you look younger or

00:25

older all automatically using ai-based

00:28

algorithms they are mostly applied to

00:30

images since it's much easier but the

00:32

same techniques with small tweaks can be

00:34

applied on videos which as you may

00:36

suspect is quite promising for the film

00:38

industry and by the way the results

00:40

you've been seeing were all made using

00:42

the technique i will discuss in this

00:44

video the main problem is that currently

00:47

these generated older versions edited

00:49

images not only seem weird but when used

00:52

in a video will have glitches and

00:54

artifacts you surely do not want in a

00:56

million dollar movie this is because

00:58

it's much harder to get videos of people

01:00

than pictures making it even harder to

01:02

train such ai models that require so

01:04

many different examples to understand

01:06

what to do this strong data dependency

01:08

is one of the reasons why current ai is

01:11

far from human intelligence this is why

01:14

researchers like rotem saban and

01:16

collaborators from tel aviv university

01:18

work hard to improve the quality of

01:20

automatic ai video editing without

01:22

requiring so many video examples or more

01:25

precisely improve ai based face

01:28

manipulations in high quality talking

01:30

head videos using models trained with

01:33

images it doesn't require anything

01:35

except the single video you want to edit

01:37

and you can add a smile make you look

01:39

younger or even older it even works with

01:42

animated videos this is so cool but

01:45

what's even better is how they achieve

01:46

that but before doing so let me talk

01:49

about the sponsor of this video um

01:52

there are no sponsors for this video so

01:54

if you could just take a second to give

01:56

it a thumbs up and leave a comment about

01:58

what you think of the model or how you'd

02:00

apply it after watching the video or

02:02

even how you feel today that will be

02:04

amazing and i can promise you i will

02:06

answer within 12 minutes you can time it

02:10

so how does it work of course it uses

02:12

cans or generative adversarial networks

02:15

i won't go into the inner workings of

02:17

guns since i already covered it in a

02:19

video that you can watch right here and

02:22

linked below but we will see how this is

02:24

different from a basic gun architecture

02:26

if you are not familiar with guns just

02:28

take a minute to watch the video and

02:30

come back i'll still be there waiting

02:32

for you and i'm not exaggerating the

02:34

video literally takes one minute to get

02:36

a simple overview of what gans are we

02:39

will just refresh the part where you

02:40

have a generative model that takes an

02:43

image or rather an encoded version of

02:45

the image and changes this code to

02:47

generate a new version of the image

02:49

modifying specific aspects if possible

02:52

controlling the generation is the

02:53

challenging part as it has so many

02:55

parameters and it's really hard to find

02:57

which parameters are in charge for what

03:00

and disentangle everything to only edit

03:02

what you want so it uses any generative

03:04

based architecture such as style gun in

03:07

this case which is simply a powerful gan

03:09

architecture for images of faces

03:11

published by nvidia a few years ago with

03:13

still very impressive results and newer

03:15

versions but the generative model itself

03:18

isn't that important as it can work with

03:21

any powerful gan architecture you can

03:23

find and yes even if these models are

03:25

all trained with images they will use

03:27

them to perform video editing assuming

03:30

that the video you will send is

03:31

realistic and already consistent they

03:33

will simply focus on maintaining realism

03:36

rather than creating a real consistent

03:38

video as we have to do in video

03:40

synthesis work where we create new

03:42

videos out of the blue so each image

03:44

will be processed individually instead

03:46

of sending a whole video and expecting a

03:48

new video in return this assumption

03:51

makes the task way simpler but there are

03:53

more challenges to face like maintaining

03:55

such a realistic video where each frame

03:57

fluently goes to the next without

03:59

apparent glitches here they will take

04:01

each frame of the video as an input

04:03

extract only the face and alloying it

04:05

for consistency which is an essential

04:07

step as we will see then they will use

04:10

their pre-trained encoder and generator

04:12

to encode the frames and produce new

04:14

versions for each unfortunately this

04:17

wouldn't fix the realism problem where

04:19

the new faces may look weird or out of

04:21

place when going from one frame to

04:23

another as well as weird lighting bugs

04:25

and background differences that may

04:27

appear to fix that they will further

04:30

train the initial generator and use it

04:32

to help make the generations across all

04:34

frames more similar and globally

04:36

coherent they also introduce two other

04:38

steps an editing step and a new

04:40

operation that they call stitching

04:42

tuning the editing step will simply take

04:44

the encoded version of the image and

04:46

change it just a bit this is the part

04:49

where it will learn to change it just

04:51

enough to make the person look older in

04:53

this case so the model will be trained

04:55

to understand which parameters to move

04:57

and how much to modify the right

04:59

features of the image to make the person

05:01

look older like adding some gray hair

05:03

adding wrinkles etc then this stitching

05:07

tuning model will take the encoded image

05:09

you see here and will be trained to

05:11

generate the image from the edited code

05:13

that will best fit the background and

05:15

other frames it will achieve that by

05:17

taking the newly generated image

05:19

comparing it with the original image and

05:21

finding the best way to replace only the

05:23

face using a mask and keep the rest of

05:25

the cropped image unchanged

05:28

finally we paste the modified face back

05:31

on the frame this process is quite

05:33

clever and allows for the production of

05:35

really high quality videos since you

05:37

only need the cropped and aligned face

05:39

in the model incredibly reducing the

05:41

computation needs and complexity of the

05:43

task so even if the face needs to be

05:45

small let's say 200 pixels square if

05:48

it's only a fifth of the image as you

05:50

can see here you can keep a pretty high

05:52

resolution video and voila this is how

05:55

these great researchers perform high

05:57

quality face manipulation in videos i

06:00

hope you enjoyed this video let me know

06:02

how you feel about this one if you liked

06:04

it or not any feedback will be amazing

06:07

this is the last opportunity you have to

06:09

make mighty by clicking the like button

06:11

and leaving a comment before you go of

06:13

course the link to the paper and code

06:15

are in the video's description note that

06:17

the code will only be released on

06:19

february 14th as per the author's github

06:22

thank you for watching

06:25

[Music]

References

►Read the full article: https://www.louisbouchard.ai/stitch-it-in-time/
►What are GANs? Short video introduction: https://youtu.be/rt-J9YJVvv4
►Tzaban, R., Mokady, R., Gal, R., Bermano, A.H. and Cohen-Or, D., 2022.
Stitch it in Time: GAN-Based Facial Editing of Real Videos. https://arxiv.org/abs/2201.08361
►Project link: https://stitch-time.github.io/
►Code: https://github.com/rotemtzaban/STIT
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/