You've most certainly seen movies like the recent Captain Marvel or Gemini Man where Samuel L Jackson and Will Smith appeared to look like they were much younger. This requires hundreds if not thousands of hours of work from professionals manually editing the scenes he appeared in. Instead, you could use a simple AI and do it within a few minutes.
Indeed, many techniques allow you to add smiles, make you look younger or older, all automatically using AI-based algorithms. It is called AI-based face manipulations in videos and here's the current state-of-the-art in 2022!...
00:01
you've most certainly seen movies like
00:03
the recent captain marvel or gemini man
00:05
where samuel l jackson and will smith
00:07
appear to look like they were much
00:09
younger this requires hundreds if not
00:11
thousands of hours of work from
00:13
professionals manually editing the
00:15
scenes he appeared in instead you could
00:18
use a simple ai and do it within a few
00:20
minutes indeed many techniques allow you
00:22
to add smiles make you look younger or
00:25
older all automatically using ai-based
00:28
algorithms they are mostly applied to
00:30
images since it's much easier but the
00:32
same techniques with small tweaks can be
00:34
applied on videos which as you may
00:36
suspect is quite promising for the film
00:38
industry and by the way the results
00:40
you've been seeing were all made using
00:42
the technique i will discuss in this
00:44
video the main problem is that currently
00:47
these generated older versions edited
00:49
images not only seem weird but when used
00:52
in a video will have glitches and
00:54
artifacts you surely do not want in a
00:56
million dollar movie this is because
00:58
it's much harder to get videos of people
01:00
than pictures making it even harder to
01:02
train such ai models that require so
01:04
many different examples to understand
01:06
what to do this strong data dependency
01:08
is one of the reasons why current ai is
01:11
far from human intelligence this is why
01:14
researchers like rotem saban and
01:16
collaborators from tel aviv university
01:18
work hard to improve the quality of
01:20
automatic ai video editing without
01:22
requiring so many video examples or more
01:25
precisely improve ai based face
01:28
manipulations in high quality talking
01:30
head videos using models trained with
01:33
images it doesn't require anything
01:35
except the single video you want to edit
01:37
and you can add a smile make you look
01:39
younger or even older it even works with
01:42
animated videos this is so cool but
01:45
what's even better is how they achieve
01:46
that but before doing so let me talk
01:49
about the sponsor of this video um
01:52
there are no sponsors for this video so
01:54
if you could just take a second to give
01:56
it a thumbs up and leave a comment about
01:58
what you think of the model or how you'd
02:00
apply it after watching the video or
02:02
even how you feel today that will be
02:04
amazing and i can promise you i will
02:06
answer within 12 minutes you can time it
02:10
so how does it work of course it uses
02:12
cans or generative adversarial networks
02:15
i won't go into the inner workings of
02:17
guns since i already covered it in a
02:19
video that you can watch right here and
02:22
linked below but we will see how this is
02:24
different from a basic gun architecture
02:26
if you are not familiar with guns just
02:28
take a minute to watch the video and
02:30
come back i'll still be there waiting
02:32
for you and i'm not exaggerating the
02:34
video literally takes one minute to get
02:36
a simple overview of what gans are we
02:39
will just refresh the part where you
02:40
have a generative model that takes an
02:43
image or rather an encoded version of
02:45
the image and changes this code to
02:47
generate a new version of the image
02:49
modifying specific aspects if possible
02:52
controlling the generation is the
02:53
challenging part as it has so many
02:55
parameters and it's really hard to find
02:57
which parameters are in charge for what
03:00
and disentangle everything to only edit
03:02
what you want so it uses any generative
03:04
based architecture such as style gun in
03:07
this case which is simply a powerful gan
03:09
architecture for images of faces
03:11
published by nvidia a few years ago with
03:13
still very impressive results and newer
03:15
versions but the generative model itself
03:18
isn't that important as it can work with
03:21
any powerful gan architecture you can
03:23
find and yes even if these models are
03:25
all trained with images they will use
03:27
them to perform video editing assuming
03:30
that the video you will send is
03:31
realistic and already consistent they
03:33
will simply focus on maintaining realism
03:36
rather than creating a real consistent
03:38
video as we have to do in video
03:40
synthesis work where we create new
03:42
videos out of the blue so each image
03:44
will be processed individually instead
03:46
of sending a whole video and expecting a
03:48
new video in return this assumption
03:51
makes the task way simpler but there are
03:53
more challenges to face like maintaining
03:55
such a realistic video where each frame
03:57
fluently goes to the next without
03:59
apparent glitches here they will take
04:01
each frame of the video as an input
04:03
extract only the face and alloying it
04:05
for consistency which is an essential
04:07
step as we will see then they will use
04:10
their pre-trained encoder and generator
04:12
to encode the frames and produce new
04:14
versions for each unfortunately this
04:17
wouldn't fix the realism problem where
04:19
the new faces may look weird or out of
04:21
place when going from one frame to
04:23
another as well as weird lighting bugs
04:25
and background differences that may
04:27
appear to fix that they will further
04:30
train the initial generator and use it
04:32
to help make the generations across all
04:34
frames more similar and globally
04:36
coherent they also introduce two other
04:38
steps an editing step and a new
04:40
operation that they call stitching
04:42
tuning the editing step will simply take
04:44
the encoded version of the image and
04:46
change it just a bit this is the part
04:49
where it will learn to change it just
04:51
enough to make the person look older in
04:53
this case so the model will be trained
04:55
to understand which parameters to move
04:57
and how much to modify the right
04:59
features of the image to make the person
05:01
look older like adding some gray hair
05:03
adding wrinkles etc then this stitching
05:07
tuning model will take the encoded image
05:09
you see here and will be trained to
05:11
generate the image from the edited code
05:13
that will best fit the background and
05:15
other frames it will achieve that by
05:17
taking the newly generated image
05:19
comparing it with the original image and
05:21
finding the best way to replace only the
05:23
face using a mask and keep the rest of
05:25
the cropped image unchanged
05:28
finally we paste the modified face back
05:31
on the frame this process is quite
05:33
clever and allows for the production of
05:35
really high quality videos since you
05:37
only need the cropped and aligned face
05:39
in the model incredibly reducing the
05:41
computation needs and complexity of the
05:43
task so even if the face needs to be
05:45
small let's say 200 pixels square if
05:48
it's only a fifth of the image as you
05:50
can see here you can keep a pretty high
05:52
resolution video and voila this is how
05:55
these great researchers perform high
05:57
quality face manipulation in videos i
06:00
hope you enjoyed this video let me know
06:02
how you feel about this one if you liked
06:04
it or not any feedback will be amazing
06:07
this is the last opportunity you have to
06:09
make mighty by clicking the like button
06:11
and leaving a comment before you go of
06:13
course the link to the paper and code
06:15
are in the video's description note that
06:17
the code will only be released on
06:19
february 14th as per the author's github
06:22
thank you for watching
06:25
[Music]
References
►Read the full article: https://www.louisbouchard.ai/stitch-it-in-time/
►What are GANs? Short video introduction: https://youtu.be/rt-J9YJVvv4
►Tzaban, R., Mokady, R., Gal, R., Bermano, A.H. and Cohen-Or, D., 2022.
Stitch it in Time: GAN-Based Facial Editing of Real Videos. https://arxiv.org/abs/2201.08361
►Project link: https://stitch-time.github.io/
►Code: https://github.com/rotemtzaban/STIT
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/