You've most certainly seen movies like the recent Captain Marvel or Gemini Man where Samuel L Jackson and Will Smith appeared to look like they were much younger. This requires hundreds if not thousands of hours of work from professionals manually editing the scenes he appeared in. Instead, you could use a simple AI and do it within a few minutes. Indeed, many techniques allow you to add smiles, make you look younger or older, all automatically using AI-based algorithms. It is called AI-based face manipulations in videos and here's the current state-of-the-art in 2022!... Learn more in the video: Video Transcript 00:01 you've most certainly seen movies like 00:03 the recent captain marvel or gemini man 00:05 where samuel l jackson and will smith 00:07 appear to look like they were much 00:09 younger this requires hundreds if not 00:11 thousands of hours of work from 00:13 professionals manually editing the 00:15 scenes he appeared in instead you could 00:18 use a simple ai and do it within a few 00:20 minutes indeed many techniques allow you 00:22 to add smiles make you look younger or 00:25 older all automatically using ai-based 00:28 algorithms they are mostly applied to 00:30 images since it's much easier but the 00:32 same techniques with small tweaks can be 00:34 applied on videos which as you may 00:36 suspect is quite promising for the film 00:38 industry and by the way the results 00:40 you've been seeing were all made using 00:42 the technique i will discuss in this 00:44 video the main problem is that currently 00:47 these generated older versions edited 00:49 images not only seem weird but when used 00:52 in a video will have glitches and 00:54 artifacts you surely do not want in a 00:56 million dollar movie this is because 00:58 it's much harder to get videos of people 01:00 than pictures making it even harder to 01:02 train such ai models that require so 01:04 many different examples to understand 01:06 what to do this strong data dependency 01:08 is one of the reasons why current ai is 01:11 far from human intelligence this is why 01:14 researchers like rotem saban and 01:16 collaborators from tel aviv university 01:18 work hard to improve the quality of 01:20 automatic ai video editing without 01:22 requiring so many video examples or more 01:25 precisely improve ai based face 01:28 manipulations in high quality talking 01:30 head videos using models trained with 01:33 images it doesn't require anything 01:35 except the single video you want to edit 01:37 and you can add a smile make you look 01:39 younger or even older it even works with 01:42 animated videos this is so cool but 01:45 what's even better is how they achieve 01:46 that but before doing so let me talk 01:49 about the sponsor of this video um 01:52 there are no sponsors for this video so 01:54 if you could just take a second to give 01:56 it a thumbs up and leave a comment about 01:58 what you think of the model or how you'd 02:00 apply it after watching the video or 02:02 even how you feel today that will be 02:04 amazing and i can promise you i will 02:06 answer within 12 minutes you can time it 02:10 so how does it work of course it uses 02:12 cans or generative adversarial networks 02:15 i won't go into the inner workings of 02:17 guns since i already covered it in a 02:19 video that you can watch right here and 02:22 linked below but we will see how this is 02:24 different from a basic gun architecture 02:26 if you are not familiar with guns just 02:28 take a minute to watch the video and 02:30 come back i'll still be there waiting 02:32 for you and i'm not exaggerating the 02:34 video literally takes one minute to get 02:36 a simple overview of what gans are we 02:39 will just refresh the part where you 02:40 have a generative model that takes an 02:43 image or rather an encoded version of 02:45 the image and changes this code to 02:47 generate a new version of the image 02:49 modifying specific aspects if possible 02:52 controlling the generation is the 02:53 challenging part as it has so many 02:55 parameters and it's really hard to find 02:57 which parameters are in charge for what 03:00 and disentangle everything to only edit 03:02 what you want so it uses any generative 03:04 based architecture such as style gun in 03:07 this case which is simply a powerful gan 03:09 architecture for images of faces 03:11 published by nvidia a few years ago with 03:13 still very impressive results and newer 03:15 versions but the generative model itself 03:18 isn't that important as it can work with 03:21 any powerful gan architecture you can 03:23 find and yes even if these models are 03:25 all trained with images they will use 03:27 them to perform video editing assuming 03:30 that the video you will send is 03:31 realistic and already consistent they 03:33 will simply focus on maintaining realism 03:36 rather than creating a real consistent 03:38 video as we have to do in video 03:40 synthesis work where we create new 03:42 videos out of the blue so each image 03:44 will be processed individually instead 03:46 of sending a whole video and expecting a 03:48 new video in return this assumption 03:51 makes the task way simpler but there are 03:53 more challenges to face like maintaining 03:55 such a realistic video where each frame 03:57 fluently goes to the next without 03:59 apparent glitches here they will take 04:01 each frame of the video as an input 04:03 extract only the face and alloying it 04:05 for consistency which is an essential 04:07 step as we will see then they will use 04:10 their pre-trained encoder and generator 04:12 to encode the frames and produce new 04:14 versions for each unfortunately this 04:17 wouldn't fix the realism problem where 04:19 the new faces may look weird or out of 04:21 place when going from one frame to 04:23 another as well as weird lighting bugs 04:25 and background differences that may 04:27 appear to fix that they will further 04:30 train the initial generator and use it 04:32 to help make the generations across all 04:34 frames more similar and globally 04:36 coherent they also introduce two other 04:38 steps an editing step and a new 04:40 operation that they call stitching 04:42 tuning the editing step will simply take 04:44 the encoded version of the image and 04:46 change it just a bit this is the part 04:49 where it will learn to change it just 04:51 enough to make the person look older in 04:53 this case so the model will be trained 04:55 to understand which parameters to move 04:57 and how much to modify the right 04:59 features of the image to make the person 05:01 look older like adding some gray hair 05:03 adding wrinkles etc then this stitching 05:07 tuning model will take the encoded image 05:09 you see here and will be trained to 05:11 generate the image from the edited code 05:13 that will best fit the background and 05:15 other frames it will achieve that by 05:17 taking the newly generated image 05:19 comparing it with the original image and 05:21 finding the best way to replace only the 05:23 face using a mask and keep the rest of 05:25 the cropped image unchanged 05:28 finally we paste the modified face back 05:31 on the frame this process is quite 05:33 clever and allows for the production of 05:35 really high quality videos since you 05:37 only need the cropped and aligned face 05:39 in the model incredibly reducing the 05:41 computation needs and complexity of the 05:43 task so even if the face needs to be 05:45 small let's say 200 pixels square if 05:48 it's only a fifth of the image as you 05:50 can see here you can keep a pretty high 05:52 resolution video and voila this is how 05:55 these great researchers perform high 05:57 quality face manipulation in videos i 06:00 hope you enjoyed this video let me know 06:02 how you feel about this one if you liked 06:04 it or not any feedback will be amazing 06:07 this is the last opportunity you have to 06:09 make mighty by clicking the like button 06:11 and leaving a comment before you go of 06:13 course the link to the paper and code 06:15 are in the video's description note that 06:17 the code will only be released on 06:19 february 14th as per the author's github 06:22 thank you for watching 06:25 [Music] References ►Read the full article: ►What are GANs? Short video introduction: https://youtu.be/rt-J9YJVvv4 ►Tzaban, R., Mokady, R., Gal, R., Bermano, A.H. and Cohen-Or, D., 2022. Stitch it in Time: GAN-Based Facial Editing of Real Videos. ►Project link: ►Code: ►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/stitch-it-in-time/ https://arxiv.org/abs/2201.08361 https://stitch-time.github.io/ https://github.com/rotemtzaban/STIT https://www.louisbouchard.ai/newsletter/

NVIDIA

AI Generates Realistic 3D Models Using Only a Handful of Images

This AI Removes Unwanted Objects From Your Images!

Watch more on YouTube: https://www.youtube.com/c/WhatsAI

2021 - HackerNoon Contributor of the Year - DEEP-LEARNING

2021 - HackerNoon Contributor of the Year - FACEBOOK

Nominated for 2022 - Best Data Science Newsletter

Nominated for 2022 - HackerNoon Contributor of the Year - Artificial Intelligence

Nominated for 2022 - Top Tech Youtuber

Nominated for 2022 - HackerNoon Contributor of the Year - Innovation

Nominated for 2022 - HackerNoon Contributor of the Year - Data Science

Nominated for 2022 - HackerNoon Contributor of the Year - Natural Language Processing

Realistic Face Manipulation in Videos With AI

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

3D Articulated Shape Reconstruction from Videos

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

3D Articulated Shape Reconstruction from Videos

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps