In a new paper titled Total Relighting, a research team at Google presents a novel per-pixel lighting representation in a deep learning framework. This explicitly models the diffuse and the specular components of appearance, producing relit portraits with convincingly rendered effects like specular highlights. This would be super cool to use in your next zoom meeting! See how it works and what it can do below. Watch the video References The article: Paper: Project link: Full reference: Pandey et al., 2021, Total Relighting: Learning to Relight Portraits for Background Replacement, doi: 10.1145/3450626.3459872 https://www.louisbouchard.ai/backgrounds-with-lighting/ https://augmentedperception.github.io/total_relighting/total_relighting_paper.pdf https://augmentedperception.github.io/total_relighting/ Video Transcript 00:00 Have you ever wanted to change the background of a picture but have it look realistic? 00:04 If you’ve already tried that, you already know that it isn’t simple. 00:08 You can’t just take a picture of yourself in your home and change the background for 00:12 a beach. 00:13 It just looks bad and not realistic. 00:15 Anyone will just say “that’s photoshopped” in a second. 00:18 For movies and professional videos, you need the perfect lighting and artists to reproduce 00:23 a high-quality image, and that’s super expensive. 00:26 There’s no way you can do that with your own pictures. 00:29 Or can you? 00:30 Well, this is what Google Research is trying to achieve with this new paper called Total 00:36 Relighting. 00:37 The goal is to properly relight any portrait based on the lighting of the new background 00:41 you add. 00:42 This task is called “Portrait relighting and background replacement”, which, as its 00:47 name says, has two very complicated sub-tasks: Background replacement, meaning that you will 00:53 need to accurately remove the current image’s background to only have your portrait. 00:59 Portrait relighting, where you will adapt your portrait based on the lighting in the 01:04 new background’s scene. 01:05 As you may expect, both these tasks are extremely challenging as the algorithm needs to understand 01:11 the image to properly remove you out of it and then understand the other image enough 01:16 to change the lighting of your portrait to make it fit the new scene. 01:20 The most impressive thing about this paper is that these two tasks are made without any 01:24 priors. 01:25 Meaning that they do not need any other information than two pictures: your portrait and the new 01:30 background to create this new realistic image. 01:33 Let’s get back to how they attacked these two tasks in detail: 01:36 This first task of removing the background of your portrait is called image matting, 01:41 or in this case, human matting, where we want to identify a human in a picture accurately. 01:47 The ‘accurate’ part makes it complex because of many fine-grain details like the floating 01:52 hair humans have. 01:54 You can’t just crop out the face without the hair. 01:57 It will just look wrong. 01:58 To achieve this, they need to train a model that can first find the human, then predict 02:03 an approximate result where we specify what we are sure is part of the person, what is 02:08 part of the background, and what is unsure. 02:11 This is called a trimap, and it is found using a classic segmentation system trained to do 02:17 exactly that: segment people in images. 02:21 This trimap is then refined using an encoder-decoder architecture, as I already explained in a 02:25 previous video if you are interested. 02:28 It basically takes this initial trimap, downscale it into condensed information, and uses this 02:33 condensed information to upscale this into a better trimap. 02:38 This may seem like magic, but it works because the network transforming this trimap into 02:43 code and code into a better trimap was trained on thousands of examples and learned how to 02:48 achieve this. 02:49 Then, they use this second trimap to again refine it into the final predicted human shape, 02:55 which is called an alpha matte. 02:57 This step also uses a neural network. 02:59 So we basically have three networks involved here, one that takes the image and generates 03:05 a trimap, a second that takes this image and trimap to improve the trimap, and the last 03:10 one that takes all these as inputs to generate the final alpha matte. 03:14 All these sub-steps are learned during training, where we show many examples of what we want 03:20 to the networks working together to improve the final result iteratively. 03:24 Again, it is very similar to what I previously covered in my video about MODNet, a network 03:31 doing precisely that, if you want more information about human matting. 03:36 Here, all these networks composed only the first step of this algorithm: the human matting. 03:43 What’s new with this paper is the second real step, which they refer to as the relighting 03:49 module. 03:50 Now that we have an accurate prediction of where the person is in the image, we need 03:54 to make it look realistic. 03:57 To do so, it is very important that the lighting on the person matches the background, so they 04:02 need to either relight the person or the background scene. 04:05 Here, as most would agree, the simplest is to relight the person, so they aimed for this. 04:10 This relighting was definitely the most complex task between the two as they needed to understand 04:15 how the human body reacts to light. 04:17 As you can see here, there are multiple networks here again. 04:20 The geometry net, an albedo net, and a shading net. 04:24 The geometry net takes the input foreground we produced on the previous step to produce 04:29 surface normals. 04:30 This is a modelization of the person’s surface so that the model can understand the depths 04:36 and light interactions. 04:38 Then, this surface normal is coupled with the same foreground image and sent into an 04:44 albedo net that produces the albedo image. 04:47 This albedo image is simply a measure of the proportion of light reflected by our object 04:52 of interest, which is a person, in this case, reacting to light from different sources. 04:57 It tells us how the clothing and skin of the person react to the light it receives, helping 05:03 us for the next step. 05:04 This next step has to do with the light of the new background. 05:07 We will try to understand how the new background lighting affects our portrait using learned 05:12 specular reflectance and diffuse light representations of our portrait here called light maps. 05:19 These light maps are calculated using a panoramic view of your wanted background. 05:23 Just like the name says, these light maps basically show how the light interacts with 05:27 the subject in many situations. 05:30 These maps allow us to make the skin and clothing appear shinier or more matte depending on 05:35 the background’s lighting. 05:37 Then, these light maps, the albedo image, and the foreground are merged into the final 05:43 and third network, the shading network. 05:46 This shading network first produces a final version of the specular light map using the 05:51 albedo information coupled with all the specular light map candidates we calculated previously. 05:57 Using this final light map, our diffuse map, and the albedo, we can finally render the 06:01 final relit person ready to be inserted on our new background. 06:06 As you saw, all the networks looked the same, exactly like this, which is called a U-Net, 06:12 or encoder-decoder architecture. 06:16 Just like I already said, it takes an input, condenses it into codes representing this 06:21 input, and upscale it into a new image. 06:24 But as I already explained in previous videos, these ‘encoder-decoders’ just take an 06:29 image into the first part of the network, which is the encoder that transforms it into 06:34 condensed information called latent code that you can see here on the right. 06:39 This information basically contains the relevant information to reconstruct the image based 06:44 on whatever style we want it to have. 06:47 Using what they learned during training, the decoder does the reverse step using this information 06:51 to produce a new image with this new style. 06:54 This style can be a new lighting orientation, but also a completely different image like 06:58 a surface map or even an alpha matte, just like in our first step. 07:03 This technique is extremely powerful, mainly because of the training they did. 07:06 Here, they used 58 cameras with multiple lights and 70 different individuals doing various 07:13 poses and expressions. 07:15 But don’t worry, this is only needed for training the algorithm. 07:19 The only thing needed at inference time is your picture and your new background. 07:23 Also, you may recall that I mentioned a panoramic view was needed to produce this re-lightened 07:28 image, but it can also be accurately approximated with another neural network based on only 07:33 the background picture you want your portrait to be translated on. 07:37 And that’s it! 07:38 Merging these two techniques together makes it, so you just have to give two images to 07:42 the algorithm, and it will do everything for you, producing a realistically re-lightened 07:46 portrait of yourself with a different background! 07:49 This paper by Pandey et al. applies it to humans, but you can imagine how useful it 07:55 could be on objects as well where you can just take pictures of objects and put them 07:59 in a new scene with the correct lighting to make them look real. 08:09 Thank you for watching!

Google

Super

YouTube

3D Articulated Shape Reconstruction from Videos

NVIDIA ADA: Train Your GAN With 1/10th of the Data

Watch more on YouTube: https://www.youtube.com/c/WhatsAI

2021 - HackerNoon Contributor of the Year - DEEP-LEARNING

2021 - HackerNoon Contributor of the Year - FACEBOOK

Nominated for 2022 - Best Data Science Newsletter

Nominated for 2022 - HackerNoon Contributor of the Year - Artificial Intelligence

Nominated for 2022 - Top Tech Youtuber

Nominated for 2022 - HackerNoon Contributor of the Year - Innovation

Nominated for 2022 - HackerNoon Contributor of the Year - Data Science

Nominated for 2022 - HackerNoon Contributor of the Year - Natural Language Processing

Introducing Total Relighting by Google

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

3D Articulated Shape Reconstruction from Videos

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

3D Articulated Shape Reconstruction from Videos

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps