Neural Rendering is the ability to generate a photorealistic model in space just like this one, from pictures of the object, person, or scene of interest. In this case, you’d have a handful of pictures of this sculpture and ask the machine to understand what the object in these pictures should look like in space. You are basically asking a machine to understand physics and shapes out of images. This is quite easy for us since we only know the real world and depths, but it’s a whole other challenge for a machine that only sees pixels. It’s great that the generated model looks accurate with realistic shapes, but what about how it blends in the new scene? And what if the lighting conditions vary in the pictures taken and the generated model looks different depending on the angle you look at it? This would automatically seem weird and unrealistic to us. These are the challenges Snapchat and the University of Southern California attacked in this new research. Watch to learn more: References: ►Read the full article: ►Kuang, Z., Olszewski, K., Chai, M., Huang, Z., Achlioptas, P. and Tulyakov, S., 2022. NeROIC: Neural Rendering of Objects from Online Image Collections. ►Project link with great video demo: ►Code: ►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/neroic/ https://arxiv.org/pdf/2201.02533.pdf https://formyfamily.github.io/NeROIC/ https://github.com/snap-research/NeROIC https://www.louisbouchard.ai/newsletter/ Video Transcript 00:00 neural rendering neural rendering is the 00:03 ability to generate a photorealistic 00:05 model in space just like this one from 00:07 pictures of the object person or scene 00:10 of interest in this case you'll have a 00:13 handful of pictures of this sculpture 00:15 and ask the machine to understand what 00:17 the object in these pictures should look 00:19 like in space you are basically asking a 00:21 machine to understand physics and shapes 00:23 out of images this is quite easy for us 00:26 since we only know the real world and 00:28 depth but it's a whole other challenge 00:30 for a machine that only sees pixels then 00:33 you might ask why do we even want to do 00:35 this i'd say that the answer is pretty 00:37 obvious to me there are many cool 00:39 applications from having an app that 00:41 could simply take a few pictures of an 00:43 object and perfectly synthesize the 3d 00:45 model to put it in images 3d scenes or 00:48 even video games this is really 00:50 promising but for these models to be 00:52 realistic lighting is another challenge 00:54 that comes with these applications it's 00:56 great that the generated model looks 00:58 accurate with realistic shapes but what 01:00 about how it blends in the new scene and 01:02 what if the lighting conditions vary in 01:05 the pictures taken and the generated 01:07 model looks different depending on the 01:09 angle you look at it this will 01:11 automatically seem weird and unrealistic 01:13 to us these are the challenges snapchat 01:15 and the university of southern 01:17 california attacked in this new research 01:19 but first a word from this episode 01:21 sponsor weights and biases weight and 01:24 biases allows you to easily keep track 01:26 of the input hyperparameters output 01:28 matrix and any insights that you and 01:30 your team have with only a handful of 01:32 lines added to your code one aspect 01:34 that's great for speeding up your 01:36 experiments is sweeps sweeps automate 01:38 hyperparameter optimization and explore 01:40 the space of all possible models without 01:42 any effort on your end it will simply 01:45 run all tests tweaking the parameters 01:47 and reporting the effect of all 01:48 parameters in clear graphs and reports 01:51 you can share with your team to explain 01:53 your final results easily i love to do 01:55 my best trying to make research look 01:57 simple and clear for you all and this is 01:59 a big reason why i love weights and 02:01 biases they are doing the same thing 02:04 with their platform making your research 02:06 look simple and reproducible i'd love 02:09 for you to check them out with the first 02:10 link below because they are helping me 02:12 continue making these videos and growing 02:14 this channel 02:16 now let's see how these researchers 02:17 tackle the lighting and realism 02:19 challenges that come with creating a 02:21 virtual object out of images the 02:23 technique builds upon neural radiance 02:25 fields which are largely used for 02:27 reconstruction with many models such as 02:30 nerf that we already covered on the 02:32 channel typically neural regions fields 02:34 need images taken in the same ideal 02:37 conditions but this is not what we want 02:39 here their approach starts with nerf and 02:42 as i said i already covered it on my 02:44 channel so i won't cover it again but 02:46 feel free to take a break and watch the 02:47 video to better understand how nerf 02:50 works in short nerf is a neural network 02:52 that is trained to infer the color 02:54 opacity and radiance of each pixel using 02:57 the images as inputs and guess the 03:00 missing pixels for the small parts of 03:02 the objects that aren't present in the 03:04 images but this approach doesn't work 03:06 for large missing parts or different 03:08 lighting conditions as it can only 03:10 interpolate from the input images here 03:13 we need something more to extrapolate 03:16 from it and make assumptions on what 03:18 should appear here and there or how 03:20 these pixels should look like under this 03:22 lighting or that one many approaches 03:25 build upon nerf to fix this but always 03:27 require more inputs from the user which 03:30 is not what we want and is hard to have 03:32 in many cases especially when we want to 03:34 build a good data set to train our model 03:37 on in short these models do not really 03:39 understand the object nor the 03:41 environment the object is in so we 03:43 always come back to the lighting problem 03:46 here the goal is to use this 03:47 architecture in online images or in 03:50 other words images with varying lighting 03:52 cameras environments and poses something 03:55 nerf can hardly do with realism the only 03:59 few things they will need other than the 04:01 images of the object themselves are a 04:03 rough foreground segmentation and an 04:06 estimation of the camera parameters 04:08 which can both be obtained with other 04:10 models available the foreground 04:12 estimation is basically just a mask that 04:14 tells you where the object of interest 04:17 is in your image like this 04:19 what they did differently is that they 04:21 separate the rendering of the object 04:23 from the environment lighting in the 04:25 input images they focus on two things 04:28 which are done in two stages first is 04:30 the object's shape or its geometry which 04:33 is the part that is most similar to nerf 04:35 here called the geometry network it will 04:38 take the input images segmentation mask 04:40 and camera parameters estimation we 04:42 discussed build a radiance field and 04:44 find the first guess of the density and 04:46 colors of each pixel as in nerf but 04:49 adapt with varying lighting conditions 04:51 in the input images this difference 04:53 relies on the two branches you see here 04:55 splitting the static content from the 04:57 varying parameters like camera or 04:59 shadows this will allow us to teach our 05:02 model how to correctly isolate the 05:04 static content from other unwanted 05:06 parameters like lighting but we are not 05:08 finished here we will estimate the 05:10 surface normals from this learned 05:13 density field which will be our textures 05:16 or in other words it will take the 05:18 results we just produced and find how 05:20 our object will react to light it will 05:23 find unbiased material properties of the 05:25 object at this stage or at least an 05:28 estimation of it using a 3d convolution 05:31 with a sobol kernel it's basically a 05:33 filter that we apply in three dimensions 05:35 to find all edges and how sharp they are 05:38 which can look like this on a 05:40 two-dimensional image and this on a 05:42 three-dimensional rendering giving us 05:44 essential information about the 05:46 different textures and shapes of the 05:48 object 05:49 the next stage is where they will fix 05:51 the long geometry and optimize the 05:53 normals we just produced using the 05:55 rendering network which is very similar 05:57 to the first geometry network here again 05:59 there are two branches one for the 06:01 material and another for the lighting 06:03 they will use spherical harmonics to 06:06 represent the lighting model and 06:08 optimize its coefficients during 06:09 training as they explain in the paper 06:11 with more information if you are 06:13 interested spherical harmonics are used 06:15 here to represent a group of basis 06:17 functions defined on the sphere surface 06:20 we can find on wikipedia that each 06:22 function defined on the surface of a 06:24 sphere can be written as a sum of these 06:27 spherical harmonics this technique is 06:29 often used for calculating the lighting 06:31 on 3d models it produces highly 06:34 realistic shading and shadowing with 06:36 comparatively little overhead in short 06:39 it will simply reduce the number of 06:40 parameters to estimate but keep the same 06:42 amount of information so instead of 06:44 learning how to render the appropriate 06:46 lighting for the whole object from 06:48 scratch the model will instead be 06:50 learning the correct coefficients to use 06:52 in the spherical harmonics that will 06:54 estimate the lighting coming out of the 06:56 surface of each pixel simplifying the 06:58 problem to a few parameters the other 07:00 branch will be trained to improve the 07:02 surface normals of the object following 07:04 the same trick using the standard foam 07:06 brdf which will model the object 07:09 material properties based on a few 07:11 parameters to find finally the outputs 07:13 of the two branches so the final 07:15 rendering and lighting will be merged to 07:18 find the final color of each pixel this 07:20 disentanglement of light and materials 07:23 is why they are able to apply any 07:25 lighting to the object and have it react 07:27 realistically remember this is done 07:30 using only a couple of images from the 07:32 internet and could all have different 07:34 lighting conditions this is amazing and 07:37 voila this is how this new paper from 07:40 quang and collaborators at snapchat 07:42 created neroic a neural rendering model 07:45 for objects from online images i hope 07:47 you enjoyed this short overview of the 07:49 paper all the references are linked 07:51 below as well as a link to the official 07:54 project and their code let me know what 07:56 you think of the explanation the 07:57 technique and how do you use it in the 08:00 real world if you are still here and 08:02 enjoy the video please don't forget to 08:04 leave a like and subscribe to the 08:06 channel it both means a lot and helps a 08:08 lot thank you for watching