Believe it or not, what you see is actually not a video. It was made from a simple collection of photos and transformed into a 3-dimensional model! The best thing is that it didn’t even need a thousand pictures, only a few, and creates the missing information afterward! As you can see, the results are amazing, but they aren’t easy to generate and require a bit more than only the images as inputs. Let’s dive in and see how the researchers achieved this as well as more fantastic examples... Watch the video Video Transcript 00:00 believe it or not what you see here is 00:02 actually not a video it was made from a 00:04 simple collection of photos and 00:05 transformed into a three-dimensional 00:08 model the best thing is that it didn't 00:10 even need a thousand pictures only a few 00:13 and could create the missing information 00:15 afterward as you can see the results are 00:17 amazing but they aren't easy to generate 00:19 and requires a bit more than only the 00:21 images as inputs let's rewind a little 00:27 imagine you want to generate a 3d model 00:29 out of a bunch of pictures you took like 00:31 these ones 00:33 instead of only using these pictures you 00:35 will also need to feed it a point cloud 00:37 a point cloud is basically the simplest 00:39 form of a 3d model you can see it as a 00:42 draft version of your 3d model 00:44 represented by sparse points in 3d space 00:48 that looks just like this these points 00:50 also have the appropriate colors and 00:51 luminance from the images you took a 00:54 point cloud is made using multiple 00:56 photos triangulating the corresponding 00:58 points to understand their position in 01:00 3d space you now have your photos and a 01:03 point cloud or as we said your 3d draft 01:06 you are ready to improve it by the way 01:08 if you find this interesting i invite 01:10 you to subscribe like the video and 01:12 share the knowledge by sending this 01:13 video to a friend i'm sure they will 01:15 love it and they will be grateful to 01:17 learn something new because of you and 01:19 if you don't no worries thank you for 01:22 watching first you will take your images 01:24 and point cloud and send it to the first 01:26 module the rasterizer remember the point 01:29 cloud is basically our initial 3d 01:32 reconstruction or our first draft the 01:35 rasterizer will produce the first low 01:37 quality version of our 3d image using 01:39 the camera parameters from your pictures 01:42 and the point cloud it will basically 01:44 try to fill in the holes in your initial 01:46 point cloud representation approximating 01:48 colors and understanding depth this is a 01:50 very challenging task as it has to both 01:53 understand the images that do not cover 01:55 all the angles and the sparse point 01:57 cloud 3d representation it might not be 02:00 able to fill in the whole 3d image 02:02 intelligently due to this lack of 02:04 information which is why it looks like 02:06 this the still unknown pixels are 02:08 replaced by the background and this is 02:11 all still very low resolution containing 02:13 many artifacts since it's far from 02:16 perfect this step is made on multiple 02:18 resolutions to help the next module with 02:21 more information 02:22 the second module is the neural renderer 02:25 this neural renderer is just a unit like 02:27 we covered numerous times on my channel 02:29 to take an image as input and generate a 02:32 new version of it as output it will take 02:34 the incomplete renderings of various 02:36 resolutions as images understand them 02:39 and produce a new version of each image 02:42 in higher definition filling the holes 02:44 this will create high resolution images 02:47 for all missing viewpoints of the scene 02:49 of course when i say to understand them 02:52 it means that the two modules are 02:53 trained together to achieve this this 02:56 neural renderer will produce hdr novel 02:59 images of the rendering or high dynamic 03:02 range images which are basically more 03:04 realistic high resolution images of the 03:07 3d scene with better lighting the hdr 03:10 results basically look like images of 03:12 the scene in the real world this is 03:14 because the hdr images will have a much 03:17 broader range of brightness than 03:19 traditional jpeg encoded images where 03:21 the brightness can only be encoded on 8 03:24 bit with a 255 to 1 range so it won't 03:27 look great if encoded in a similar 03:29 format a third and final module the tone 03:32 mapper is introduced to take this 03:34 broader range and learn an intelligent 03:36 transformation to fit the 8-bit encoding 03:39 better this third module aims to take 03:41 these hdr novel images and transform 03:44 them into ldr images covering the whole 03:47 scene our final outputs the ldr images 03:50 or low dynamic range images will look 03:53 much better with traditional image 03:55 encodings this module basically learns 03:57 to mimic digital cameras physical lens 04:00 and sensor properties to produce similar 04:02 outputs from our previous real-world 04:04 like images there are basically four 04:06 steps in this algorithm create a point 04:09 cloud from your images to have a first 04:11 3d rendering of the scene fill in the 04:13 missing holes of this first rendering as 04:16 best as possible using the images and 04:18 camera information and do this with 04:20 various image resolutions use these 04:23 various image resolutions of the 3d 04:25 rendering in a unit to create a high 04:27 quality hdr image of this rendering for 04:30 any viewpoint transform the hdr images 04:33 into ldr images for better visualization 04:36 and voila we have the amazing looking 04:39 video of the scene we saw at the 04:41 beginning of the video as i mentioned 04:43 there are some limitations one of which 04:45 is the fact that they are highly 04:46 dependent on the quality of the point 04:48 cloud given for obvious reasons also if 04:51 the camera is very close to an object or 04:54 the point cloud 2 sparse it may cause 04:56 holes like this one in the final 04:58 rendering still the results are pretty 05:01 incredible considering the complexity of 05:03 the task we've made immense progress in 05:05 the past year you can take a look at the 05:07 videos i made covering other neural 05:09 rendering techniques less than a year 05:11 ago and compared the quality of the 05:12 results it's pretty crazy of course this 05:15 is just an overview of this new paper 05:17 attacking this super interesting task in 05:19 a novel way i invite you to read their 05:21 excellent paper for more technical 05:23 detail about their implementation and 05:25 check their github repository with 05:27 pre-trained models both are linked in 05:29 the description below thank you very 05:31 much for watching the whole video please 05:33 take a second to let me know what you 05:35 think of the overall quality of the 05:36 videos if you saw any improvements 05:38 recently or not and i will see you next 05:41 week References ►Read the full article: ►Rückert, D., Franke, L. and Stamminger, M., 2021. ADOP: Approximate Differentiable One-Pixel Point Rendering, . ►Code: . ►My Newsletter (A new AI application explained weekly to your emails!): . https://www.louisbouchard.ai/ai-synthesizes-smooth-videos-from-a-couple-of-images/ https://arxiv.org/pdf/2110.06635.pdf https://github.com/darglein/ADOP https://www.louisbouchard.ai/newsletter/