Believe it or not, what you see is actually not a video. It was made from a simple collection of photos and transformed into a 3-dimensional model! The best thing is that it didn’t even need a thousand pictures, only a few, and creates the missing information afterward! As you can see, the results are amazing, but they aren’t easy to generate and require a bit more than only the images as inputs. Let’s dive in and see how the researchers achieved this as well as more fantastic examples... Watch the video Video Transcript 00:00 believe it or not what you see here is 00:02 actually not a video it was made from a 00:04 simple collection of photos and 00:05 transformed into a three-dimensional 00:08 model the best thing is that it didn't 00:10 even need a thousand pictures only a few 00:13 and could create the missing information 00:15 afterward as you can see the results are 00:17 amazing but they aren't easy to generate 00:19 and requires a bit more than only the 00:21 images as inputs let's rewind a little 00:27 imagine you want to generate a 3d model 00:29 out of a bunch of pictures you took like 00:31 these ones 00:33 instead of only using these pictures you 00:35 will also need to feed it a point cloud 00:37 a point cloud is basically the simplest 00:39 form of a 3d model you can see it as a 00:42 draft version of your 3d model 00:44 represented by sparse points in 3d space 00:48 that looks just like this these points 00:50 also have the appropriate colors and 00:51 luminance from the images you took a 00:54 point cloud is made using multiple 00:56 photos triangulating the corresponding 00:58 points to understand their position in 01:00 3d space you now have your photos and a 01:03 point cloud or as we said your 3d draft 01:06 you are ready to improve it by the way 01:08 if you find this interesting i invite 01:10 you to subscribe like the video and 01:12 share the knowledge by sending this 01:13 video to a friend i'm sure they will 01:15 love it and they will be grateful to 01:17 learn something new because of you and 01:19 if you don't no worries thank you for 01:22 watching first you will take your images 01:24 and point cloud and send it to the first 01:26 module the rasterizer remember the point 01:29 cloud is basically our initial 3d 01:32 reconstruction or our first draft the 01:35 rasterizer will produce the first low 01:37 quality version of our 3d image using 01:39 the camera parameters from your pictures 01:42 and the point cloud it will basically 01:44 try to fill in the holes in your initial 01:46 point cloud representation approximating 01:48 colors and understanding depth this is a 01:50 very challenging task as it has to both 01:53 understand the images that do not cover 01:55 all the angles and the sparse point 01:57 cloud 3d representation it might not be 02:00 able to fill in the whole 3d image 02:02 intelligently due to this lack of 02:04 information which is why it looks like 02:06 this the still unknown pixels are 02:08 replaced by the background and this is 02:11 all still very low resolution containing 02:13 many artifacts since it's far from 02:16 perfect this step is made on multiple 02:18 resolutions to help the next module with 02:21 more information 02:22 the second module is the neural renderer 02:25 this neural renderer is just a unit like 02:27 we covered numerous times on my channel 02:29 to take an image as input and generate a 02:32 new version of it as output it will take 02:34 the incomplete renderings of various 02:36 resolutions as images understand them 02:39 and produce a new version of each image 02:42 in higher definition filling the holes 02:44 this will create high resolution images 02:47 for all missing viewpoints of the scene 02:49 of course when i say to understand them 02:52 it means that the two modules are 02:53 trained together to achieve this this 02:56 neural renderer will produce hdr novel 02:59 images of the rendering or high dynamic 03:02 range images which are basically more 03:04 realistic high resolution images of the 03:07 3d scene with better lighting the hdr 03:10 results basically look like images of 03:12 the scene in the real world this is 03:14 because the hdr images will have a much 03:17 broader range of brightness than 03:19 traditional jpeg encoded images where 03:21 the brightness can only be encoded on 8 03:24 bit with a 255 to 1 range so it won't 03:27 look great if encoded in a similar 03:29 format a third and final module the tone 03:32 mapper is introduced to take this 03:34 broader range and learn an intelligent 03:36 transformation to fit the 8-bit encoding 03:39 better this third module aims to take 03:41 these hdr novel images and transform 03:44 them into ldr images covering the whole 03:47 scene our final outputs the ldr images 03:50 or low dynamic range images will look 03:53 much better with traditional image 03:55 encodings this module basically learns 03:57 to mimic digital cameras physical lens 04:00 and sensor properties to produce similar 04:02 outputs from our previous real-world 04:04 like images there are basically four 04:06 steps in this algorithm create a point 04:09 cloud from your images to have a first 04:11 3d rendering of the scene fill in the 04:13 missing holes of this first rendering as 04:16 best as possible using the images and 04:18 camera information and do this with 04:20 various image resolutions use these 04:23 various image resolutions of the 3d 04:25 rendering in a unit to create a high 04:27 quality hdr image of this rendering for 04:30 any viewpoint transform the hdr images 04:33 into ldr images for better visualization 04:36 and voila we have the amazing looking 04:39 video of the scene we saw at the 04:41 beginning of the video as i mentioned 04:43 there are some limitations one of which 04:45 is the fact that they are highly 04:46 dependent on the quality of the point 04:48 cloud given for obvious reasons also if 04:51 the camera is very close to an object or 04:54 the point cloud 2 sparse it may cause 04:56 holes like this one in the final 04:58 rendering still the results are pretty 05:01 incredible considering the complexity of 05:03 the task we've made immense progress in 05:05 the past year you can take a look at the 05:07 videos i made covering other neural 05:09 rendering techniques less than a year 05:11 ago and compared the quality of the 05:12 results it's pretty crazy of course this 05:15 is just an overview of this new paper 05:17 attacking this super interesting task in 05:19 a novel way i invite you to read their 05:21 excellent paper for more technical 05:23 detail about their implementation and 05:25 check their github repository with 05:27 pre-trained models both are linked in 05:29 the description below thank you very 05:31 much for watching the whole video please 05:33 take a second to let me know what you 05:35 think of the overall quality of the 05:36 videos if you saw any improvements 05:38 recently or not and i will see you next 05:41 week References ►Read the full article: ►Rückert, D., Franke, L. and Stamminger, M., 2021. ADOP: Approximate Differentiable One-Pixel Point Rendering, . ►Code: . ►My Newsletter (A new AI application explained weekly to your emails!): . https://www.louisbouchard.ai/ai-synthesizes-smooth-videos-from-a-couple-of-images/ https://arxiv.org/pdf/2110.06635.pdf https://github.com/darglein/ADOP https://www.louisbouchard.ai/newsletter/

BUNCH

Rewind

Super

On Investing In People Over Ideas or Apps with AI YouTuber Louis Bouchard

This AI Can Separate Speech, Music and Sound Effects from Movie Soundtracks

Watch more on YouTube: https://www.youtube.com/c/WhatsAI

2021 - HackerNoon Contributor of the Year - DEEP-LEARNING

2021 - HackerNoon Contributor of the Year - FACEBOOK

Nominated for 2022 - Best Data Science Newsletter

Nominated for 2022 - HackerNoon Contributor of the Year - Artificial Intelligence

Nominated for 2022 - Top Tech Youtuber

Nominated for 2022 - HackerNoon Contributor of the Year - Innovation

Nominated for 2022 - HackerNoon Contributor of the Year - Data Science

Nominated for 2022 - HackerNoon Contributor of the Year - Natural Language Processing

This AI Creates Videos From a Couple of Images

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

3D Articulated Shape Reconstruction from Videos

3D Articulated Shape Reconstruction from Videos

How Helpful is Primavera in a Construction

Neuralangelo is The Best Video-to-3D AI to Date

Revolutionizing 3D Model Generation with MVDream AI

ICCV 2019: Papers that indicate the future of computer vision (Satellites to 3D reconstruction)

3D Articulated Shape Reconstruction from Videos

3D Articulated Shape Reconstruction from Videos

How Helpful is Primavera in a Construction

Neuralangelo is The Best Video-to-3D AI to Date

Revolutionizing 3D Model Generation with MVDream AI

ICCV 2019: Papers that indicate the future of computer vision (Satellites to 3D reconstruction)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps