Believe it or not, what you see is actually not a video.
It was made from a simple collection of photos and transformed into a 3-dimensional model! The best thing is that it didn’t even need a thousand pictures, only a few, and creates the missing information afterward!
As you can see, the results are amazing, but they aren’t easy to generate and require a bit more than only the images as inputs. Let’s dive in and see how the researchers achieved this as well as more fantastic examples...
00:00
believe it or not what you see here is
00:02
actually not a video it was made from a
00:04
simple collection of photos and
00:05
transformed into a three-dimensional
00:08
model the best thing is that it didn't
00:10
even need a thousand pictures only a few
00:13
and could create the missing information
00:15
afterward as you can see the results are
00:17
amazing but they aren't easy to generate
00:19
and requires a bit more than only the
00:21
images as inputs let's rewind a little
00:27
imagine you want to generate a 3d model
00:29
out of a bunch of pictures you took like
00:31
these ones
00:33
instead of only using these pictures you
00:35
will also need to feed it a point cloud
00:37
a point cloud is basically the simplest
00:39
form of a 3d model you can see it as a
00:42
draft version of your 3d model
00:44
represented by sparse points in 3d space
00:48
that looks just like this these points
00:50
also have the appropriate colors and
00:51
luminance from the images you took a
00:54
point cloud is made using multiple
00:56
photos triangulating the corresponding
00:58
points to understand their position in
01:00
3d space you now have your photos and a
01:03
point cloud or as we said your 3d draft
01:06
you are ready to improve it by the way
01:08
if you find this interesting i invite
01:10
you to subscribe like the video and
01:12
share the knowledge by sending this
01:13
video to a friend i'm sure they will
01:15
love it and they will be grateful to
01:17
learn something new because of you and
01:19
if you don't no worries thank you for
01:22
watching first you will take your images
01:24
and point cloud and send it to the first
01:26
module the rasterizer remember the point
01:29
cloud is basically our initial 3d
01:32
reconstruction or our first draft the
01:35
rasterizer will produce the first low
01:37
quality version of our 3d image using
01:39
the camera parameters from your pictures
01:42
and the point cloud it will basically
01:44
try to fill in the holes in your initial
01:46
point cloud representation approximating
01:48
colors and understanding depth this is a
01:50
very challenging task as it has to both
01:53
understand the images that do not cover
01:55
all the angles and the sparse point
01:57
cloud 3d representation it might not be
02:00
able to fill in the whole 3d image
02:02
intelligently due to this lack of
02:04
information which is why it looks like
02:06
this the still unknown pixels are
02:08
replaced by the background and this is
02:11
all still very low resolution containing
02:13
many artifacts since it's far from
02:16
perfect this step is made on multiple
02:18
resolutions to help the next module with
02:21
more information
02:22
the second module is the neural renderer
02:25
this neural renderer is just a unit like
02:27
we covered numerous times on my channel
02:29
to take an image as input and generate a
02:32
new version of it as output it will take
02:34
the incomplete renderings of various
02:36
resolutions as images understand them
02:39
and produce a new version of each image
02:42
in higher definition filling the holes
02:44
this will create high resolution images
02:47
for all missing viewpoints of the scene
02:49
of course when i say to understand them
02:52
it means that the two modules are
02:53
trained together to achieve this this
02:56
neural renderer will produce hdr novel
02:59
images of the rendering or high dynamic
03:02
range images which are basically more
03:04
realistic high resolution images of the
03:07
3d scene with better lighting the hdr
03:10
results basically look like images of
03:12
the scene in the real world this is
03:14
because the hdr images will have a much
03:17
broader range of brightness than
03:19
traditional jpeg encoded images where
03:21
the brightness can only be encoded on 8
03:24
bit with a 255 to 1 range so it won't
03:27
look great if encoded in a similar
03:29
format a third and final module the tone
03:32
mapper is introduced to take this
03:34
broader range and learn an intelligent
03:36
transformation to fit the 8-bit encoding
03:39
better this third module aims to take
03:41
these hdr novel images and transform
03:44
them into ldr images covering the whole
03:47
scene our final outputs the ldr images
03:50
or low dynamic range images will look
03:53
much better with traditional image
03:55
encodings this module basically learns
03:57
to mimic digital cameras physical lens
04:00
and sensor properties to produce similar
04:02
outputs from our previous real-world
04:04
like images there are basically four
04:06
steps in this algorithm create a point
04:09
cloud from your images to have a first
04:11
3d rendering of the scene fill in the
04:13
missing holes of this first rendering as
04:16
best as possible using the images and
04:18
camera information and do this with
04:20
various image resolutions use these
04:23
various image resolutions of the 3d
04:25
rendering in a unit to create a high
04:27
quality hdr image of this rendering for
04:30
any viewpoint transform the hdr images
04:33
into ldr images for better visualization
04:36
and voila we have the amazing looking
04:39
video of the scene we saw at the
04:41
beginning of the video as i mentioned
04:43
there are some limitations one of which
04:45
is the fact that they are highly
04:46
dependent on the quality of the point
04:48
cloud given for obvious reasons also if
04:51
the camera is very close to an object or
04:54
the point cloud 2 sparse it may cause
04:56
holes like this one in the final
04:58
rendering still the results are pretty
05:01
incredible considering the complexity of
05:03
the task we've made immense progress in
05:05
the past year you can take a look at the
05:07
videos i made covering other neural
05:09
rendering techniques less than a year
05:11
ago and compared the quality of the
05:12
results it's pretty crazy of course this
05:15
is just an overview of this new paper
05:17
attacking this super interesting task in
05:19
a novel way i invite you to read their
05:21
excellent paper for more technical
05:23
detail about their implementation and
05:25
check their github repository with
05:27
pre-trained models both are linked in
05:29
the description below thank you very
05:31
much for watching the whole video please
05:33
take a second to let me know what you
05:35
think of the overall quality of the
05:36
videos if you saw any improvements
05:38
recently or not and i will see you next
05:41
week
►Read the full article: https://www.louisbouchard.ai/ai-synthesizes-smooth-videos-from-a-couple-of-images/
►Rückert, D., Franke, L. and Stamminger, M., 2021. ADOP: Approximate Differentiable One-Pixel Point Rendering, https://arxiv.org/pdf/2110.06635.pdf.
►Code: https://github.com/darglein/ADOP.
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/.