This AI Creates Videos From a Couple of Imagesby@whatsai

# This AI Creates Videos From a Couple of Images

November 1st, 2021

## Too Long; Didn't Read

Researchers created a simple collection of photos and transformed them into a 3-dimensional model. The best thing is that it didn’t even need a thousand pictures, only a few, and could create the missing information afterward. The results are amazing but they aren't easy to generate and require a bit more than only the images as inputs.

Believe it or not, what you see is actually not a video.

It was made from a simple collection of photos and transformed into a 3-dimensional model! The best thing is that it didn’t even need a thousand pictures, only a few, and creates the missing information afterward!

As you can see, the results are amazing, but they aren’t easy to generate and require a bit more than only the images as inputs. Let’s dive in and see how the researchers achieved this as well as more fantastic examples...

## Video Transcript

00:00

believe it or not what you see here is

00:02

actually not a video it was made from a

00:04

simple collection of photos and

00:05

transformed into a three-dimensional

00:08

model the best thing is that it didn't

00:10

even need a thousand pictures only a few

00:13

and could create the missing information

00:15

afterward as you can see the results are

00:17

amazing but they aren't easy to generate

00:19

and requires a bit more than only the

00:21

images as inputs let's rewind a little

00:27

imagine you want to generate a 3d model

00:29

out of a bunch of pictures you took like

00:31

these ones

00:33

instead of only using these pictures you

00:35

will also need to feed it a point cloud

00:37

a point cloud is basically the simplest

00:39

form of a 3d model you can see it as a

00:42

draft version of your 3d model

00:44

represented by sparse points in 3d space

00:48

that looks just like this these points

00:50

also have the appropriate colors and

00:51

luminance from the images you took a

00:54

point cloud is made using multiple

00:56

photos triangulating the corresponding

00:58

points to understand their position in

01:00

3d space you now have your photos and a

01:03

point cloud or as we said your 3d draft

01:06

you are ready to improve it by the way

01:08

if you find this interesting i invite

01:10

you to subscribe like the video and

01:12

share the knowledge by sending this

01:13

video to a friend i'm sure they will

01:15

love it and they will be grateful to

01:17

learn something new because of you and

01:19

if you don't no worries thank you for

01:22

watching first you will take your images

01:24

and point cloud and send it to the first

01:26

module the rasterizer remember the point

01:29

cloud is basically our initial 3d

01:32

reconstruction or our first draft the

01:35

rasterizer will produce the first low

01:37

quality version of our 3d image using

01:39

the camera parameters from your pictures

01:42

and the point cloud it will basically

01:44

try to fill in the holes in your initial

01:46

point cloud representation approximating

01:48

colors and understanding depth this is a

01:50

very challenging task as it has to both

01:53

understand the images that do not cover

01:55

all the angles and the sparse point

01:57

cloud 3d representation it might not be

02:00

able to fill in the whole 3d image

02:02

intelligently due to this lack of

02:04

information which is why it looks like

02:06

this the still unknown pixels are

02:08

replaced by the background and this is

02:11

all still very low resolution containing

02:13

many artifacts since it's far from

02:16

perfect this step is made on multiple

02:18

resolutions to help the next module with

02:21

02:22

the second module is the neural renderer

02:25

this neural renderer is just a unit like

02:27

we covered numerous times on my channel

02:29

to take an image as input and generate a

02:32

new version of it as output it will take

02:34

the incomplete renderings of various

02:36

resolutions as images understand them

02:39

and produce a new version of each image

02:42

in higher definition filling the holes

02:44

this will create high resolution images

02:47

for all missing viewpoints of the scene

02:49

of course when i say to understand them

02:52

it means that the two modules are

02:53

trained together to achieve this this

02:56

neural renderer will produce hdr novel

02:59

images of the rendering or high dynamic

03:02

range images which are basically more

03:04

realistic high resolution images of the

03:07

3d scene with better lighting the hdr

03:10

results basically look like images of

03:12

the scene in the real world this is

03:14

because the hdr images will have a much

03:17

broader range of brightness than

03:19

traditional jpeg encoded images where

03:21

the brightness can only be encoded on 8

03:24

bit with a 255 to 1 range so it won't

03:27

look great if encoded in a similar

03:29

format a third and final module the tone

03:32

mapper is introduced to take this

03:34

broader range and learn an intelligent

03:36

transformation to fit the 8-bit encoding

03:39

better this third module aims to take

03:41

these hdr novel images and transform

03:44

them into ldr images covering the whole

03:47

scene our final outputs the ldr images

03:50

or low dynamic range images will look

03:53

much better with traditional image

03:55

encodings this module basically learns

03:57

to mimic digital cameras physical lens

04:00

and sensor properties to produce similar

04:02

outputs from our previous real-world

04:04

like images there are basically four

04:06

steps in this algorithm create a point

04:09

cloud from your images to have a first

04:11

3d rendering of the scene fill in the

04:13

missing holes of this first rendering as

04:16

best as possible using the images and

04:18

camera information and do this with

04:20

various image resolutions use these

04:23

various image resolutions of the 3d

04:25

rendering in a unit to create a high

04:27

quality hdr image of this rendering for

04:30

any viewpoint transform the hdr images

04:33

into ldr images for better visualization

04:36

and voila we have the amazing looking

04:39

video of the scene we saw at the

04:41

beginning of the video as i mentioned

04:43

there are some limitations one of which

04:45

is the fact that they are highly

04:46

dependent on the quality of the point

04:48

cloud given for obvious reasons also if

04:51

the camera is very close to an object or

04:54

the point cloud 2 sparse it may cause

04:56

holes like this one in the final

04:58

rendering still the results are pretty

05:01

incredible considering the complexity of

05:03

the task we've made immense progress in

05:05

the past year you can take a look at the

05:07

videos i made covering other neural

05:09

rendering techniques less than a year

05:11

ago and compared the quality of the

05:12

results it's pretty crazy of course this

05:15

is just an overview of this new paper

05:17

attacking this super interesting task in

05:19

a novel way i invite you to read their

05:21

excellent paper for more technical

05:23

detail about their implementation and

05:25

check their github repository with

05:27

pre-trained models both are linked in

05:29

the description below thank you very

05:31

much for watching the whole video please

05:33

take a second to let me know what you

05:35

think of the overall quality of the

05:36

videos if you saw any improvements

05:38

recently or not and i will see you next

05:41

week

## References

►Rückert, D., Franke, L. and Stamminger, M., 2021. ADOP: Approximate Differentiable One-Pixel Point Rendering, https://arxiv.org/pdf/2110.06635.pdf.