Hackernoon logoInfinite Nature: Fly Into a 2D Image and Explore it as a Drone by@whatsai

Infinite Nature: Fly Into a 2D Image and Explore it as a Drone

Louis Bouchard Hacker Noon profile picture

@whatsaiLouis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

The next step for view synthesis: Perpetual View Generation, where the goal is to take an image to fly into it and explore the landscape!

Watch the video

References

Read the full article: https://www.louisbouchard.me/infinite-nature/
Paper: Liu, A., Tucker, R., Jampani, V., Makadia, A., Snavely, N. and Kanazawa, A., 2020. Infinite Nature: Perpetual View Generation of Natural

Scenes from a Single Image, https://arxiv.org/pdf/2012.09855.pdf

Project link: https://infinite-nature.github.io/

Code: https://github.com/google-research/google-research/tree/master/infinite_nature

Colab demo: https://colab.research.google.com/github/google-research/google-research/blob/master/infinite_nature/infinite_nature_demo.ipynb#scrollTo=sCuRX1liUEVM

Video Transcript

00:00

This week's paper is about a new task called "Perpetual View Generation," where the goal

00:05

is to take an image to fly into it and explore the landscape.

00:09

This is the first solution for this problem, but it is extremely impressive considering

00:13

we only feed one image into the network, and it can generate what it would look like to

00:18

fly into it like a bird.

00:20

Of course, this task is extremely complex and will improve over time.

00:24

As two-minute papers would say, imagine in just a couple of papers down the line how

00:28

useful this technology can be for video games or flight simulators!

00:32

I'm amazed to see how well it already works, even if this is the paper introducing this

00:37

new task.

00:38

Especially considering how complex this task is.

00:41

And not only because it has to generate new viewpoints like GANverse3D is doing, which

00:46

I covered in a previous video, but it also has to generate a new image at

00:50

each frame, and once you pass a couple of dozen frames, you will have close to nothing

00:54

left from the original image to use.

00:57

And yes, this can be done over hundreds of frames while still looking a lot better than

01:02

current view synthesis approaches.

01:04

Let's see how they can generate an entire bird-view video in the wanted direction from

01:09

a single picture and how you can try it yourself right now without having to set up anything!

01:15

To do that, they have to use the geometry of the image, so they first need to produce

01:19

a disparity map of the image.

01:22

This is done using a state-of-the-art network called MiDaS, which I will not enter into,

01:27

but this is the output it gives.

01:29

This disparity map is basically an inverse depth map, informing the network of the depths

01:34

inside the scene.

01:35

Then, we enter the real first step of their technique, which is the renderer.

01:39

The goal of this renderer is to generate a new view based on the old view.

01:44

This new view will be the next frame, and as you understood, the old view is the input

01:49

image.

01:50

This is done using a differentiable renderer.

01:53

Differentiable just because we can use backpropagation to train it, just like we traditionally do

01:58

with the conventional deep nets, you know.

02:01

This renderer takes the image and disparity map to produce a three-dimensional mesh representing

02:06

the scene.

02:07

Then, we simply use this 3D mesh to generate an image from a novel viewpoint, P1 in this

02:12

case.

02:13

This gives us this amazing new picture that looks just a bit zoomed, but it is not simply

02:18

zoomed in.

02:19

There are some pink marks on the rendered image and black marks on the disparity map,

02:24

as you can see.

02:25

They correspond to the occluded regions and regions outside the field of view in the previous

02:29

image used as input to the renderer since this renderer only generates a new view but

02:35

is unable to invent unseen details.

02:38

This leads us to quite a problem, how can we have a complete and realistic image if

02:43

we do not know what goes there?

02:46

Well, we can use another network that will also take this new disparity map and image

02:50

as input to 'refine' it.

02:53

This other network called SPADE is also a state-of-the network, but for conditional

02:58

image synthesis.

02:59

Here, it is a conditional image synthesis network because we need to tell our network

03:04

some conditions, which in this case are the pink and black missing parts.

03:08

We basically send this faulty image to the second network to fill in holes and add the

03:13

necessary details.

03:14

You can see this SPADE network as a GAN architecture where the image is first encoded into a latent

03:19

code that will give us the style of the image.

03:23

Then, this code is decoded to generate a new version of the initial image, simply filling

03:28

the missing parts with new information following the same style present in the encoded information.

03:34

And voilà!

03:36

You have your new frame and its reverse depth map.

03:39

You can now simply repeat the process over and over to get all future frames, which now

03:43

looks like this.

03:45

Using this output as input in the next iteration, you can produce an infinity of iterations,

03:50

always following the wanted viewpoint and the precedent frame context!

03:53

photo: fig 2 -> repeat [step 3...] -> video examples

03:54

As you know, such powerful algorithms frequently need data and annotation to be trained on,

03:59

and this one isn't the exception.

04:01

To do so, they needed aerial footage of nature taken from drones, which they took from youtube,

04:07

manually curated, and pre-processed them to create their own dataset.

04:11

Fortunately for other researchers wanting to attack this challenge, you don't have to

04:15

do the same thing since they released this dataset of aerial footage of natural coastal

04:20

scenes used to train their algorithm.

04:21

It is available for download on their project page, which is linked in the description below.

04:26

As I mentioned, you can even try it yourself as they made the code publicly available,

04:31

but they also created a demo you can try right now on google colab.

04:35

The link is in the description below.

04:37

You just have to run the first few cells like this, which will install the code and dependencies,

04:42

load their model, and there you go.

04:44

You can now free-fly around the images they have and even upload your own!

04:48

Of course, all the steps I just mentioned were already there.

04:51

Simply run the code and enjoy!

04:53

You can find the article covering this paper on my newly created website, as well as our

04:58

discord community, my guide to learning machine learning, and more exciting stuff I will be

05:03

sharing on there.

05:04

Feel free to become a free member and get notified of new articles I share!

05:09

Congratulations to the winners of the NVIDIA GTC giveaway, all appearing on the screen

05:13

right now.

05:14

You should have received an email from me with the DLI code!

05:18

Thank you for watching.



Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.