Have you ever imagined being able to take a picture and just magically dive into it as if it would be a door to another world?
Well, whether you thought about this or not, some people did, and thanks to them, it is now possible with AI! This is just one step away from teleportation and being able to be there physically. Maybe one day AI will help with that and fix an actual problem too! I’m just kidding, this is really cool, and I’m glad some people are working on it.
This is InfiniteNature… Zero! It is called this way because it is a follow-up on a paper I previously covered called InfiniteNature. What’s the difference? Quality!
Learn more in the video...
►Read the full article: https://www.louisbouchard.ai/infinitenature-zero/
►Li, Z., Wang, Q., Snavely, N. and Kanazawa, A., 2022.
InfiniteNature-Zero: Learning Perpetual View Generation of Natural
Scenes from Single Images. In European Conference on Computer Vision
(pp. 515-534). Springer, Cham, https://arxiv.org/abs/2207.11148
►Code and project website: https://infinite-nature-zero.github.io/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
0:00
have you ever imagined being able to
0:02
take a picture and just magically dive
0:04
into it as if it will be a door to
0:06
another world well whether you thought
0:08
about this or not some people did and
0:11
thanks to them it's now possible with AI
0:13
this is just one step away from
0:16
teleportation and being able to be there
0:18
physically maybe one day AI will help
0:21
with that and fix an actual problem too
0:23
I'm just kidding this is really cool and
0:25
I'm glad some people are working on it
0:27
this is infinite nature zero it's called
0:31
this way because it's a follow-up on a
0:33
paper I previously covered called
0:35
infinite nature what's the difference
0:37
quality just look at that it's so much
0:40
better in only one paper it's incredible
0:43
you can actually feel like you are
0:45
diving into the picture and it only
0:47
requires one input picture how cool is
0:50
that the only thing even cooler is how
0:53
it works let's dive into it but first
0:56
allow me 10 seconds of your time for a
0:58
sponsor of this video myself yes only 10
1:01
seconds I don't think I deserve more
1:02
compared to the amazing companies that
1:04
usually sponsor my work if you like the
1:06
videos first I think you should
1:08
subscribe to the channel but I also
1:10
think you will love my two newsletters
1:12
where I share daily research papers and
1:15
news and the weekly one where I share
1:17
these videos and very interesting
1:19
discussions related to these papers and
1:21
AI ethics you should probably follow me
1:24
on Twitter as well at what's AI if you'd
1:26
like to stay up to date with the news
1:28
and papers in the field tons are coming
1:30
out with the cvpr deadlines that just
1:32
passed and you don't want to miss out on
1:34
those so how does infinite nature zero
1:37
work it all starts with a single image
1:40
you send as input yes a single image it
1:43
doesn't require a video or multiple
1:44
views or anything else this is different
1:47
from their previous paper that I also
1:49
covered where they needed videos to help
1:51
the model understand natural scenes
1:53
during training which is also why they
1:55
call this model infinite nature zero
1:58
because it requires zero videos here
2:01
their work is divided into three methods
2:03
used during training in order to get
2:05
those results to start the model
2:07
randomly samples two virtual camera
2:10
trajectories which will tell you where
2:12
you are going in the image why too
2:14
because the force is necessary to
2:16
generate a new view telling you where to
2:19
fly into the image to generate a second
2:21
image this is the actual trajectory you
2:24
will be taking the second virtual
2:25
trajectory is used during training to
2:28
dive and return to the original image to
2:31
teach the model to learn geometry aware
2:33
view refinement during view generation
2:36
in a self-supervised way as we teach it
2:39
to get back to an image we already have
2:42
in our training data set they refer to
2:44
this approach as a cyclic virtual camera
2:46
trajectory as the starting and ending
2:48
views are the same our input image they
2:51
do that by going to a virtual or fake
2:54
sample Viewpoint and returning to the
2:56
original view afterward just to teach
2:58
the Reconstruction part to the model the
3:01
viewpoints are sampled using an
3:03
algorithm called the autopilot algorithm
3:05
to find the sky and not Skydive into
3:08
rocks or the ground as nobody would like
3:10
to do that then during training we use a
3:13
gun-like approach using a discriminator
3:15
to measure how much the new view
3:17
generated looks like a real image
3:19
represented with L adversarial or ladv
3:23
so yes guns aren't dead yet this is a
3:26
very cool application of them for
3:28
guiding the training when you don't have
3:30
any ground roof for example when you
3:32
don't have infinite images in this case
3:34
basically they use another model a
3:37
discriminator trained on our training
3:39
data set that can see if an image seems
3:42
to be part of it or Not So based on its
3:44
answer you can improve the generation to
3:46
make it look like an image from our data
3:49
set which supposedly looks realistic we
3:52
also measure the difference between our
3:53
regenerated initial image and the
3:56
original one to help the model
3:57
iteratively get better at reconstruct
3:59
acting it represented by L Rick here and
4:03
we simply repeat this process multiple
4:05
times to generate our novel frames and
4:07
create these kinds of videos there's one
4:10
last thing to tweak before getting those
4:12
amazing results they saw that with their
4:14
approach the sky due to its infinite
4:17
nature compared to the ground changes
4:19
way too quickly to fix that they use
4:21
another segmentation model to find the
4:24
sky automatically in the generated
4:26
images and fix it using an intelligent
4:28
blending system between the generated
4:31
sky and the sky from our initial image
4:33
so that it doesn't change too quickly
4:35
and unrealistically after training with
4:38
this two-step process and scale
4:40
refinement infinite nature 0 allows you
4:42
to have stable long-range trajectories
4:44
for natural scenes as well as accurately
4:47
generate Noble views that are
4:49
geometrically coherent and voila this is
4:52
how you can take a picture and dive into
4:54
it as if you were a bird I invite you to
4:56
read their paper for more details on
4:58
their method and in limitation
5:00
especially regarding how they achieve to
5:02
train their model in such a clever way
5:05
as I omitted some technical details
5:07
making this possible for Simplicity by
5:09
the way the code is available and linked
5:11
below if you'd like to try it let me
5:13
know if you do and send me the results
5:15
I'd love to see them thank you for
5:17
watching and I hope you've enjoyed this
5:19
video I will see you next week with
5:21
another amazing paper