THE BEST Photo to 3D AI Model ! by@whatsai

THE BEST Photo to 3D AI Model !

image
Louis Bouchard HackerNoon profile picture

Louis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

As if taking a picture wasn’t a challenging enough technological prowess, we are now doing the opposite: modeling the world from pictures. I’ve covered amazing AI-based models that could take images and turn them into high-quality scenes. A challenging task that consists of taking a few images in the 2-dimensional picture world to create how the object or person would look in the real world.

Take a few pictures and instantly have a realistic model to insert into your product. How cool is that?!

The results have dramatically improved upon the first model I covered in 2020, called NeRF. And this improvement isn’t only about the quality of the results. NVIDIA made it even better.

Not only that the quality is comparable, if not better, but it is more than 1'000 times faster with less than two years of research.

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/nvidia-photos-into-3d-scenes/
►NVIDIA's blog post (credit to video): https://blogs.nvidia.com/blog/2022/03/25/instant-nerf-research-3d-ai/
►NVIDIA's video: https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.mp4
►Paper: Thomas Muller, Alex Evans, Christoph Schied and Alexander
Keller, 2022, "Instant Neural Graphics Primitives with a Multiresolution
Hash Encoding", https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf
►Project link: https://nvlabs.github.io/instant-ngp/
►Code: https://github.com/NVlabs/instant-ngp
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

       0:00

as if taking a picture wasn't a

0:02

challenging enough technological prowess

0:05

we are now doing the opposite modeling

0:07

the world from pictures i've covered

0:09

amazing ai based models that could take

0:12

images and turn them into high quality

0:14

scenes a challenging task that consists

0:16

of taking a few images in the

0:18

two-dimensional picture world to create

0:20

how the object or person will look like

0:23

in the real world you can easily see how

0:25

useful this technology is for many

0:27

industries like video games animation

0:29

movies or advertising take a few

0:31

pictures and instantly have a realistic

0:34

model to insert into your product the

0:36

results have dramatically improved upon

0:38

the first model i covered in 2020 called

0:41

nerf and this improvement isn't only

0:43

about the quality of the results nvidia

0:46

made it even better not only that the

0:48

quality is comparable if not better but

0:51

it's more than one thousand times faster

0:53

with less than two years of research

0:56

this is the pace of ai research

0:58

exponential gains in quality and

1:01

efficiency a big factor that makes this

1:03

field so incredible you will be lost

1:06

with the new techniques and quality of

1:07

the results if you miss just a couple of

1:10

days which is why i first created this

1:12

channel and why you should also

1:14

subscribe just look at those 3d models

1:17

these cool models only needed a dozen

1:19

pictures and the ai guessed the missing

1:22

spot and created this beauty in seconds

1:24

something like this took hours to

1:26

produce with nerf let's dive into how

1:29

they made this much progress on so many

1:31

fronts in so little time but first i'd

1:34

like to take a few seconds to talk about

1:36

active loop an amazing company i

1:38

recently stumbled on and they are now

1:40

sponsoring this video active loop is

1:43

becoming popular with its open source

1:45

dataset format for ai hub one of the top

1:48

10 python packages in 2021 with active

1:52

loop hub you can treat your data sets as

1:54

numpy like arrays as a result you have a

1:57

simple dataset api for creating storing

2:00

version controlling and querying ai data

2:02

sets of any size it's perfect to

2:05

collaborate with your team and iterate

2:07

on your data sets the feature i like the

2:09

most is being able to stream my data

2:11

sets while training models in pytorch or

2:14

tensorflow this means anyone can access

2:16

any slice of the data and start training

2:19

models in seconds no matter how big is

2:21

the data set just like that how cool is

2:24

that with all these neat features hub

2:27

definitely frees me from building data

2:29

pipelines so i can train my models

2:31

faster active loop has just released

2:34

more than 100 image video and audio data

2:37

sets available almost instantly with a

2:39

single line of code try them out in your

2:41

workflows and let me know in the

2:43

comments below how it works i'd love to

2:45

know what you build with them

2:49

instant nerf attacks the task of inverse

2:51

rendering which consists of rendering a

2:54

3d representation from pictures a dozen

2:57

in this case approximating the real

2:59

shape of the object and how light will

3:01

behave on it so that it looks realistic

3:04

in any new scene here nerf stands for

3:07

neural radiance fields i will only do a

3:10

quick overview of how nerfs work as i

3:12

already covered this kind of network in

3:14

multiple videos which i invite you to

3:16

watch for more detail and a better

3:18

understanding quickly nerfs is a type of

3:21

neural network they take images and

3:23

camera settings as inputs and learn how

3:26

to produce an initial 3d representation

3:28

of the objects or scenes in the picture

3:31

fine tune this representation using

3:33

learn parameters from a supervised

3:35

learning settings this means that we

3:37

need a 3d object and a few images of it

3:40

at different known angles to train it

3:42

and the network will learn to recreate

3:44

the object to make the results as best

3:46

as possible we need a picture from

3:48

multiple viewpoints like this to be sure

3:51

we capture all or most sides of the

3:54

objects and we train this network to

3:56

understand general objects shapes and

3:58

light radiance we are asking it to learn

4:01

how to fill the missing parts based on

4:04

what it has seen before and how light

4:06

reacts to them in the 3d world basically

4:09

it will be like asking you to draw a

4:11

human without giving any details on the

4:13

hands you'd automatically assume the

4:15

person has five fingers based on your

4:18

knowledge this is easy for us as we have

4:20

many years of experience behind the belt

4:23

and one essential thing current ais are

4:25

lacking our intelligence we can create

4:28

links where there are none and do many

4:30

unbelievable things on the opposite side

4:33

ai needs specific rules or at least

4:36

examples to follow which is why we need

4:38

to give it what an object looks like in

4:40

the real world during its training phase

4:42

to improve then after such a training

4:45

process you only feed the images with

4:47

the camera angles at inference time and

4:50

it produces the final model in a few

4:52

hours did i see a few hours i'm sorry i

4:56

was still in 2021. it now does that in a

4:59

few seconds this new version by nvidia

5:02

called instant nerf is indeed 1000 times

5:05

faster than its nerf predecessor from a

5:08

year ago why because of multi-resolution

5:11

hash grid encoding multi-what

5:13

multi-resolution hash grid encoding they

5:16

explained it very clearly with this

5:18

sentence

5:19

we reduce the cost with a versatile new

5:23

input encoding that permits the use of a

5:25

smaller network without sacrificing

5:28

quality thus significantly reducing the

5:31

number of floating point and memory

5:33

access operations

5:35

in short they change how the nerf

5:37

network will see the inputs so our

5:40

initial 3d model prediction makes it

5:42

more digestible and information

5:45

efficient to use a smaller network while

5:47

keeping the quality of the outputs the

5:50

same keeping such a high quality using a

5:53

smaller network is possible because we

5:55

are not only learning the weights of the

5:57

nerf network during training but also

5:59

the way we are transforming those inputs

6:02

beforehand so the input is transformed

6:04

using trained functions here step one to

6:08

four compressed in a hash table to focus

6:10

on valuable information extremely

6:12

quickly and then sent to a much smaller

6:15

network in step 5 as the inputs are

6:18

similarly much smaller now they are

6:20

storing the values of any type in the

6:23

table with keys indicating where they

6:25

are stored for super efficient parallel

6:27

modifications and removing the lookup

6:30

time for big arrays during training and

6:32

inference this transformation and a much

6:35

smaller network is why instant nerf is

6:37

so much faster and why it made it into

6:40

this video and voila this is how nvidia

6:44

is now able to generate 3d models like

6:46

these in seconds

6:49

if this wasn't cool enough i said that

6:51

it can store values of any type which

6:54

means that this technique can not only

6:56

be used with nerfs but also with other

6:58

super cool applications like gigapixel

7:01

images that become just as incredibly

7:03

efficient of course this was just an

7:06

overview of this new paper attacking

7:08

this super interesting task in a novel

7:10

way i invite you to read their excellent

7:12

paper for more technical detail about

7:14

the multi-resolution hash grid encoding

7:17

approach and their implementation a link

7:19

to the paper and their code is in the

7:21

description below thank you for watching

7:23

the whole video please take a second to

7:26

let me know what you think of the

7:27

overall quality of the videos and new

7:29

editing i will see you next week with

another amazing paper




Comments

Signup or Login to Join the Discussion

Tags

Related Stories