Introducing BANMo: From Cat Pictures to Deformable 3D Models

Written by whatsai | Published 2022/08/14
Tech Story Tags: ai | nerf | 3d-modelling | 3d | cvpr | artificial-intelligence | technology | hackernoon-top-story | web-monetization | hackernoon-es | hackernoon-hi | hackernoon-zh | hackernoon-vi | hackernoon-fr | hackernoon-pt | hackernoon-ja

TLDR

BANMo is a NeRF-inspired approach shared at the CVPR 2022 event I attended a few weeks ago. It takes pictures to create deformable 3D models. The model starts with a few casually taken videos of the object you want to to capture showing how it moves and deforms. The initial result will give you information about the object's shape and its understanding of your object's appearance and the shape. Learn more in the video......or in the full article: https://www.louisbouchard.ai/banmo/via the TL;DR App

Here's BANMo, a NeRF-inspired approach shared at the CVPR 2022 event I attended a few weeks ago.

BANMo takes pictures to create deformable 3D models. If you are in VFX, game development, or creating 3D scenes, this new AI model is for you. I wouldn’t be surprised to see this model or similar approaches in your creation pipeline very shortly, allowing you to spend much less time, money, and effort on making 3D models. Learn more in the video...

References

►Read the full article: https://www.louisbouchard.ai/banmo/
►Project page: https://banmo-www.github.io/
►Paper: Yang, G., Vo, M., Neverova, N., Ramanan, D., Vedaldi, A. and
Joo, H., 2022. Banmo: Building animatable 3d neural models from many
casual videos. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (pp. 2863–2873).
►Code: https://github.com/facebookresearch/banmo
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

0:00

if you are in VFX game development or

0:02

creating 3D scenes this new AI model is

0:06

for you I wouldn't be surprised to see

0:08

this model or similar approaches in your

0:10

creation pipeline very shortly allowing

0:12

you to spend much less time money and

0:14

effort on making 3D models just look at

0:18

that of course it's not perfect but that

0:20

was done instantly with a casual video

0:23

taken from a phone it didn't need an

0:25

expensive midi camera setup or complex

0:28

depth sensors one of the Beauties behind

0:31

AI making complex and costly

0:33

Technologies available for startups are

0:35

single individuals to create projects

0:37

with professional quality results just

0:40

film an object and transform it into a

0:42

model you can import right away you can

0:45

then fine tune the details if you are

0:46

not satisfied but the whole model will

0:49

be there within a few seconds what

0:51

you've been seeing are the results from

0:52

an AI model called banmu recently shared

0:55

at the cvpr event I attended I'll be

0:58

honest they got my attention because

1:00

because of the cats still it wasn't

1:02

completely clickbait the paper and

1:05

approach are actually pretty awesome it

1:07

isn't like the other Nerf approach to

1:09

reconstructing objects in 3D models

1:11

banned mode tackles a task recall

1:13

articulated 3D shape reconstruction

1:16

which means it works with videos and

1:18

pictures to model deformable objects and

1:22

what's more deformable than a cat and

1:24

what's even cooler than seeing the

1:26

results is understanding how it works

1:29

the model starts with a few casually

1:31

taken videos of the object you want to

1:34

capture showing how it moves and deforms

1:36

itself that's where you want to send a

1:38

video of your cat slurping into a vase

1:41

bamboo takes those videos to create then

1:44

what they refer to as a canonical space

1:47

this initial result will give you

1:49

information about the object's shape

1:51

appearance and articulations it's the

1:53

model's understanding of your object's

1:55

shape how it moves through space and

1:57

where it belongs between a brick and a

1:59

blood described by those big balls and

2:01

various colors it then takes this 3D

2:04

representation and applies any pose you

2:06

want simulating the cat's behavior and

2:08

articulations as close to reality as

2:11

possible seems like magic doesn't it

2:13

that's because we are not done here we

2:16

quickly went from a video to the model

2:18

but this is where it becomes interesting

2:20

so what do they use to go from images of

2:23

a video to such a representation in this

2:25

canonical space you guessed it a

2:28

nerf-like model if you are not familiar

2:30

with this approach I strongly invite you

2:32

to watch one of the many videos I made

2:34

covering them and come back for the rest

2:36

insert the Nerf inspired method will

2:39

have to predict three essential

2:40

properties used for each

2:42

three-dimensional pixel or voxels of the

2:46

object as you see here color density and

2:49

a canonical embedding using a neural

2:51

network trained for that to achieve a 3D

2:54

model with realistic articulations and

2:56

movement banmu uses the camera's spatial

2:59

location in multiple frames to

3:01

understand the array from which it is

3:03

filming allowing it to reconstruct and

3:05

improve the 3D model iteratively through

3:07

all frames of the videos similar to what

3:10

we will do to understand an object move

3:12

it around and look at it in all

3:14

directions this part is done

3:16

automatically by observing the videos

3:18

thanks to the canonical embedding we

3:20

just mentioned this embedding will

3:22

contain all necessary features of each

3:24

part of the object to allow you to query

3:27

with a new desired position for the

3:29

object and forcing a coherent

3:31

reconstruction given observations it

3:33

will basically map The Wanted position

3:35

from the picture up to the 3D model with

3:38

the correct viewpoints and lighting

3:40

conditions and provide Q is for the

3:42

needed shape and articulations one last

3:45

thing to mention is our colors those

3:47

colors represent the cat's body

3:49

attributes shared in the different

3:51

videos and images we used this is the

3:54

feature we will learn and look at to

3:57

take valuable information from all

3:59

videos and merge them in the same 3D

4:01

model to improve our results and voila

4:05

you end up with this beautiful 3D

4:08

deformable calorie cut you can use in

4:10

your applications of course this was

4:13

just an overview of banmu and I invite

4:15

you to read the paper for a deeper

4:17

understanding of the model you should

4:19

definitely subscribe to the channel if

4:21

this kind of AI news interests you as

4:23

I'm sharing similar exciting approaches

4:26

every week thank you for watching until

4:28

the end and I will see you next week

4:30

with another amazing paper

4:32

foreign

4:36

[Music]

Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.

Published by HackerNoon on 2022/08/14