595 reads

595 reads

VOGUE by Google, MIT, and UW: The AI-Powered Online Fitting Room

by Louis BouchardJanuary 23rd, 2021

Read on Terminal Reader

Print this story

Read this story w/o Javascript

Too Long; Didn't Read

Google used a modified StyleGAN2 architecture to create an online fitting room where you can automatically try-on any pants or shirts you want using only an image of yourself. VOGUE by Google, MIT, and UW: The AI-Powered Online Fitting Room. I explain Artificial Intelligence terms and news to non-experts. I also explain more about this project, titled "VOGUE", and how it works. Join the Learn AI channel, Learn AI Together, Subscribe and Subscribe.

Companies Mentioned

Mention Thumbnail

Mention Thumbnail

featured image - VOGUE by Google, MIT, and UW: The AI-Powered Online Fitting Room

Google used a modified StyleGAN2 architecture to create an online fitting room where you can automatically try-on any pants or shirts you want using only an image of yourself.

In this video, I explain more about this project, titled VOGUE, and how it works.

References:

►Lewis, Kathleen M et al., (2021), VOGUE: Try-On by StyleGAN Interpolation Optimization, https://vogue-try-on.github.io/
►Interactive examples: https://vogue-try-on.github.io/demo_r...

Follow me for more AI content:

►Instagram: https://www.instagram.com/whats_ai/
►LinkedIn: https://www.linkedin.com/in/whats-ai/
►Twitter: https://twitter.com/Whats_AI
►Facebook: https://www.facebook.com/whats.artifi...
►Medium: https://medium.com/@whats_ai

The best courses in AI:
►https://www.omologapps.com/whats-ai

Join Our Discord channel, Learn AI Together:
►https://discord.gg/learnaitogether

Chapters:

0:00 Hey! Tap the Thumbs Up button and Subscribe. You'll learn a lot of cool stuff, I promise.
0:40 Paper explanation
1:47 VOGUE's model
4:46 More examples

Video Transcript

google used a modified style gun

00:02

architecture to create an online fitting

00:04

room where you can automatically try on

00:06

any pants or shirt you want

00:08

using only one image of yourself let's

00:11

see how they achieve that

00:12

and more impressive results

00:17

[Music]

00:21

this is what's ai and i share artificial

00:23

intelligence news every week

00:25

if you are new to the channel and want

00:27

to stay up to date please consider

00:28

subscribing to not miss any further news

00:31

a team of researchers from google mit

00:33

and the university of washington

00:35

recently published a paper called vogue

00:38

try on by style gun interpolation

00:40

optimization

00:42

they use a gun architecture to create an

00:44

online fitting room

00:45

where you can automatically try on any

00:47

pens or choice you want

00:49

using only an image of yourself also

00:51

called

00:52

garment transfer the goal is to take the

00:54

clothes from a person in a picture and

00:56

transfer it to someone else

00:58

while conserving the correct body shape

01:00

hair and skin color

01:02

this is a complex task since some parts

01:05

like the garment of the output

01:07

image need to be extracted from one

01:09

image and the other parts proper to the

01:11

actual person

01:12

is taken from another picture keeping

01:14

the identity of the person where we want

01:16

to try

01:17

clothes on well they were able to do

01:20

exactly that using a gun based

01:22

architecture

01:23

more precisely a pose-conditioned style

01:26

gun 2 is at the core of their

01:27

architecture

01:28

i won't go into the details of this

01:30

talgun 2 and the gun architectures

01:32

since i've already explained them in

01:34

many videos like in this video

01:36

where i explain to nephi which also uses

01:39

a style gun

01:40

2 based architecture i definitely invite

01:42

you to watch this video before

01:44

continuing this one if you are not

01:45

familiar with gans or style gun 2.

01:48

so in order to work and generate

01:50

photorealistic images

01:52

with different outfit vogue needs to

01:54

train this post-condition

01:56

style gun 2 architecture but this is

01:58

harder than simply implementing style

02:00

gun 2

02:01

since it was mainly developed for face

02:03

images which is where it got

02:05

its popularity from they had to make two

02:08

key modifications

02:09

at first they had to modify the

02:11

beginning of the generator

02:13

with an encoder that takes pose key

02:15

points of the image as inputs

02:17

this serves as the input of the first

02:20

4x4 style block

02:21

of style gantu instead of a constant

02:24

input to implement this pose condition

02:26

then they trained their stargan2 to

02:28

output segmentations

02:30

at each resolution in addition to the

02:32

rgb image as you can see here

02:35

using this network they were able to

02:37

generate many images and their

02:38

segmentations

02:39

with desired poles following this given

02:42

an input pair of

02:43

images they could project the images

02:46

into the latent space of the generator

02:48

to compute the latent codes

02:50

that will best differentiate the

02:51

characteristics of the pair of input

02:53

images

02:54

using an optimizer to find the space of

02:56

combinations where lies the garment

02:59

from the second image and the person

03:01

from the first image

03:02

they had to maximize changes within the

03:05

region of interest while minimizing

03:07

changes

03:08

outside of the region of interest to do

03:10

that

03:11

they used two latent space representing

03:13

the two input images

03:15

the first one from the image with the

03:17

person to be generated

03:19

and the second one from the image with

03:21

the garment to be transferred

03:23

as we saw they also needed the pose heat

03:25

map as

03:26

input to the stargand 2 generator showed

03:29

here again

03:30

in grey then they had access to the

03:32

segmentations and images generated from

03:35

the trained gan architecture

03:36

following this they used a loss function

03:39

composed of

03:40

three separate terms that each optimized

03:42

a part of the generated image

03:45

there's the edition localization lust

03:47

term

03:48

that encourages the network to only

03:50

interpolate styles

03:51

within the region of interest defined

03:54

here as m

03:55

using the segmentation outputs then

03:58

there's the garment loss used to

04:00

transfer over the correct shape and

04:02

texture of the garments

04:03

using embeddings from a very popular

04:05

convolutional neural network

04:07

architecture called

04:08

vgg16 they compute the distance between

04:10

the garment

04:11

areas of the two images using again the

04:14

segmentation labels

04:16

this created mask is then applied to the

04:18

generated rgb images

04:20

finally there's the identity loss which

04:23

guides the network to

04:24

as it says preserve the identity of the

04:26

person

04:27

this is again done using the

04:28

segmentation labels following the same

04:31

procedure as the garment loss

04:33

just take a second to look at how these

04:35

losses affect the output image

04:37

you can clearly see when the

04:39

localization less or the identity less

04:41

is missing

04:42

and their importance

04:45

as they state our method can synthesize

04:48

the same style short for varied poses

04:51

and body shapes by fixing the style

04:54

vector

04:54

we present several different styles in

04:56

multiple poses

04:59

just look at how much better the results

05:01

are with this new approach

05:03

of course this was just an overview of

05:05

this new paper

05:07

i strongly invite you to read their

05:09

paper for a better technical

05:11

understanding

05:12

it is the first link in the description

05:14

please leave a like if you went this far

05:16

in the video

05:17

and since there's over 80 percent of you

05:19

guys that are not subscribed yet

05:21

consider subscribing to the channel to

05:23

not miss any further news

05:25

thank you for watching

05:38

[Music]

HackerNoon Services

L O A D I N G
. . . comments & more!

About Author

Louis Bouchard@whatsai

I explain Artificial Intelligence terms and news to non-experts.

Read my stories About @whatsai

TOPICS

purcat-img

machine-learning #artificial-intelligence #machine-learning #deep-learning #deeplearning #machinelearning #youtube-transcripts #youtubers #hackernoon-top-story #web-monetization

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave

Read on Terminal Reader

Read this story w/o Javascript

Mentioned in this story

companies

Mention Thumbnail

Mention Thumbnail

Mention Thumbnail

Mention Thumbnail

Mention Thumbnail

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas

X REMOVE AD