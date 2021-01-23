I explain Artificial Intelligence terms and news to non-experts.
Google used a modified StyleGAN2 architecture to create an online fitting room where you can automatically try-on any pants or shirts you want using only an image of yourself.
In this video, I explain more about this project, titled VOGUE, and how it works.
References:
►Lewis, Kathleen M et al., (2021), VOGUE: Try-On by StyleGAN Interpolation Optimization, https://vogue-try-on.github.io/
►Interactive examples: https://vogue-try-on.github.io/demo_r...
google used a modified style gun
architecture to create an online fitting
room where you can automatically try on
any pants or shirt you want
using only one image of yourself let's
see how they achieve that
and more impressive results
this is what's ai and i share artificial
intelligence news every week
if you are new to the channel and want
to stay up to date please consider
subscribing to not miss any further news
a team of researchers from google mit
and the university of washington
recently published a paper called vogue
try on by style gun interpolation
optimization
they use a gun architecture to create an
online fitting room
where you can automatically try on any
pens or choice you want
using only an image of yourself also
called
garment transfer the goal is to take the
clothes from a person in a picture and
transfer it to someone else
while conserving the correct body shape
hair and skin color
this is a complex task since some parts
like the garment of the output
image need to be extracted from one
image and the other parts proper to the
actual person
is taken from another picture keeping
the identity of the person where we want
to try
clothes on well they were able to do
exactly that using a gun based
architecture
more precisely a pose-conditioned style
gun 2 is at the core of their
architecture
i won't go into the details of this
talgun 2 and the gun architectures
since i've already explained them in
many videos like in this video
where i explain to nephi which also uses
a style gun
2 based architecture i definitely invite
you to watch this video before
continuing this one if you are not
familiar with gans or style gun 2.
so in order to work and generate
photorealistic images
with different outfit vogue needs to
train this post-condition
style gun 2 architecture but this is
harder than simply implementing style
gun 2
since it was mainly developed for face
images which is where it got
its popularity from they had to make two
key modifications
at first they had to modify the
beginning of the generator
with an encoder that takes pose key
points of the image as inputs
this serves as the input of the first
4x4 style block
of style gantu instead of a constant
input to implement this pose condition
then they trained their stargan2 to
output segmentations
at each resolution in addition to the
rgb image as you can see here
using this network they were able to
generate many images and their
segmentations
with desired poles following this given
an input pair of
images they could project the images
into the latent space of the generator
to compute the latent codes
that will best differentiate the
characteristics of the pair of input
images
using an optimizer to find the space of
combinations where lies the garment
from the second image and the person
from the first image
they had to maximize changes within the
region of interest while minimizing
changes
outside of the region of interest to do
that
they used two latent space representing
the two input images
the first one from the image with the
person to be generated
and the second one from the image with
the garment to be transferred
as we saw they also needed the pose heat
map as
input to the stargand 2 generator showed
here again
in grey then they had access to the
segmentations and images generated from
the trained gan architecture
following this they used a loss function
composed of
three separate terms that each optimized
a part of the generated image
there's the edition localization lust
term
that encourages the network to only
interpolate styles
within the region of interest defined
here as m
using the segmentation outputs then
there's the garment loss used to
transfer over the correct shape and
texture of the garments
using embeddings from a very popular
convolutional neural network
architecture called
vgg16 they compute the distance between
the garment
areas of the two images using again the
segmentation labels
this created mask is then applied to the
generated rgb images
finally there's the identity loss which
guides the network to
as it says preserve the identity of the
person
this is again done using the
segmentation labels following the same
procedure as the garment loss
just take a second to look at how these
losses affect the output image
you can clearly see when the
localization less or the identity less
is missing
and their importance
as they state our method can synthesize
the same style short for varied poses
and body shapes by fixing the style
vector
we present several different styles in
multiple poses
just look at how much better the results
are with this new approach
of course this was just an overview of
this new paper
i strongly invite you to read their
paper for a better technical
understanding
it is the first link in the description
please leave a like if you went this far
in the video
and since there's over 80 percent of you
guys that are not subscribed yet
consider subscribing to the channel to
not miss any further news
thank you for watching
