Google used a modified StyleGAN2 architecture to create an online fitting room where you can automatically try-on any pants or shirts you want using only an image of yourself. In this video, I explain more about this project, titled VOGUE, and how it works. ►Lewis, Kathleen M et al., (2021), VOGUE: Try-On by StyleGAN Interpolation Optimization, ►Interactive examples: ►Instagram: ►LinkedIn: ►Twitter: ►Facebook: ►Medium: ► ► Hey! Tap the Thumbs Up button and Subscribe. You'll learn a lot of cool stuff, I promise. Paper explanation VOGUE's model More examples References: https://vogue-try-on.github.io/ https://vogue-try-on.github.io/demo_r... Follow me for more AI content: https://www.instagram.com/whats_ai/ https://www.linkedin.com/in/whats-ai/ https://twitter.com/Whats_AI https://www.facebook.com/whats.artifi... https://medium.com/@whats_ai The best courses in AI: https://www.omologapps.com/whats-ai Join Our Discord channel, Learn AI Together: https://discord.gg/learnaitogether Chapters: 0:00 0:40 1:47 4:46 Video Transcript google used a modified style gun 00:02 architecture to create an online fitting 00:04 room where you can automatically try on 00:06 any pants or shirt you want 00:08 using only one image of yourself let's 00:11 see how they achieve that 00:12 and more impressive results 00:17 [Music] 00:21 this is what's ai and i share artificial 00:23 intelligence news every week 00:25 if you are new to the channel and want 00:27 to stay up to date please consider 00:28 subscribing to not miss any further news 00:31 a team of researchers from google mit 00:33 and the university of washington 00:35 recently published a paper called vogue 00:38 try on by style gun interpolation 00:40 optimization 00:42 they use a gun architecture to create an 00:44 online fitting room 00:45 where you can automatically try on any 00:47 pens or choice you want 00:49 using only an image of yourself also 00:51 called 00:52 garment transfer the goal is to take the 00:54 clothes from a person in a picture and 00:56 transfer it to someone else 00:58 while conserving the correct body shape 01:00 hair and skin color 01:02 this is a complex task since some parts 01:05 like the garment of the output 01:07 image need to be extracted from one 01:09 image and the other parts proper to the 01:11 actual person 01:12 is taken from another picture keeping 01:14 the identity of the person where we want 01:16 to try 01:17 clothes on well they were able to do 01:20 exactly that using a gun based 01:22 architecture 01:23 more precisely a pose-conditioned style 01:26 gun 2 is at the core of their 01:27 architecture 01:28 i won't go into the details of this 01:30 talgun 2 and the gun architectures 01:32 since i've already explained them in 01:34 many videos like in this video 01:36 where i explain to nephi which also uses 01:39 a style gun 01:40 2 based architecture i definitely invite 01:42 you to watch this video before 01:44 continuing this one if you are not 01:45 familiar with gans or style gun 2. 01:48 so in order to work and generate 01:50 photorealistic images 01:52 with different outfit vogue needs to 01:54 train this post-condition 01:56 style gun 2 architecture but this is 01:58 harder than simply implementing style 02:00 gun 2 02:01 since it was mainly developed for face 02:03 images which is where it got 02:05 its popularity from they had to make two 02:08 key modifications 02:09 at first they had to modify the 02:11 beginning of the generator 02:13 with an encoder that takes pose key 02:15 points of the image as inputs 02:17 this serves as the input of the first 02:20 4x4 style block 02:21 of style gantu instead of a constant 02:24 input to implement this pose condition 02:26 then they trained their stargan2 to 02:28 output segmentations 02:30 at each resolution in addition to the 02:32 rgb image as you can see here 02:35 using this network they were able to 02:37 generate many images and their 02:38 segmentations 02:39 with desired poles following this given 02:42 an input pair of 02:43 images they could project the images 02:46 into the latent space of the generator 02:48 to compute the latent codes 02:50 that will best differentiate the 02:51 characteristics of the pair of input 02:53 images 02:54 using an optimizer to find the space of 02:56 combinations where lies the garment 02:59 from the second image and the person 03:01 from the first image 03:02 they had to maximize changes within the 03:05 region of interest while minimizing 03:07 changes 03:08 outside of the region of interest to do 03:10 that 03:11 they used two latent space representing 03:13 the two input images 03:15 the first one from the image with the 03:17 person to be generated 03:19 and the second one from the image with 03:21 the garment to be transferred 03:23 as we saw they also needed the pose heat 03:25 map as 03:26 input to the stargand 2 generator showed 03:29 here again 03:30 in grey then they had access to the 03:32 segmentations and images generated from 03:35 the trained gan architecture 03:36 following this they used a loss function 03:39 composed of 03:40 three separate terms that each optimized 03:42 a part of the generated image 03:45 there's the edition localization lust 03:47 term 03:48 that encourages the network to only 03:50 interpolate styles 03:51 within the region of interest defined 03:54 here as m 03:55 using the segmentation outputs then 03:58 there's the garment loss used to 04:00 transfer over the correct shape and 04:02 texture of the garments 04:03 using embeddings from a very popular 04:05 convolutional neural network 04:07 architecture called 04:08 vgg16 they compute the distance between 04:10 the garment 04:11 areas of the two images using again the 04:14 segmentation labels 04:16 this created mask is then applied to the 04:18 generated rgb images 04:20 finally there's the identity loss which 04:23 guides the network to 04:24 as it says preserve the identity of the 04:26 person 04:27 this is again done using the 04:28 segmentation labels following the same 04:31 procedure as the garment loss 04:33 just take a second to look at how these 04:35 losses affect the output image 04:37 you can clearly see when the 04:39 localization less or the identity less 04:41 is missing 04:42 and their importance 04:45 as they state our method can synthesize 04:48 the same style short for varied poses 04:51 and body shapes by fixing the style 04:54 vector 04:54 we present several different styles in 04:56 multiple poses 04:59 just look at how much better the results 05:01 are with this new approach 05:03 of course this was just an overview of 05:05 this new paper 05:07 i strongly invite you to read their 05:09 paper for a better technical 05:11 understanding 05:12 it is the first link in the description 05:14 please leave a like if you went this far 05:16 in the video 05:17 and since there's over 80 percent of you 05:19 guys that are not subscribed yet 05:21 consider subscribing to the channel to 05:23 not miss any further news 05:25 thank you for watching 05:38 [Music]