Make GANs training easier for everyone by generating Images following a sketch! Indeed, whit this new method, you can control your GAN's outputs based on the simplest type of knowledge you could provide it: hand-drawn sketches. Watch the video Video Transcript 00:00 Machine learning models can now generate new images based on what it has seen from an existing 00:05 set of images. 00:06 We can't really say that the model is creative, as even though the image is indeed new, the 00:12 results are always highly inspired by similar photos it has seen in the past. 00:17 Such a type of architecture is called a generative adversarial network, or GAN. 00:21 If you already know how GANs work, you can skip to this time to see what the researchers 00:26 did. 00:27 If not, I'll quickly go over how it works. 00:29 This powerful architecture basically takes a bunch of images and tries to imitate them. 00:34 There are typically two networks, the generator, and the discriminator. 00:38 Their names are pretty informative... 00:39 The generator tries to generate a new image, and the discriminator tries to discriminate 00:44 such images. 00:45 The training process goes as follows: the discriminator is shown either an image coming 00:49 from our training dataset, which is our set of real images, or an image made by the generator 00:55 called a fake image. 00:56 Then, the discriminator tries to say whether the image was real or fake. 01:01 If the image sent guessed real was fake, we say that the discriminator has been fooled, 01:07 and we update its parameters to improve its detection ability for the next try. 01:11 In reverse, if the discriminator guessed right, saying it was fake, the generator is penalized 01:16 and updated the same way, thus improving the quality of the future generated image. 01:22 This process is repeated over and over until the discriminator is fooled half the time, 01:26 meaning that the generated images are very similar to what we have in our real dataset. 01:31 So the generated images now look like they were picked from our dataset, having the same 01:37 style. 01:38 If you'd like to have more details about how a generator and a discriminator model work 01:41 and what they look like on the inside, I'd recommend watching one of the many videos 01:46 I made covering them, like this one appearing on the top right corner 01:50 right now... 01:51 The problem here is that this process has been a black box for a while, and it is extremely 01:56 difficult to train, especially to control what kind of images are generated. 02:00 There has been a lot of progress in understanding what part of the generator network is responsible 02:05 for what. 02:07 Traditionally, building a model with control on the generated images' style to produce 02:11 what we want, like generating images of cats with a specific position, 02:16 needs specialized knowledge in deep learning, engineering work, patience, and a lot of trial 02:21 and error. 02:22 It would also need a lot of image examples, manually curated, of what you aim to generate 02:27 and a great understanding of how the model works to adapt it for your own needs correctly. 02:33 And repeat this process for any change you would like to make. 02:37 Instead, this new method by Sheng-Yu Wang et al. from Carnegie Mellon University and 02:41 MIT called Sketch Your Own GAN can take an existing model, for example, a generator trained 02:46 to generate new images of cats, and control the output based on the simplest 02:50 type of knowledge you could provide it: hand-drawn sketches. 02:54 Something anyone can do, making GANs training a lot more accessible. 02:59 No more hard work and model tweaking for hours to generate the cat in the position you wanted 03:04 by figuring out which part of the model is in charge of which component in the image! 03:08 How cool is that? 03:10 It surely at least deserves a like on this video and sending it to your group chat! 03:15 ;) Of course, there's nothing special in generating 03:17 a cat in a specific position, but imagine how powerful this can be. 03:22 It can take a model trained to generate anything, and from a handful of sketches, control what 03:27 will appear while conserving the other details and the same style! 03:30 It is an architecture to re-train a generator model, encouraging it to produce images with 03:36 the structure provided by the sketches while preserving the original model’s diversity 03:41 and the maximum image quality possible. 03:44 This is also called fine-tuning a model, where you take a powerful existing model and adapt 03:49 it to perform better for your task. 03:51 Imagine you really wanted to build a gabled church but didn't know the colors or specific 03:56 architecture? 03:57 Just send the sketch to the model and get infinite inspiration for your creation! 04:02 Of course, this is still early research, and it will always follow the style in your dataset 04:06 you used to train the generator, but still, the images are all *new* and can be surprisingly 04:13 beautiful! 04:14 But how did they do that? 04:15 What have they figured out about generative models that can be taken advantage of to control 04:20 the output? 04:21 There are various challenges for such a task, like the amount of data and the model expertise 04:26 needed. 04:27 The data problem is fixed by using a model that was already trained, which we are simply 04:31 trying to adapt to our task using a handful of sketches instead of hundreds or thousands 04:36 of sketches and image pairs which are typically needed. 04:40 To attack the expertise problem, instead of manually figuring out the changes to make 04:44 to the model, they transform the generated image into a sketch representation using another 04:50 model trained to do that, called Photosketch. 04:51 Then, the generator is trained similarly to a traditional GAN training but with two discriminators 04:58 instead of one. 05:00 The first discriminator is used to control the quality of the output, just like a regular 05:04 GAN architecture would have following the same training process we described earlier. 05:09 The second discriminator is trained to tell the difference between the generated sketches 05:13 and the sketches made by the user. 05:15 Thus encouraging the generated images to match the user sketches structure similarly to how 05:20 the first discriminator encourages the generated images to match the images in the initial 05:25 training dataset. 05:27 This way, the model figures out by itself which parameters to change to fit this new 05:31 task of imitating the sketches and removing the model expertise requirements to play with 05:36 generative models. 05:38 This field of research is exciting, allowing anyone to play with generative models and 05:43 control the outputs. 05:44 It is much closer to something that could be useful in the real world than the initial 05:49 models, where you would need a lot of time, money, and expertise to build a model able 05:53 to generate such images. 05:54 Instead, from a handful of sketches anyone can do, the resulting model can produce an 05:59 infinite number of new images that resemble the input sketches allowing many more people 06:04 to play with these generative networks. 06:07 Let me know what you think and if this seems as exciting to you as it is to me! 06:11 If you'd like more detail on this technique, I'd strongly recommend reading their paper 06:15 linked in the description below! 06:17 Thank you for watching. References ►Read the full article: ►Sheng-Yu Wang et all, "Sketch Your Own GAN", 2021, ►Project link: ►Code: https://github.com/PeterWang512/GANSketching ►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/make-gans-training-easier/ https://arxiv.org/pdf/2108.02774v1.pdf https://peterwang512.github.io/GANSketching/ https://www.louisbouchard.ai/newsletter/