Make GANs training easier for everyone by generating Images following a sketch!
Indeed, whit this new method, you can control your GAN's outputs based on the simplest type of knowledge you could provide it: hand-drawn sketches.
00:00
Machine learning models can now generate new images based on what it has seen from an existing
00:05
set of images.
00:06
We can't really say that the model is creative, as even though the image is indeed new, the
00:12
results are always highly inspired by similar photos it has seen in the past.
00:17
Such a type of architecture is called a generative adversarial network, or GAN.
00:21
If you already know how GANs work, you can skip to this time to see what the researchers
00:26
did.
00:27
If not, I'll quickly go over how it works.
00:29
This powerful architecture basically takes a bunch of images and tries to imitate them.
00:34
There are typically two networks, the generator, and the discriminator.
00:38
Their names are pretty informative...
00:39
The generator tries to generate a new image, and the discriminator tries to discriminate
00:44
such images.
00:45
The training process goes as follows: the discriminator is shown either an image coming
00:49
from our training dataset, which is our set of real images, or an image made by the generator
00:55
called a fake image.
00:56
Then, the discriminator tries to say whether the image was real or fake.
01:01
If the image sent guessed real was fake, we say that the discriminator has been fooled,
01:07
and we update its parameters to improve its detection ability for the next try.
01:11
In reverse, if the discriminator guessed right, saying it was fake, the generator is penalized
01:16
and updated the same way, thus improving the quality of the future generated image.
01:22
This process is repeated over and over until the discriminator is fooled half the time,
01:26
meaning that the generated images are very similar to what we have in our real dataset.
01:31
So the generated images now look like they were picked from our dataset, having the same
01:37
style.
01:38
If you'd like to have more details about how a generator and a discriminator model work
01:41
and what they look like on the inside, I'd recommend watching one of the many videos
01:46
I made covering them, like this one appearing on the top right corner
01:50
right now...
01:51
The problem here is that this process has been a black box for a while, and it is extremely
01:56
difficult to train, especially to control what kind of images are generated.
02:00
There has been a lot of progress in understanding what part of the generator network is responsible
02:05
for what.
02:07
Traditionally, building a model with control on the generated images' style to produce
02:11
what we want, like generating images of cats with a specific position,
02:16
needs specialized knowledge in deep learning, engineering work, patience, and a lot of trial
02:21
and error.
02:22
It would also need a lot of image examples, manually curated, of what you aim to generate
02:27
and a great understanding of how the model works to adapt it for your own needs correctly.
02:33
And repeat this process for any change you would like to make.
02:37
Instead, this new method by Sheng-Yu Wang et al. from Carnegie Mellon University and
02:41
MIT called Sketch Your Own GAN can take an existing model, for example, a generator trained
02:46
to generate new images of cats, and control the output based on the simplest
02:50
type of knowledge you could provide it: hand-drawn sketches.
02:54
Something anyone can do, making GANs training a lot more accessible.
02:59
No more hard work and model tweaking for hours to generate the cat in the position you wanted
03:04
by figuring out which part of the model is in charge of which component in the image!
03:08
How cool is that?
03:10
It surely at least deserves a like on this video and sending it to your group chat!
03:15
;) Of course, there's nothing special in generating
03:17
a cat in a specific position, but imagine how powerful this can be.
03:22
It can take a model trained to generate anything, and from a handful of sketches, control what
03:27
will appear while conserving the other details and the same style!
03:30
It is an architecture to re-train a generator model, encouraging it to produce images with
03:36
the structure provided by the sketches while preserving the original model’s diversity
03:41
and the maximum image quality possible.
03:44
This is also called fine-tuning a model, where you take a powerful existing model and adapt
03:49
it to perform better for your task.
03:51
Imagine you really wanted to build a gabled church but didn't know the colors or specific
03:56
architecture?
03:57
Just send the sketch to the model and get infinite inspiration for your creation!
04:02
Of course, this is still early research, and it will always follow the style in your dataset
04:06
you used to train the generator, but still, the images are all *new* and can be surprisingly
04:13
beautiful!
04:14
But how did they do that?
04:15
What have they figured out about generative models that can be taken advantage of to control
04:20
the output?
04:21
There are various challenges for such a task, like the amount of data and the model expertise
04:26
needed.
04:27
The data problem is fixed by using a model that was already trained, which we are simply
04:31
trying to adapt to our task using a handful of sketches instead of hundreds or thousands
04:36
of sketches and image pairs which are typically needed.
04:40
To attack the expertise problem, instead of manually figuring out the changes to make
04:44
to the model, they transform the generated image into a sketch representation using another
04:50
model trained to do that, called Photosketch.
04:51
Then, the generator is trained similarly to a traditional GAN training but with two discriminators
04:58
instead of one.
05:00
The first discriminator is used to control the quality of the output, just like a regular
05:04
GAN architecture would have following the same training process we described earlier.
05:09
The second discriminator is trained to tell the difference between the generated sketches
05:13
and the sketches made by the user.
05:15
Thus encouraging the generated images to match the user sketches structure similarly to how
05:20
the first discriminator encourages the generated images to match the images in the initial
05:25
training dataset.
05:27
This way, the model figures out by itself which parameters to change to fit this new
05:31
task of imitating the sketches and removing the model expertise requirements to play with
05:36
generative models.
05:38
This field of research is exciting, allowing anyone to play with generative models and
05:43
control the outputs.
05:44
It is much closer to something that could be useful in the real world than the initial
05:49
models, where you would need a lot of time, money, and expertise to build a model able
05:53
to generate such images.
05:54
Instead, from a handful of sketches anyone can do, the resulting model can produce an
05:59
infinite number of new images that resemble the input sketches allowing many more people
06:04
to play with these generative networks.
06:07
Let me know what you think and if this seems as exciting to you as it is to me!
06:11
If you'd like more detail on this technique, I'd strongly recommend reading their paper
06:15
linked in the description below!
06:17
Thank you for watching.
►Read the full article: https://www.louisbouchard.ai/make-gans-training-easier/
►Sheng-Yu Wang et all, "Sketch Your Own GAN", 2021, https://arxiv.org/pdf/2108.02774v1.pdf
►Project link: https://peterwang512.github.io/GANSketching/
►Code: https://github.com/PeterWang512/GANSketching
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/