NVIDIA ADA: Train Your GAN With 1/10th of the Data

Written by whatsai | Published 2021/05/07
Tech Story Tags: gan | gans | nvidia | artificial-intelligence | machine-learning | deep-learning | youtube-transcripts | hackernoon-top-story | web-monetization

TLDR NVIDIA ADA: Train Your GAN With 1/10th of the Data. You can train a powerful generative model with one-tenth of the images. This gives us the ability to create many computer vision applications, even when we do not have access to so many images. The paper covered:► https://arxiv.org/abs/2006.06676/2006/07/06/07. The paper covers a technique for training a gun architecture. The training data set is called "ground truth"via the TL;DR App

With this new training method developed by NVIDIA, you can train a powerful generative model with one-tenth of the images! This gives us the ability to create many computer vision applications, even when we do not have access to so many images.

Watch the video

References

The complete article: https://www.louisbouchard.ai/nvidia-ada/
The paper covered:► https://arxiv.org/abs/2006.06676
GitHub with code:► https://github.com/NVlabs/stylegan2-ada
What are GANs ? | Introduction to Generative Adversarial Networks | Face Generation & Editing:► https://youtu.be/ZnpZsiy_p2M

Video Transcript

00:00
these types of models that generate such
00:02
realistic images in different
00:04
applications are called
00:05
guns and they typically need thousands
00:07
and thousands of
00:08
training images now with this new
00:11
training method
00:12
developed by nvidia you can have the
00:14
same results with 1 10 fewer images
00:17
making possible many applications that
00:19
do not have access to many images
00:22
let's see how they achieve that
00:30
this is what's ai and i share artificial
00:32
intelligence news every week
00:34
if you are new to the channel and want
00:35
to stay up to date please consider
00:37
subscribing to not miss any further news
00:40
this new paper covers a technique for
00:42
training a gun architecture
00:44
they are used in many applications
00:46
related to computer vision
00:47
where we want to generate a realistic
00:50
transformation of an image following a
00:51
specific style
00:53
if you are not familiar with how gans
00:55
work i definitely recommend you to watch
00:57
the video i made explaining it before
00:59
continuing this one
01:01
as you know gans architecture trained in
01:03
an adversarial way
01:04
meaning that there are two networks
01:06
training at the same time
01:07
one training to generate a transform
01:09
damage from the input
01:11
the generator and the other one training
01:13
to differentiate the generated images
01:16
from the training images ground truth
01:18
these training images ground troops are
01:20
just the transformation result we would
01:22
like to achieve for each input image
01:25
then we try to optimize both networks at
01:28
the same time
01:29
thus making the generator better and
01:30
better at generating images
01:32
that look real but in order to produce
01:35
these great
01:36
in realistic results we need two things
01:38
a training data set
01:40
composed of thousands and thousands of
01:42
images
01:43
and stopping the training before
01:44
overfitting overfitting during again
01:47
training will mean that our
01:48
discriminator's feedback will become
01:50
meaningless
01:51
and the images generated will only get
01:53
worse
01:54
it happens past a certain point when you
01:56
train your network too much for your
01:58
amount of data
01:59
and the quality only gets worse as you
02:02
can see happening here
02:03
after the black dots these are the
02:05
problems nvidia tackled
02:07
with this paper they realized that this
02:09
is basically the same problem
02:11
and could be solved by one solution they
02:13
proposed a method they called an
02:15
adaptative discriminator augmentation
02:18
their approach is quite simple in theory
02:20
and you can apply it to any gan
02:22
architecture you already have
02:24
without changing anything as you may
02:26
know
02:27
in most areas of deep learning we
02:29
perform what we call data augmentation
02:31
to fight against over-fitting
02:33
in computer vision it often takes the
02:35
form of applying transformations to the
02:38
image during the training phase
02:39
to multiply our quantity of training
02:41
data these transformations can be
02:44
anything
02:44
from applying a rotation adding noise
02:46
changing the colors etc
02:48
to modify our input image and create a
02:51
unique version of it
02:52
making our network train on way more
02:54
diverse data set
02:56
without having to create or find more
02:58
images
02:59
unfortunately this cannot be easily
03:01
applied to a gun architecture
03:03
since the generator will learn to
03:04
generate images following these same
03:06
augmentations
03:08
this is what nvidia's team has done they
03:11
found a way to use these augmentations
03:12
to prevent the model from overfitting
03:15
while ensuring that none of these
03:16
transformations are leaked onto the
03:18
generated images
03:20
they basically apply this set of image
03:22
augmentations to all images shown to the
03:25
discriminator
03:26
with a chosen probability of each
03:28
transformation to randomly occur
03:30
and evaluate the discriminator's
03:31
performance using
03:33
these modified images this high number
03:36
of transformations all applied randomly
03:38
makes it very unlikely that the
03:40
discriminator sees even
03:42
one unchanged image of course the
03:44
generator is trained and guided to
03:46
generate only clean images
03:47
without any transformations they've
03:50
concluded that this method of training a
03:52
gan architecture with augmented data
03:55
shown to the discriminator works only if
03:57
each transformation's occurrence
03:59
probability is below 80 the higher it is
04:03
the more augmentations will be applied
04:05
and thus a more diverse training data
04:07
set you will have
04:08
they found that while this was solving
04:10
the question of limited
04:11
amount of training images there was
04:13
still the overfitting issue that
04:15
appeared at different times based on
04:17
your initial dataset size
04:19
this is why they thought of an
04:20
adaptative way of doing the segmentation
04:24
instead of having another hyperparameter
04:26
to decide the ideal augmentation
04:28
probability of appearance
04:30
they instead control the augmentation
04:32
strength during the training
04:34
starting at 0 and then adjust its value
04:36
iteratively based on the difference
04:38
between the training and validation
04:40
sets indicating if overfitting is
04:42
happening or not
04:44
this validation set is just a different
04:46
set of the same type of images that the
04:48
network is not trained on
04:51
the validation set just needs to be made
04:53
of images that the discriminator hasn't
04:56
seen before
04:57
it is used to measure the quality of our
04:59
results and quantify the degree of
05:01
divergence of our network
05:03
quantifying overfitting at the same time
05:06
here you can see the results of this
05:08
adaptative discriminator augmentation
05:10
for multiple training set sizes on the
05:12
ffhq dataset
05:14
here we use the fid measure which you
05:17
can see getting better and better over
05:18
time
05:19
and never reaching this overfitting
05:21
problem where it starts to get only
05:23
worse
05:24
the fid or frechette inception distance
05:27
is basically
05:28
a measure of the distance between the
05:30
distributions for generated and real
05:32
images
05:33
it measures the quality of generated
05:35
image samples
05:37
the lower it is the better our results
05:40
this ffhq dataset contains
05:42
7000 high quality faces taken from
05:44
flickr
05:45
it was created as a benchmark for
05:47
generative adversarial networks
05:50
and indeed they successfully matched
05:52
style gun 2 results with an order of
05:54
magnitude fewer images
05:56
used as you can see here where the
05:58
results are plotted for 1 000 to 140 000
06:02
training examples
06:03
using again this same fid measure on the
06:06
ffhq dataset
06:08
of course the code is also completely
06:10
available and easy to implement to your
06:12
gan architecture using tensorflow
06:14
both the code and the paper are linked
06:16
in the description if you would like to
06:18
implement this
06:18
in your code or have a deeper
06:20
understanding of the technique by
06:22
reading the paper
06:23
this paper was just published in the nur
06:25
ips 2020 as well as another announcement
06:28
by nvidia
06:30
they announced a new program called the
06:32
applied research accelerator
06:34
program their goal here is to support
06:36
research projects
06:37
to make a real world impact through
06:39
deployment into gpu accelerated
06:42
applications
06:43
adopted by commercial and government
06:45
organizations
06:46
granting hardware funding technical
06:49
guidance
06:50
support and more to researchers you
06:52
should definitely give it a look if that
06:54
fits your current needs
06:55
i linked it in the description of the
06:57
video as well
06:59
please leave a like if you went this far
07:01
in the video
07:02
and since there are over 90 percent of
07:03
you guys watching that are not
07:05
subscribed yet
07:06
consider subscribing to the channel to
07:08
not miss any further news clearly
07:10
explained
07:11
thank you for watching



Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.
Published by HackerNoon on 2021/05/07