NVIDIA ADA: Train Your GAN With 1/10th of the Data

Written by whatsai | Published 2021/05/07
Tech Story Tags: gan | gans | nvidia | artificial-intelligence | machine-learning | deep-learning | youtube-transcripts | hackernoon-top-story | web-monetization

TLDR

NVIDIA ADA: Train Your GAN With 1/10th of the Data. You can train a powerful generative model with one-tenth of the images. This gives us the ability to create many computer vision applications, even when we do not have access to so many images. The paper covered:► https://arxiv.org/abs/2006.06676/2006/07/06/07. The paper covers a technique for training a gun architecture. The training data set is called "ground truth"via the TL;DR App

With this new training method developed by NVIDIA, you can train a powerful generative model with one-tenth of the images! This gives us the ability to create many computer vision applications, even when we do not have access to so many images.

Watch the video

References

The complete article: https://www.louisbouchard.ai/nvidia-ada/
The paper covered:► https://arxiv.org/abs/2006.06676
GitHub with code:► https://github.com/NVlabs/stylegan2-ada
What are GANs ? | Introduction to Generative Adversarial Networks | Face Generation & Editing:► https://youtu.be/ZnpZsiy_p2M

Video Transcript

00:00

these types of models that generate such

00:02

realistic images in different

00:04

applications are called

00:05

guns and they typically need thousands

00:07

and thousands of

00:08

training images now with this new

00:11

training method

00:12

developed by nvidia you can have the

00:14

same results with 1 10 fewer images

00:17

making possible many applications that

00:19

do not have access to many images

00:22

let's see how they achieve that

00:30

this is what's ai and i share artificial

00:32

intelligence news every week

00:34

if you are new to the channel and want

00:35

to stay up to date please consider

00:37

subscribing to not miss any further news

00:40

this new paper covers a technique for

00:42

training a gun architecture

00:44

they are used in many applications

00:46

related to computer vision

00:47

where we want to generate a realistic

00:50

transformation of an image following a

00:51

specific style

00:53

if you are not familiar with how gans

00:55

work i definitely recommend you to watch

00:57

the video i made explaining it before

00:59

continuing this one

01:01

as you know gans architecture trained in

01:03

an adversarial way

01:04

meaning that there are two networks

01:06

training at the same time

01:07

one training to generate a transform

01:09

damage from the input

01:11

the generator and the other one training

01:13

to differentiate the generated images

01:16

from the training images ground truth

01:18

these training images ground troops are

01:20

just the transformation result we would

01:22

like to achieve for each input image

01:25

then we try to optimize both networks at

01:28

the same time

01:29

thus making the generator better and

01:30

better at generating images

01:32

that look real but in order to produce

01:35

these great

01:36

in realistic results we need two things

01:38

a training data set

01:40

composed of thousands and thousands of

01:42

images

01:43

and stopping the training before

01:44

overfitting overfitting during again

01:47

training will mean that our

01:48

discriminator's feedback will become

01:50

meaningless

01:51

and the images generated will only get

01:53

worse

01:54

it happens past a certain point when you

01:56

train your network too much for your

01:58

amount of data

01:59

and the quality only gets worse as you

02:02

can see happening here

02:03

after the black dots these are the

02:05

problems nvidia tackled

02:07

with this paper they realized that this

02:09

is basically the same problem

02:11

and could be solved by one solution they

02:13

proposed a method they called an

02:15

adaptative discriminator augmentation

02:18

their approach is quite simple in theory

02:20

and you can apply it to any gan

02:22

architecture you already have

02:24

without changing anything as you may

02:26

know

02:27

in most areas of deep learning we

02:29

perform what we call data augmentation

02:31

to fight against over-fitting

02:33

in computer vision it often takes the

02:35

form of applying transformations to the

02:38

image during the training phase

02:39

to multiply our quantity of training

02:41

data these transformations can be

02:44

anything

02:44

from applying a rotation adding noise

02:46

changing the colors etc

02:48

to modify our input image and create a

02:51

unique version of it

02:52

making our network train on way more

02:54

diverse data set

02:56

without having to create or find more

02:58

images

02:59

unfortunately this cannot be easily

03:01

applied to a gun architecture

03:03

since the generator will learn to

03:04

generate images following these same

03:06

augmentations

03:08

this is what nvidia's team has done they

03:11

found a way to use these augmentations

03:12

to prevent the model from overfitting

03:15

while ensuring that none of these

03:16

transformations are leaked onto the

03:18

generated images

03:20

they basically apply this set of image

03:22

augmentations to all images shown to the

03:25

discriminator

03:26

with a chosen probability of each

03:28

transformation to randomly occur

03:30

and evaluate the discriminator's

03:31

performance using

03:33

these modified images this high number

03:36

of transformations all applied randomly

03:38

makes it very unlikely that the

03:40

discriminator sees even

03:42

one unchanged image of course the

03:44

generator is trained and guided to

03:46

generate only clean images

03:47

without any transformations they've

03:50

concluded that this method of training a

03:52

gan architecture with augmented data

03:55

shown to the discriminator works only if

03:57

each transformation's occurrence

03:59

probability is below 80 the higher it is

04:03

the more augmentations will be applied

04:05

and thus a more diverse training data

04:07

set you will have

04:08

they found that while this was solving

04:10

the question of limited

04:11

amount of training images there was

04:13

still the overfitting issue that

04:15

appeared at different times based on

04:17

your initial dataset size

04:19

this is why they thought of an

04:20

adaptative way of doing the segmentation

04:24

instead of having another hyperparameter

04:26

to decide the ideal augmentation

04:28

probability of appearance

04:30

they instead control the augmentation

04:32

strength during the training

04:34

starting at 0 and then adjust its value

04:36

iteratively based on the difference

04:38

between the training and validation

04:40

sets indicating if overfitting is

04:42

happening or not

04:44

this validation set is just a different

04:46

set of the same type of images that the

04:48

network is not trained on

04:51

the validation set just needs to be made

04:53

of images that the discriminator hasn't

04:56

seen before

04:57

it is used to measure the quality of our

04:59

results and quantify the degree of

05:01

divergence of our network

05:03

quantifying overfitting at the same time

05:06

here you can see the results of this

05:08

adaptative discriminator augmentation

05:10

for multiple training set sizes on the

05:12

ffhq dataset

05:14

here we use the fid measure which you

05:17

can see getting better and better over

05:18

time

05:19

and never reaching this overfitting

05:21

problem where it starts to get only

05:23

worse

05:24

the fid or frechette inception distance

05:27

is basically

05:28

a measure of the distance between the

05:30

distributions for generated and real

05:32

images

05:33

it measures the quality of generated

05:35

image samples

05:37

the lower it is the better our results

05:40

this ffhq dataset contains

05:42

7000 high quality faces taken from

05:44

flickr

05:45

it was created as a benchmark for

05:47

generative adversarial networks

05:50

and indeed they successfully matched

05:52

style gun 2 results with an order of

05:54

magnitude fewer images

05:56

used as you can see here where the

05:58

results are plotted for 1 000 to 140 000

06:02

training examples

06:03

using again this same fid measure on the

06:06

ffhq dataset

06:08

of course the code is also completely

06:10

available and easy to implement to your

06:12

gan architecture using tensorflow

06:14

both the code and the paper are linked

06:16

in the description if you would like to

06:18

implement this

06:18

in your code or have a deeper

06:20

understanding of the technique by

06:22

reading the paper

06:23

this paper was just published in the nur

06:25

ips 2020 as well as another announcement

06:28

by nvidia

06:30

they announced a new program called the

06:32

applied research accelerator

06:34

program their goal here is to support

06:36

research projects

06:37

to make a real world impact through

06:39

deployment into gpu accelerated

06:42

applications

06:43

adopted by commercial and government

06:45

organizations

06:46

granting hardware funding technical

06:49

guidance

06:50

support and more to researchers you

06:52

should definitely give it a look if that

06:54

fits your current needs

06:55

i linked it in the description of the

06:57

video as well

06:59

please leave a like if you went this far

07:01

in the video

07:02

and since there are over 90 percent of

07:03

you guys watching that are not

07:05

subscribed yet

07:06

consider subscribing to the channel to

07:08

not miss any further news clearly

07:10

explained

07:11

thank you for watching

Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.

Published by HackerNoon on 2021/05/07