paint-brush
DALL·E 2 Pre-Training Mitigationsby@whatsai
541 reads
541 reads

DALL·E 2 Pre-Training Mitigations

by Louis BouchardJuly 18th, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Most artificial intelligence models aren’t open-source, which means we, regular people like us, cannot use them freely. This is what we will dive into in this video... The most well-known, Dall-e 2, can be used to generate images from random prompts. The data used to train such models as well coming from random images on the internet pretty pretty. We will look into what they are trying to mitigate risks and how they are filtering out violent and sexual images from the internet.

Company Mentioned

Mention Thumbnail
featured image - DALL·E 2 Pre-Training Mitigations
Louis Bouchard HackerNoon profile picture

You’ve all seen amazing-looking images like these, entirely generated by an artificial intelligence model. I covered multiple approaches on my channel, like Craiyon, Imagen, and the most well-known, Dall-e 2.

Most people want to try them and generate images from random prompts, but the majority of these models aren’t open-source, which means we, regular people like us, cannot use them freely. Why? This is what we will dive into in this video...

References

►Read the full article: https://www.louisbouchard.ai/how-openai-reduces-risks-for-dall-e-2/
►OpenAI's article: https://openai.com/blog/dall-e-2-pre-training-mitigations/
►Dalle 2 video:
►Craiyon's video:
►Use Craiyon: https://www.craiyon.com/
►My Daily Newsletter: https://www.getrevue.co/profile/whats_ai

Video Transcript

       0:00

you've all seen amazing looking images

0:02

like these entirely generated by an

0:05

artificial intelligence model i covered

0:07

multiple approaches on my channel like

0:09

crayon imogen and the most well-known

0:12

deli 2. most people want to try them and

0:15

generate images from random prompts but

0:18

the majority of these models aren't open

0:20

source which means regular people like

0:23

us cannot use them freely why this is

0:26

what we will dive into in this video

0:29

i said most of them were not open source

0:32

well crayon is and people have generated

0:35

amazing memes using it you can see how

0:38

such a model can become dangerous

0:40

allowing anyone to generate anything not

0:43

only for the possible misuses regarding

0:45

the generations but the data used to

0:47

train such models as well coming from

0:50

random images on the internet pretty

0:52

much anything with questionable content

0:55

and producing some unexpected images the

0:58

training data could also be retrieved

1:00

through inverse engineering of the model

1:02

which is most likely unwanted openai

1:05

also used this to justify not releasing

1:08

the daily2 model to the public here we

1:10

will look into what they are

1:12

investigating as potential risks and how

1:14

they are trying to mitigate them i go

1:16

through a very interesting article they

1:18

wrote covering their data pre-processing

1:21

steps when training dalit ii but before

1:24

so allow me a few seconds to be my own

1:26

sponsor and share my most recent project

1:28

which might interest you i recently

1:31

created a daily newsletter sharing ai

1:34

news and research with a simple and

1:36

clear one-liner to know if the paper

1:38

code or news is worth your time you can

1:41

subscribe to it on linkedin or with your

1:43

email the link is in the description

1:45

below

1:46

so what does openai really have in mind

1:48

when they say that they are making

1:50

efforts to reduce risks

1:52

first and the most obvious one is that

1:55

they are filtering out violent and

1:57

sexual images from the hundreds of

1:59

millions of images on the internet this

2:02

is to prevent the modal from learning

2:04

how to produce violent and sexual

2:06

content or even return the original

2:08

images as generations it's like not

2:11

teaching your kid how to fight if you

2:13

don't want him to get into fights it

2:15

might help but it's far from a perfect

2:17

fix still i believe it's necessary to

2:20

have such filters in our data sets and

2:22

definitely helps in this case but how do

2:25

they do that exactly they build several

2:27

models trained to classify data to be

2:30

filtered or not by giving them a few

2:32

different positive and negative examples

2:34

and iteratively improve the classifiers

2:37

with human feedback each classifier went

2:39

through our whole data set deleting more

2:42

images than needed just in case as it's

2:44

much better for the model to not see bad

2:47

data in the first place rather than

2:48

trying to correct the shot afterward

2:51

each classifier will have a unique

2:53

understanding of which content to filter

2:56

and will all complement themselves

2:57

ensuring good filtering if by good we

3:00

mean no false negative images going

3:02

through the filtering process

3:04

still it comes with downsides first the

3:07

data set is clearly smaller and may not

3:10

accurately represent the real world

3:12

which may be good or bad depending on

3:14

the use case they also found an

3:16

unexpected side effect of this data

3:18

filtering process it amplified the

3:21

model's biases towards certain

3:23

demographics introducing the second

3:25

thing openai is doing as a pre-training

3:28

mitigation reduce the biases caused by

3:31

this filtering for example after

3:33

filtering one of the biases they noticed

3:36

was that the modal generated more images

3:38

of men and fewer of women compared to

3:41

modals trained on the original data set

3:44

they explained that one of the reasons

3:46

may be that women appear more often than

3:48

men in sexual content which may bias

3:50

their classifiers to remove more false

3:53

negative images containing women from

3:55

the data set creating a gap in the

3:57

gender ratio that the model observes in

4:00

training and replicates to fix that they

4:02

re-weight the filtered data set to match

4:05

the distribution of the initial

4:07

pre-filter data set here is an example

4:10

they cover using cats and dogs where the

4:12

filter will remove more dugs then cats

4:14

so the fix will be to double the

4:16

training loss for images of dogs which

4:19

will be like sending two images of dugs

4:21

instead of one and compensating for the

4:23

lack of images this is once again just a

4:26

proxy for actual filtering bias but it

4:29

still reduces the image distribution gap

4:31

between the pre-filtered and the

4:33

filtered data set

4:35

the last issue is an issue of

4:36

memorization something the models seem

4:39

to be much more powerful than i am as we

4:42

said it's possible to regurgitate the

4:44

training data from such image generation

4:46

models which is not wanted in most cases

4:49

here we also want to generate novel

4:51

images and not simply copy paste images

4:54

from the internet but how can we prevent

4:56

that just like our memory you cannot

4:59

really decide what you remember and what

5:01

goes away once you see something it

5:03

either sticks or it doesn't they found

5:05

that just like humans learning a new

5:07

concept if the model sees the same image

5:10

numerous times in the data set it may

5:12

accidentally know it by heart at the end

5:15

of its training and generate it exactly

5:17

for a similar or identical text prompt

5:20

this one is an easy and reliable fix

5:23

just find out which images are too

5:25

similar and delete the duplicates easy

5:28

doing this will mean comparing each

5:30

image with every other image meaning

5:33

hundreds of quadrillions of image pairs

5:36

to compare instead they simply start by

5:38

grouping similar images together and

5:41

then compare the images with all other

5:43

images within the same and a few other

5:46

clusters around it immensely reducing

5:48

the complexity while still finding 97 of

5:52

all duplicate pairs again another fix to

5:55

do within the data set before training

5:57

our daily model openai also mentions

6:00

some next step they are investigating

6:02

and if you've enjoyed this video i

6:04

definitely invite you to read their

6:06

in-depth article to see all the details

6:08

of this pre-training mitigation work

6:11

it's a very interesting and well-written

6:13

article let me know what you think of

6:15

their mitigation efforts and their

6:17

choice to limit the model's access to

6:19

the public

6:20

leave a comment or join the discussion

6:22

in our community on discord thank you

6:24

for watching until the end and i will

6:26

see you next week with another amazing

6:29

paper

[Music]