OpenAI's New Model is Amazing! DALL·E 2 Explained Simply by@whatsai

OpenAI's New Model is Amazing! DALL·E 2 Explained Simply

April 7th 2022 6,022 reads
Read on Terminal Reader
Open TLDR
react to story with heart
react to story with light
react to story with boat
react to story with money
Last year I shared DALL·E, an amazing model by OpenAI capable of generating images from a text input with incredible results. Now is time for his big brother, DALL ·E 2, who is four times better at generating photorealistic images from text. The recent model learned a new skill; image inpainting. It can also edit those images and make them look even better! Or simply add a feature you want like some flamingos in the background. Learn more in the video!
image
Louis Bouchard HackerNoon profile picture

Louis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

youtube social iconlinkedin social icongithub social iconfacebook social icontwitter social iconinstagram social icon

Last year I shared , an amazing model by OpenAI capable of generating images from a text input with incredible results. Now is time for his big brother, DALL·E 2. And you won’t believe the progress in a single year! DALL·E 2 is not only better at generating photorealistic images from text. The results are four times the resolution!

As if it wasn’t already impressive enough, the recent model learned a new skill; .

DALL·E could generate images from text inputs.

DALL·E 2 can do it better, but it doesn’t stop there. It can also edit those images and make them look even better! Or simply add a feature you want like some flamingos in the background.

Sounds interesting? Learn more in the video!

References

►Read the full article: https://www.louisbouchard.ai/openais-new-model-dall-e-2-is-amazing/
►A. Ramesh et al., 2022, DALL-E 2 paper: https://cdn.openai.com/papers/dall-e-2.pdf
►OpenAI's blog post: https://openai.com/dall-e-2
►Risks and limitations: https://github.com/openai/dalle-2-preview/blob/main/system-card.md
►OpenAI Dalle's instagram page: https://www.instagram.com/openaidalle/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

       0:00

last year i shared dolly an amazing

0:02

model by openai capable of generating

0:05

images from a texan foot with incredible

0:08

results now it's time for his big

0:10

brother dolly too and you won't believe

0:13

the progress in a single year dolly 2 is

0:15

not only better at generating

0:17

photorealistic images from texts the

0:20

results are four times the resolution as

0:22

if it wasn't already impressive enough

0:25

the recent model learned a new skill

0:27

image in painting delhi could generate

0:30

images from text inputs dolly 2 can do

0:33

it better but it doesn't stop there it

0:35

can also edit those images and make them

0:38

look even better or simply add a feature

0:41

you want like some flapping goes in the

0:43

background this is what image and

0:45

painting is we take the part of an image

0:47

and replace it with something else

0:49

following the style and reflections in

0:51

the image keeping realism of course it

0:53

doesn't only replace the part of the

0:55

image at random this will be too easy

0:58

for openai this in-painting process is

1:00

also text guided which means you can

1:02

tell it to add a famine go here there or

1:05

even there

1:06

before diving into the nitty-gritty of

1:08

this newest dahle model let me talk a

1:11

little about this episode sponsor

1:13

weights and biases if you are not

1:15

familiar with weight and biases you are

1:17

most certainly new here and should

1:19

definitely subscribe to the channel

1:21

weight and biases allows you to keep

1:22

track of all your experiments with only

1:25

a handful of lines added to your code

1:27

one feature i love is how you can

1:29

quickly create and share amazing looking

1:31

interactive reports like this one

1:34

clearly showing your team or future self

1:36

your runs metrics hyperparameters and

1:38

data configurations alongside any notes

1:41

you or your team had at the time it's a

1:44

powerful feature to either add quick

1:46

comments on an experiment or create

1:48

polished pieces of analysis reports can

1:50

also be used as dashboards for reporting

1:53

a smaller subset of metrics than the

1:55

main workspace you can even create

1:57

public view-only links to share with

2:00

anyone easily capturing and sharing your

2:02

work is essential if you want to grow as

2:04

an ml practitioner which is why i

2:06

recommend using tools that improve your

2:08

work like weights and biases just try it

2:11

with the first link below and start

2:13

sharing your work like a pro

2:16

now let's dive into how dolly 2 can not

2:19

only generate images from text but is

2:21

also capable of editing them indeed this

2:24

new in-painting skill the network has

2:26

learned is due to its better

2:28

understanding of concepts and the images

2:30

themselves locally and globally what i

2:33

mean by locally and globally is that

2:35

dahle 2 has a deeper understanding of

2:37

why the pixels next to each other has

2:40

these colors as it understands the

2:42

objects in the scene and their

2:43

interrelation to each other this way it

2:46

will be able to understand that this

2:48

water has reflection and the object on

2:50

the right should be also reflected there

2:53

it also understands the global scene

2:55

which is what is happening just like if

2:58

you were to describe what is going on

3:00

when the person took the photo here

3:02

you'd say that this photo does not exist

3:05

obviously or else i'm definitely down to

3:07

try that if we forget that this is

3:09

impossible you'd say that the astronaut

3:11

is riding a horse in space so if i were

3:14

to ask you to draw the same scene but on

3:17

a planet rather than in free space you'd

3:19

be able to picture something like that

3:21

since you understand that the horse and

3:23

astronaut are the objects of interest to

3:25

keep in the picture this seems obvious

3:28

but it's extremely complex for a machine

3:30

that only sees pixels of colors which is

3:33

why dahli 2 is so impressive to me but

3:35

how exactly does the model understand

3:38

the text we send it and can generate an

3:40

image out of it well it's pretty similar

3:43

to the first model i covered on the

3:45

channel it starts by using the clip

3:47

model by openai to encode both a text

3:50

and an image into the same domain a

3:52

condensed representation called a latent

3:55

code then it will take this encoding and

3:58

use a generator also called a decoder to

4:01

generate a new image that means the same

4:04

thing as the text since it's from the

4:06

same latent code so dali 2 has two steps

4:10

clip to encode the information and the

4:12

new decoder model to take this encoded

4:15

information and generate an image out of

4:17

it these two separated steps are also

4:20

why we can generate variations of the

4:22

images we can simply randomly change the

4:25

encoded information just a little making

4:27

it move a tiny bit in the latent space

4:30

and it will still represent the same

4:32

sentence while having all different

4:34

values creating a different image

4:36

representing the same text as we see

4:39

here it initially takes a text input and

4:42

encodes it what we see above is the

4:44

first step of the training process where

4:46

we also feed it an image and encode it

4:48

using clip so that images and text are

4:51

encoded similarly following the clip

4:53

objective then for generating a new

4:56

image we switch to the section below

4:58

where we use the text encoding guided by

5:00

clip to transform it into an image ready

5:03

encoding this transformation is done

5:05

using a diffusion prior which we will

5:07

cover shortly as it is very similar to

5:09

the diffusion model used for the final

5:12

step finally we use our newly created

5:14

image encoding and decode it into a new

5:17

image using the diffusion decoder a

5:20

diffusion decoder or modal is a kind of

5:23

model that starts with random noise and

5:25

learns how to iteratively change this

5:28

noise to get back to an image it learns

5:30

that by doing the opposite during

5:32

training we will feed it images and

5:34

apply random gaussian noise on the image

5:37

iteratively until we can't see anything

5:40

other than noise then we simply reverse

5:43

the model to generate images from noise

5:45

if you'd like more detail about this

5:47

kind of network which are really cool i

5:50

invite you to watch this video i made

5:51

about them and voila this is how dali 2

5:55

generates such high quality images

5:58

following text it's super impressive and

6:00

tells us that the model does understand

6:02

the text but does it deeply understand

6:05

what it created

6:06

well it sure looks like it it's the

6:08

capability of impainting images that

6:10

makes us believe that it does understand

6:12

the pictures pretty well but why is that

6:15

so how can it link a text input to an

6:18

image and understand the image enough to

6:20

replace only some parts of it without

6:23

affecting the realism this is all

6:25

because of clip as it links a text input

6:28

to an image if we encode back our newly

6:30

generated image and use a different text

6:33

input to guide another generation we can

6:35

generate the second version of the image

6:38

that will replace only the wanted region

6:40

in our first generation and you will end

6:43

up with this picture unfortunately the

6:46

code isn't publicly available and is not

6:48

in their api yet the reason for that as

6:51

per openai is to study the risks and

6:53

limitations of such a powerful model

6:56

they actually discuss these potential

6:58

risks and the reason for this privacy in

7:00

their paper and in a great repository i

7:02

linked in the description below if you

7:04

are interested they also opened an

7:06

instagram account to share more results

7:08

if you'd like to see that it's also

7:10

linked below i loved dally and this one

7:13

is even cooler

7:15

of course this was just an overview of

7:17

how dahli2 works and i strongly invite

7:19

reading their great paper linked below

7:21

for more detail on their implementation

7:23

of the model i hope you enjoyed this

7:26

video as much as i enjoyed making it and

7:28

i will see you next week with another

amazing paper thank you for watching




react to story with heart
react to story with light
react to story with boat
react to story with money

Related Stories

L O A D I N G
. . . comments & more!