paint-brush
Introducing DALL·E: Inspired by GPT-3 and Image-GPT from OpenAIby@whatsai
922 reads
922 reads

Introducing DALL·E: Inspired by GPT-3 and Image-GPT from OpenAI

by Louis BouchardJanuary 27th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

DALL-E from Open AI can generate images from text captions. It is very similar to GPT-3 and Image GPT and produces amazing results.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Introducing DALL·E: Inspired by GPT-3 and Image-GPT from OpenAI
Louis Bouchard HackerNoon profile picture

OpenAI successfully trained a network that can generate images from text captions. It is very similar to GPT-3 and Image GPT and produces amazing results.

DALL-E is a new neural network developed by OpenAI based on GPT-3.
In fact, it’s a smaller version of GPT-3 using 12-billion parameters instead of 175 billion. But it has been specifically trained to generate images from text descriptions, using a dataset of text-image pairs instead of a very broad dataset like GPT-3. It can create images from text captions using natural language, just like GPT-3 creates websites and stories.



Follow me for more AI content:

►Instagram: https://www.instagram.com/whats_ai/

►LinkedIn: https://www.linkedin.com/in/whats-ai/

►Twitter: https://twitter.com/Whats_AI

►Facebook: https://www.facebook.com/whats.artifi...

►Medium: https://medium.com/@whats_ai

The best courses in AI:

https://www.omologapps.com/whats-ai

Join Our Discord channel, Learn AI Together:

https://discord.gg/learnaitogether

Become a member of the YouTube community and support my work:
https://www.youtube.com/channel/UCUzG...

Video Transcript

00:00

openai successfully trained a network

00:02

able to generate images from text

00:04

captions

00:05

it's very similar to gpt3 and image gpt

00:09

and produces amazing results let's see

00:12

what it's really capable of

00:14

this is what's ai and i share artificial

00:16

intelligence news every week

00:18

if you are new to the channel and want

00:20

to stay up to date please consider

00:21

subscribing to not miss any further news

00:24

dolly is a new neural network developed

00:27

by openai based on gpt3

00:29

in fact it's a smaller version of gpt3

00:32

using

00:32

12 billion parameters instead of 175

00:36

billion parameters

00:38

but it has been specifically trained to

00:40

generate images from text descriptions

00:42

using a data set of text image pairs

00:45

instead of very broad

00:46

data set like gpt3 it can generate

00:50

images from text captions

00:51

using natural language just like gpt3

00:54

can create

00:55

websites and stories it's a continuation

00:58

of msgpt and gpt3 that i both covered in

01:01

previous videos if you haven't watched

01:03

them yet

01:04

dolly is very similar to gpt3 in the way

01:07

that it's also a transformer language

01:10

model

01:10

receiving text and images as inputs to

01:13

output a final transformed

01:15

image in many forms it can edit

01:17

attributes of specific objects

01:19

in images as you can see here or even

01:22

control multiple objects and their

01:24

attributes at the same time

01:26

this is a very complicated task since

01:29

the network has to understand the

01:30

relation between the objects

01:32

and create an image based on its

01:34

understanding

01:36

just take this example feeding to the

01:38

network an emoji

01:40

of a baby penguin wearing a blue hat

01:43

red gloves green shirt and yellow pens

01:46

all these components need to be

01:48

understood the objects colors and even

01:51

the location of the objects

01:54

meaning that the gloves need to be both

01:56

red and on the hands on the penguin

01:58

the same thing for the rest and the

02:00

results are very impressive considering

02:02

the complexity of the task

02:05

we can just see another more simple

02:07

example where we just fed

02:09

a small red block sitting on a large

02:11

green block

02:12

to the network now it just needs to know

02:15

that there are two blocks

02:16

their colors and one being smaller and

02:19

the other bigger

02:20

this seems very simple to us but it

02:23

needs a really high level of

02:25

understanding to be able to achieve this

02:28

it is still

02:29

not perfect as you can see but we are

02:31

getting pretty close

02:33

dolly is also able to change the

02:35

viewpoint of a scene

02:36

for example here we send an extreme

02:38

close-up view of an eagle

02:40

on a mountain and these are the results

02:43

here we just changed the eagle for a fox

02:46

and this is what is generated

02:50

of course a simple caption can produce

02:52

an infinitude of plausible images

02:54

nobody knows what you have in mind if

02:57

you think of a painting of a fox

02:59

sitting in a field at sunrise there are

03:02

many variables

03:03

such as the fox itself its colors where

03:06

it is looking at

03:07

its position and we are not even talking

03:10

about the background and the style of

03:11

the painting

03:12

fortunately since it is very similar to

03:14

gpt3 we can

03:16

add details to the input text and

03:17

generate something much closer to what

03:20

we expected

03:20

just as you can see here with different

03:22

styles of paintings

03:25

it can also generate images using

03:27

objects that are not related to each

03:30

other

03:30

like creating a realistic avocado chair

03:33

or generate original and unseen

03:35

illustrations

03:36

like a new emoji in short they described

03:39

dolly as a simple

03:40

decoder only transformer if you are not

03:43

familiar with transformers you should

03:44

definitely watch the video i made

03:46

covering them

03:48

as i mentioned it receives both the text

03:50

and an image as inputs

03:52

in the form of tokens just like gpt3 to

03:55

produce a transformed image

03:57

it uses self-attention as i described in

03:59

a previous video to understand the

04:01

context of the text

04:02

and sparse attention for the images

04:05

there are not many details about how it

04:07

works or how exactly it was trained

04:09

but they will be publishing a paper

04:11

explaining their approach

04:12

in short this daily network shows that

04:15

manipulating visual concepts

04:17

through language is now within reach and

04:20

i am excited to read their occurring

04:22

paper

04:23

of course this was just an overview of

04:25

this new openai network

04:27

called dolly i strongly invite you to

04:30

follow openai's news

04:31

about the upcoming paper for a better

04:33

technical understanding

04:35

or just subscribe to my channel i will

04:38

be sure to cover it as soon as it's

04:39

released

04:40

please leave a like if you went this far

04:42

in the video and since there's over 80

04:45

percent of you guys that are not

04:46

subscribed yet

04:47

consider subscribing to the channel to

04:49

not miss any further news

04:51

thank you for watching