923 reads

923 reads

Introducing DALL·E: Inspired by GPT-3 and Image-GPT from OpenAI

by Louis BouchardJanuary 27th, 2021

Read on Terminal Reader

Print this story

Read this story w/o Javascript

Too Long; Didn't Read

DALL-E from Open AI can generate images from text captions. It is very similar to GPT-3 and Image GPT and produces amazing results.

Companies Mentioned

Mention Thumbnail

Mention Thumbnail

featured image - Introducing DALL·E: Inspired by GPT-3 and Image-GPT from OpenAI

OpenAI successfully trained a network that can generate images from text captions. It is very similar to GPT-3 and Image GPT and produces amazing results.

DALL-E is a new neural network developed by OpenAI based on GPT-3.
In fact, it’s a smaller version of GPT-3 using 12-billion parameters instead of 175 billion. But it has been specifically trained to generate images from text descriptions, using a dataset of text-image pairs instead of a very broad dataset like GPT-3. It can create images from text captions using natural language, just like GPT-3 creates websites and stories.

Project link: https://openai.com/blog/dall-e/

Follow me for more AI content:

►Instagram: https://www.instagram.com/whats_ai/

►LinkedIn: https://www.linkedin.com/in/whats-ai/

►Twitter: https://twitter.com/Whats_AI

►Facebook: https://www.facebook.com/whats.artifi...

►Medium: https://medium.com/@whats_ai

The best courses in AI:

►https://www.omologapps.com/whats-ai

Join Our Discord channel, Learn AI Together:

►https://discord.gg/learnaitogether

Become a member of the YouTube community and support my work:
https://www.youtube.com/channel/UCUzG...

Video Transcript

00:00

openai successfully trained a network

00:02

able to generate images from text

00:04

captions

00:05

it's very similar to gpt3 and image gpt

00:09

and produces amazing results let's see

00:12

what it's really capable of

00:14

this is what's ai and i share artificial

00:16

intelligence news every week

00:18

if you are new to the channel and want

00:20

to stay up to date please consider

00:21

subscribing to not miss any further news

00:24

dolly is a new neural network developed

00:27

by openai based on gpt3

00:29

in fact it's a smaller version of gpt3

00:32

using

00:32

12 billion parameters instead of 175

00:36

billion parameters

00:38

but it has been specifically trained to

00:40

generate images from text descriptions

00:42

using a data set of text image pairs

00:45

instead of very broad

00:46

data set like gpt3 it can generate

00:50

images from text captions

00:51

using natural language just like gpt3

00:54

can create

00:55

websites and stories it's a continuation

00:58

of msgpt and gpt3 that i both covered in

01:01

previous videos if you haven't watched

01:03

them yet

01:04

dolly is very similar to gpt3 in the way

01:07

that it's also a transformer language

01:10

model

01:10

receiving text and images as inputs to

01:13

output a final transformed

01:15

image in many forms it can edit

01:17

attributes of specific objects

01:19

in images as you can see here or even

01:22

control multiple objects and their

01:24

attributes at the same time

01:26

this is a very complicated task since

01:29

the network has to understand the

01:30

relation between the objects

01:32

and create an image based on its

01:34

understanding

01:36

just take this example feeding to the

01:38

network an emoji

01:40

of a baby penguin wearing a blue hat

01:43

red gloves green shirt and yellow pens

01:46

all these components need to be

01:48

understood the objects colors and even

01:51

the location of the objects

01:54

meaning that the gloves need to be both

01:56

red and on the hands on the penguin

01:58

the same thing for the rest and the

02:00

results are very impressive considering

02:02

the complexity of the task

02:05

we can just see another more simple

02:07

example where we just fed

02:09

a small red block sitting on a large

02:11

green block

02:12

to the network now it just needs to know

02:15

that there are two blocks

02:16

their colors and one being smaller and

02:19

the other bigger

02:20

this seems very simple to us but it

02:23

needs a really high level of

02:25

understanding to be able to achieve this

02:28

it is still

02:29

not perfect as you can see but we are

02:31

getting pretty close

02:33

dolly is also able to change the

02:35

viewpoint of a scene

02:36

for example here we send an extreme

02:38

close-up view of an eagle

02:40

on a mountain and these are the results

02:43

here we just changed the eagle for a fox

02:46

and this is what is generated

02:50

of course a simple caption can produce

02:52

an infinitude of plausible images

02:54

nobody knows what you have in mind if

02:57

you think of a painting of a fox

02:59

sitting in a field at sunrise there are

03:02

many variables

03:03

such as the fox itself its colors where

03:06

it is looking at

03:07

its position and we are not even talking

03:10

about the background and the style of

03:11

the painting

03:12

fortunately since it is very similar to

03:14

gpt3 we can

03:16

add details to the input text and

03:17

generate something much closer to what

03:20

we expected

03:20

just as you can see here with different

03:22

styles of paintings

03:25

it can also generate images using

03:27

objects that are not related to each

03:30

other

03:30

like creating a realistic avocado chair

03:33

or generate original and unseen

03:35

illustrations

03:36

like a new emoji in short they described

03:39

dolly as a simple

03:40

decoder only transformer if you are not

03:43

familiar with transformers you should

03:44

definitely watch the video i made

03:46

covering them

03:48

as i mentioned it receives both the text

03:50

and an image as inputs

03:52

in the form of tokens just like gpt3 to

03:55

produce a transformed image

03:57

it uses self-attention as i described in

03:59

a previous video to understand the

04:01

context of the text

04:02

and sparse attention for the images

04:05

there are not many details about how it

04:07

works or how exactly it was trained

04:09

but they will be publishing a paper

04:11

explaining their approach

04:12

in short this daily network shows that

04:15

manipulating visual concepts

04:17

through language is now within reach and

04:20

i am excited to read their occurring

04:22

paper

04:23

of course this was just an overview of

04:25

this new openai network

04:27

called dolly i strongly invite you to

04:30

follow openai's news

04:31

about the upcoming paper for a better

04:33

technical understanding

04:35

or just subscribe to my channel i will

04:38

be sure to cover it as soon as it's

04:39

released

04:40

please leave a like if you went this far

04:42

in the video and since there's over 80

04:45

percent of you guys that are not

04:46

subscribed yet

04:47

consider subscribing to the channel to

04:49

not miss any further news

04:51

thank you for watching

Spacecoin

L O A D I N G
. . . comments & more!

About Author

Louis Bouchard@whatsai

I explain Artificial Intelligence terms and news to non-experts.

Read my stories About @whatsai

TOPICS

purcat-img

gaming #dall-e #artificial-intelligence #gpt-3 #gpt3 #machine-learning #hackernoon-top-story #youtube-transcripts #youtubers #web-monetization

THIS ARTICLE WAS FEATURED IN...

Read on Terminal Reader

Read this story w/o Javascript

Mentioned in this story

companies

Mention Thumbnail

Mention Thumbnail

Mention Thumbnail

Mention Thumbnail

Mention Thumbnail

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas

Categories

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks