Introducing DALL·E: Inspired by GPT-3 and Image-GPT from OpenAI

Written by whatsai | Published 2021/01/27
Tech Story Tags: dall-e | artificial-intelligence | gpt-3 | gpt3 | machine-learning | hackernoon-top-story | youtube-transcripts | youtubers | web-monetization

TLDRvia the TL;DR App

OpenAI successfully trained a network that can generate images from text captions. It is very similar to GPT-3 and Image GPT and produces amazing results.
DALL-E is a new neural network developed by OpenAI based on GPT-3.
In fact, it’s a smaller version of GPT-3 using 12-billion parameters instead of 175 billion. But it has been specifically trained to generate images from text descriptions, using a dataset of text-image pairs instead of a very broad dataset like GPT-3. It can create images from text captions using natural language, just like GPT-3 creates websites and stories.



Follow me for more AI content:

The best courses in AI:
Join Our Discord channel, Learn AI Together:
Become a member of the YouTube community and support my work:
https://www.youtube.com/channel/UCUzG...

Video Transcript

00:00
openai successfully trained a network
00:02
able to generate images from text
00:04
captions
00:05
it's very similar to gpt3 and image gpt
00:09
and produces amazing results let's see
00:12
what it's really capable of
00:14
this is what's ai and i share artificial
00:16
intelligence news every week
00:18
if you are new to the channel and want
00:20
to stay up to date please consider
00:21
subscribing to not miss any further news
00:24
dolly is a new neural network developed
00:27
by openai based on gpt3
00:29
in fact it's a smaller version of gpt3
00:32
using
00:32
12 billion parameters instead of 175
00:36
billion parameters
00:38
but it has been specifically trained to
00:40
generate images from text descriptions
00:42
using a data set of text image pairs
00:45
instead of very broad
00:46
data set like gpt3 it can generate
00:50
images from text captions
00:51
using natural language just like gpt3
00:54
can create
00:55
websites and stories it's a continuation
00:58
of msgpt and gpt3 that i both covered in
01:01
previous videos if you haven't watched
01:03
them yet
01:04
dolly is very similar to gpt3 in the way
01:07
that it's also a transformer language
01:10
model
01:10
receiving text and images as inputs to
01:13
output a final transformed
01:15
image in many forms it can edit
01:17
attributes of specific objects
01:19
in images as you can see here or even
01:22
control multiple objects and their
01:24
attributes at the same time
01:26
this is a very complicated task since
01:29
the network has to understand the
01:30
relation between the objects
01:32
and create an image based on its
01:34
understanding
01:36
just take this example feeding to the
01:38
network an emoji
01:40
of a baby penguin wearing a blue hat
01:43
red gloves green shirt and yellow pens
01:46
all these components need to be
01:48
understood the objects colors and even
01:51
the location of the objects
01:54
meaning that the gloves need to be both
01:56
red and on the hands on the penguin
01:58
the same thing for the rest and the
02:00
results are very impressive considering
02:02
the complexity of the task
02:05
we can just see another more simple
02:07
example where we just fed
02:09
a small red block sitting on a large
02:11
green block
02:12
to the network now it just needs to know
02:15
that there are two blocks
02:16
their colors and one being smaller and
02:19
the other bigger
02:20
this seems very simple to us but it
02:23
needs a really high level of
02:25
understanding to be able to achieve this
02:28
it is still
02:29
not perfect as you can see but we are
02:31
getting pretty close
02:33
dolly is also able to change the
02:35
viewpoint of a scene
02:36
for example here we send an extreme
02:38
close-up view of an eagle
02:40
on a mountain and these are the results
02:43
here we just changed the eagle for a fox
02:46
and this is what is generated
02:50
of course a simple caption can produce
02:52
an infinitude of plausible images
02:54
nobody knows what you have in mind if
02:57
you think of a painting of a fox
02:59
sitting in a field at sunrise there are
03:02
many variables
03:03
such as the fox itself its colors where
03:06
it is looking at
03:07
its position and we are not even talking
03:10
about the background and the style of
03:11
the painting
03:12
fortunately since it is very similar to
03:14
gpt3 we can
03:16
add details to the input text and
03:17
generate something much closer to what
03:20
we expected
03:20
just as you can see here with different
03:22
styles of paintings
03:25
it can also generate images using
03:27
objects that are not related to each
03:30
other
03:30
like creating a realistic avocado chair
03:33
or generate original and unseen
03:35
illustrations
03:36
like a new emoji in short they described
03:39
dolly as a simple
03:40
decoder only transformer if you are not
03:43
familiar with transformers you should
03:44
definitely watch the video i made
03:46
covering them
03:48
as i mentioned it receives both the text
03:50
and an image as inputs
03:52
in the form of tokens just like gpt3 to
03:55
produce a transformed image
03:57
it uses self-attention as i described in
03:59
a previous video to understand the
04:01
context of the text
04:02
and sparse attention for the images
04:05
there are not many details about how it
04:07
works or how exactly it was trained
04:09
but they will be publishing a paper
04:11
explaining their approach
04:12
in short this daily network shows that
04:15
manipulating visual concepts
04:17
through language is now within reach and
04:20
i am excited to read their occurring
04:22
paper
04:23
of course this was just an overview of
04:25
this new openai network
04:27
called dolly i strongly invite you to
04:30
follow openai's news
04:31
about the upcoming paper for a better
04:33
technical understanding
04:35
or just subscribe to my channel i will
04:38
be sure to cover it as soon as it's
04:39
released
04:40
please leave a like if you went this far
04:42
in the video and since there's over 80
04:45
percent of you guys that are not
04:46
subscribed yet
04:47
consider subscribing to the channel to
04:49
not miss any further news
04:51
thank you for watching

Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.
Published by HackerNoon on 2021/01/27