OpenAI successfully trained a network that can generate images from text captions. It is very similar to GPT-3 and Image GPT and produces amazing results. DALL-E is a new neural network developed by OpenAI based on GPT-3. In fact, it’s a smaller version of GPT-3 using instead of 175 billion. But it has been specifically trained to generate images from text descriptions, using a dataset of text-image pairs instead of a very broad dataset like GPT-3. It can create images from text captions using natural language, just like GPT-3 creates websites and stories. 12-billion parameters Project link: https://openai.com/blog/dall-e/ Follow me for more AI content: ►Instagram: https://www.instagram.com/whats_ai/ ►LinkedIn: https://www.linkedin.com/in/whats-ai/ ►Twitter: https://twitter.com/Whats_AI ►Facebook: https://www.facebook.com/whats.artifi... ►Medium: https://medium.com/@whats_ai The best courses in AI: ► https://www.omologapps.com/whats-ai Join Our Discord channel, Learn AI Together: ► https://discord.gg/learnaitogether Become a member of the YouTube community and support my work: https://www.youtube.com/channel/UCUzG... Video Transcript 00:00 openai successfully trained a network 00:02 able to generate images from text 00:04 captions 00:05 it's very similar to gpt3 and image gpt 00:09 and produces amazing results let's see 00:12 what it's really capable of 00:14 this is what's ai and i share artificial 00:16 intelligence news every week 00:18 if you are new to the channel and want 00:20 to stay up to date please consider 00:21 subscribing to not miss any further news 00:24 dolly is a new neural network developed 00:27 by openai based on gpt3 00:29 in fact it's a smaller version of gpt3 00:32 using 00:32 12 billion parameters instead of 175 00:36 billion parameters 00:38 but it has been specifically trained to 00:40 generate images from text descriptions 00:42 using a data set of text image pairs 00:45 instead of very broad 00:46 data set like gpt3 it can generate 00:50 images from text captions 00:51 using natural language just like gpt3 00:54 can create 00:55 websites and stories it's a continuation 00:58 of msgpt and gpt3 that i both covered in 01:01 previous videos if you haven't watched 01:03 them yet 01:04 dolly is very similar to gpt3 in the way 01:07 that it's also a transformer language 01:10 model 01:10 receiving text and images as inputs to 01:13 output a final transformed 01:15 image in many forms it can edit 01:17 attributes of specific objects 01:19 in images as you can see here or even 01:22 control multiple objects and their 01:24 attributes at the same time 01:26 this is a very complicated task since 01:29 the network has to understand the 01:30 relation between the objects 01:32 and create an image based on its 01:34 understanding 01:36 just take this example feeding to the 01:38 network an emoji 01:40 of a baby penguin wearing a blue hat 01:43 red gloves green shirt and yellow pens 01:46 all these components need to be 01:48 understood the objects colors and even 01:51 the location of the objects 01:54 meaning that the gloves need to be both 01:56 red and on the hands on the penguin 01:58 the same thing for the rest and the 02:00 results are very impressive considering 02:02 the complexity of the task 02:05 we can just see another more simple 02:07 example where we just fed 02:09 a small red block sitting on a large 02:11 green block 02:12 to the network now it just needs to know 02:15 that there are two blocks 02:16 their colors and one being smaller and 02:19 the other bigger 02:20 this seems very simple to us but it 02:23 needs a really high level of 02:25 understanding to be able to achieve this 02:28 it is still 02:29 not perfect as you can see but we are 02:31 getting pretty close 02:33 dolly is also able to change the 02:35 viewpoint of a scene 02:36 for example here we send an extreme 02:38 close-up view of an eagle 02:40 on a mountain and these are the results 02:43 here we just changed the eagle for a fox 02:46 and this is what is generated 02:50 of course a simple caption can produce 02:52 an infinitude of plausible images 02:54 nobody knows what you have in mind if 02:57 you think of a painting of a fox 02:59 sitting in a field at sunrise there are 03:02 many variables 03:03 such as the fox itself its colors where 03:06 it is looking at 03:07 its position and we are not even talking 03:10 about the background and the style of 03:11 the painting 03:12 fortunately since it is very similar to 03:14 gpt3 we can 03:16 add details to the input text and 03:17 generate something much closer to what 03:20 we expected 03:20 just as you can see here with different 03:22 styles of paintings 03:25 it can also generate images using 03:27 objects that are not related to each 03:30 other 03:30 like creating a realistic avocado chair 03:33 or generate original and unseen 03:35 illustrations 03:36 like a new emoji in short they described 03:39 dolly as a simple 03:40 decoder only transformer if you are not 03:43 familiar with transformers you should 03:44 definitely watch the video i made 03:46 covering them 03:48 as i mentioned it receives both the text 03:50 and an image as inputs 03:52 in the form of tokens just like gpt3 to 03:55 produce a transformed image 03:57 it uses self-attention as i described in 03:59 a previous video to understand the 04:01 context of the text 04:02 and sparse attention for the images 04:05 there are not many details about how it 04:07 works or how exactly it was trained 04:09 but they will be publishing a paper 04:11 explaining their approach 04:12 in short this daily network shows that 04:15 manipulating visual concepts 04:17 through language is now within reach and 04:20 i am excited to read their occurring 04:22 paper 04:23 of course this was just an overview of 04:25 this new openai network 04:27 called dolly i strongly invite you to 04:30 follow openai's news 04:31 about the upcoming paper for a better 04:33 technical understanding 04:35 or just subscribe to my channel i will 04:38 be sure to cover it as soon as it's 04:39 released 04:40 please leave a like if you went this far 04:42 in the video and since there's over 80 04:45 percent of you guys that are not 04:46 subscribed yet 04:47 consider subscribing to the channel to 04:49 not miss any further news 04:51 thank you for watching