Have you ever dreamed of taking the style of a picture, like this cool TikTok drawing style on the left, and applying it to a new picture of your choice? Well, I did, and it has never been easier to do. In fact, you can even achieve that from only text and can try it right now with this new method and their Google Colab notebook available for everyone (see references). Simply take a picture of the style you want to copy, enter the text you want to generate, and this algorithm will generate a new picture out of it! Just look back at the results above, such a big step forward! The results are extremely impressive, especially if you consider that they were made from a single line of text! If that sounds interesting, watch the video and learn more! Watch the video References ►Read the full article: ►CLIPDraw: Frans, K., Soros, L.B. and Witkowski, O., 2021. CLIPDraw: exploring text-to-drawing synthesis through language-image encoders. ►StyleCLIPDraw: Schaldenbrand, P., Liu, Z. and Oh, J., 2021. StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis. ►CLIPDraw Colab notebook: ►StyleCLIPDraw code: ►StyleCLIPDraw Colab notebook: ►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/clipdraw/ https://arxiv.org/abs/2106.14843 https://arxiv.org/abs/2111.03133 https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb https://github.com/pschaldenbrand/StyleCLIPDraw https://colab.research.google.com/github/pschaldenbrand/StyleCLIPDraw/blob/master/Style_ClipDraw_1_0_Refactored.ipynb https://www.louisbouchard.ai/newsletter/ Video Transcript 00:00 have you ever dreamed of taking a 00:01 picture like this cool tick tock drawing 00:03 style and applying it to a new picture 00:06 of your choice well i did and it has 00:08 never been easier to do in fact you can 00:10 even achieve that from only text and you 00:13 can try it right now with this new 00:15 method and their google collab notebook 00:17 available for everyone simply take a 00:19 picture of the style you want to copy 00:21 enter the text you want to generate and 00:23 this algorithm will generate a new 00:25 picture out of it look at that such a 00:28 big step forward the results are 00:30 extremely impressive especially if you 00:31 consider that they were made from a 00:33 single line of text here i tried 00:35 imitating the same style with another 00:37 text input to be honest sometimes it may 00:40 look a bit all over the place especially 00:42 if you select a more complicated or 00:44 messy drawing style like this one 00:46 speaking of something messy if you are 00:47 like me and your model versioning and 00:49 resource tracking looks like this you 00:51 may be the perfect candidate to try the 00:53 sponsor of two days video which is none 00:55 other than weights and biases i always 00:57 assumed i could stack folders like this 00:59 and simply add old v1 v2 v3 and so on to 01:03 my file names without any problem until 01:06 i had to work with someone while it may 01:07 be easy for me to find my old tests it 01:10 was impossible to explain my thought 01:12 process behind this mess and was my 01:14 teammate's nightmare if you care about 01:15 your teammates and reproducibility don't 01:18 do like i did and give weights and 01:20 biases a shot no more notebooks or 01:22 results saved everywhere as it creates a 01:24 super friendly user dashboard for you 01:26 and your team to track your experiments 01:28 and it's super easy to set up and use 01:30 it's the first link in the description 01:32 and i promise within a month you will be 01:34 completely dependent 01:37 as we said this new model by peter 01:39 schaldenbrunn ethel called style clip 01:42 draw which is an improvement upon clip 01:44 draw by kevin franz at all takes an 01:46 image and takes as inputs and can 01:48 generate a new image based on your text 01:50 and following the style in the image so 01:52 the model has to both understand what's 01:54 in the text and the image to correctly 01:56 copy its style as you may suspect this 01:59 is incredibly challenging but we are 02:01 fortunate enough to have a lot of 02:02 researchers working on so many different 02:04 challenges like trying to link text with 02:07 images which is what clip can do quickly 02:10 clip is a model developed by openai that 02:12 can basically associate a line of text 02:14 with an image both the text and images 02:17 will be encoded similarly so that they 02:19 will be very close to each other in the 02:21 new space they are encoded in if they 02:23 both mean the same thing using clip the 02:25 researchers could understand the text 02:27 from the user input and generate an 02:29 image out of it if you are not familiar 02:31 with clip yet i would recommend watching 02:33 a video i made about it together with 02:35 dolly earlier this year but then how did 02:38 they apply a new style to it clip is 02:40 just linking existing images to texts it 02:43 cannot create a new image indeed we also 02:46 need something else to capture the style 02:48 of the image sent in both the textures 02:50 and shapes well the image generation 02:52 process is quite unique it won't simply 02:55 generate an image right away rather it 02:57 will draw on a canvas and get better and 02:59 better over time it will just draw 03:01 random lines at first and create an 03:03 initial image this new image is then 03:06 sent back to the algorithm and compared 03:08 with both the style image and the text 03:10 which will generate another version this 03:12 is one iteration at each iteration we 03:15 draw random curves again oriented by the 03:17 two losses we'll see in a second this 03:19 random process is quite cool since it 03:22 will allow each new test to look 03:24 different so using the same image and 03:26 same text as inputs you will end up with 03:29 different results that may look even 03:31 better here you can see a very important 03:33 step called image augmentation it will 03:35 basically create multiple variations of 03:38 the image and allow the model to 03:39 converge on results that look right to 03:42 humans and not simply on the right 03:44 numerical values for the machine this 03:46 simple process is repeated until we are 03:49 satisfied with the results so this whole 03:51 model learns on the fly over many 03:54 iterations optimizing two losses we see 03:56 here one for aligning the content of the 03:59 image with the text sent and the other 04:01 further style here you can see the first 04:03 lust is based on how close the clip 04:06 encodings are as we said earlier where 04:08 clip is basically judging the results 04:11 and its decision will orient the next 04:12 generation the second one is also very 04:15 simple we send both images into a 04:18 pre-trained convolutional neural network 04:20 like vgg which will encode the images 04:22 similarly to clip we then compare these 04:24 encodings to measure how close they are 04:26 to each other this will be our second 04:29 judge that will orient the next 04:30 generation as well this way using both 04:33 judges we can get closer to the text and 04:35 the wanted style at the same time in the 04:37 next generation if you are not familiar 04:39 with convolutional neural networks and 04:41 encodings i will strongly recommend 04:43 watching the video i made explaining 04:45 them in simple terms this iterative 04:47 process makes the model a bit slow to 04:49 generate a beautiful image but after a 04:51 few hundred iterations or in other words 04:53 after a few minutes you have your new 04:55 image and i promise it's worth the wait 04:58 it also means that it doesn't require 05:00 any other training which is pretty cool 05:02 now the interesting part you've been 05:04 waiting for indeed you can use it right 05:06 now for free or at least pretty cheaply 05:08 using the collab notebook linked in the 05:10 description below i had some problems 05:12 running it and i would recommend buying 05:14 the pro version of collab if you'd like 05:16 to play with it without any issues 05:19 otherwise feel free to ask me any 05:21 questions in the comments if you 05:22 encounter any problems i pretty much 05:24 went through all of them myself to use 05:27 it you simply run all cells like that 05:29 and that's it you can now enter a new 05:31 text for the generation or send a new 05:33 image for the style from a link and 05:35 voila now tweak the parameters and see 05:38 what you can do if you play with it 05:40 please send me the results on twitter 05:42 and tag me i'd love to see them as they 05:44 state in the paper the results will have 05:46 the same biases as the models they use 05:49 such as clip which you should consider 05:51 if you play with it of course this was a 05:53 simple overview of the paper and i 05:55 strongly invite you to read both clip 05:57 draw and style clip draw for more 05:58 technical details and try their collab 06:01 notebook both are linked in the 06:02 description below thank you once again 06:05 weights and biases for sponsoring this 06:07 video and huge thanks to you for 06:09 watching until the end i hope you 06:11 enjoyed this week's video let me know 06:13 what you think and how you will use this 06:15 new model 06:17 [Music]