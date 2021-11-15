StyleCLIPDraw: Text-to-Drawing Synthesis with Artistic Control

Have you ever dreamed of taking the style of a picture, like this cool TikTok drawing style on the left, and applying it to a new picture of your choice? Well, I did, and it has never been easier to do. In fact, you can even achieve that from only text and can try it right now with this new method and their Google Colab notebook available for everyone (see references).

References

►Read the full article: https://www.louisbouchard.ai/clipdraw/

►CLIPDraw: Frans, K., Soros, L.B. and Witkowski, O., 2021. CLIPDraw:

exploring text-to-drawing synthesis through language-image encoders. https://arxiv.org/abs/2106.14843

►StyleCLIPDraw: Schaldenbrand, P., Liu, Z. and Oh, J., 2021.

StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Synthesis. https://arxiv.org/abs/2111.03133

►CLIPDraw Colab notebook: https://colab.research.google.com/github/kvfrans/clipdraw/blob/main/clipdraw.ipynb

►StyleCLIPDraw code: https://github.com/pschaldenbrand/StyleCLIPDraw

►StyleCLIPDraw Colab notebook: https://colab.research.google.com/github/pschaldenbrand/StyleCLIPDraw/blob/master/Style_ClipDraw_1_0_Refactored.ipynb

Video Transcript

have you ever dreamed of taking a

picture like this cool tick tock drawing

style and applying it to a new picture

of your choice well i did and it has

never been easier to do in fact you can

even achieve that from only text and you

can try it right now with this new

method and their google collab notebook

available for everyone simply take a

picture of the style you want to copy

enter the text you want to generate and

this algorithm will generate a new

picture out of it look at that such a

big step forward the results are

extremely impressive especially if you

consider that they were made from a

single line of text here i tried

imitating the same style with another

text input to be honest sometimes it may

look a bit all over the place especially

if you select a more complicated or

messy drawing style like this one

speaking of something messy if you are

like me and your model versioning and

resource tracking looks like this you

may be the perfect candidate to try the

sponsor of two days video which is none

other than weights and biases i always

assumed i could stack folders like this

and simply add old v1 v2 v3 and so on to

my file names without any problem until

i had to work with someone while it may

be easy for me to find my old tests it

was impossible to explain my thought

process behind this mess and was my

teammate's nightmare if you care about

your teammates and reproducibility don't

do like i did and give weights and

biases a shot no more notebooks or

results saved everywhere as it creates a

super friendly user dashboard for you

and your team to track your experiments

and it's super easy to set up and use

it's the first link in the description

and i promise within a month you will be

completely dependent

as we said this new model by peter

schaldenbrunn ethel called style clip

draw which is an improvement upon clip

draw by kevin franz at all takes an

image and takes as inputs and can

generate a new image based on your text

and following the style in the image so

the model has to both understand what's

in the text and the image to correctly

copy its style as you may suspect this

is incredibly challenging but we are

fortunate enough to have a lot of

researchers working on so many different

challenges like trying to link text with

images which is what clip can do quickly

clip is a model developed by openai that

can basically associate a line of text

with an image both the text and images

will be encoded similarly so that they

will be very close to each other in the

new space they are encoded in if they

both mean the same thing using clip the

researchers could understand the text

from the user input and generate an

image out of it if you are not familiar

with clip yet i would recommend watching

a video i made about it together with

dolly earlier this year but then how did

they apply a new style to it clip is

just linking existing images to texts it

cannot create a new image indeed we also

need something else to capture the style

of the image sent in both the textures

and shapes well the image generation

process is quite unique it won't simply

generate an image right away rather it

will draw on a canvas and get better and

better over time it will just draw

random lines at first and create an

initial image this new image is then

sent back to the algorithm and compared

with both the style image and the text

which will generate another version this

is one iteration at each iteration we

draw random curves again oriented by the

two losses we'll see in a second this

random process is quite cool since it

will allow each new test to look

different so using the same image and

same text as inputs you will end up with

different results that may look even

better here you can see a very important

step called image augmentation it will

basically create multiple variations of

the image and allow the model to

converge on results that look right to

humans and not simply on the right

numerical values for the machine this

simple process is repeated until we are

satisfied with the results so this whole

model learns on the fly over many

iterations optimizing two losses we see

here one for aligning the content of the

image with the text sent and the other

further style here you can see the first

lust is based on how close the clip

encodings are as we said earlier where

clip is basically judging the results

and its decision will orient the next

generation the second one is also very

simple we send both images into a

pre-trained convolutional neural network

like vgg which will encode the images

similarly to clip we then compare these

encodings to measure how close they are

to each other this will be our second

judge that will orient the next

generation as well this way using both

judges we can get closer to the text and

the wanted style at the same time in the

next generation if you are not familiar

with convolutional neural networks and

encodings i will strongly recommend

watching the video i made explaining

them in simple terms this iterative

process makes the model a bit slow to

generate a beautiful image but after a

few hundred iterations or in other words

after a few minutes you have your new

image and i promise it's worth the wait

it also means that it doesn't require

any other training which is pretty cool

now the interesting part you've been

waiting for indeed you can use it right

now for free or at least pretty cheaply

using the collab notebook linked in the

description below i had some problems

running it and i would recommend buying

the pro version of collab if you'd like

to play with it without any issues

otherwise feel free to ask me any

questions in the comments if you

encounter any problems i pretty much

went through all of them myself to use

it you simply run all cells like that

and that's it you can now enter a new

text for the generation or send a new

image for the style from a link and

voila now tweak the parameters and see

what you can do if you play with it

please send me the results on twitter

and tag me i'd love to see them as they

state in the paper the results will have

the same biases as the models they use

such as clip which you should consider

if you play with it of course this was a

simple overview of the paper and i

strongly invite you to read both clip

draw and style clip draw for more

technical details and try their collab

notebook both are linked in the

description below thank you once again

weights and biases for sponsoring this

video and huge thanks to you for

watching until the end i hope you

enjoyed this week's video let me know

what you think and how you will use this

new model

[Music]





