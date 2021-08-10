SDEdit Helps Regular People Do Complex Graphic Design Tasks

Say goodbye to complex GAN and transformer architectures for image generation. This new method by Chenling Meng et al. from Stanford University and Carnegie Mellon University can generate new images from any user-based inputs.

Even people like me with zero artistic skills can now generate beautiful images or modifications out of quick sketches. It may sound weird at first, but by just adding noise to the input, they can smooth out the undesirable artifacts, like the user edits, while preserving the overall structure of the image.

So the image now looks like this, complete noise, but we can still see some shapes of the image and stroke, and specific colors. This new noisy input is then sent to the model to reverse this process and generate a new version of the image following this overall structure.

Meaning that it will follow the overall shapes and colors of the image, but not so precisely that it can create new features like replacing this sketch with a real-looking beard. Learn more in the video and watch the amazing results!

►Read the full article: https://www.louisbouchard.ai/image-synthesis-from-sketches/

►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

►SDEdit, Chenlin Meng et al., 2021, https://arxiv.org/pdf/2108.01073.pdf

►Project link: https://chenlin9.github.io/SDEdit/

►Code: https://github.com/ermongroup/SDEdit

►Demo: https://colab.research.google.com/drive/1KkLS53PndXKQpPlS1iK-k1nRQYmlb4aO?usp=sharing

Video Transcript

say goodbye to complex GAN and

transformer architectures for image

generation

this new method by channing meng el from

stanford university and carnegie mellon

university can generate new images from

any user based inputs even people like

me

with zero artistic skills can now

generate beautiful images

or modifications out of quick sketches

it may sound weird at first but just by

adding noise to the input

they can smooth out the undesirable

artifacts like the user edits

while preserving the overall structure

of the image so the image now looks like

this

complete noise but we can still see some

shapes of the image and strokes and

specific colors

this new noisy input is then sent to the

model to reverse this process

and generate a new version of the image

following this overall structure

meaning that it will follow the overall

shapes and colors

of the image but not so precisely that

it can create

new features like replacing the sketch

with a real looking beard

the same way you can send a complete

draft of an image like this

add noise to it and it will remove the

noise by simulating the reverse steps

this way it will gradually improve the

quality of the generated image following

a specific dataset style

from any input this is why you don't

need any drawing skills anymore

since it generates an image from noise

it has no id and doesn't need to know

the initial input before applying noise

this is a big difference and a huge

advantage compared to other generative

networks

like conditional GANs where you train a

model to go from one style to another

with image pairs coming from two

different but related data sets

by the way if you find this interesting

don't forget to subscribe like the video

and share it with your friends or

colleagues

it helps a lot thank you this model

called sd edits

uses stochastic differential equations

or sdes

which means that by injecting gaussian

noise they transform

any complex data distribution into a

known prior

distribution this known prior

distribution is seen

during training and this is what the

model is trained on to reconstruct the

image

so the model learns how to transform

this gaussian noisy input

into a less noisy image and repeats this

step until we have an image

following the one style this method

works with whatever type of input

because if you add enough noise to it

the image will become so noisy that it

joins the known distribution

then the model can take this known

distribution and

do the reverse steps denoising the image

based on what it was trained on

indeed just like GANs we need a target

dataset

which is the kind of data or images we

want to generate

for example to generate realistic faces

we need a data set

full of realistic faces then we add

noise to these face

images and teach the model to denoise

them iteratively and this is the beauty

of this model

because once it has learned how to

denoise an image we can pretty much do

anything to the image

before adding noise to it like adding

strokes since they are blended within

the expected image distribution

from the noise we are adding typically

editing an image based on

such strokes is a challenging task for a

gan architecture

since these strokes are extremely

different from the image and from what

the model has seen

during training a GAN architecture will

need two data sets to fix this

the target data set which will be the

one we try to imitate and a source data

set which is the images with strokes

that we are trying to edit these are

called paired

datasets because we need each image to

come in pairs

in both data sets to train our model on

we also need to define a proper loss

function to train it

making the image synthesis process very

expensive and time consuming

in our case with sd edits we do not need

any paired data sets since the stroke

and the image styles are merged

because of this noise this makes the new

noisy image part of the known data

for the model which uses it to generate

a new image very similar to the training

data set

but taking the new structure into

account in other words

it can easily take an edited image as

input

blurs it just enough but not too much to

keep global semantics and structural

detail

and denoise it to produce a new image

that magically takes your edits into

account

and the model wasn't even trained with

strokes or edits examples only with the

original images

of course in the case of a simple user

edit

they carefully designed the architecture

to only generate the edited part and not

recreate

the whole picture this is super cool

because it enables applications such as

conditional image generation

stroke based image synthesis and editing

image and painting colorization and

other inverse problems to be solved

using a single unconditional modal

without

retraining it of course this will still

work

for only one generation style which will

be the data set it was trained on

however it's still a big advantage as

you only need one data set

instead of multiple related data sets

with a GAN based

image and painting network as we

discussed the only downside

may be the time needed to generate the

new image as

this iterative process takes much more

time than a single pass

through a more traditional gan based

generative model

still i'd rather wait a couple of

seconds to have

great results for an image than having a

blurry fail

in real time you can try it yourself

with the code they made publicly

available

or use the demo on their website both

are linked in the description

let me know what you think of this model

i'm excited to see what will happen with

this

sd based method in a couple of months or

even less

thank you for watching

