Say goodbye to complex GAN and transformer architectures for image generation. This new method by Chenling Meng et al. from Stanford University and Carnegie Mellon University can generate new images from any user-based inputs. Even people like me with zero artistic skills can now generate beautiful images or modifications out of quick sketches. It may sound weird at first, but by just adding noise to the input, they can smooth out the undesirable artifacts, like the user edits, while preserving the overall structure of the image. So the image now looks like this, complete noise, but we can still see some shapes of the image and stroke, and specific colors. This new noisy input is then sent to the model to reverse this process and generate a new version of the image following this overall structure. Meaning that it will follow the overall shapes and colors of the image, but not so precisely that it can create new features like replacing this sketch with a real-looking beard. Learn more in the video and watch the amazing results! Watch the video References: ►Read the full article: ►My Newsletter (A new AI application explained weekly to your emails!): ►SDEdit, Chenlin Meng et al., 2021, ►Project link: ►Code: ►Demo: https://www.louisbouchard.ai/image-synthesis-from-sketches/ https://www.louisbouchard.ai/newsletter/ https://arxiv.org/pdf/2108.01073.pdf https://chenlin9.github.io/SDEdit/ https://github.com/ermongroup/SDEdit https://colab.research.google.com/drive/1KkLS53PndXKQpPlS1iK-k1nRQYmlb4aO?usp=sharing Video Transcript 00:00 say goodbye to complex GAN and 00:02 transformer architectures for image 00:03 generation 00:04 this new method by channing meng el from 00:07 stanford university and carnegie mellon 00:09 university can generate new images from 00:12 any user based inputs even people like 00:14 me 00:15 with zero artistic skills can now 00:17 generate beautiful images 00:18 or modifications out of quick sketches 00:21 it may sound weird at first but just by 00:23 adding noise to the input 00:25 they can smooth out the undesirable 00:26 artifacts like the user edits 00:28 while preserving the overall structure 00:30 of the image so the image now looks like 00:32 this 00:33 complete noise but we can still see some 00:35 shapes of the image and strokes and 00:37 specific colors 00:38 this new noisy input is then sent to the 00:40 model to reverse this process 00:42 and generate a new version of the image 00:44 following this overall structure 00:46 meaning that it will follow the overall 00:48 shapes and colors 00:49 of the image but not so precisely that 00:51 it can create 00:52 new features like replacing the sketch 00:54 with a real looking beard 00:56 the same way you can send a complete 00:58 draft of an image like this 01:00 add noise to it and it will remove the 01:02 noise by simulating the reverse steps 01:04 this way it will gradually improve the 01:06 quality of the generated image following 01:08 a specific dataset style 01:10 from any input this is why you don't 01:12 need any drawing skills anymore 01:14 since it generates an image from noise 01:16 it has no id and doesn't need to know 01:19 the initial input before applying noise 01:21 this is a big difference and a huge 01:23 advantage compared to other generative 01:25 networks 01:26 like conditional GANs where you train a 01:28 model to go from one style to another 01:30 with image pairs coming from two 01:32 different but related data sets 01:34 by the way if you find this interesting 01:36 don't forget to subscribe like the video 01:38 and share it with your friends or 01:39 colleagues 01:40 it helps a lot thank you this model 01:42 called sd edits 01:44 uses stochastic differential equations 01:46 or sdes 01:47 which means that by injecting gaussian 01:49 noise they transform 01:50 any complex data distribution into a 01:53 known prior 01:54 distribution this known prior 01:56 distribution is seen 01:57 during training and this is what the 01:59 model is trained on to reconstruct the 02:01 image 02:02 so the model learns how to transform 02:04 this gaussian noisy input 02:05 into a less noisy image and repeats this 02:08 step until we have an image 02:10 following the one style this method 02:12 works with whatever type of input 02:14 because if you add enough noise to it 02:16 the image will become so noisy that it 02:18 joins the known distribution 02:20 then the model can take this known 02:22 distribution and 02:23 do the reverse steps denoising the image 02:26 based on what it was trained on 02:28 indeed just like GANs we need a target 02:31 dataset 02:32 which is the kind of data or images we 02:34 want to generate 02:35 for example to generate realistic faces 02:37 we need a data set 02:38 full of realistic faces then we add 02:41 noise to these face 02:42 images and teach the model to denoise 02:45 them iteratively and this is the beauty 02:47 of this model 02:48 because once it has learned how to 02:50 denoise an image we can pretty much do 02:52 anything to the image 02:53 before adding noise to it like adding 02:55 strokes since they are blended within 02:57 the expected image distribution 02:59 from the noise we are adding typically 03:02 editing an image based on 03:04 such strokes is a challenging task for a 03:06 gan architecture 03:07 since these strokes are extremely 03:08 different from the image and from what 03:10 the model has seen 03:12 during training a GAN architecture will 03:14 need two data sets to fix this 03:16 the target data set which will be the 03:17 one we try to imitate and a source data 03:20 set which is the images with strokes 03:22 that we are trying to edit these are 03:25 called paired 03:26 datasets because we need each image to 03:28 come in pairs 03:29 in both data sets to train our model on 03:32 we also need to define a proper loss 03:34 function to train it 03:35 making the image synthesis process very 03:38 expensive and time consuming 03:40 in our case with sd edits we do not need 03:43 any paired data sets since the stroke 03:45 and the image styles are merged 03:47 because of this noise this makes the new 03:49 noisy image part of the known data 03:52 for the model which uses it to generate 03:54 a new image very similar to the training 03:56 data set 03:57 but taking the new structure into 03:59 account in other words 04:00 it can easily take an edited image as 04:03 input 04:03 blurs it just enough but not too much to 04:06 keep global semantics and structural 04:08 detail 04:09 and denoise it to produce a new image 04:11 that magically takes your edits into 04:13 account 04:14 and the model wasn't even trained with 04:16 strokes or edits examples only with the 04:19 original images 04:20 of course in the case of a simple user 04:23 edit 04:23 they carefully designed the architecture 04:25 to only generate the edited part and not 04:27 recreate 04:28 the whole picture this is super cool 04:30 because it enables applications such as 04:32 conditional image generation 04:34 stroke based image synthesis and editing 04:37 image and painting colorization and 04:39 other inverse problems to be solved 04:41 using a single unconditional modal 04:44 without 04:45 retraining it of course this will still 04:47 work 04:48 for only one generation style which will 04:50 be the data set it was trained on 04:52 however it's still a big advantage as 04:55 you only need one data set 04:56 instead of multiple related data sets 04:59 with a GAN based 05:00 image and painting network as we 05:02 discussed the only downside 05:04 may be the time needed to generate the 05:05 new image as 05:07 this iterative process takes much more 05:09 time than a single pass 05:10 through a more traditional gan based 05:12 generative model 05:13 still i'd rather wait a couple of 05:15 seconds to have 05:16 great results for an image than having a 05:18 blurry fail 05:19 in real time you can try it yourself 05:22 with the code they made publicly 05:23 available 05:24 or use the demo on their website both 05:26 are linked in the description 05:28 let me know what you think of this model 05:30 i'm excited to see what will happen with 05:32 this 05:32 sd based method in a couple of months or 05:35 even less 05:36 thank you for watching 05:42 [Music]