Have you ever had an image you really liked and could only manage to find a small version of it that looked like this image above on the left? How cool would it be if you could take this image and make it twice look as good? It’s great, but what if you could make it even four or eight times more high definition? Now we’re talking, just look at that. Here we enhanced the resolution of the image by a factor of four, meaning that we have four times more height and width pixels for more details, making it look a lot smoother. The best thing is that this is done within a few seconds, completely automatically, and works with pretty much any image. Oh, and you can even use it yourself with a demo they made available... Watch more results and learn about how it works in the video! Watch the video References ►Read the full article: ►Liang, J., Cao, J., Sun, G., Zhang, K., Van Gool, L. and Timofte, R., 2021. SwinIR: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 1833-1844). ►Code: ►Demo: ►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/swinir/ https://github.com/JingyunLiang/SwinIR https://replicate.ai/jingyunliang/swinir https://www.louisbouchard.ai/newsletter/ Video Transcript 00:00 have you ever had an image and really 00:01 liked it but couldn't manage to find a 00:04 better version than this how cool would 00:06 it be if you could take this image and 00:08 make it look twice as good it would be 00:10 great but what if i could make it even 00:12 four or eight times more high definition 00:15 now we are talking just look at that 00:18 here we enhance the resolution of the 00:19 image by a factor of 4 meaning that we 00:22 have 4 times more height and width 00:24 pixels for more details making it look a 00:27 lot smoother the best thing is that this 00:29 is done within a few seconds completely 00:31 automatically and works with pretty much 00:33 any image oh and you can even use it 00:36 yourself with a demo they made available 00:38 as we will see during the video 00:40 speaking of enhancing resolution i'm 00:42 always looking to enhance different 00:44 aspects of how i work and share what i 00:46 make if you are working on machine 00:47 learning problems there is no better way 00:49 to enhance your workflows than with this 00:51 episode sponsor weights and biases waste 00:53 and biases is a ml ups platform where 00:56 you can keep track of your machine 00:57 learning experiments insights and ids a 01:00 feature i especially love is how you can 01:02 quickly create and share amazing looking 01:04 interactive reports like this one 01:06 clearly showing your team or future self 01:08 your runs matrix hyper parameter and 01:10 data configurations alongside any notes 01:13 you had at the time capturing and 01:15 sharing your work is essential if you 01:16 want to grow as a ml practitioner which 01:18 is why i highly recommend using tools 01:20 that improve your work like weights and 01:22 biases just try it with the first link 01:24 below and i will owe you an apology if 01:26 you haven't been promoted within a year 01:29 before getting into this amazing model 01:31 we have to first introduce the concept 01:33 of photo of sampling or image super 01:36 resolution the goal here is to construct 01:38 a high resolution image from a 01:40 corresponding low resolution input image 01:42 which is a face in this case but it can 01:44 be any object animal or landscape the 01:47 low resolution will be such as 512 01:50 pixels or smaller not that blurry but 01:53 it's clearly not high definition when 01:55 you have it full screen just take a 01:57 second to put the video on full screen 01:59 and you'll see the artifacts while we 02:01 are at it you should also take a few 02:02 more seconds to like the video and send 02:05 it to a friend or two i'm convinced they 02:07 will love this and will thank you for it 02:09 anyway we take the slow definition image 02:11 and transform it into a high definition 02:13 image with a much clearer face in this 02:16 case a 2048 pixel square image which is 02:19 4 times more hd to achieve that we 02:22 usually have a typical unit like 02:24 architecture with convolutional neural 02:26 networks which i covered in many videos 02:28 before like the one appearing on the top 02:30 right corner of your screen if you'd 02:32 like to learn more about how they work 02:34 the main downside is that cnns have 02:36 difficulty adapting to extremely broad 02:39 data sets since they have the same 02:40 kernels for all images which makes them 02:43 great for local results and 02:44 generalization but less powerful for the 02:47 overall results when we want the best 02:48 results for each individual image on the 02:51 other hand transformers are promising 02:53 architecture due to the self-attention 02:55 mechanism capturing global interactions 02:57 between contexts for each image but have 03:00 heavy computations that are not suitable 03:02 for images here instead of using cnn's 03:05 or transformers they created the same 03:07 unit-like architecture with both 03:09 convolution and attention mechanisms or 03:12 more precisely using the swin 03:14 transformer architecture the swin 03:16 transformer is amazing since it has both 03:18 the advantages of the cnns to process 03:20 images of larger sizes and prepare them 03:23 for the attention mechanisms and these 03:25 attention mechanisms will create 03:27 long-range connections so that the model 03:29 understands the overall image much 03:31 better and can also recreate the same 03:33 image in a better way i won't enter into 03:36 the details of the swin transformer as i 03:38 already covered this architecture a few 03:40 months ago and explain its difference 03:42 with cnns and classical transformer 03:44 architectures used in natural language 03:46 processing if you'd like to learn more 03:47 about it and how the researchers applied 03:49 transformers to vision check out the 03:52 video and come back for the explanation 03:54 of the subsampling model the model is 03:56 called swin ir and can do many tasks 03:59 which include image of sampling as i 04:01 said it uses convolutions to allow for 04:03 bigger images more precisely they use a 04:06 convolutional layer to reduce the size 04:08 of the image which you can see here this 04:10 reduced image is then sent into the 04:12 model and also passed directly to the 04:15 reconstruction module to give the model 04:17 general information about the image as 04:20 we will see in a few seconds this 04:21 representation will basically look like 04:23 many weird blurry versions of the image 04:26 giving valuable information to the 04:28 upscaling module and how the overall 04:30 image should look like then we see the 04:33 swing transformer layers coupled with 04:35 convolutions this is to compress the 04:37 image further and always extract more 04:39 valuable precise information about both 04:42 the style and details while forgetting 04:44 about the overall image this is why we 04:46 then add the convoluted image to give 04:48 the overall information we lack with a 04:51 skip connection all at this is finally 04:53 sent into a reconstruction module called 04:55 subpixel which looks like this and uses 04:58 both the larger general features and 05:01 smaller detailed features we just 05:03 created to reconstruct a higher 05:05 definition image you can see this as a 05:07 convolutional neural network but in 05:09 reverse or simply a decoder taking the 05:12 condensed features we have and 05:14 reconstructing a bigger image from it 05:16 again if you'd like to learn more about 05:18 cnns and decoders you should check some 05:20 of the videos i made covering them so 05:22 you basically send your image in a 05:24 convolutional layer take this new 05:26 representation save it for later while 05:29 also sending it in the swin transformer 05:31 architecture to condense the information 05:33 further and learn the most important 05:35 features to reconstruct then you take 05:38 these new features with the saved ones 05:40 and use a decoder to reconstruct the 05:42 high definition version and voila now 05:45 you only need enough data and you will 05:47 have results like this 05:54 [Music] 05:59 of course as with all research there are 06:01 some limitations in this case probably 06:03 due to the initial convolutional layer 06:06 it doesn't work really well with very 06:07 small images under 200 pixels wide you 06:10 may see artifacts and weird results like 06:12 this one appear it seems like you can 06:14 also remove wrinkles using the bigger of 06:17 scalers which can be a useful artifact 06:20 if you are looking to do that other than 06:21 that the results are pretty crazy and 06:24 for having played with it a lot in the 06:26 past few days the four times upscaling 06:28 is incredible and you can play with it 06:30 too they made the github repo available 06:32 for everyone with pre-trained models and 06:35 even a demo you can play with right away 06:37 without any code of course this was just 06:40 an overview of this amazing new model 06:42 and i will strongly invite you to read 06:44 their paper for a deeper technical 06:46 understanding everything is linked in 06:48 the description let me know what you 06:49 think and i hope you've enjoyed this 06:52 video thank you once again weights and 06:54 biases for sponsoring this video and to 06:56 anyone still watching see you next week 06:59 with another exciting paper