You can apply any design, lighting, or graphics style to your 4K image in real-time using this new machine learning-based approach! If you think this looks interesting, watch the video on this topic and read more about it from the references below 👇 Watch the video References ►Read the full article: https://www.louisbouchard.ai/4k-image-translation-in-real-time/ ►Liang, Jie and Zeng, Hui and Zhang, Lei, (2021), "High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network", https://export.arxiv.org/pdf/2105.09188.pdf ►Code: https://github.com/csjliang/LPTN Video Transcript 00:00 You've all seen these kinds of pictures where a person's face is "toonified" into an anime 00:05 character. 00:06 Many of you must have seen other kinds of image transformations like this, where an 00:10 image is changed to follow the style of a certain artist. 00:13 Here, an even more challenging task could be something like this, where an image is 00:18 transformed into another season or time of the day. 00:21 What you have not seen yet is the time it takes to produce these results and the actual 00:25 resolutions of the produced pictures. 00:28 This new paper is completely transparent towards this as it attacks exactly this problem. 00:33 Indeed, compared to most approaches, they translate high-definition 4K images, and this 00:38 is done in real-time. 00:39 In this work, they showed their results on season translation, night and day translations, 00:45 and photo retouching, which you've been looking at for the last minute. 00:49 This task is also known as 'image-to-image translation', and all the results you see 00:53 here were produced in 4K. 00:55 Of course, this video is not in 4K, and the images were taken from their paper, so it 01:00 might not look that high-quality here. 01:03 Please look at their paper or try their code if you are not convinced! 01:07 These are the most amazing results of this paper. 01:09 Here, you can see their technique below called LPTN, which stands for Laplacian Pyramid Translation 01:16 Network. 01:17 Look at how much less time it took LPTN to produce the image translations where most 01:22 approaches cannot even do it as this amount of definition is just too computationally 01:27 demanding. 01:28 And yes, this is in seconds. 01:30 They could translate 4K images in not even a tenth of a second using a single regular 01:36 GPU. 01:37 It is faster than all these approaches on 480p image translations! 01:41 And yes, it is not eight times faster, but 80 times faster on average! 01:46 But how is that possible? 01:48 How can they be so much more efficient and still produce amazing and high-quality results? 01:53 This is achieved by optimizing the fact that illumination and color manipulation, which 01:58 relates to the style of an image, is contained in the low-frequency component of an image. 02:03 Whereas the content details, which we want to keep when translating an image into another 02:08 style, can be adaptively refined on high-frequency components. 02:13 This is where it becomes interesting. 02:15 These two components can be divided into two tasks that can be performed simultaneously 02:19 by the GPU. 02:21 Indeed, they split the image into low-resolution and high-resolution components, use a network 02:26 to process the information of the low-frequency or the style of the image, 02:30 and render a final image merging this processed style with the refined high-frequency component, 02:37 which is the details of the image but adapted by a smaller sub-network to fit the new style. 02:43 Thus dodging the unavoidable heavy computation consumption when processing the high-resolution 02:48 components in the whole network. 02:50 This has been a long-standing studied field achieved with a popular technique called Laplacian 02:55 Pyramid. 02:57 The main idea of this Laplacian Pyramid method is to decompose the image into high and low-frequency 03:02 segments and reconstruct it afterward. 03:05 First, we produce an average of the initial image, making it blurry and removing high-frequency 03:11 components. 03:12 This is done using a kernel that passes through the whole image to round batches of pixels 03:17 together. 03:18 For example, if they take a 3 by 3 kernel, it would go through the whole image averaging 03:23 3 by 3 patches removing all unique values. 03:26 They are basically blurring the image by softening the edges. 03:30 Then, the difference between this blurry image and the initial image is saved to use at the 03:35 end of the algorithm to re-introduce the details, which are the high-frequency components. 03:41 This is repeated three times with bigger and bigger averaging kernels producing smaller 03:47 and smaller low-frequency versions of the image having less and less high-frequency 03:52 details. 03:53 If you remember, these low-frequency versions of the image contain information about the 03:57 colors in the image and illumination. 03:59 Indeed, they are basically just a blurred low-quality version of our image, which is 04:03 why the model is so much more efficient. 04:06 This is convenient since they are smaller versions of the image, and this is the exact 04:11 information we are trying to change when translating the image into another style. 04:16 Meaning that using these low-frequency versions is much more computationally efficient than 04:21 using the whole image directly, but they are also focused on the information we want to 04:26 change in the image, which is why the results are so great. 04:30 This lower-quality version of the image can be easily translated using an encoder-decoder, 04:35 just like any other image translation technique we previously mentioned, but since it is done 04:40 on a much lower quality image and a much smaller image, 04:43 it is exponentially faster to process. 04:47 The best thing is that the quality of the results only depends on the initially saved 04:52 high-frequency versions of the image sent as input which is not processed throughout 04:57 the whole network. 04:59 This high-frequency information is simply merged at the end of the processing with the 05:03 low-frequency image to improve the details. 05:06 Basically, it is so much faster because the researchers split the image's information 05:11 in two: low-frequency general information and detailed high-frequency information. 05:17 Then, they send only the computational-friendly part of the image, which is exactly what we 05:23 want to transform, the blurry, low-quality general style of the image, or in other words: 05:29 the low-frequency information. 05:30 Then, only fast and straightforward transformations are done on the high-frequency parts of the 05:36 image to resize them and merge them with the blurry newly-stylized image, 05:42 improving the results by adding details on all edges in the picture. 05:46 And voilà! 05:47 You have your results with a fraction of the time and computational power needed. 05:51 This is brilliant, and the code is publicly available if you would like to try it, which 05:56 is always cool! 05:58 As always, the links to the complete article and references are in the description of the 06:02 video. 06:03 Thank you for watching!