You can apply any design, lighting, or graphics style to your 4K image in real-time using this new machine learning-based approach! If you think this looks interesting, watch the video on this topic and read more about it from the references below 👇
►Read the full article: https://www.louisbouchard.ai/4k-image-translation-in-real-time/
►Liang, Jie and Zeng, Hui and Zhang, Lei, (2021), "High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network", https://export.arxiv.org/pdf/2105.09188.pdf
►Code: https://github.com/csjliang/LPTN
00:00
You've all seen these kinds of pictures where a person's face is "toonified" into an anime
00:05
character.
00:06
Many of you must have seen other kinds of image transformations like this, where an
00:10
image is changed to follow the style of a certain artist.
00:13
Here, an even more challenging task could be something like this, where an image is
00:18
transformed into another season or time of the day.
00:21
What you have not seen yet is the time it takes to produce these results and the actual
00:25
resolutions of the produced pictures.
00:28
This new paper is completely transparent towards this as it attacks exactly this problem.
00:33
Indeed, compared to most approaches, they translate high-definition 4K images, and this
00:38
is done in real-time.
00:39
In this work, they showed their results on season translation, night and day translations,
00:45
and photo retouching, which you've been looking at for the last minute.
00:49
This task is also known as 'image-to-image translation', and all the results you see
00:53
here were produced in 4K.
00:55
Of course, this video is not in 4K, and the images were taken from their paper, so it
01:00
might not look that high-quality here.
01:03
Please look at their paper or try their code if you are not convinced!
01:07
These are the most amazing results of this paper.
01:09
Here, you can see their technique below called LPTN, which stands for Laplacian Pyramid Translation
01:16
Network.
01:17
Look at how much less time it took LPTN to produce the image translations where most
01:22
approaches cannot even do it as this amount of definition is just too computationally
01:27
demanding.
01:28
And yes, this is in seconds.
01:30
They could translate 4K images in not even a tenth of a second using a single regular
01:36
GPU.
01:37
It is faster than all these approaches on 480p image translations!
01:41
And yes, it is not eight times faster, but 80 times faster on average!
01:46
But how is that possible?
01:48
How can they be so much more efficient and still produce amazing and high-quality results?
01:53
This is achieved by optimizing the fact that illumination and color manipulation, which
01:58
relates to the style of an image, is contained in the low-frequency component of an image.
02:03
Whereas the content details, which we want to keep when translating an image into another
02:08
style, can be adaptively refined on high-frequency components.
02:13
This is where it becomes interesting.
02:15
These two components can be divided into two tasks that can be performed simultaneously
02:19
by the GPU.
02:21
Indeed, they split the image into low-resolution and high-resolution components, use a network
02:26
to process the information of the low-frequency or the style of the image,
02:30
and render a final image merging this processed style with the refined high-frequency component,
02:37
which is the details of the image but adapted by a smaller sub-network to fit the new style.
02:43
Thus dodging the unavoidable heavy computation consumption when processing the high-resolution
02:48
components in the whole network.
02:50
This has been a long-standing studied field achieved with a popular technique called Laplacian
02:55
Pyramid.
02:57
The main idea of this Laplacian Pyramid method is to decompose the image into high and low-frequency
03:02
segments and reconstruct it afterward.
03:05
First, we produce an average of the initial image, making it blurry and removing high-frequency
03:11
components.
03:12
This is done using a kernel that passes through the whole image to round batches of pixels
03:17
together.
03:18
For example, if they take a 3 by 3 kernel, it would go through the whole image averaging
03:23
3 by 3 patches removing all unique values.
03:26
They are basically blurring the image by softening the edges.
03:30
Then, the difference between this blurry image and the initial image is saved to use at the
03:35
end of the algorithm to re-introduce the details, which are the high-frequency components.
03:41
This is repeated three times with bigger and bigger averaging kernels producing smaller
03:47
and smaller low-frequency versions of the image having less and less high-frequency
03:52
details.
03:53
If you remember, these low-frequency versions of the image contain information about the
03:57
colors in the image and illumination.
03:59
Indeed, they are basically just a blurred low-quality version of our image, which is
04:03
why the model is so much more efficient.
04:06
This is convenient since they are smaller versions of the image, and this is the exact
04:11
information we are trying to change when translating the image into another style.
04:16
Meaning that using these low-frequency versions is much more computationally efficient than
04:21
using the whole image directly, but they are also focused on the information we want to
04:26
change in the image, which is why the results are so great.
04:30
This lower-quality version of the image can be easily translated using an encoder-decoder,
04:35
just like any other image translation technique we previously mentioned, but since it is done
04:40
on a much lower quality image and a much smaller image,
04:43
it is exponentially faster to process.
04:47
The best thing is that the quality of the results only depends on the initially saved
04:52
high-frequency versions of the image sent as input which is not processed throughout
04:57
the whole network.
04:59
This high-frequency information is simply merged at the end of the processing with the
05:03
low-frequency image to improve the details.
05:06
Basically, it is so much faster because the researchers split the image's information
05:11
in two: low-frequency general information and detailed high-frequency information.
05:17
Then, they send only the computational-friendly part of the image, which is exactly what we
05:23
want to transform, the blurry, low-quality general style of the image, or in other words:
05:29
the low-frequency information.
05:30
Then, only fast and straightforward transformations are done on the high-frequency parts of the
05:36
image to resize them and merge them with the blurry newly-stylized image,
05:42
improving the results by adding details on all edges in the picture.
05:46
And voilà!
05:47
You have your results with a fraction of the time and computational power needed.
05:51
This is brilliant, and the code is publicly available if you would like to try it, which
05:56
is always cool!
05:58
As always, the links to the complete article and references are in the description of the
06:02
video.
06:03
Thank you for watching!