paint-brush
High-Resolution Photorealistic Image Translation in Real Timeby@whatsai
375 reads
375 reads

High-Resolution Photorealistic Image Translation in Real Time

by Louis BouchardMay 29th, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

You can apply any design, lighting, or graphics style to your 4K image in real-time using this new machine learning-based approach. If you think this looks interesting, watch the video on this topic and read more about it from the references below. They could translate 4K images in not even a tenth of a second using a single regular GPU. It is faster than all these approaches on 480p image translations! But how is that possible? They are 80 times faster on average!

Company Mentioned

Mention Thumbnail
featured image - High-Resolution Photorealistic Image Translation in Real Time
Louis Bouchard HackerNoon profile picture

You can apply any design, lighting, or graphics style to your 4K image in real-time using this new machine learning-based approach! If you think this looks interesting, watch the video on this topic and read more about it from the references below 👇

Watch the video

References

►Read the full article: https://www.louisbouchard.ai/4k-image-translation-in-real-time/

►Liang, Jie and Zeng, Hui and Zhang, Lei, (2021), "High-Resolution Photorealistic Image Translation in Real-Time: A Laplacian Pyramid Translation Network", https://export.arxiv.org/pdf/2105.09188.pdf

►Code: https://github.com/csjliang/LPTN

Video Transcript

00:00

You've all seen these kinds of pictures where a person's face is "toonified" into an anime

00:05

character.

00:06

Many of you must have seen other kinds of image transformations like this, where an

00:10

image is changed to follow the style of a certain artist.

00:13

Here, an even more challenging task could be something like this, where an image is

00:18

transformed into another season or time of the day.

00:21

What you have not seen yet is the time it takes to produce these results and the actual

00:25

resolutions of the produced pictures.

00:28

This new paper is completely transparent towards this as it attacks exactly this problem.

00:33

Indeed, compared to most approaches, they translate high-definition 4K images, and this

00:38

is done in real-time.

00:39

In this work, they showed their results on season translation, night and day translations,

00:45

and photo retouching, which you've been looking at for the last minute.

00:49

This task is also known as 'image-to-image translation', and all the results you see

00:53

here were produced in 4K.

00:55

Of course, this video is not in 4K, and the images were taken from their paper, so it

01:00

might not look that high-quality here.

01:03

Please look at their paper or try their code if you are not convinced!

01:07

These are the most amazing results of this paper.

01:09

Here, you can see their technique below called LPTN, which stands for Laplacian Pyramid Translation

01:16

Network.

01:17

Look at how much less time it took LPTN to produce the image translations where most

01:22

approaches cannot even do it as this amount of definition is just too computationally

01:27

demanding.

01:28

And yes, this is in seconds.

01:30

They could translate 4K images in not even a tenth of a second using a single regular

01:36

GPU.

01:37

It is faster than all these approaches on 480p image translations!

01:41

And yes, it is not eight times faster, but 80 times faster on average!

01:46

But how is that possible?

01:48

How can they be so much more efficient and still produce amazing and high-quality results?

01:53

This is achieved by optimizing the fact that illumination and color manipulation, which

01:58

relates to the style of an image, is contained in the low-frequency component of an image.

02:03

Whereas the content details, which we want to keep when translating an image into another

02:08

style, can be adaptively refined on high-frequency components.

02:13

This is where it becomes interesting.

02:15

These two components can be divided into two tasks that can be performed simultaneously

02:19

by the GPU.

02:21

Indeed, they split the image into low-resolution and high-resolution components, use a network

02:26

to process the information of the low-frequency or the style of the image,

02:30

and render a final image merging this processed style with the refined high-frequency component,

02:37

which is the details of the image but adapted by a smaller sub-network to fit the new style.

02:43

Thus dodging the unavoidable heavy computation consumption when processing the high-resolution

02:48

components in the whole network.

02:50

This has been a long-standing studied field achieved with a popular technique called Laplacian

02:55

Pyramid.

02:57

The main idea of this Laplacian Pyramid method is to decompose the image into high and low-frequency

03:02

segments and reconstruct it afterward.

03:05

First, we produce an average of the initial image, making it blurry and removing high-frequency

03:11

components.

03:12

This is done using a kernel that passes through the whole image to round batches of pixels

03:17

together.

03:18

For example, if they take a 3 by 3 kernel, it would go through the whole image averaging

03:23

3 by 3 patches removing all unique values.

03:26

They are basically blurring the image by softening the edges.

03:30

Then, the difference between this blurry image and the initial image is saved to use at the

03:35

end of the algorithm to re-introduce the details, which are the high-frequency components.

03:41

This is repeated three times with bigger and bigger averaging kernels producing smaller

03:47

and smaller low-frequency versions of the image having less and less high-frequency

03:52

details.

03:53

If you remember, these low-frequency versions of the image contain information about the

03:57

colors in the image and illumination.

03:59

Indeed, they are basically just a blurred low-quality version of our image, which is

04:03

why the model is so much more efficient.

04:06

This is convenient since they are smaller versions of the image, and this is the exact

04:11

information we are trying to change when translating the image into another style.

04:16

Meaning that using these low-frequency versions is much more computationally efficient than

04:21

using the whole image directly, but they are also focused on the information we want to

04:26

change in the image, which is why the results are so great.

04:30

This lower-quality version of the image can be easily translated using an encoder-decoder,

04:35

just like any other image translation technique we previously mentioned, but since it is done

04:40

on a much lower quality image and a much smaller image,

04:43

it is exponentially faster to process.

04:47

The best thing is that the quality of the results only depends on the initially saved

04:52

high-frequency versions of the image sent as input which is not processed throughout

04:57

the whole network.

04:59

This high-frequency information is simply merged at the end of the processing with the

05:03

low-frequency image to improve the details.

05:06

Basically, it is so much faster because the researchers split the image's information

05:11

in two: low-frequency general information and detailed high-frequency information.

05:17

Then, they send only the computational-friendly part of the image, which is exactly what we

05:23

want to transform, the blurry, low-quality general style of the image, or in other words:

05:29

the low-frequency information.

05:30

Then, only fast and straightforward transformations are done on the high-frequency parts of the

05:36

image to resize them and merge them with the blurry newly-stylized image,

05:42

improving the results by adding details on all edges in the picture.

05:46

And voilà!

05:47

You have your results with a fraction of the time and computational power needed.

05:51

This is brilliant, and the code is publicly available if you would like to try it, which

05:56

is always cool!

05:58

As always, the links to the complete article and references are in the description of the

06:02

video.

06:03

Thank you for watching!