This AI by Intel Makes Grand Theft Auto 5 Look Photorealistic by@whatsai

This AI by Intel Makes Grand Theft Auto 5 Look Photorealistic

Video Transcript: Is AI the future of video game design? This video is about a paper called Enhancing Photorealism Enhancement. EPE is an AI that can be applied live to the video game and transform every frame to look much more natural. You can find out more about it below. The video Transcript: Do you think this is the future for video games? Read the video below to see how it works in the video Transcript. At the bottom of the video, we ask you to ask if you think it's the future.
image
Louis Bouchard Hacker Noon profile picture

Louis Bouchard

I explain Artificial Intelligence terms and news to non-experts.

Is AI the future of video game design?

This video is about a paper called Enhancing Photorealism Enhancement. Enhancing Photorealism Enhancement, or EPE, is an AI that can be applied live to the video game and transform every frame to look much more natural.

You can find out more about it below.

Watch the video

References:

►Read the full article: https://www.louisbouchard.ai/the-future-of-video-game

►Richter, Abu AlHaija, Koltun, (2021), "Enhancing Photorealism Enhancement", https://intel-isl.github.io/PhotorealismEnhancement/

Video Transcript

 00:00

What you see here is the gameplay of a very popular video game called GTA 5.

00:04

It looks super realistic, but it is still obvious that this is a video game.

00:09

Now, look at this...

00:10

No, this is not real life.

00:13

It is still the same GTA5 gameplay that went through a new model using artificial intelligence

00:18

to enhance its graphics and make it look more like the real world!

00:23

The researchers from Intel Labs just published this paper called Enhancing Photorealism Enhancement.

00:29

And if you think that this may be "just another GAN," taking a picture of the video game as

00:35

an input and changing it following the style of the natural world, let me change your mind.

00:40

They worked on this model for two years to make it extremely robust.

00:44

It can be applied live to the video game and transform every frame to look much more natural.

00:50

Just imagine the possibilities where you can put a lot less effort into the game graphic,

00:54

make it super stable and complete, then improve the style using this model.

00:59

I think this is a massive breakthrough for video games, and this is just the first paper

01:03

attacking this same task applied specifically to video games!

01:07

I want to ask you a question that you can already answer or wait until the end of the

01:11

video to answer: Do you think this is the future of video games?

01:16

If you want more time to answer, that's perfect, let's get into this technique.

01:21

In general, this task is called image-to-image translation.

01:24

You take an image and transform it into another, often using GANs, as I covered numerous times

01:30

in my previous videos.

01:31

If you want an overview of how a typical GAN architecture works, I invite you to check

01:35

out this video appearing on the top right corner as I won't get into the details of

01:40

how it works here.

01:42

As I said earlier, this model is different than basic image-to-image translation as it

01:47

uses the fact that it is applied to a video game.

01:50

This is of enormous importance here as video games have much more information than a simple

01:55

picture, so why make the task more complicated by achieving realistic transformations using

02:01

only the snapshot as input?

02:03

Instead, they use much more information already available for each image of the game like

02:08

the surface normals, depth information, materials, transparency, lighting, and even a segmentation

02:15

map which tells you what and where the objects are.

02:18

I'm sure you can already see how all this additional information can help with this

02:22

task.

02:23

All these images are sent to a first network called the G-buffer Encoder.

02:29

This G-buffer encoder takes all this information, sends it into a classic convolutional network

02:34

independently to extract and condense all the valuable information from these different

02:40

versions of the initial image.

02:42

This is done using multiple residual blocks, as you can see here, which is basically just

02:47

a convolutional neural network architecture, and more precisely, a ResNet architecture.

02:52

The information is extracted at multiple steps, as you can see.

02:56

This is done to obtain information at different stages of the process.

03:00

Early information is vital in this task because it gives more information regarding the spatial

03:05

location and has the smaller details information.

03:08

In comparison, deeper information is essential to understand the overall image and its style.

03:14

A combination of both early and deep information is thus very powerful when used correctly!

03:20

Then, all this information here, referred to as the G-buffer features, is sent to another

03:25

network with the original image from the game called the rendered image.

03:30

You can see the different colors representing the G-buffer information extracted from different

03:35

scales as we previously saw, with the gray arrow showing the process for the actual image.

03:40

Here again, you can see this as an enhanced version of the same residual blocks as for

03:46

the g-buffer encoder repeated multiple times, but with a little tweak to better adapt the

03:51

G-buffer information before being added to the process.

03:54

This is done using what they refer to as RAD here, which is again residual blocks, convolutions,

04:00

and normalization.

04:02

As I mentioned, this architecture is a bit more complicated than a simple encoder-decoder

04:06

architecture like regular GAN.

04:08

Similarly, the training process is also more elaborated.

04:12

Here, you can see two metrics, the realism score, and the LPIPS score.

04:17

The realism score is basically the GAN section of the training process.

04:22

It compares both a similar real-world image to a game image and compares the real image

04:28

to an enhanced game image.

04:29

Helping the model to learn how to produce a realistic and enhanced version of the game

04:34

image sent.

04:36

Whereas this LPIPS component is a known loss used to retain the structure of the rendered

04:41

image as much as possible.

04:43

This is achieved by giving a score based on the difference between the associated pixels

04:48

of the rendered image versus the enhanced image.

04:51

Penalizing the network when it generates a new image that spatially differs from the

04:56

original image.

04:58

So both these metrics work together to improve the overall results during the training of

05:03

this algorithm.

05:04

Of course, as always, you need a large enough dataset of the real world and of the game

05:10

as it won't generate something that the model has never seen before.

05:14

And now, do you think this kind of model is the future of video games?

05:18

Has your opinion changed after seeing this video?

05:21

As always, the references are linked in the description below, and the full article is

05:25

available on my website louisbouchard.ai with more information.

05:29

Thank you for watching!         



Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.