This AI by Intel Makes Grand Theft Auto 5 Look Photorealistic

Written by whatsai | Published 2021/05/28
Tech Story Tags: artificial-intelligence | ai | future-of-ai | ai-trends | ai-applications | gans | game-development | hackernoon-top-story | web-monetization

TLDR Video Transcript: Is AI the future of video game design? This video is about a paper called Enhancing Photorealism Enhancement. EPE is an AI that can be applied live to the video game and transform every frame to look much more natural. You can find out more about it below. The video Transcript: Do you think this is the future for video games? Read the video below to see how it works in the video Transcript. At the bottom of the video, we ask you to ask if you think it's the future.via the TL;DR App

Is AI the future of video game design?
This video is about a paper called Enhancing Photorealism Enhancement. Enhancing Photorealism Enhancement, or EPE, is an AI that can be applied live to the video game and transform every frame to look much more natural.
You can find out more about it below.

Watch the video

References:

►Richter, Abu AlHaija, Koltun, (2021), "Enhancing Photorealism Enhancement", https://intel-isl.github.io/PhotorealismEnhancement/

Video Transcript

 00:00
What you see here is the gameplay of a very popular video game called GTA 5.
00:04
It looks super realistic, but it is still obvious that this is a video game.
00:09
Now, look at this...
00:10
No, this is not real life.
00:13
It is still the same GTA5 gameplay that went through a new model using artificial intelligence
00:18
to enhance its graphics and make it look more like the real world!
00:23
The researchers from Intel Labs just published this paper called Enhancing Photorealism Enhancement.
00:29
And if you think that this may be "just another GAN," taking a picture of the video game as
00:35
an input and changing it following the style of the natural world, let me change your mind.
00:40
They worked on this model for two years to make it extremely robust.
00:44
It can be applied live to the video game and transform every frame to look much more natural.
00:50
Just imagine the possibilities where you can put a lot less effort into the game graphic,
00:54
make it super stable and complete, then improve the style using this model.
00:59
I think this is a massive breakthrough for video games, and this is just the first paper
01:03
attacking this same task applied specifically to video games!
01:07
I want to ask you a question that you can already answer or wait until the end of the
01:11
video to answer: Do you think this is the future of video games?
01:16
If you want more time to answer, that's perfect, let's get into this technique.
01:21
In general, this task is called image-to-image translation.
01:24
You take an image and transform it into another, often using GANs, as I covered numerous times
01:30
in my previous videos.
01:31
If you want an overview of how a typical GAN architecture works, I invite you to check
01:35
out this video appearing on the top right corner as I won't get into the details of
01:40
how it works here.
01:42
As I said earlier, this model is different than basic image-to-image translation as it
01:47
uses the fact that it is applied to a video game.
01:50
This is of enormous importance here as video games have much more information than a simple
01:55
picture, so why make the task more complicated by achieving realistic transformations using
02:01
only the snapshot as input?
02:03
Instead, they use much more information already available for each image of the game like
02:08
the surface normals, depth information, materials, transparency, lighting, and even a segmentation
02:15
map which tells you what and where the objects are.
02:18
I'm sure you can already see how all this additional information can help with this
02:22
task.
02:23
All these images are sent to a first network called the G-buffer Encoder.
02:29
This G-buffer encoder takes all this information, sends it into a classic convolutional network
02:34
independently to extract and condense all the valuable information from these different
02:40
versions of the initial image.
02:42
This is done using multiple residual blocks, as you can see here, which is basically just
02:47
a convolutional neural network architecture, and more precisely, a ResNet architecture.
02:52
The information is extracted at multiple steps, as you can see.
02:56
This is done to obtain information at different stages of the process.
03:00
Early information is vital in this task because it gives more information regarding the spatial
03:05
location and has the smaller details information.
03:08
In comparison, deeper information is essential to understand the overall image and its style.
03:14
A combination of both early and deep information is thus very powerful when used correctly!
03:20
Then, all this information here, referred to as the G-buffer features, is sent to another
03:25
network with the original image from the game called the rendered image.
03:30
You can see the different colors representing the G-buffer information extracted from different
03:35
scales as we previously saw, with the gray arrow showing the process for the actual image.
03:40
Here again, you can see this as an enhanced version of the same residual blocks as for
03:46
the g-buffer encoder repeated multiple times, but with a little tweak to better adapt the
03:51
G-buffer information before being added to the process.
03:54
This is done using what they refer to as RAD here, which is again residual blocks, convolutions,
04:00
and normalization.
04:02
As I mentioned, this architecture is a bit more complicated than a simple encoder-decoder
04:06
architecture like regular GAN.
04:08
Similarly, the training process is also more elaborated.
04:12
Here, you can see two metrics, the realism score, and the LPIPS score.
04:17
The realism score is basically the GAN section of the training process.
04:22
It compares both a similar real-world image to a game image and compares the real image
04:28
to an enhanced game image.
04:29
Helping the model to learn how to produce a realistic and enhanced version of the game
04:34
image sent.
04:36
Whereas this LPIPS component is a known loss used to retain the structure of the rendered
04:41
image as much as possible.
04:43
This is achieved by giving a score based on the difference between the associated pixels
04:48
of the rendered image versus the enhanced image.
04:51
Penalizing the network when it generates a new image that spatially differs from the
04:56
original image.
04:58
So both these metrics work together to improve the overall results during the training of
05:03
this algorithm.
05:04
Of course, as always, you need a large enough dataset of the real world and of the game
05:10
as it won't generate something that the model has never seen before.
05:14
And now, do you think this kind of model is the future of video games?
05:18
Has your opinion changed after seeing this video?
05:21
As always, the references are linked in the description below, and the full article is
05:25
available on my website louisbouchard.ai with more information.
05:29
Thank you for watching!         



Written by whatsai | I explain Artificial Intelligence terms and news to non-experts.
Published by HackerNoon on 2021/05/28