I explain Artificial Intelligence terms and news to non-experts.
Is AI the future of video game design?
This video is about a paper called Enhancing Photorealism Enhancement. Enhancing Photorealism Enhancement, or EPE, is an AI that can be applied live to the video game and transform every frame to look much more natural.
You can find out more about it below.
►Read the full article: https://www.louisbouchard.ai/the-future-of-video-game
►Richter, Abu AlHaija, Koltun, (2021), "Enhancing Photorealism Enhancement", https://intel-isl.github.io/PhotorealismEnhancement/
What you see here is the gameplay of a very popular video game called GTA 5.
It looks super realistic, but it is still obvious that this is a video game.
Now, look at this...
No, this is not real life.
It is still the same GTA5 gameplay that went through a new model using artificial intelligence
to enhance its graphics and make it look more like the real world!
The researchers from Intel Labs just published this paper called Enhancing Photorealism Enhancement.
And if you think that this may be "just another GAN," taking a picture of the video game as
an input and changing it following the style of the natural world, let me change your mind.
They worked on this model for two years to make it extremely robust.
It can be applied live to the video game and transform every frame to look much more natural.
Just imagine the possibilities where you can put a lot less effort into the game graphic,
make it super stable and complete, then improve the style using this model.
I think this is a massive breakthrough for video games, and this is just the first paper
attacking this same task applied specifically to video games!
I want to ask you a question that you can already answer or wait until the end of the
video to answer: Do you think this is the future of video games?
If you want more time to answer, that's perfect, let's get into this technique.
In general, this task is called image-to-image translation.
You take an image and transform it into another, often using GANs, as I covered numerous times
in my previous videos.
If you want an overview of how a typical GAN architecture works, I invite you to check
out this video appearing on the top right corner as I won't get into the details of
how it works here.
As I said earlier, this model is different than basic image-to-image translation as it
uses the fact that it is applied to a video game.
This is of enormous importance here as video games have much more information than a simple
picture, so why make the task more complicated by achieving realistic transformations using
only the snapshot as input?
Instead, they use much more information already available for each image of the game like
the surface normals, depth information, materials, transparency, lighting, and even a segmentation
map which tells you what and where the objects are.
I'm sure you can already see how all this additional information can help with this
All these images are sent to a first network called the G-buffer Encoder.
This G-buffer encoder takes all this information, sends it into a classic convolutional network
independently to extract and condense all the valuable information from these different
versions of the initial image.
This is done using multiple residual blocks, as you can see here, which is basically just
a convolutional neural network architecture, and more precisely, a ResNet architecture.
The information is extracted at multiple steps, as you can see.
This is done to obtain information at different stages of the process.
Early information is vital in this task because it gives more information regarding the spatial
location and has the smaller details information.
In comparison, deeper information is essential to understand the overall image and its style.
A combination of both early and deep information is thus very powerful when used correctly!
Then, all this information here, referred to as the G-buffer features, is sent to another
network with the original image from the game called the rendered image.
You can see the different colors representing the G-buffer information extracted from different
scales as we previously saw, with the gray arrow showing the process for the actual image.
Here again, you can see this as an enhanced version of the same residual blocks as for
the g-buffer encoder repeated multiple times, but with a little tweak to better adapt the
G-buffer information before being added to the process.
This is done using what they refer to as RAD here, which is again residual blocks, convolutions,
As I mentioned, this architecture is a bit more complicated than a simple encoder-decoder
architecture like regular GAN.
Similarly, the training process is also more elaborated.
Here, you can see two metrics, the realism score, and the LPIPS score.
The realism score is basically the GAN section of the training process.
It compares both a similar real-world image to a game image and compares the real image
to an enhanced game image.
Helping the model to learn how to produce a realistic and enhanced version of the game
Whereas this LPIPS component is a known loss used to retain the structure of the rendered
image as much as possible.
This is achieved by giving a score based on the difference between the associated pixels
of the rendered image versus the enhanced image.
Penalizing the network when it generates a new image that spatially differs from the
So both these metrics work together to improve the overall results during the training of
Of course, as always, you need a large enough dataset of the real world and of the game
as it won't generate something that the model has never seen before.
And now, do you think this kind of model is the future of video games?
Has your opinion changed after seeing this video?
As always, the references are linked in the description below, and the full article is
available on my website louisbouchard.ai with more information.
Thank you for watching!
Create your free account to unlock your custom reading experience.