You've probably heard that the best players of games like Chess, Go, and even video games like DOTA, are actually AI players. Data scientists have created AI that can beat the best human players in these games, thanks to a technique called reinforcement learning.
In this video, you can discover how machines become super-human in most domains by using reinforcement learning. This is a video submission for the IJCAI-21 AI Video Competition.
The article: https://www.louisbouchard.ai/reinforc...
By:
Malrick Costantini, http://www.malrickcostantini.com/
Elias Ilmari, https://nordicgrit.com/
Louis-François Bouchard, https://www.louisbouchard.ai/
Connect with us
Malrick, https://www.linkedin.com/in/malrick-c...
Elias, https://www.linkedin.com/in/eliasliin...
Louis, https://www.linkedin.com/in/whats-ai/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/
00:00
You have probably heard that the world champion of chess, go, and even some video games like
00:04
Dota is a machine.
00:06
Recent progress in Artificial Intelligence allowed researchers to defeat the best human
00:10
players in the world in these games, thanks to a technique called Reinforcement Learning.
00:15
This same technique also allowed robots to walk, open doors, or even play soccer.
00:20
But what is this technique exactly?
00:22
This short article aims to introduce the basics of this technology and provide an overview
00:26
of how it works.
00:28
Reinforcement Learning is a technology inspired by living beings.
00:31
Living beings, in general, are learning certain behaviors to obtain rewards or avoid punishment.
00:37
If you are eating something tasty, you may want to eat it again.
00:40
If you are touching a hot stove, it is quite likely that you’re not going to want to
00:44
do it again.
00:45
Reinforcement Learning is about doing the same thing: teaching machines how to obtain
00:48
positive rewards and avoid negative rewards.
00:51
We call these machines “agents.”
00:53
These agents evolve in an environment.
00:55
They are going to observe this environment and take action based on these observations.
01:00
Depending on the result of their actions, they will be given a reward, either positive
01:05
or negative.
01:06
At first, the agent will behave randomly, but it will become better and better through
01:10
trial and error.
01:11
In other words, they are learning to maximize the amount of reward they are getting throughout
01:16
their life.
01:17
Let’s have a look at a simple example.
01:19
You are on an imaginary line, with a cake ready to eat on one side and a burning fire
01:24
camp on the other side.
01:25
What would you do in this situation?
01:27
Typically, your answer would be to walk straight for the cake.
01:30
Otherwise, you will get hurt by walking in the fire camp.
01:32
But how is a computer going to know this and learn the same decision-making process?
01:37
Through trial and error!
01:39
As we discussed, at first, the agent is going to behave randomly.
01:43
Half of the time, it will go to the left and the other half to the right.
01:46
But at one moment, it will reach one of the rewards, either positive and negative.
01:51
At this moment, the agent learns that going to the left hurts, or in reverse, if it was
01:56
lucky enough, it learns how great a cake tastes.
01:58
That’s it!
02:00
Once it learns about these rewards, it can have optimal behavior in this environment
02:04
and go directly straight to the cake each time.
02:06
This is a simple example, as the only possibility of the agent is to go right or left.
02:11
However, usually, it would have many more possible paths.
02:15
Even if it already found a good reward in such a complicated environment, it needs to
02:19
keep looking for better rewards.
02:21
In other words, maybe a bigger cake is waiting for us over the next corner, so from time
02:26
to time, we need to take the chance and have a look.
02:29
We can make a comparison with the real world.
02:31
If you’re like me, you are used to ordering the same pizza at the same pizzeria regularly,
02:36
but what if you try a new one once in a while?
02:39
You may appreciate it even more and decide that it’s your new favorite.
02:42
You would’ve never discovered this improvement without trying something new, even if you
02:47
already enjoyed the taste of the first one.
02:49
Of course, not every scenario is as simple, but in Reinforcement Learning, every problem
02:54
can be seen this way.
02:56
The only change with each new challenge the agent will face is the kind of environment
03:00
it will evolve in.
03:01
Whether it is a chessboard, a video game, or even the motors’ states of a robot learning
03:06
how to walk, the logic is the same: the agent tries things, sees how the environment reacts
03:12
to his actions, and adapts to do better in the future.
03:14
You can see reinforcement learning as machines learning in a Darwinism way.
03:15
If this makes sense to you, congratulations!
03:18
You now understand what reinforcement learning is and how it works.
03:21
There are, of course, more technical details, but this is the gist of this astonishing technique,
03:26
with amazing capabilities.