You've probably heard that the best players of games like Chess, Go, and even video games like DOTA, are actually AI players. Data scientists have created AI that can beat the best human players in these games, thanks to a technique called reinforcement learning. In this video, you can discover how machines become super-human in most domains by using reinforcement learning. This is a video submission for the IJCAI-21 AI Video Competition. Watch the video References The article: Malrick Costantini, Elias Ilmari, Louis-François Bouchard, Malrick, Elias, Louis, ►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/reinforc... By: http://www.malrickcostantini.com/ https://nordicgrit.com/ https://www.louisbouchard.ai/ Connect with us https://www.linkedin.com/in/malrick-c... https://www.linkedin.com/in/eliasliin... https://www.linkedin.com/in/whats-ai/ https://www.louisbouchard.ai/newsletter/ Video Transcript 00:00 You have probably heard that the world champion of chess, go, and even some video games like 00:04 Dota is a machine. 00:06 Recent progress in Artificial Intelligence allowed researchers to defeat the best human 00:10 players in the world in these games, thanks to a technique called Reinforcement Learning. 00:15 This same technique also allowed robots to walk, open doors, or even play soccer. 00:20 But what is this technique exactly? 00:22 This short article aims to introduce the basics of this technology and provide an overview 00:26 of how it works. 00:28 Reinforcement Learning is a technology inspired by living beings. 00:31 Living beings, in general, are learning certain behaviors to obtain rewards or avoid punishment. 00:37 If you are eating something tasty, you may want to eat it again. 00:40 If you are touching a hot stove, it is quite likely that you’re not going to want to 00:44 do it again. 00:45 Reinforcement Learning is about doing the same thing: teaching machines how to obtain 00:48 positive rewards and avoid negative rewards. 00:51 We call these machines “agents.” 00:53 These agents evolve in an environment. 00:55 They are going to observe this environment and take action based on these observations. 01:00 Depending on the result of their actions, they will be given a reward, either positive 01:05 or negative. 01:06 At first, the agent will behave randomly, but it will become better and better through 01:10 trial and error. 01:11 In other words, they are learning to maximize the amount of reward they are getting throughout 01:16 their life. 01:17 Let’s have a look at a simple example. 01:19 You are on an imaginary line, with a cake ready to eat on one side and a burning fire 01:24 camp on the other side. 01:25 What would you do in this situation? 01:27 Typically, your answer would be to walk straight for the cake. 01:30 Otherwise, you will get hurt by walking in the fire camp. 01:32 But how is a computer going to know this and learn the same decision-making process? 01:37 Through trial and error! 01:39 As we discussed, at first, the agent is going to behave randomly. 01:43 Half of the time, it will go to the left and the other half to the right. 01:46 But at one moment, it will reach one of the rewards, either positive and negative. 01:51 At this moment, the agent learns that going to the left hurts, or in reverse, if it was 01:56 lucky enough, it learns how great a cake tastes. 01:58 That’s it! 02:00 Once it learns about these rewards, it can have optimal behavior in this environment 02:04 and go directly straight to the cake each time. 02:06 This is a simple example, as the only possibility of the agent is to go right or left. 02:11 However, usually, it would have many more possible paths. 02:15 Even if it already found a good reward in such a complicated environment, it needs to 02:19 keep looking for better rewards. 02:21 In other words, maybe a bigger cake is waiting for us over the next corner, so from time 02:26 to time, we need to take the chance and have a look. 02:29 We can make a comparison with the real world. 02:31 If you’re like me, you are used to ordering the same pizza at the same pizzeria regularly, 02:36 but what if you try a new one once in a while? 02:39 You may appreciate it even more and decide that it’s your new favorite. 02:42 You would’ve never discovered this improvement without trying something new, even if you 02:47 already enjoyed the taste of the first one. 02:49 Of course, not every scenario is as simple, but in Reinforcement Learning, every problem 02:54 can be seen this way. 02:56 The only change with each new challenge the agent will face is the kind of environment 03:00 it will evolve in. 03:01 Whether it is a chessboard, a video game, or even the motors’ states of a robot learning 03:06 how to walk, the logic is the same: the agent tries things, sees how the environment reacts 03:12 to his actions, and adapts to do better in the future. 03:14 You can see reinforcement learning as machines learning in a Darwinism way. 03:15 If this makes sense to you, congratulations! 03:18 You now understand what reinforcement learning is and how it works. 03:21 There are, of course, more technical details, but this is the gist of this astonishing technique, 03:26 with amazing capabilities.

Super

OpenAI's New Code Generator: GitHub Copilot (and Codex)

How AI Can Spot Wildfires Faster Than Humans 

Watch more on YouTube: https://www.youtube.com/c/WhatsAI

2021 - HackerNoon Contributor of the Year - DEEP-LEARNING

2021 - HackerNoon Contributor of the Year - FACEBOOK

Nominated for 2022 - Best Data Science Newsletter

Nominated for 2022 - HackerNoon Contributor of the Year - Artificial Intelligence

Nominated for 2022 - Top Tech Youtuber

Nominated for 2022 - HackerNoon Contributor of the Year - Innovation

Nominated for 2022 - HackerNoon Contributor of the Year - Data Science

Nominated for 2022 - HackerNoon Contributor of the Year - Natural Language Processing

How Machines Beat Humans at Everything

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

3D Articulated Shape Reconstruction from Videos

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

3D Articulated Shape Reconstruction from Videos

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps