The brain of a human child is spectacularly amazing. Even in any previously unknown situation, the brain makes a decision based on its primal knowledge. Depending on the outcome, it learns and remembers the most optimal choices to be taken in that particular scenario. On a high level, this process of learning can be understood as a ’trial and error’ process, where the brain tries to maximise the occurrence of positive outcomes.
Similar is the inception of Reinforcement Learning. An ideal machine is like a child’s brain, that can remember each and every decision taken in given tasks. Likewise, the goal is to try and optimise the results. In Reinforcement Learning, the learner isn’t told which action to take, but is instead made to try and discover actions that would yield maximum reward. In the most interesting and challenging cases, actions may not only affect the immediate reward, but also impact the next situation and all subsequent rewards. These two characteristics: ‘trial and error search’ and ‘delayed reward’ are the most distinguishing features of reinforcement learning.
Many of us must have heard about the famous Alpha Go, built by Google using Reinforcement Learning. This machine has even beaten the world champion Lee Sudol in the abstract strategy board game of Go! Elon Musk in a famous debate on AI with Jack Ma, explained how machines are becoming smarter than humans. Reinforcement Learning is definitely one of the areas where machines have already proven their capability to outsmart humans.
Reinforcement Learning can be understood by an example of video games. A typical video game usually consists of:
Fig: A Video Game Analogy of Reinforcement Learning
An agent (player) who moves around doing stuffAn environment that the agent exists in (map, room). An action that the agent takes (moves upward one space, sells cloak). A reward that the agent acquires (coins, killing other players). A state that the agent currently exists in (on a particular square of a map, part of a room). A goal that the agent may have (level up, getting as many rewards as possible)
The agent basically runs through sequences of state-action pairs in the given environment, observing the rewards that result, to figure out the best path for the agent to take in order to reach the goal.
The Markov decision process lays the foundation stone for Reinforcement Learning and formally describes an observable environment. There are two important parts of Reinforcement Learning:
There are numerous application areas of Reinforcement Learning. Starting from robotics and games to self-driving cars, Reinforcement Learning has found applications in many areas. Famous researchers in the likes of Andrew Ng, Andrej Karpathy and David Silverman are betting big on the future of Reinforcement Learning.
It seems till date that the idea of outsmarting humans in every field is farfetched. But the seed has been sown and companies like Google and Tesla have shown that if machines and humans work together, the future has many opportunities to offer. As far as Reinforcement Learning is concerned, we at Sigmoid are excited about its future and its game changing applications.
About
Abhijeet is a Data Scientist at Sigmoid. He mainly works in the domain of Recommendation Engines, Time Series Forecasting, Reinforcement Learning and Computer Vision.
Previously published here.