Introduction As a human, we learn from our experiences. And experience is gained with prolonged interactions with our environment. Be it designing, planning, playing, developing, or coding, the longer we indulge in these activities, the better we become. And as it popularly goes, Practice makes a man perfect. Inspired by this trait of human learning, a separate branch of machine learning has emerged. This branch of machine learning, focusing on learning with interaction, is called Reinforcement Learning. From playing simple games to Self-driving vehicles, Reinforcement learning has found its utility in a wide range of applications. Reinforcement Learning To put in a single statement - “Reinforcement Learning is learning what to do, in an uncertain environment, as to maximize a reward signal“. Let us break this statement to understand it better. Reinforcement learning is goal-based learning. This means that any learning problem in the reinforcement setting is set to achieve a certain goal. Now this goal can be to win a game of chess or balance a pole. Reinforcement Learning aims to excel at this single task. “Learning what to do” - Any Reinforcement Learning problem is set in an uncertain environment. With more and more interactions, the uncertainty of the environment is reduced. With better information about the environment, the learning agent can make an informed decision. A reward signal evaluates the likelihood of any environment. “In an uncertain environment“ - For any action that the learning agent takes, it receives feedback. The feedback informs the agent about the action performed. An action that brings the state of the environment to a more acceptable state gives a higher reward. “Maximizing a reward signal“ - Thus, the learning agent interacts with its environment to maximize a reward signal. Reward signal is chosen in such a way that the task of maximizing the reward signal aligns with the goal of the agent. Agent-Environment interaction is a cycle of agents sensing the environment, taking appropriate action, and receiving rewards from the environment for its action. Closed-Loop Learning Problem Reinforcement learning is also a closed-loop learning problem. By a closed-loop learning problem, we mean that actions taken by the learning agent at any time impact its future decisions as well. Any action taken by a Reinforcement Learning agent changes the environment. This works as an input for further decisions of the agent. And hence we have a closed-loop problem. Explore-Exploit Dilemma Any reinforcement learning task involves an explore-exploit dilemma. To maximize the reward signal the agent exploits its knowledge. But to expand its knowledge, the agent must explore. But exploring comes at a cost of diminished rewards. But in the long run, the agent might benefit from exploring. And thus, there is a dilemma. Both exploration and exploitation can not be done without failing the task itself. The decision between exploring and exploiting is generally dependent on many factors. All these factors must be accounted for before choosing one or the other path. Let us take an example to understand it better. Suppose you are playing a game where N boxes are kept, each containing some amount of rewards (think of it as money or gold). You have K moves, and in a single move, you can open any box. You are rewarded with the worth found in the box opened by you. ( ) Your goal is to maximize accumulated rewards. Note that the box is not emptied. You will receive the same reward if you choose the box again in the next round. In the game above, you can easily experience the explore-exploit dilemma. If you know that box 1 gives you a reward of 100, would you choose it in all K moves? Or will you risk opening box 2, which might contain less or more reward? Elements of Reinforcement Learning We have been talking in abstract terms like and . But now, we will formally define some of the elements of reinforcement learning. Broadly speaking, the following are the elements of reinforcement learning: agent environment until now Agent Environment Policy Reward Signal Value Function Model of Environment is defined as anything that makes the decision of taking an action. An agent senses the environment and uses its knowledge to decide which action will provide the maximum reward signal. The ultimate goal of any Reinforcement Learning task is to make agents learn how to achieve a goal. Agent is everything outside of the agent. This does not necessarily mean the physical environment. It is more formally defined as anything over which the agent does not have arbitrary control. However, the agent can take actions that change the state of the environment. It is this state of the environment that the agent uses to make decisions. Environment is the algorithm inside the agent, which it uses to take any decision. As learning progresses, the agent learns the optimal policy which gives him the maximum reward. Policy is the feedback that the agent receives from the environment for any action it takes. It is a numerical value that the agent wants to maximize over time. A higher reward signal means a step closer to the target. Reward Signal is an estimate of total reward that can be accumulated over the future from the current state of the environment. It is much harder to estimate value function, and this is something the agent learns to predict with time. Value function Conclusion With each passing second, reinforcement learning is finding its use in more and more applications. The idea that reinforcement learning does not need a labeled data set makes it suitable for many tasks that other branches of machine learning fail to solve. With that, this article comes to an end. I hope you got to learn something new today!

Target

Fenwick Tree Explained

Divide and Conquer: Karatsuba Integer Multiplication

Nominated for 2022 - HackerNoon Contributor of the Year - Algorithms

Nominated for 2022 - HackerNoon Contributor of the Year - Computer Science

Nominated for 2022 - HackerNoon Contributor of the Year - Linux

Nominated for 2022 - HackerNoon Contributor of the Year - Engineering

Nominated for 2022 - HackerNoon Contributor of the Year - Mathematics

Too Long; Didn't Read

Reinforcement Learning: 'Practice Makes a Machine Perfect'

Reinforcement Learning: 'Practice Makes a Machine Perfect'

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

15 Databases, 15 Use Cases—Stop Using the Wrong Database for the Right Problem

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

15 Databases, 15 Use Cases—Stop Using the Wrong Database for the Right Problem

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps