Before you go, check out these stories!

Hackernoon logoReinforcement Learning Part 0 by@init_27

Reinforcement Learning Part 0

Author profile picture

@init_27Sanyam Bhutani

๐Ÿ‘จโ€๐Ÿ’ป H2Oai ๐ŸŽ™ CTDS.Show & CTDS.News ๐Ÿ‘จโ€๐ŸŽ“ ๐ŸŽฒ Kaggle 3x Expert

You can find me on Twitter @bhutanisanyam1, connect with me on Linkedin here

The series will be in the form of a deep dive into the code with explanations and walkthroughs along side.

Next up will be a series of Posts that will take you through a few concepts in the most beginner appealing way that we can put up.

So What exactly is Reinforcement Learning?

You will receive a full Mathematically and Programmatically sound answer in the series, but here is a fun one to begin with.

Imagine youโ€™re in a Bakery and have been told to bake a Delicious cake by your Supervisor.

Your Supervisor is rather a strict person who leaves you to discover the best recipe. However, since the Supervisor hates you, she will thrash you every time you bake a bad cake (Probably not the best place to work at).

Now, youโ€™re a smart kid! You start out an experiment. You keep a track of your Performance and the taste of every attempt.

Your end goal is to impress your Supervisor (maximise your reward). You start out as an inexperienced person. You play around the Bakery (Your environment) and keep trying until you finally impress your Supervisor (Reward)

You start out by adding Salt, by burning down a few things and get Thrashed every time you do so (Receive a penalty) and since youโ€™re smart, you make sure you donโ€™t do this again (Keep a track of previous moves).

In the end you finally get โ€˜Trainedโ€™ once youโ€™ve baked the Best Cake and received your highest goal.

So this is how RL works.

  • There is an Agent: You.
  • In an Environment: Bakery.
  • Whose goal is to maximise the Reward: Receive an appraisal from the Supervisor.
  • The agent keeps getting feedback from the Environment: The Supervisor.
  • A positive Feedback for a right move and a Negative feedback for a wrong step.
  • The Agent has a Memory of itโ€™s previous actions, based on which it Learns.
  • The Agent keeps interacting until the Reward is Maximised.
Subscribe to my Newsletter for a Weekly Curated List of Deep Learning and Computer Vision


Join Hacker Noon

Create your free account to unlock your custom reading experience.