Hackernoon logoDeciphering Reinforced Learning for Hybrid Controls in Robots by@sachin-devmurari

Deciphering Reinforced Learning for Hybrid Controls in Robots

Sachin Devmurari Hacker Noon profile picture

@sachin-devmurariSachin Devmurari

Experienced Content writer with SEO content experience. Web content writing and article writing

Recently, when I was watching the Terminator: Dark Fate (By the way, I was disappointed with the whole reboot kind of thing, for me, Judgement Day was the ultimate Terminator movie).

Anyways back to our discussion, the movie made me felt that filmmakers, writers, and even some journalists put the robots in a bd light like they are some virus out to kill us all. 


But, the fact is the opposite of what we are being shown. Robots have been assisting us for quite long in the industrial upliftment. There is a whole industry of robotics that is rising high over the past few years. The global robotics market is growing at a compound annual growth rate of 26% and will reach revenues of $210 billion by 2025. So, you can understand, why I am saying that Robots are not so bad, they can be profitable too!


The basic idea here is that Robots are here to help, and we are going to discuss a method to make them more efficient at that. So, let’s begin!

Reinforcement Learning: 

Machine learning is a part of AI and uses algorithms to train the machines to aggregate, analyze, and predict data patterns. There are three types of algorithm learning methods used in the Artificial Intelligence paradigm. They are.

  1. Supervised learning
  2. Unsupervised learning
  3. Reinforcement learning

Supervised learning is a human way of mentoring machines with data patterns. Unsupervised learning explores self-learning and allows the machine to learn by itself. Reinforcement learning is like placing a machine to play the game of life. They are trained to act in a specific environment, with pre-requisites to handle the situations. 


Let’s take an example of autonomous trucks. Elon Musk is a true revolutionary, and when he is not planning to sen men on mars, he is inventing an autonomous truck called Semi. A self-driving vehicle needs to predict the way around traffic, proper speed on different turf, and destinations of the delivery. All this seems easy on the paper, but what if a car skips the lane and come towards the self-driving truck, that is when reinforcement learning can help machines train for. 

Continuous Hybrid Controls in Robots:

Robots use grippers, and other endpoint tools to perform different tasks. In robotics, there are two kinds of robot action.

  • Continuous Actions- analog outputs, torques or velocities
  • Discrete Actions- control modes, gear switching, or discrete valves.

The actions performed by th robots are powered through the servo motors. Two of the most popular types of servo motors used brush, and brushless. All the controls in robots, either they are welding steel sheets in industries or spray painting your next sports car are programmed through modules with AI capabilities. 

A hybrid control merges the continuous and discrete actions for optimal endpoint function in robots. Using the same algorithmic model of reinforcement learning chooses between continuous and discretization of the actions during an industrial process more reliable. 

Hybrid MPOs:

Here, we are going to consider a reinforcement learning with a hybrid agent in the Markov Decision Process or MDP. The entire RL model is based on the Maximum aposteriori Policy Optimisation (MPO). It differs from the conventional formulations of the Reinforcement Learning algorithms, where the aim is to find a trajectory that can maximize the result. 

While MPO explores a paradigm, where inference formulations are used. They start by distributing the data over trajectories and create a relatable outcome. Then, estimate an optimized distribution over the trajectories consistent with the results. 

If you are a Person of Interest TV series fan, you will understand it easily. Remember the episode where AI machine in the series predicts more than a thousand ways to ultimately reach the same outcome? Here, the RL model for hybrid MPO works on a similar framework. 

Execution of Hybrid MPO for Continuous-Hybrid Controls:

Every robotic action, whether continuous or discrete is controlled through programs written in machine language interpreted through a processor in the robotic system that converts the codes into mechanical energy through servo motors.

Here, the programmed data is accessible through APIs or Application Programming Interface. It is a set of protocols that dictate data access, authorizations, and validations across different platforms. But, before the RL model provides a command program through APIs or interfaces to robots, there needs to be an execution of hybrid policies.

A hybrid policy integrates continuous and discrete actions to create asynchronous hybrid control. It provides the optimal reward for formulations. Let’s take an example of drilling a hole in the steel plate.

A robot needs to drill a hole of 0.75 mm into a high gauge steel plate. Now, there are two types of actions here. One is to create a forward push for the drill tool that comes through the continuous action of torque/velocity. 

While the other is to switch gears to reach that modulated torque for the safety of the tool, which is a discrete action. Too much velocity can kill the tool by overheating.

So, the hybrid MPO executes a hybrid policy exposing multiple “modes” to the agent. So, the robot can select the correct policy with continuous and discrete action. 


Robotics has been evolving for quite some time. The dream of industry 4.0 is already here, and we are seeing new advances in robotic automation. Here, I have tried to decipher the RL model and its application on robot controls. It is an amazing advancement into the automated industrial robotics and one which will help us create efficient processes.


Join Hacker Noon

Create your free account to unlock your custom reading experience.