In Reinforcement Learning (RL), agents are trained on a reward and punishment mechanism. The agent is rewarded for correct moves and punished for the wrong ones. In doing so, the agent tries to minimize wrong moves and maximize the right ones.
In this article, we’ll look at some of the real-world applications of reinforcement learning.
Various papers have proposed Deep Reinforcement Learning for autonomous driving. In self-driving cars, there are various aspects to consider, such as speed limits at various places, drivable zones, avoiding collisions — just to mention a few.
Some of the autonomous driving tasks where reinforcement learning could be applied include trajectory optimization, motion planning, dynamic pathing, controller optimization, and scenario-based learning policies for highways.
For example, parking can be achieved by learning automatic parking policies. Lane changing can be achieved using Q-Learning while overtaking can be implemented by learning an overtaking policy while avoiding collision and maintaining a steady speed thereafter.
AWS DeepRacer is an autonomous racing car that has been designed to test out RL in a physical track. It uses cameras to visualize the runway and a reinforcement learning model to control the throttle and direction.
Wayve.ai has successfully applied reinforcement learning to training a car on how to drive in a day. They used a deep reinforcement learning algorithm to tackle the lane following task. Their network architecture was a deep network with 4 convolutional layers and 3 fully connected layers. The example below shows the lane following task. The image in the middle represents the driver’s perspective.
In industry reinforcement, learning-based robots are used to perform various tasks. Apart from the fact that these robots are more efficient than human beings, they can also perform tasks that would be dangerous for people.
A great example is the use of AI agents by Deepmind to cool Google Data Centers. This led to a 40% reduction in energy spending. The centers are now fully controlled with the AI system without the need for human intervention. There is obviously still supervision from data center experts. The system works in the following way:
The actions are verified by the local control system.
Supervised time series models can be used for predicting future sales as well as predicting stock prices. However, these models don’t determine the action to take at a particular stock price. Enter Reinforcement Learning (RL). An RL agent can decide on such a task; whether to hold, buy, or sell. The RL model is evaluated using market benchmark standards in order to ensure that it’s performing optimally.
This automation brings consistency into the process, unlike previous methods where analysts would have to make every single decision. IBM for example has a sophisticated reinforcement learning based platform that has the ability to make financial trades. It computes the reward function based on the loss or profit of every financial transaction.
In NLP, RL can be used in text summarization, question answering, and machine translation just to mention a few.
The authors of this paper Eunsol Choi, Daniel Hewlett, and Jakob Uszkoreit propose an RL based approach for question answering given long texts. Their method works by first selecting a few sentences from the document that are relevant for answering the question. A slow RNN is then employed to produce answers to the selected sentences.
A combination of supervised and reinforcement learning is used for abstractive text summarization in this paper. The paper is fronted by Romain Paulus, Caiming Xiong & Richard Socher. Their goal is to solve the problem faced in summarization while using Attentional, RNN-based encoder-decoder models in longer documents. The authors of this paper propose a neural network with a novel intra-attention that attends over the input and continuously generates output separately. Their training methods are a combo of standard supervised word prediction and reinforcement learning.
On the side of machine translation, authors from the University of Colorado and the University of Maryland, propose a reinforcement learning based approach to simultaneous machine translation. The interesting thing about this work is that it has the ability to learn when to trust the predicted words and uses RL to determine when to wait for more input.
Researchers from Stanford University, Ohio State University, and Microsoft Research have fronted Deep RL for use in dialogue generation. The deep RL can be used to model future rewards in a chatbot dialogue. Conversations are simulated using two virtual agents. Policy gradient methods are used to reward sequences that contain important conversation attributes such as coherence, informativity, and ease of answering.
More NLP applications can be found here.
In healthcare, patients can receive treatment from policies learned from RL systems. RL is able to find optimal policies using previous experiences without the need for previous information on the mathematical model of biological systems. It makes this approach more applicable than other control-based systems in healthcare.
RL in healthcare is categorized as dynamic treatment regimes(DTRs) in chronic disease or critical care, automated medical diagnosis, and other general domains.
In DTRs the input is a set of clinical observations and assessments of a patient. The outputs are the treatment options for every stage. These are similar to states in RL. Application of RL in DTRs is advantageous because it is capable of determining time-dependent decisions for the best treatment for a patient at a specific time.
The use of RL in healthcare also enables improvement of long-term outcomes by factoring the delayed effects of treatments.
RL has also been used for the discovery and generation of optimal DTRs for chronic diseases.
You can dive deeper into RL applications in healthcare by exploring this paper.
In the engineering frontier, Facebook has developed an open-source reinforcement learning platform — Horizon. The platform uses reinforcement learning to optimize large-scale production systems. Facebook has used Horizon internally:
Horizon also contains workflows for:
A classic example of reinforcement learning in video display is serving a user a low or high bit rate video based on the state of the video buffers and estimates from other machine learning systems.
Horizon is capable of handling production-like concerns such as:
User preferences can change frequently, therefore recommending news to users based on reviews and likes could become obsolete quickly. With reinforcement learning, the RL system can track the reader’s return behaviors.
Construction of such a system would involve obtaining news features, reader features, context features, and reader news features. News features include but are not limited to the content, headline, and publisher. Reader features refer to how the reader interacts with the content e.g clicks and shares. Context features include news aspects such as timing and freshness of the news. A reward is then defined based on these user behaviors.
Let’s look at an application in the gaming frontier, specifically AlphaGo Zero. Using reinforcement learning, AlphaGo Zero was able to learn the game of Go from scratch. It learned by playing against itself. After 40 days of self-training, Alpha Go Zero was able to outperform the version of Alpha Go known as Master that has defeated world number one Ke Jie. It only used black and white stones from the board as input features and a single neural network. A simple tree search that relies on the single neural network is used to evaluate positions moves and sample moves without using any Monte Carlo rollouts.
In this paper, the authors propose real-time bidding with multi-agent reinforcement learning. The handling of a large number of advertisers is dealt with using a clustering method and assigning each cluster a strategic bidding agent. To balance the trade-off between the competition and cooperation among advertisers, a Distributed Coordinated Multi-Agent Bidding (DCMAB) is proposed.
In marketing, the ability to accurately target an individual is very crucial. This is because the right targets obviously lead to a high return on investment. The study in this paper was based on Taobao — the largest e-commerce platform in China. The proposed method outperforms the state-of-the-art single-agent reinforcement learning approaches.
The use of deep learning and reinforcement learning can train robots that have the ability to grasp various objects — even those unseen during training. This can, for example, be used in building products in an assembly line.
This is achieved by combining large-scale distributed optimization and a variant of deep Q-Learning called QT-Opt. QT-Opt support for continuous action spaces makes it suitable for robotics problems. A model is first trained offline and then deployed and fine-tuned on the real robot.
Google AI applied this approach to robotics grasping where 7 real-world robots ran for 800 robot hours in a 4-month period.
In this experiment, the QT-Opt approach succeeds in 96% of the grasp attempts across 700 trials grasps on objects that were previously unseen. Google AI’s previous method had a 78% success rate.
Whereas reinforcement learning is still a very active research area significant progress has been made to advance the field and apply it in real life.
In this article, we have barely scratched the surface as far as application areas of reinforcement learning are concerned. Hopefully, this has sparked some curiosity that will drive you to dive in a little deeper into this area. If you want to learn more check out this awesome repo — no pun intended, and this one as well.
See also:
This article was originally written by Derrick Mwiti and posted on the Neptune blog. You can find more in-depth articles for machine learning practitioners there.