Too Long; Didn't Read
Reinforcement <a href="https://hackernoon.com/tagged/learning" target="_blank">Learning</a> provides a framework for training agents to solve problems in the world. One of the limitations of these agents however is their inflexibility once trained. They are able to <strong><em>learn a policy</em></strong> to solve a specific problem (formalized as an <a href="https://en.wikipedia.org/wiki/Markov_decision_process" target="_blank">MDP</a>), but that learned policy is often useless in new problems, even relatively similar ones.