This story draft by @escholar has not been reviewed by an editor, YET.

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning: Related Work

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Authors:

(1) Kinjal Basu, IBM Research;

(2) Keerthiram Murugesan, IBM Research;

(3) Subhajit Chaudhury, IBM Research;

(4) Murray Campbell, IBM Research;

(5) Kartik Talamadupula, Symbl.ai;

(6) Tim Klinger, IBM Research.

Table of Links

Abstract and 1 Introduction

2 Background

3 Symbolic Policy Learner

3.1 Learning Symbolic Policy using ILP

3.2 Exception Learning

4 Rule Generalization

4.1 Dynamic Rule Generalization

5 Experiments and Results

5.1 Dataset

5.2 Experiments

5.3 Results

6 Related Work

7 Future Work and Conclusion, Limitations, Ethics Statement, and References

6 Related Work

Text-based Reinforcement Learning: TBGs have recently emerged as promising environments for studying grounded language understanding and have drawn significant research interest. Zahavy et al. (2018) introduced the Action-Elimination Deep Q-Network (AE-DQN), which learns to predict invalid actions in the text-adventure game Zork. Côté et al. (2018) designed TextWorld, a sandbox learning environment for training and evaluating RL agents on text-based games. Building on this, Murugesan et al. (2021a) introduced TWC, a set of games requiring agents with commonsense knowledge. The LeDeepChef system (Adolphs and Hofmann, 2019) achieved good results on the First TextWorld Problems (Trischler et al., 2019) by supervising the model with entities from FreeBase, allowing the agent to generalize to unseen objects. A recent line of work learns symbolic (typically graph-structured) representations of the agent’s belief. Notably, Ammanabrolu and Riedl (2019) proposed KG-DQN and Adhikari et al. (2020b) proposed GATA. The following instruction for TBGs paper (Tuli et al., 2022), which was also focused on the TW-Cooking domain, assumes a lot about the game environment and provides many manual instructions to the agent. In our work, EXPLORER automatically learns the rules in an online manner.


Symbolic Rule Learning Approaches: Learning symbolic rules using inductive logic programming has a long history of research. After the success of ASP, many works have emerged that are capable of learning non-monotonic logic programs, such as FOLD (Shakerin et al., 2017), ILASP (Law et al., 2014), XHAIL (Ray, 2009), ASPAL (Corapi et al., 2011), etc. However, there are not many efforts that have been taken to lift the rules to their generalized version and then learn exceptions. Also, they do not perform well on noisy data. To tackle this issue, there are efforts to combine ILP with differentiable programming (Evans and Grefenstette, 2018; Rocktäschel and Riedel, 2017). However, it requires lots of data to be trained on. In our work, we use a simple information gain based inductive learning approach, as the EXPLORER learns the rules after each episode with a very small amount of examples (sometimes with zero negative examples).


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks