This story draft by @escholar has not been reviewed by an editor, YET.

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning: Dataset

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Authors:

(1) Kinjal Basu, IBM Research;

(2) Keerthiram Murugesan, IBM Research;

(3) Subhajit Chaudhury, IBM Research;

(4) Murray Campbell, IBM Research;

(5) Kartik Talamadupula, Symbl.ai;

(6) Tim Klinger, IBM Research.

Table of Links

Abstract and 1 Introduction

2 Background

3 Symbolic Policy Learner

3.1 Learning Symbolic Policy using ILP

3.2 Exception Learning

4 Rule Generalization

4.1 Dynamic Rule Generalization

5 Experiments and Results

5.1 Dataset

5.2 Experiments

5.3 Results

6 Related Work

7 Future Work and Conclusion, Limitations, Ethics Statement, and References

5.1 Dataset

In our work, we want to show that if an RL agent uses symbolic and neural reasoning in tandem, where the neural module is mainly responsible for exploration and the symbolic component for exploitation, then the performance of that agent increases drastically in text-based games. At first, we verify our approach with TW-Cooking domain (Adhikari et al., 2020a), where we have used levels 1-4 from the GATA dataset[3] for testing. As the name suggests, this game suit is about collecting various cooking ingredients and preparing a meal following an in-game recipe.


To showcase the importance of generalization, we have tested our EXPLORER agent on TWC games with OOD data. Here, the goal is to tidy up the house by putting objects in their commonsense locations. With the help of TWC framework (Murugesan et al., 2021a), we have generated a set of games with 3 different difficulty levels - (i) easy level: that contains 1 room with 1 to 3 objects; (ii) medium level: that contains 1 or 2 rooms with 4 or 5 objects; and (iii) hard level: a mix of games with a high number of objects (6 or 7 objects in 1 or 2 rooms) or a high number of rooms (3 or 4 rooms containing 4 or 5 objects).


We chose TW-Cooking and TWC games as our test-bed because these are benchmark datasets for evaluating neuro-symbolic agents in text-based games (Chaudhury et al., 2021, 2023; Wang et al., 2022; Kimura et al., 2021; Basu et al., 2022a). Also, these environments require the agents to exhibit skills such as exploration, planning, reasoning, and OOD generalization, which makes them ideal environments to evaluate EXPLORER.


This paper is available on arxiv under CC BY 4.0 DEED license.


[3] https://github.com/xingdi-eric-yuan/GATA-public

L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks