This story draft by @escholar has not been reviewed by an editor, YET.

EXPLORER: Exploration-guided Reasoning for Textual Reinforcement Learning: Experiments

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
0-item

Authors:

(1) Kinjal Basu, IBM Research;

(2) Keerthiram Murugesan, IBM Research;

(3) Subhajit Chaudhury, IBM Research;

(4) Murray Campbell, IBM Research;

(5) Kartik Talamadupula, Symbl.ai;

(6) Tim Klinger, IBM Research.

Table of Links

Abstract and 1 Introduction

2 Background

3 Symbolic Policy Learner

3.1 Learning Symbolic Policy using ILP

3.2 Exception Learning

4 Rule Generalization

4.1 Dynamic Rule Generalization

5 Experiments and Results

5.1 Dataset

5.2 Experiments

5.3 Results

6 Related Work

7 Future Work and Conclusion, Limitations, Ethics Statement, and References

5.2 Experiments

To explain EXPLORER works better than a neuralonly agent, we have selected two neural baseline models for each of our datasets (TWC and TWCooking) and compared them with EXPLORER. In our evaluation, for both the datasets, we have used LSTM-A2C (Narasimhan et al., 2015) as the Text-Only agent, which uses the encoded history of observation to select the best action. For TWCooking, we have compared EXPLORER with the SOTA model on the TW-Cooking domain - Graph Aided Transformer Agent (GATA) (Adhikari et al., 2020a). Also, we have done a comparative study of neuro-symbolic models on TWC (section 5.3) with SOTA neuro-symbolic model CBR (Atzeni et al., 2022), where we have used SOTA neural model BiKE (Murugesan et al., 2021b) as the neural module in both EXPLORER and CBR.


We have tested with four neuro-symbolic settings of EXPLORER, where one without generalization - EXPLORER-w/o-GEN and the other three uses EXPLORER with different settings of generalization. Below are the details of different generalization settings in EXPLORER:


Exhaustive Rule Generalization: This setting lifts the rules exhaustively with all the hypernyms up to WordNet level 3 from an object or in other words select those hypernyms of an object whose path-distance with the object is ≤ 3.


IG-based generalization (hypernym Level 2/3): Here, EXPLORER uses the rule generalization algorithm (algorithm 1). It takes WordNet hypernyms up to level 2 or 3 from an object.


For both datasets in all the settings, agents are trained using 100 episodes with 50 steps maximum. On TW-Cooking domain, it is worth mentioning that while we have done the pre-training tasks (such as graph encoder, graph updater, action scorer, etc) for GATA as in (Adhikari et al., 2020a), both text-only agent and EXPLORER do not have any pretraining advantage to boost the performance.


Table 2: TW-Cooking domain — Comparison Results (with Mean and SD)


This paper is available on arxiv under CC BY 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

EScholar: Electronic Academic Papers for Scholars HackerNoon profile picture
EScholar: Electronic Academic Papers for Scholars@escholar
We publish the best academic work (that's too often lost to peer reviews & the TA's desk) to the global tech community

Topics

Around The Web...

Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks