Authors:
(1) Kinjal Basu, IBM Research;
(2) Keerthiram Murugesan, IBM Research;
(3) Subhajit Chaudhury, IBM Research;
(4) Murray Campbell, IBM Research;
(5) Kartik Talamadupula, Symbl.ai;
(6) Tim Klinger, IBM Research.
Table of Links
3.1 Learning Symbolic Policy using ILP
4.1 Dynamic Rule Generalization
5 Experiments and Results
7 Future Work and Conclusion, Limitations, Ethics Statement, and References
5.2 Experiments
To explain EXPLORER works better than a neuralonly agent, we have selected two neural baseline models for each of our datasets (TWC and TWCooking) and compared them with EXPLORER. In our evaluation, for both the datasets, we have used LSTM-A2C (Narasimhan et al., 2015) as the Text-Only agent, which uses the encoded history of observation to select the best action. For TWCooking, we have compared EXPLORER with the SOTA model on the TW-Cooking domain - Graph Aided Transformer Agent (GATA) (Adhikari et al., 2020a). Also, we have done a comparative study of neuro-symbolic models on TWC (section 5.3) with SOTA neuro-symbolic model CBR (Atzeni et al., 2022), where we have used SOTA neural model BiKE (Murugesan et al., 2021b) as the neural module in both EXPLORER and CBR.
We have tested with four neuro-symbolic settings of EXPLORER, where one without generalization - EXPLORER-w/o-GEN and the other three uses EXPLORER with different settings of generalization. Below are the details of different generalization settings in EXPLORER:
Exhaustive Rule Generalization: This setting lifts the rules exhaustively with all the hypernyms up to WordNet level 3 from an object or in other words select those hypernyms of an object whose path-distance with the object is ≤ 3.
IG-based generalization (hypernym Level 2/3): Here, EXPLORER uses the rule generalization algorithm (algorithm 1). It takes WordNet hypernyms up to level 2 or 3 from an object.
For both datasets in all the settings, agents are trained using 100 episodes with 50 steps maximum. On TW-Cooking domain, it is worth mentioning that while we have done the pre-training tasks (such as graph encoder, graph updater, action scorer, etc) for GATA as in (Adhikari et al., 2020a), both text-only agent and EXPLORER do not have any pretraining advantage to boost the performance.
This paper is available on arxiv under CC BY 4.0 DEED license.