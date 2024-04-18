This is paper is available on arxiv under CC 4.0 DEED license. Authors: (1) Dhruv Shah, UC Berkeley and he contributed equally; (2) Michael Equi, UC Berkeley and he contributed equally; (3) Blazej Osinski, University of Warsaw; (4) Fei Xia, Google DeepMind; (5) Brian Ichter, Google DeepMind; (6) Sergey Levine, UC Berkeley and Google DeepMind.

7 Discussion

We presented LFG, a method for utilizing language models for semantic guesswork to help navigate to goals in new and unfamiliar environments. The central idea in our work is that, while language models can bring to bear rich semantic understanding, their ungrounded inferences about how to perform navigational tasks are better used as suggestions and heuristics rather than plans. We formulate a way to derive a heuristic score from language models that we can then incorporate into a planning algorithm, and use this heuristic planner to reach goals in new environments more effectively. This way of using language models benefits from their inferences when they are correct, and reverts to a more conventional unguided search when they are not.





Limitations and future work: While our experiments provide a validation of our key hypothesis, they have a number of limitations. First, we only test in indoor environments in both sim and real yet the role of semantics in navigation likely differs drastically across domains – e.g., navigating a forest might implicate semantics very differently than navigating an apartment building. Exploring the applicability of semantics derived from language models in other settings would be another promising and exciting direction for future work. Second, we acknowledge that multiple requests to cloud-hosted LLMs with CoT is slow and requires an internet connection, severely limiting the extent of real-world deployment of the proposed method. We hope that ongoing advancements in quantizing LLMs for edge deployment and fast inference will address this limitation.

Acknowledgments

This research was partly supported by AFOSR FA9550-22-1-0273 and DARPA ANSR. The authors would like to thank Bangguo Yu, Vishnu Sashank Dorbala, Mukul Khanna, Theophile Gervet, and Chris Paxton, for their help in reproducing baselines. The authors would also like to thank Ajay Sridhar, for supporting real-world experiments, and Devendra Singh Chaplot, Jie Tan, Peng Xu, and Tingnan Zhang, for useful discussions in various stages of the project.

