This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Oded Blumenthal, Software and Information Systems Engineering, Ben Gurion University, Israel;
(2) Guy Shani, Software and Information Systems Engineering, Ben Gurion University, Israel. Table of Links Introduction
Background
Related Work
POMCP for Stochastic Contingent Planning
Domain Independent Heuristics for POMCP
Empirical Evaluation
Conclusion and References 7 Conclusion In this paper we suggested to model goal POMDPs as stochastic contingent planning models, which allows us to use domain independent heuristics developed in the automated planning community to estimate the utility of belief states. We implemented our domain independent heuristics into the rollout mechanism of POMCP — a well known online POMDP planner that constructs a search tree to evaluate which action to take next. We provide an empirical evaluation showing how heuristics provide much leverage, especially in complex domains that require a long planning horizon, compared to the standard uniform rollout policy that is often used in POMCP. For future research we intend to integrate our methods into other solvers, such as RTDP-BEL, or into point-based planners as a method to gather good belief points. We can also experiment with additional heuristics, other than the hadd heuristic used in this paper. References [1] Alexandre Albore, Héctor Palacios & Hector Geffner (2009): A Translation-Based Approach to Contingent Planning. In: IJCAI, 9, pp. 1623–1628, doi:10.5555/1661445.1661706. Available at https://dl.acm. org/doi/10.5555/1661445.1661706. [2] Blai Bonet & Héctor Geffner (2001): Planning as heuristic search. Artificial Intelligence 129(1-2), pp. 5–33, doi:10.1016/S0004-3702(01)00108-4. [3] Blai Bonet & Hector Geffner (2009): Solving POMDPs: RTDP-Bel vs. Point-based Algorithms. In Craig Boutilier, editor: IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009, pp. 1641–1646, doi:10.5555/1661445.1661709. Available at https://dl.acm.org/doi/10.5555/1661445.1661709. [4] Blai Bonet & Hector Geffner (2011): Planning under Partial Observability by Classical Replanning: Theory and Experiments. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pp. 1936–1941, doi:10.5591/978-1-57735-516-8/IJCAI11-324. [5] Ronen I Brafman & Guy Shani (2016): Online belief tracking using regression for contingent planning. Artificial Intelligence 241, pp. 131–152, doi:10.1016/j.artint.2016.08.005. [6] Hector Geffner & Blai Bonet (2022): A concise introduction to models and methods for automated planning. Springer Nature. [7] Malte Helmert & Carmel Domshlak (2009): Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? In Lubos Brim, Stefan Edelkamp, Eric A. Hansen & Peter Sanders, editors: Graph Search Engineering, 29.11. - 04.12.2009, Dagstuhl Seminar Proceedings 09491, Schloss Dagstuhl -LeibnizZentrum für Informatik, Germany, doi:10.1609/icaps.v19i1.13370. Available at http://drops.dagstuhl. de/opus/volltexte/2010/2432/. [8] Malte Helmert, Patrik Haslum, Jörg Hoffmann & Raz Nissim (2014): Merge-and-shrink abstraction: A method for generating lower bounds in factored state spaces. Journal of the ACM (JACM) 61(3), pp. 1–63, doi:10.1145/2559951. [25] Guy Shani, Joelle Pineau & Robert Kaplow (2013): A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27, pp. 1–51, doi:10.1007/s10458-012-9200-2. [26] William Shen, Felipe Trevizan, Sam Toyer, Sylvie Thiébaux & Lexing Xie (2019): Guiding search with generalized policies for probabilistic planning. In: Proceedings of the International Symposium on Combinatorial Search, 10, pp. 97–105, doi:10.1609/socs.v10i1.18507. [27] David Silver & Joel Veness (2010): Monte-Carlo planning in large POMDPs. Advances in neural information processing systems 23, doi:10.5555/2997046.2997137. [28] Richard D Smallwood & Edward J Sondik (1973): The optimal control of partially observable Markov processes over a finite horizon. Operations research 21(5), pp. 1071–1088, doi:10.1287/opre.21.5.1071. [29] Adhiraj Somani, Nan Ye, David Hsu & Wee Sun Lee (2013): DESPOT: Online POMDP planning with regularization. Advances in neural information processing systems 26, doi:10.1613/jair.5328. [30] Zachary Sunberg & Mykel Kochenderfer (2018): Online algorithms for POMDPs with continuous state, action, and observation spaces. In: Proceedings of the International Conference on Automated Planning and Scheduling, 28, pp. 259–263, doi:10.48550/arXiv.1709.06196. [31] Yuchen Xiao, Sammie Katt, Andreas ten Pas, Shengjian Chen & Christopher Amato (2019): Online planning for target object search in clutter under partial observability. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp. 8241–8247, doi:10.1109/ICRA.2019.8793494. [32] Nan Ye, Adhiraj Somani, David Hsu & Wee Sun Lee (2017): DESPOT: Online POMDP planning with regularization. Journal of Artificial Intelligence Research 58, pp. 231–266, doi:10.1613/jair.5328. This paper is available on arxiv under CC 4.0 license. Authors: (1) Oded Blumenthal, Software and Information Systems Engineering, Ben Gurion University, Israel; (2) Guy Shani, Software and Information Systems Engineering, Ben Gurion University, Israel. This paper is available on arxiv under CC 4.0 license. Authors: Authors: (1) Oded Blumenthal, Software and Information Systems Engineering, Ben Gurion University, Israel; (2) Guy Shani, Software and Information Systems Engineering, Ben Gurion University, Israel. Table of Links Introduction Background Related Work POMCP for Stochastic Contingent Planning Domain Independent Heuristics for POMCP Empirical Evaluation Conclusion and References Introduction Introduction Background Background Related Work Related Work POMCP for Stochastic Contingent Planning POMCP for Stochastic Contingent Planning Domain Independent Heuristics for POMCP Domain Independent Heuristics for POMCP Empirical Evaluation Empirical Evaluation Conclusion and References Conclusion and References 7 Conclusion In this paper we suggested to model goal POMDPs as stochastic contingent planning models, which allows us to use domain independent heuristics developed in the automated planning community to estimate the utility of belief states. We implemented our domain independent heuristics into the rollout mechanism of POMCP — a well known online POMDP planner that constructs a search tree to evaluate which action to take next. We provide an empirical evaluation showing how heuristics provide much leverage, especially in complex domains that require a long planning horizon, compared to the standard uniform rollout policy that is often used in POMCP. For future research we intend to integrate our methods into other solvers, such as RTDP-BEL, or into point-based planners as a method to gather good belief points. We can also experiment with additional heuristics, other than the h add heuristic used in this paper. add References [1] Alexandre Albore, Héctor Palacios & Hector Geffner (2009): A Translation-Based Approach to Contingent Planning. In: IJCAI, 9, pp. 1623–1628, doi:10.5555/1661445.1661706. Available at https://dl.acm. org/doi/10.5555/1661445.1661706. [2] Blai Bonet & Héctor Geffner (2001): Planning as heuristic search. Artificial Intelligence 129(1-2), pp. 5–33, doi:10.1016/S0004-3702(01)00108-4. [3] Blai Bonet & Hector Geffner (2009): Solving POMDPs: RTDP-Bel vs. Point-based Algorithms. In Craig Boutilier, editor: IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009, pp. 1641–1646, doi:10.5555/1661445.1661709. Available at https://dl.acm.org/doi/10.5555/1661445.1661709. [4] Blai Bonet & Hector Geffner (2011): Planning under Partial Observability by Classical Replanning: Theory and Experiments. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011, pp. 1936–1941, doi:10.5591/978-1-57735-516-8/IJCAI11-324. [5] Ronen I Brafman & Guy Shani (2016): Online belief tracking using regression for contingent planning. Artificial Intelligence 241, pp. 131–152, doi:10.1016/j.artint.2016.08.005. [6] Hector Geffner & Blai Bonet (2022): A concise introduction to models and methods for automated planning. Springer Nature. [7] Malte Helmert & Carmel Domshlak (2009): Landmarks, Critical Paths and Abstractions: What’s the Difference Anyway? In Lubos Brim, Stefan Edelkamp, Eric A. Hansen & Peter Sanders, editors: Graph Search Engineering, 29.11. - 04.12.2009, Dagstuhl Seminar Proceedings 09491, Schloss Dagstuhl -LeibnizZentrum für Informatik, Germany, doi:10.1609/icaps.v19i1.13370. Available at http://drops.dagstuhl. de/opus/volltexte/2010/2432/. [8] Malte Helmert, Patrik Haslum, Jörg Hoffmann & Raz Nissim (2014): Merge-and-shrink abstraction: A method for generating lower bounds in factored state spaces. Journal of the ACM (JACM) 61(3), pp. 1–63, doi:10.1145/2559951. [25] Guy Shani, Joelle Pineau & Robert Kaplow (2013): A survey of point-based POMDP solvers. Autonomous Agents and Multi-Agent Systems 27, pp. 1–51, doi:10.1007/s10458-012-9200-2. [26] William Shen, Felipe Trevizan, Sam Toyer, Sylvie Thiébaux & Lexing Xie (2019): Guiding search with generalized policies for probabilistic planning. In: Proceedings of the International Symposium on Combinatorial Search, 10, pp. 97–105, doi:10.1609/socs.v10i1.18507. [27] David Silver & Joel Veness (2010): Monte-Carlo planning in large POMDPs. Advances in neural information processing systems 23, doi:10.5555/2997046.2997137. [28] Richard D Smallwood & Edward J Sondik (1973): The optimal control of partially observable Markov processes over a finite horizon. Operations research 21(5), pp. 1071–1088, doi:10.1287/opre.21.5.1071. [29] Adhiraj Somani, Nan Ye, David Hsu & Wee Sun Lee (2013): DESPOT: Online POMDP planning with regularization. Advances in neural information processing systems 26, doi:10.1613/jair.5328. [30] Zachary Sunberg & Mykel Kochenderfer (2018): Online algorithms for POMDPs with continuous state, action, and observation spaces. In: Proceedings of the International Conference on Automated Planning and Scheduling, 28, pp. 259–263, doi:10.48550/arXiv.1709.06196. [31] Yuchen Xiao, Sammie Katt, Andreas ten Pas, Shengjian Chen & Christopher Amato (2019): Online planning for target object search in clutter under partial observability. In: 2019 International Conference on Robotics and Automation (ICRA), IEEE, pp. 8241–8247, doi:10.1109/ICRA.2019.8793494. [32] Nan Ye, Adhiraj Somani, David Hsu & Wee Sun Lee (2017): DESPOT: Online POMDP planning with regularization. Journal of Artificial Intelligence Research 58, pp. 231–266, doi:10.1613/jair.5328.

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Rollout Heuristics for Online Stochastic Contingent Planning: Conclusion and References

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

5 Smart Heuristic Methods for Solving Bin Packing Problems

Learn Agent Instructions: Fine-Tuning Autonomous AI Agent Trajectory

Building Your Own Custom Tool in SuperAGI: A Step-by-Step Guide

Exploring Crypto ‘Vending Machines’: Physical Machines, Dispensers, and Autonomous Agents

How to Use Contracts in Obyte: Smart, Prosaic, and With Arbitration

It's Time for Lawmakers to Regulate the Development and Use of AI

5 Smart Heuristic Methods for Solving Bin Packing Problems

Learn Agent Instructions: Fine-Tuning Autonomous AI Agent Trajectory

Building Your Own Custom Tool in SuperAGI: A Step-by-Step Guide

Exploring Crypto ‘Vending Machines’: Physical Machines, Dispensers, and Autonomous Agents

How to Use Contracts in Obyte: Smart, Prosaic, and With Arbitration

It's Time for Lawmakers to Regulate the Development and Use of AI

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps