This paper is available on arxiv under CC 4.0 license.
Authors:
(1) Oded Blumenthal, Software and Information Systems Engineering, Ben Gurion University, Israel;
(2) Guy Shani, Software and Information Systems Engineering, Ben Gurion University, Israel. Table of Links Introduction
Background
Related Work
POMCP for Stochastic Contingent Planning
Domain Independent Heuristics for POMCP
Empirical Evaluation
Conclusion and References 5 Domain Independent Heuristics for POMCP We now describe the main contribution of this paper — two domain independent rollout heuristics that leverage methods developed in the automated planning community, using the structure specified in the stochastic contingent planning problem. 5.1 Delete Relaxation Heuristics Delete relaxation heuristics are built upon the notion that if actions have only positive effects, then the number of actions that can be executed before the state becomes fixed is finite, and in many cases, small. Also, as actions cannot destroy the precondition of other actions, one can execute actions in parallel. Algorithm 2 portrays a delete relaxation heuristic. Delete relaxation heuristics create a layered graph, interleaving action and fact layers. The first layer, which is a fact layer, contains all the facts that hold in the state for which the heuristic is computed (line 2). The second layer, which is an action layer, contains all the actions whose preconditions hold given the facts in the first layer (line 5). The next layer, which is again a fact layer, contains all the positive effects of the actions in the previous layer, as well as all facts from the previous layer (line 6), and so forth. We stop developing the graph once no new facts can be obtained (line 8). 5.2 Heuristics in Belief Space A major disadvantage of the above heuristics is that they focus on a single state. When the agent is aware of the true state of the system, observations have no value. Hence, the above heuristics, as well as any heuristic that is based on a single state, do not provide an estimate for the value of information, which is a key advantage of POMDPs. We hence suggest now a heuristic that is computed over a set B of possible states (Algorithm 3). We compute again the delete relaxation graph, with a few modifications. We compute for each state in B a separate fact layer. An action can be applied only if its preconditions are satisfied in the fact layers of all agents (line 7). This is equivalent to the requirement in contingent planning where an action is applicable only if it is applicable in all states in the current belief, where B is served as an approximation of the true belief state. Second, our method leverages the deterministic observations, that allow us to filter out states that are inconsistent with the received observation (lines 8-14). When a sensing action can be applied, all states that do not agree with the value of s on the observation are discarded from B (lines 10-14). That is, we remove the fact layers corresponding to these states, and no longer consider them when computing which actions can be applied. We stop when both no states were discarded, and no new facts were obtained (line 17). This process must take into account sensing actions to remove states that are incompatible with s, which would allow, at the next iteration, that action preconditions would be satisfied for less states, and hence additional actions can be executed. This paper is available on arxiv under CC 4.0 license. Authors: (1) Oded Blumenthal, Software and Information Systems Engineering, Ben Gurion University, Israel; (2) Guy Shani, Software and Information Systems Engineering, Ben Gurion University, Israel. This paper is available on arxiv under CC 4.0 license. Authors: Authors: (1) Oded Blumenthal, Software and Information Systems Engineering, Ben Gurion University, Israel; (2) Guy Shani, Software and Information Systems Engineering, Ben Gurion University, Israel. Table of Links Introduction Background Related Work POMCP for Stochastic Contingent Planning Domain Independent Heuristics for POMCP Empirical Evaluation Conclusion and References Introduction Introduction Background Background Related Work Related Work POMCP for Stochastic Contingent Planning POMCP for Stochastic Contingent Planning Domain Independent Heuristics for POMCP Domain Independent Heuristics for POMCP Empirical Evaluation Empirical Evaluation Conclusion and References Conclusion and References 5 Domain Independent Heuristics for POMCP We now describe the main contribution of this paper — two domain independent rollout heuristics that leverage methods developed in the automated planning community, using the structure specified in the stochastic contingent planning problem. 5.1 Delete Relaxation Heuristics Delete relaxation heuristics are built upon the notion that if actions have only positive effects, then the number of actions that can be executed before the state becomes fixed is finite, and in many cases, small. Also, as actions cannot destroy the precondition of other actions, one can execute actions in parallel. Algorithm 2 portrays a delete relaxation heuristic. Delete relaxation heuristics create a layered graph, interleaving action and fact layers. The first layer, which is a fact layer, contains all the facts that hold in the state for which the heuristic is computed (line 2). The second layer, which is an action layer, contains all the actions whose preconditions hold given the facts in the first layer (line 5). The next layer, which is again a fact layer, contains all the positive effects of the actions in the previous layer, as well as all facts from the previous layer (line 6), and so forth. We stop developing the graph once no new facts can be obtained (line 8). 5.2 Heuristics in Belief Space A major disadvantage of the above heuristics is that they focus on a single state. When the agent is aware of the true state of the system, observations have no value. Hence, the above heuristics, as well as any heuristic that is based on a single state, do not provide an estimate for the value of information, which is a key advantage of POMDPs. We hence suggest now a heuristic that is computed over a set B of possible states (Algorithm 3). B We compute again the delete relaxation graph, with a few modifications. We compute for each state in B a separate fact layer. An action can be applied only if its preconditions are satisfied in the fact layers of all agents (line 7). This is equivalent to the requirement in contingent planning where an action is applicable only if it is applicable in all states in the current belief, where B is served as an approximation of the true belief state. B B Second, our method leverages the deterministic observations, that allow us to filter out states that are inconsistent with the received observation (lines 8-14). When a sensing action can be applied, all states that do not agree with the value of s on the observation are discarded from B (lines 10-14). That is, we remove the fact layers corresponding to these states, and no longer consider them when computing which actions can be applied. B We stop when both no states were discarded, and no new facts were obtained (line 17). This process must take into account sensing actions to remove states that are incompatible with s , which would allow, at the next iteration, that action preconditions would be satisfied for less states, and hence additional actions can be executed. s

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Rollout Heuristics: Domain Independent Heuristics for POMCP

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

5 Smart Heuristic Methods for Solving Bin Packing Problems

Learn Agent Instructions: Fine-Tuning Autonomous AI Agent Trajectory

Building Your Own Custom Tool in SuperAGI: A Step-by-Step Guide

Exploring Crypto ‘Vending Machines’: Physical Machines, Dispensers, and Autonomous Agents

How to Use Contracts in Obyte: Smart, Prosaic, and With Arbitration

It's Time for Lawmakers to Regulate the Development and Use of AI

5 Smart Heuristic Methods for Solving Bin Packing Problems

Learn Agent Instructions: Fine-Tuning Autonomous AI Agent Trajectory

Building Your Own Custom Tool in SuperAGI: A Step-by-Step Guide

Exploring Crypto ‘Vending Machines’: Physical Machines, Dispensers, and Autonomous Agents

How to Use Contracts in Obyte: Smart, Prosaic, and With Arbitration

It's Time for Lawmakers to Regulate the Development and Use of AI

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps