This is paper is available on arxiv under CC 4.0 DEED license.
Authors:
(1) Leah Chrestien, Czech Technical University in Prague;
(2) Tomá˘s Pevný, Czech Technical University in Prague, and Gen Digital, Inc.;
(3) Stefan Edelkamp, Czech Technical University in Prague;
(4) Antonín Komenda, Czech Technical University in Prague.
In imitation learning for planning, parameters of heuristic functions are optimized against a set of solved problem instances. This work revisits the necessary and sufficient conditions of strictly optimally efficient heuristics for forward search algorithms, mainly A* and greedy best-first search, which expand only states on the returned optimal path. It then proposes a family of loss functions based on ranking tailored for a given variant of the forward search algorithm. Furthermore, from a learning theory point of view, it discusses why optimizing cost-to-goal h∗ is unnecessarily difficult. The experimental comparison on a diverse set of problems unequivocally supports the derived theory.
Automated planning finds a sequence of actions that will reach a goal in a model of the environment provided by the user. It is considered to be one of the core problems in Artificial Intelligence and it is behind some of its successful applications [27, 39, 46]. Early analysis of planning tasks [31] indicated that optimizing the heuristic function steering the search for a given problem domain towards the goal can dramatically improve the performance of the search. Automated optimization of the heuristic function therefore becomes a central request in improving the performance of planners [6, 13].
Learning in planning means optimizing heuristic functions from plans of already solved problems and their instances. This definition includes the selection of proper heuristics in a set of pattern databases [12, 17, 22, 33], a selection of a planner from a portfolio [25], learning planning operators from instances [32, 56], learning for macro-operators and entanglements [9, 29], and learning heuristic functions by general function approximators (e.g. neural networks) [5, 16, 19, 45].
The majority of research [16, 19, 45, 53] optimizes the heuristic function to estimate the cost-togoal, as it is well known that it is an optimal heuristic for many best-first heuristic search algorithms including A* or IDA* [21, 28]. These works optimize the heuristic function to solve regression problems. But even a true cost-to-goal h∗ does not guarantee that the forward search will find the optimal solution while expanding the minimal number of states. This paper defines a stricter version of optimally efficiency [10, 24] as follows. Forward search (or its heuristic) is called strictly optimally efficient iff it expands only states on one optimal solution path returned by the search, i.e., if this optimal solution path has l + 1 states, the search will expand only l states.
This work focuses exclusively on a forward search with merit functions. It presents theoretical arguments as to why the order (rank) of states provided by the heuristic function is more important than the precise estimate of the cost-to-goal and it formalizes the necessary and sufficient condition for strictly optimally efficient search. It also argues why learning to rank is a simpler problem than regression estimating the cost-to-goal from a statistical learning theory point of view. The main motivation of the paper is similar to that in [38], but the derived theory and implementation are much simpler. The sufficiency of optimizing the rank has been already recognized in [59] for a beam-search and in [18] for greedy best-first search (GBFS), though, unlike this work, it cannot guarantee strict optimal efficiency.
We emphasize that this work neither deals with nor questions the existence of a strictly optimally efficient heuristic function with the forward search. It is assumed that the space of possible heuristic functions is sufficiently large, such that there exist one that is sufficiently good. The theoretical results are supported by experimental comparisons to the prior art on eight domains.
The contributions of this paper are as follows.
1. We state necessary and sufficient conditions for strictly optimally efficient heuristic functions for forward search algorithms in deterministic planning.
2. We show that when optimizing a heuristic function, it is better to solve the ranking problem.
3. We argue, why learning to rank is easier than regression used in learning the cost-to-goal.
4. We instantiate it for A* and GBFS searches leading to two slightly different loss functions.
5. We experimentally demonstrate on eight problems (three grid and five PDDL) that optimizing rank is always better.