Theory and Algorithms for Partial Order Based Reduction in Planning
Search is a major technique for planning. It amounts to exploring a state space of planning domains typically modeled as a directed graph. However, prohibitively large sizes of the search space make search expensive. Developing better heuristic functions has been the main technique for improving search efficiency. Nevertheless, recent studies have shown that improving heuristics alone has certain fundamental limits on improving search efficiency. Recently, a new direction of research called partial order based reduction (POR) has been proposed as an alternative to improving heuristics. POR has shown promise in speeding up searches.
POR has been extensively studied in model checking research and is a key enabling technique for scalability of model checking systems. Although the POR theory has been extensively studied in model checking, it has never been developed systematically for planning before. In addition, the conditions for POR in the model checking theory are abstract and not directly applicable in planning. Previous works on POR algorithms for planning did not establish the connection between these algorithms and existing theory in model checking.
In this paper, we develop a theory for POR in planning. The new theory we develop connects the stubborn set theory in model checking and POR methods in planning. We show that previous POR algorithms in planning can be explained by the new theory. Based on the new theory, we propose a new, stronger POR algorithm. Experimental results on various planning domains show further search cost reduction using the new algorithm.
I.2.8Artificial IntelligenceProblem Solving, Control Methods, and Search[Graph and tree search strategies] \termsAI planning, State-space search, Partial order reduction, Stubborn set
State space search is a fundamental and pervasive approach to artificial intelligence in general and planning in particular. It is among the most successful approaches to planning. A major concern with state space search is that it has a high time and space cost since the state space that needs to be explored is usually very large.
Much research on classical planning has focused on the design of better heuristic functions. For example, new heuristic functions have recently been developed by analyzing the domain transition graphs (DTGs) and causal graphs on top of the SAS+ formalism [Briel et al. (2007), Helmert and Röger (2008)]. Despite the success of using domain-independent heuristics for classic planning, heuristic planners still face scalability challenges for large-scale problems. As shown by recent work, search even with almost perfect heuristic guidance may still lead to very high search cost [Helmert and Röger (2008)]. Therefore, it is important to improve other components of the search algorithm that are orthogonal to the development of heuristics.
Recently, partial order based reduction (POR), a new way to reduce the search cost from an orthogonal perspective, has been studied for classical planning [Chen et al. (2009), Chen and Yao (2009)]. POR as a method to reduce search space has been extensively studied in model checking with solid theoretical investigation. However, the theoretical properties of POR in planning have still not been fully investigated. There are three key questions.
1) POR algorithms have been extensively studied in model checking. In fact, POR is an enabling technique for modeling checking, which will not be practical without POR due to its high time complexity. Extensive research has been developed for the theory of POR in model checking. What are the relationships between the previous POR methods designed for model checking and existing work for planning? Understanding these relationships can not only help us understand both problems better, but can also potentially lead to better POR algorithms for planning.
2) In essence, all POR based algorithms reduce the search space by restricting certain actions from expanding at each state. Although these POR algorithms all look similar, what are the differences in the quality of reduction that significantly affect search efficiency? We think it is important to investigate the reduction powers of different POR algorithms.
3) Given the fact that there is more than one POR reduction algorithm for planning, are there other, stronger POR algorithms? To answer this question, in essence, we need to find the sufficient and/or necessary conditions for partial-order based pruning. There are sufficient conditions for POR in model checking. Nevertheless, those conditions are abstract and not directly applicable in planning.
The main contribution of this work is to establish the relationship between the POR methods for model checking and those for planning. We leverage on the existing POR theory for model checking and develop a counterpart theory for planning. This new theory allows existing POR algorithms for planning to be explained in a unified framework. Moreover, based on the conditions given by this theory, we develop a new POR algorithm for planning that is stronger than previous ones. Experimental results also show that our proposed algorithm leads to more reduction.
This paper is organized as follows. We first give basic definitions in Section 2. In Section 3, we present a general theory that gives sufficient conditions for POR in planning. In Section 4, we use the new theory to explain two previous POR algorithms. Based on the theory, in Section 5, we propose a new POR algorithm for planning which is different and stronger than previous ones. We report experimental results in Section 7, review some related work in Section 8, and give conclusions in Section 9.
Planning is a core area of artificial intelligence. It entails arranging a course of actions to achieve certain goals under given constraints. Classical planning is the most fundamental form of planning, which deals with only propositional logic. In this paper, we work on the SAS+ formalism [Jonsson and Bäckström (1998)] of classical planning. SAS+ formalism has recently attracted a lot of attention due to a number of advantages it has over the traditional STRIPS formalism. In the following, we review this formalism and introduce our notations.
A SAS+ planning task is defined as a tuple of four elements, .
is a set of multi-valued state variables, each with an associated finite domain .
is a set of actions and each action is a tuple , where both and define some partial assignments of state variables in the form . is a partial assignment that defines the goal.
is the set of states. A state is a full assignment to all the state variables. is the initial state. A state is a goal state if .
Two partial assignment sets are conflict-free if and only if they do not assign different values to the same state variable.
For a SAS+ planning task, for a given state and an action , when all variable assignments in are met in state , action is applicable in state . After applying to , the state variable assignment will be changed to a new state according to : the state variables that appear in will be changed to the assignments in while other state variables remain the same. We denote the resulting state after applying an applicable action to as . is undefined if is not applicable in . The planning task is to find a path, or a sequence of actions, that transits the initial state to a goal state that includes .
An important structure for a given SAS+ task is the domain transition graph defined as follows:
For a SAS+ planning task, each state variable corresponds to a domain transition graph (DTG) , a directed graph with a vertex set , where is a special vertex, and an edge set determined by the following.
If there is an action such that and , then belongs to and we say that is associated with the edge (denoted as ). It is conventional to call the edges in DTGs transitions.
If there is an action such that and no assignment to is in , then belongs to and we say that is associated with the transition (denoted as ).
Intuitively, a SAS+ task can be decomposed into multiple objects, each corresponding to one DTG, which models the transitions of the possible values of that object.
For a SAS+ planning task, an action is associated with a DTG (denoted as ) if contains an assignment to .
For a SAS+ planning task, a DTG is goal-related if the partial assignments in that define the goal states include an assignment in . A goal-related DTG is unachieved in state if in and .
A SAS+ planning task can also specify a preference that needs to be optimized. A preference is a mapping from a path to a numerical value. In this paper we assume an action set invariant preference. A preference is action set invariant if two paths have the same preference whenever they contain the same set of actions (possibly in different orders). Most popular preferences, such as plan length and total action cost, are action set invariant.
3 Partial Order Reduction Theory for Planning
Partial order based reduction (POR) algorithms have been extensively studied for model checking [Varpaaniemi (2005), Clarke et al. (2000)], which also requires examining a state space in order to prove certain properties. POR is a technique that allows a search to explore only part of the entire search space and still maintain completeness and/or optimality. Without POR, model checking would be too expensive to be practical [Holzmann (1997)]. However, POR has not been studied systematically for planning.
In this section, we will first introduce the concept of search reduction. Then, we will present a general POR theory for planning, which gives sufficient conditions that guide the design of practical POR algorithms.
3.1 Search reduction for planning
We first introduce the concept of search reduction. A standard search, such as breath-first search (BFS), depth-first search, or search, needs to explore a state space graph. A reduction algorithm is an algorithm that reduces the state space graph into a subgraph, so that a search will be performed on the subgraph instead of the original one. We first define the state space graph. In our presentation, for any graph , we use to denote the set of vertices and the set of edges. For a directed graph , for any vertex , a vertex is its successor if and only if .
For a SAS+ planning task, a state space graph for the task is a directed graph in which each state is a vertex and each directed edge represents an action that will be explored during a search process. Most search algorithms work on the original state space graph as defined below.
For a SAS+ planning task, its original state space graph is a directed graph in which each state is a vertex and there is a directed edge if and only if there exists an action such that . We say that action marks the edge .
For a SAS+ planning task, for a state space graph , the successor set of a state , denoted by , is the set of all the successor states of . The expansion set of a state , denoted by , is the set of actions
Intuitively, the successor set of a state includes all the successor states that shall be generated by a search upon expanding , while the expansion set includes all the actions to be expanded at .
In general, a reduction method is a method that maps each input state space graph to a subgraph of . The POR algorithms we study remove edges from . More specifically, each state is only connected to a subset of all its successors in the reduced subgraph. We note that, by removing edges, a POR algorithm may also reduce the number of vertices that are reachable from the initial state, hence reducing the number of nodes examined by a search. The decision whether a successor state would still be a successor in the reduced subgraph can be made locally by checking certain conditions related to the current state and some precomputed information. Hence, a POR algorithm can be combined with various search algorithms.
For a SAS+ task, a solution sequence in its state space graph is a pair , where is a non-goal state, is a sequence of actions, and, let , is an edge in for and is a goal state. We now define some generic properties of reduction methods.
For a SAS+ planning task, a reduction method is completeness-preserving if for any solution sequence in the state space graph, there also exists a solution sequence in the reduced state space graph.
For a SAS+ planning task, a reduction method is optimality-preserving if, for any solution sequence in the state space graph, there also exists a solution sequence in the reduced state space graph satisfying that has the same preference that does.
For a SAS+ planning task, a reduction method is action-pre-serving if, for any solution sequence in the state space graph, there also exists a solution sequence in the reduced state space graph satisfying that the actions in is a permutation of the actions in .
Clearly, being action-preserving is a sufficient condition for being completeness-preserving. When the preference is action set invariant, being action-preserving is also a sufficient condition for being optimality-preserving.
3.2 Stubborn set theory for planning
Although there are many variations of POR methods, a popular and representative POR algorithm is the stubborn set method [Valmari (1988), Valmari (1989), Valmari (1990), Valmari (1998), Valmari (1991), Valmari (1993)], used for model checking based on Petri nets. The basic idea is to form a stubborn set of applicable actions for each state and expand only the actions in the stubborn set during search. By expanding a small subset of applicable actions in each state, stubborn set methods can reduce the search space without compromising completeness.
Since planning also examines a large search space, we propose to develop a stubborn set theory for planning. To achieve this, we need to handle various subtle issues arising from the differences between model checking and planning. We first define the concept of stubborn sets for planning, adapted from the concepts in model checking.
Definition 11 (Stubborn Set for Planning)
For a SAS+ planning task, a set of actions is a stubborn set at state if and only if
For any action and actions , if is a prefix of a path from to a goal state, then is a valid path from and leads to the same state that does; and
Any valid path from to a goal state contains at least one action in .
The above definition is schematically illustrated in Figure 1. Once we define the stubborn set at each state , we in effect reduce the state space graph to a subgraph: only the edges corresponding to actions in the stubborn sets are kept in the subgraph.
For a SAS+ planning task, given a stubborn set defined at each state , the stubborn set method reduces its state space graph to a subgraph such that and there is an edge in if and only if there exists an action such that .
A stubborn set method for planning is a reduction method that reduces the original state space graph to a subgraph according to Definition 12. In other words, a stubborn set method expands actions only in a stubborn set in each state. In the sequel, we show that such a reduction method preserves actions, hence, it also preserves completeness and optimality.
Any stubborn set method for planning is action-preserving.
We prove that for any solution sequence in the original state space graph , there exists a solution sequence in the reduced state space graph resulting from the stubborn set method, such that is a permutation of actions in . We prove this fact by induction on , the length of .
When , let be the only action in , according to the second condition in Definition 12, is in . Thus, is also a solution sequence in . The EC method is action-preserving in the base case.
When , the induction assumption is that any path in with length less than or equal to has a permutation in that leads to the same final state. Now we consider a solution sequence in : . Let . If , we can invoke the induction assumption for the state and prove our induction assumption for .
We now consider the case where . Let be the first action in such that . Such an action must exist because of the condition A2 in Definition 11.
Consider the sequence . According to condition A1 in Definition 12, is also a valid sequence from which leads to the same state that does. Hence, we know that is also a solution path. Therefore, let , we know is an executable action sequence starting from . Let , is a solution sequence in . From the induction assumption, we know there is a sequence which is a permutation of , such that is a solution sequence in . Since , we know that followed by is a solution sequence from and is a permutation of actions in , which is a permutation of actions in . Thus, the stubborn set method is action-preserving. \endproof
Since being action-preserving is a sufficient condition for being completeness-preserving and optimality-preserving, when the preference is action set invariant, we have the following result.
A stubborn set method for planning is completeness-preserving. In addition, it is optimality-preserving when the preference is action set invariant.
3.3 Left commutativity in SAS+ planning
Note that although Theorem 1 provides an important result for reduction, it is not directly applicable since the conditions in Definition 11 are abstract and not directly implementable in algorithms. We need to find sufficient conditions for Definition 11 that can facilitate the design of reduction algorithms. In the following, we define several concepts that can lead to sufficient conditions for Definition 11.
Definition 13 (State-Dependent Left Commutativity)
For a SAS+ planning task, an ordered action pair is left commutative in state , if is a valid path at , and is also a valid path at and results in the same state. We denote such a relationship by .
Definition 14 (State-Independent Left Commutativity)
For a SAS+ planning task, an ordered action pair is left commutative if, for any state , it is true that . We denote such a relationship by .
Note the following. 1) Left commutativity is not a symmetric relationship. does not imply . 2) The order in the notation suggests that we should always try only during the search instead of trying both and . Also, not every state-independent left commutative action pair is state-dependent left commutative. For instance, in a SAS+ planning task with three state variables , action with , eff and action with , eff are left commutative in state but not in state as is not applicable in state .
We introduce state-independent left commutativity as it can be used to derive sufficient conditions for finding stubborn sets.
Definition 15 (State-Independent Left Commutative Set)
For a SAS+
planning task, a set of actions is a left commutative set at a state if and only if
For any action and any action , if there exists a valid path from to a goal state that contains both and , then it is the case that ; and
Any valid path from to a goal state contains at least one action in .
For a SAS+ planning task, for a state , if a set of actions is a state-independent left commutative set, it is also a stubborn set.
For an action and actions , if is a prefix of a path from to a goal state, then according to L1, we see that , for . According to the definition of left commutativity, we see that and can be swapped and that the resulting path is still a valid path that leads to the same state that does. We can subsequently swap with , , and to obtain equivalent paths, before finally obtaining , as shown in the schematic illustration in the right part of Figure 2. Hence, we have shown that if is a prefix of a path from to a goal state, then is a also valid path from that leads to the same state that does, which is exactly the condition in Definition 11. \endproof
From the above proof, we see that the requirement of state-independent left commutativity in Definition 15 is unnecessarily strong. Instead, only certain state-dependent left commutativity is necessary. In fact, when we change to , we only require where is the state after is executed. Similarly, when we change to , we only require where is the state after is executed. Based on the above analysis, we can refine the sufficient conditions.
Definition 16 (State-Dependent Left Commutative Set)
For a SAS+
planning task, a set of actions is a left commutative set at a state if and only if
For any action and actions , if is a prefix of a path from to a goal state, then , where is the state after is executed; and
Any valid path from to a goal state contains at least one action in .
We only need to slightly modify the proof to Theorem 2 in order to prove the following theorem.
For a SAS+ planning task, for a state , if a set of actions is a state-dependent left commutative set, it is also a stubborn set.
The above result gives sufficient conditions for finding stubborn sets in planning. The concept of state-dependent left commutative set requires a less stringent condition than the state-independent left commutative set. Such a nuance actually leads to different previous POR algorithms with varying performances. Therefore, it will result in smaller sets and stronger reduction. Next, we present our algorithm for finding such a set at each state to satisfy these conditions.
3.4 Determining left commutativity
Theorem 3 provides a key result for POR. However, the conditions in Definition 13 are still abstract and not directly implementable. The key issue is to efficiently find left commutative action pairs. Now we give necessary and sufficient conditions for Definition 13 that can practically determine left commutativity and facilitate the design of reduction algorithms.
For a SAS+ planning task, for a valid action path in state , we have if and only if and , and , and are all conflict-free and is applicable at .
First, from the definition of , we know that action is applicable in state . This implies that and eff are conflict-free. Symmetrically, since action is applicable in state , and eff are also conflict-free. Now we prove eff and eff are conflict-free by contradiction. If eff and eff are not conflict-free, without loss of generality, we can assume that contains and contains . Thus, the value of is for state and for state , i.e., is different than . This contradicts our assumption that and are left commutative. Thus, and are conflict-free.
Second, if and , and , and are all conflict-free, since is applicable in , is also applicable in state as and eff are conflict-free. Hence, is a valid path at . Also, for any state variable , its value in states and are the same, because eff and eff are conflict-free. Therefore, we have . Hence, we have . \endproof
Theorem 4 gives necessary and sufficient conditions for deciding whether two actions are left-commutative or not. Based on this result, we later develop practical POR algorithms that find stubborn sets using left commutativity.
4 Explanation of previous POR algorithms
Previously, we have proposed two POR algorithms for planning: expansion core (EC) [Chen and Yao (2009)] and stratified planning (SP) [Chen et al. (2009)], both of which showed good performance in reducing the search space. However we did not have a unified explanation for them. We now explain how these two algorithms can be explained by our theory. Full details of the two algorithms can be found in our papers [Chen and Yao (2009), Chen et al. (2009)].
4.1 Explanation of EC
Expansion core (EC) algorithm is a POR-based reduction algorithm for planning. We will see that, in essence, the EC algorithm exploits the SAS+ formalism to find a left commutative set for each state. To describe the EC algorithm, we need the following definitions.
For a SAS+ task, for each DTG , for a vertex , an edge is a potential descendant edge of (denoted as ) if 1) is goal-related and there exists a path from to the goal state in that contains ; or 2) is not goal-related and is reachable from .
For a SAS+ task, for each DTG , for a vertex , a vertex is a potential descendant vertex of (denoted as ) if 1) is goal-related and there exists a path from to the goal state in that contains ; or 2) is not goal-related and is reachable from .
For a SAS+ task, given a state , for any , we call a potential precondition of the DTG if there exist and such that
For a SAS+ task, given a state , for any , we call a potential dependent of the DTG if there exists , and such that
For a SAS+ task, for a state , its potential dependency graph PDG() is a directed graph in which each DTG corresponds to a vertex, and there is an edge from to , , if and only if is a potential precondition or potential dependent of .
Figure 3 illustrates the above definitions. In PDG(), points to as is a potential precondition of and points to as is a potential dependent of .
For a directed graph , a subset of is a dependency closure if there do not exist and such that .
Intuitively, a DTG in a dependency closure may depend on other DTGs in the closure but not those DTGs outside of the closure. In Figure 3, and form a dependency closure of PDG().
The EC algorithm is defined as follows:
Definition 23 (Expansion Core Algorithm)
For a SAS+ planning task, the EC method reduces its state space graph to a subgraph such that and for each vertex (state) , it expands actions in the following set :
where is the set of executable actions in and is an index set satisfying:
The DTGs form a dependency closure in PDG(); and
There exists such that is goal-related and is not the goal state in .
Intuitively, the EC method can be described as follows. To reduce the original state-space graph, for each state, instead of expanding actions in all the DTGs, it only expands actions in DTGs that belong to a dependency closure of PDG() under the condition that at least one DTG in the dependency closure is goal-related and not at a goal state.
The set can always be found for any non-goal state since PDG() itself is always such a dependency closure. If there is more than one such closure, theoretically any dependency closure satisfying the above conditions can be used in EC. In practice, when there are multiple such dependency closures, EC picks the one with less actions in order to get stronger reduction. EC has adopted the following scheme to find the dependency closure for any state .
Given a PDG(), EC first finds its strongly connected components (SCCs). If each SCC is contracted to a single vertex, the resulting graph is a directed acyclic graph . Note that each vertex in with a zero out-degree corresponds to a dependency closure. It then topologically sorts all the vertices in to get a sequence of SCCs: , and picks the minimum such that includes a goal-related DTG that is not in its goal state. It chooses all the DTGs in as the dependency closure.
Now we explain the EC algorithm using the POR theory we developed in Section 3. We show that the EC algorithm can be viewed as an algorithm for finding a state-dependent left-commutative set in each state.
For a SAS+ planning task, the EC algorithm defines a state-dependent left commutative set for each state.
Consider an action and actions such that is a prefix of a path from to a goal state, we show that , where is the state after is applied to .
Let be the index set of the DTGs that form a dependency closure, as used in in (3). Since , there must exist such that . Let the state after applying to be . We see that we must have because otherwise there must exist a that changes the assignment of state variable . However, that would imply that . Since is applicable in , we see that .
If there exists a state variable such that an assignment to is in both eff and , then will point to the DTG as is a potential dependent of , forcing to be included in the dependency closure, i.e. . However, as , it will violate our assumption that . Hence, none of the precondition assignments of is added by . Therefore, since is applicable in , it is also applicable in .
On the other hand, if has a precondition assignment in a DTG that is associated with, then will point to that DTG since is a potential precondition of , forcing that DTG to be in , which contradicts the assumption that . Hence, does not alter any precondition assignment of . Therefore, since is applicable in , it is also applicable in the state .
Finally, if there exists a state variable such that an assignment to is altered by both and , then we know and . In this case, will point to since is a potential precondition of , making , which contradicts our assumption. Hence, eff and eff correspond to assignments to distinct sets of state variables. Therefore, applying and to will lead to the same state.
From the above, we see that is applicable in , is applicable in , and hence is applicable in . Further we see that leads to the same state as does when applied to . We conclude that and satisfies L1’.
Moreover, for any goal-related DTG , if in a state , its assignment is not the goal state in , then some actions associated with have to be executed in any solution path from . Since includes all the actions in at least one goal-related DTG , any solution path must contain at least one action in . Therefore, also satisfies A2 and it is indeed a state-dependent left commutative set. \endproof
For any SAS+ planning task, the EC algorithm defines a stubborn set in each state.
4.2 Explanation of SP
The stratified planning (SP) algorithm exploits commutativity of actions directly [Chen et al. (2009)]. To describe the SP algorithm, we need the following definitions first.
Given a SAS+ planning task with state variable set , the causal graph () is a directed graph with as the vertex set. There is an edge if and only if and there exists an action such that and or eff.
For a SAS+ task , a stratification of the causal graph as is a partition of the node set : in such a way that there exists no edge where and .
By stratification, each state variable is assigned a level , where if . Subsequently, each action is assigned a level , . is the level of the state variable(s) in eff. Note that all state variables in the same eff must be in the same level, hence, our is well-defined.
Definition 26 (Follow-up Action)
For a SAS+ task , an action is a follow-up action of (denoted as ) if or .
The SP algorithm can be combined with standard search algorithms, such as breadth-first search, depth-first search, and best-first search (including ). During the search, for each state that is going to be expanded, the SP algorithm examines the action that leads to . Then, for each applicable action in state , SP makes the following decisions.
Definition 27 (Stratified Planning Algorithm)
For a SAS+ planning task, in any non-initial state , assuming is the action that leads directly to , and is an applicable action in , then SP does not expand if and is not a follow-up action of . Otherwise, SP expands . In the initial state , SP expands all applicable actions.
The following result shows the relationship between the SP algorithm and our new POR theory.
If an action is not SP-expandable after , and state is the state before action , then .
Since is not SP-expandable after , following the SP algorithm, we have and is not a follow-up action of . According to Definition 26, we have . These imply that eff and are conflict-free, and that eff and are conflict-free. Also, since is applicable in and eff and are conflict-free, must be applicable in (Otherwise eff must change the value of at least one variable in , which means eff and are not conflict-free).
Now we prove that and eff are conflict-free by showing . If their intersection is non-empty, we assume a state variable is assigned by both and eff. By the definition of stratification, is in layer . However, since is assigned by , there must be an edge from layer to layer since . In this case, we know that from the definition of stratification. Nevertheless, this contradicts with the assumption that . Thus, , and and eff are conflict-free.
With all three conflict-free pairs, we have according to Theorem 2. \endproof
Although SP reduces the search space by avoiding the expansion of certain actions, it is in fact not a stubborn set based reduction algorithm. We have the following theorem for the SP algorithm.
For a SAS+ planning task , a valid path is an SP-path if and only if is a path in the search space of the SP algorithm applied to S.
For a SAS+ planning task , for any initial and any valid path from , there exists a path from such that is an SP-path, and both and lead to the same state from , and is a permutation of actions in .
We prove by induction on the number of actions.
When , since there is no action before , any valid path will also be a valid path in the search space of the SP algorithm.
Now we assume this proposition is true of for and prove the case when . For a valid path , by our induction hypothesis, we can rearrange the first actions to obtain a path .
Now we consider a new path . There are two cases. First, if , or and is a follow-up action of , then is already an SP-path. Otherwise, we have and is not a follow-up action of . In this case, by Lemma 2, path is also a valid path that leads to the same state as does.
By the induction hypothesis, if is still not an SP-path, we can rearrange the first actions in to get a new path . Otherwise we let . Comparing and , we know , namely, the level value of the last action in is strictly larger than that in . We can repeat the above process to generate as long as is not an SP-path. Our transformation from to also ensures that every is a valid path from and leads to the same state that does.
Since we know that the layer value of the last action in each is monotonically decreasing as increases, such a process must stop after a finite number of iterations. Suppose it finally stops at , we must have that or and is a follow-up action of . Hence, now is an SP-path. We then assign to and the induction step is proved. \endproof
Theorem 6 shows that the SP algorithm cannot reduce the number of states expanded in the search space. The reason is as follows: for any state in the original search space that is reachable from the initial state via a path , there is still an SP-path that reaches . Therefore, every reachable state in the search space is still reachable by the SP algorithm. In other words, SP reduces the number of generated states, but not the number of expanded states.
SP is not a stubborn set based reduction algorithm. This can be illustrated by the following example.
Assuming a SAS+ planning task that contains two state variables and , where both and have domain , with the initial state as and the goal as . Actions and are two actions in where is and eff is and is and eff is . It is easy to see that and are not follow-up actions of each other, and that will be in different layers after stratification. Without loss of generality, we can assume . Therefore, we know that action will not be expanded after action in state . However, is the goal. Not expanding in state violates condition in Definition 11 where any valid path from to a goal state has to contain at least one action in the expansion set of .
We can also see in the above example that the search space explored by SP contains four states, namely, the initial state , , and the goal state. Meanwhile, under the EC algorithm, in state , the DTGs for and are not in each other’s dependency closures. This implies that in , EC expands either action or , but not both. Therefore, EC expands three states while SP expands four. This illustrates our conclusion in Theorem 6 that the SP algorithm cannot reduce the number of expanded states.
5 A New POR Algorithm Framework for Planning
We have developed a POR theory for planning and explained two previous POR algorithms using the theory. Now, based on the theory, we propose a new POR algorithm which is stronger than the previous EC algorithm.
Our theory shows in Theorem 3 that the condition for enabling POR reduction is strongly related to left commutativity of actions. In fact, constructing a stubborn set can be reduced to finding a left commutativity set. As we show in Theorem 5, the EC algorithm follows this idea. However, the basic unit of reduction in EC is DTG (i.e. either all actions in a DTG are expanded or none of them are), which is not necessary according to our theory. Based on this insight, we propose a new algorithm that operates with the granularity of actions instead of DTGs.
For a state , an action set is a landmark action set if and only if any valid path starting from to a goal state contains at least one action in .
For a SAS+ task, an action is supported by an action if and only if .
For a state , its action support graph (ASG) at is defined as a directed graph in which each vertex is an action, and there is an edge from to if and only if is not applicable in and is supported by .
The above definition of ASG is a direct extension of the definition of a causal graph. Instead of having domains as basic units, here we directly use actions as basic units.
For an action and a state , the action core of at , denoted by , is the set of actions that are in the transitive closure of in . The action core for a given set of actions is the union of action cores of every action in .
For a state , if an action is not applicable in and there is a valid path starting from whose last action is , then contains an action .
We prove this by induction on the length of .
In the base case where , we assume . Since is not applicable in , it must be supported by . Thus, . Suppose this lemma is true for , we prove the case for . For a valid path , again there exists an action before that supports . If is applicable in , then . Otherwise, we have a path with . Thus, by the induction assumption, contains at least one action in , which is a subset of , according to Definition 31 and 32. \endproof
Given a SAS+ planning task with as set of all actions , for a state and a set of action , the action closure of action set at , denoted by by , is a subset of and a super set of such that for any applicable action at and any action , and are conflict-free. In addition, if , and are conflict-free.
Intuitively, actions in can be executed without affecting the completeness and optimality of search. Specifically, because any applicable action in and any action not in will not assign different values to the same state variable, for action and action at , path will lead to the same state that does. Additionally, because and are conflict-free when , executing action will not affect the applicability of action in future. Therefore, actions in can be safely expanded first during the search, while actions outside it can be expanded later.
A simple procedure, shown in Algorithm 1, can be used to find the action closure for a given action set .
The proposed POR algorithm, called stubborn action core (SAC), works as follows. At any given state , the expansion set of state is determined by Algorithm 2.
There are various ways to find a landmark action set for a given state. Here we give one example that is used in our current implementation. To find a landmark action set at , we utilize the DTGs associated with the SAS+ formalism. We first find a transition set that includes all possible transitions in an unachieved goal-related DTG where is the current state of in . It is easy to see that all actions that mark transitions in this set make up a landmark action set, because is unachieved and at least one action starting from has to be performed in any solution plan.
There are also other ways to find a landmark action set. For instance, the pre-processor in the LAMA planner [Richter et al. (2008)] can be used to find landmark facts, and all actions that lead to these landmark facts also make up a landmark action set.
For a state , the expansion set defined by the SAC algorithm is a stubborn set at .
We first prove that our expansion set satisfies condition in Definition 11, namely, for any action , and actions , if is a valid path from , then is also a valid path, and leads to the same state that does.
To simplify this proof, we can treat action sequence as a “macro” action where an assignment in if and only if is in the precondition of some and is not in the effects of a previous action , and an assignment is in if and only if is in the effect set of some , and is not assigned to any value other than in the effects of later action . In the following proof, we use the macro action in place of the path .
To prove , we only need to prove that if is a valid path, then . According to Theorem 4, if and only if the following four propositions are true.
a) Action must be applicable in . We prove this by contradiction. Let , if is not applicable in , but applicable in , then supports . Since all effects of are from actions in the path , there exists an action such that supports . However, according to Definition 32, is in the transitive closure of in . According to our algorithm,