# Goal-constrained planning domain model verification of safety properties^{1}

^{1}

## Abstract

The verification of planning domain models is crucial to ensure the safety, integrity and correctness of planning-based automated systems. This task is usually performed using model checking techniques. However, unconstrained application of model checkers to verify planning domain models can result in false positives, i.e. counterexamples that are unreachable by a sound planner when using the domain under verification during a planning task. In this paper, we discuss the downside of unconstrained planning domain model verification. We then introduce the notion of a valid planning counterexample, and demonstrate how model checkers, as well as state trajectory constraints planning techniques, should be used to verify planning domain models so that invalid planning counterexamples are not returned.

[color=red]error \definechangesauthor[color=blue]new \definechangesauthor[color=magenta]anas \definechangesauthor[color=brown]myquote

## 1 Introduction

Planning and task scheduling techniques are increasingly applied to real world problems such as activity sequencing, constraint solving and resource management. These processes are implemented in planning-based automated systems which are already used in space missions [24, 6, 1], search and rescue [20], logistics [30] and many other domains. Since the failure of such systems could have catastrophic consequences, these applications are regarded as safety-critical. Therefore, verification methods that are robust, trustworthy and systematic are crucial to gain confidence in the safety, integrity and correctness of these systems.

The literature is rich with studies on verification of planning systems. For instance, Smith et al. [28] carried out scenario-based testing and model-based validation of the remote agent that controlled the Deep Space 1 mission. Another example is the verification of the safety of the autonomous science agent design that was deployed on the Earth Orbiter 1 spacecraft [7].

A typical planning system consists of a planning domain model, planning problem, planner, plan, executive, and monitor. Planners take as an input a domain model which describes application-specific states and actions, and a problem that specifies the planning goal and the initial state. From these inputs, a sequence of actions that can achieve the goal starting from the initial state is returned as plan. The plan is then executed by an executive to change the world state so that it matches the desired goal.

Our research focuses on the verification of planning domain models wrt. safety properties. Domain models provide the foundations for planning. They describe real-world actions by capturing their pre-conditions and effects. Due to modelling errors, a domain model might be inconsistent, incomplete, or inaccurate [4]. This could cause the planner to fail in finding a plan or to generate unrealistic plans that will fail to execute in the real world. Moreover, erroneous domain models could lead planners to produce unsafe plans that, when executed, could have catastrophic consequences in the real world.

This paper addresses the fact that the state-of-the-art verification methods for planning domain models are vulnerable to false positive counterexamples. In particular, unconstrained verification tasks might return counterexamples that are unreachable by planners when using the domain under verification (DUV) during a planning task. Such counterexamples can mislead designers to unnecessarily restrict domain models, thereby potentially blocking valid and possibly necessary behaviours. In addition, false positive counterexamples can lead verification engineers to overlook counterexamples that are reachable by planners. According to the Electronic Engineering Times, a leading technological news website in the electronics industry:

10pt10pt \sayWhen a design is under-constrained, illegal inputs can lead to the formal tool exploring illegal design states. The tool may report false bugs, resulting in the verification team spending time pursuing \saywild-goose chases. Under-constrained designs can also lead to a false sense of achieving the desired coverage. [13]

This is a well studied problem in the Verification and Validation (V&V) community. For instance, Nguyen et al. [25] mentioned that false counterexamples can be avoided by constraining the property with reachability information. However, the literature, e.g. Smith et al. [29], suggests that this aspect has been overlooked in the context of planning domain model verification. We discuss this further in Section 3. To address this oversight, we propose to employ planning goals as constraints during verification.

Thus, we introduce goal-constrained planning domain model verification, a concept transferred from V&V research that eliminates invalid planning counterexamples per se. We formally prove that goal-constrained planning domain model verification of safety properties is guaranteed to return only valid planning counterexamples, if and only if any exist. We also demonstrate two different ways to perform goal-constrained planning domain model verification, one using model checkers and the other using state trajectory constraints planning techniques. We illustrate our method using the Cave Diving planning domain model as an example [21]. Additionally, we perform empirical experiments to demonstrate the feasibility and investigate the behaviour of our approach using the Spin model checker [19] and the MIPS-XXL-IPC5 planner [11]. To the best of our knowledge, this work is the first to introduce the concept of goal-constrained planning domain model verification.

The rest of this paper is organised as follows. Section 2 contrasts the concepts presented here with related work. Section 3 informally discusses the problem of invalid planning counterexamples in planning domain model verification. A verification concept of planning domain models that avoids returning invalid planning counterexamples is presented in Section 4. Section 5 discusses the application of this concept on our example domain. Section 6 reports and discusses the experimental results. Finally, Section 7 concludes the paper and suggests future work.

## 2 Related Work

Closely related, but different, is the work by Albarghouthi et al. [2]. Their main objective is to treat verification as a planning task, whereas our aim is to demonstrate how model checkers and planners can be used for domain model verification. They proposed to perform system model verification using classical planners. To do this, they first translated the model of the system under verification into a planning domain model. Then, the negation of the safety property to be established is used as the goal for the planner, which is then consulted to find a plan that acts as counterexample for the given property. In our study, because our aim is to verify domain models against a given safety property with respect to a specific goal, we use state trajectory constraints to restrict counterexamples to identify plans that can achieve the planning goal while falsifying the safety property. In their work the negation of the safety property is used as the goal. Whereas, in our method, the negation of the safety property is represented as a state trajectory constraint and the goal is the given planning goal.

Raimondi et al. [27] also applied verification as planning to verify planning domain models, starting from LTL specifications [10]. This work fundamentally differs from our work. They tested the impact of individual atomic propositions on the validity of the overall verified property by translating the specification properties into trap formulas. However, their method does not consider the interaction between property testing and the original planning goal. Note that finding a planning constraint to exercise a specific atomic proposition is not enough to ensure the constraint itself would be exercised during the planning process. For example, a planning goal might be achieved through a state trajectory that does not exercise the hard constraint used to represent the tested property. Our work is mainly focused on investigating this interaction. Therefore, we use state trajectory constraints to guarantee the property is tested while achieving the planning goal.

Goldman et al. [17] also used classical planners for planning systems verification, but they examined verifying plans rather than domain models. They proposed an approach that uses classical planners to find counterexamples for a given planning problem and plan instance. Their work and ours are related in that both suggest performing planning verification for a specific planning problem rather than attempting unconstrained verification of a planning system. However, their work is limited to the verification of single plan instances, whereas our method verifies all potential plans that can be spun from a domain model for a specific goal.

Among others, the researchers in [26, 22, 29, 18, 5] used model checkers to verify planning domain models. They translated the respective domain models into the input language of the selected model checker. The model checker is then applied to verify the domain model wrt. a given specification property. Similarly, we also propose a method to verify domain models using model checkers. However, our method differs from the others in two aspects. First, in the way we define the planning domain model verification problem, and, second, in the way we use model checkers to perform verification. As explained in Section 4, we constrain the verification of planning domain models with a specific goal. In contrast, previous studies perform unconstrained verification of domain models, i.e. they leave the goal open. As discussed in Section 3, the unconstrained goal may cause the model checker to return counterexamples that are unreachable when a planner uses the DUV. On the other hand, when the goal is constrained for verification, then we show that the returned counterexamples, if any, are guaranteed to be reachable by any sound planner. The second difference is that, after the planning domain model is translated to the model checker’s input language, we augment the model transitions, introducing the negation of the goal as a new constraint, thereby forcing the model checker to terminate once the goal is reached. This modification prevents the model checker from returning counterexamples that falsify the given property after satisfying the goal; these are unreachable by planners.

## 3 Invalid counterexamples in planning domain model verification

Planning domain model verification aims to demonstrate that any produced plan satisfies a set of properties. To achieve this, formal planning domain model verification methods leave the planning goal open. This, we define as unconstrained verification of planning domain models, i.e. the verification is expected to hold for any potential goal. Unconstrained verification searches the domain model for a sequence of actions that can falsify the given property, regardless of any other conditions. In particular, whether or not a planner would consider this sequence to be a plan, is not taken into account. This is a critical oversight, because, when the domain model is used to solve a specific planning problem, the sequence of actions that constitutes such a counterexample might, in fact, be “pruned away” by the planner, if it does not satisfy the planning goal. Therefore, for a specific planning problem, counterexamples that do not achieve the planning goal are deemed unreachable counterexamples from the planner’s perspective. Hence, we consider them to be invalid planning counterexamples.

To illustrate this, we use a modified version of the microwave oven example, introduced in [10], as presented in Figure 1. A safety requirement would be that the domain model does not allow the generation of erroneous plans, in LTL , where is the LTL globally operator. Unconstrained verification will return as a counterexample that when applied to will produce which is an error state. However, when this model is used to find a plan that achieves the goal , this sequence will not be considered by the planner as it does not lead to a state that achieves the goal. Moreover, we observe that the valid plan does satisfy the property , i.e. is error-free. Thus, the sequence from to is an invalid planning counterexample; it does not achieve the goal, nor is it part of a valid plan towards the goal.

Counterexamples that are unreachable by planners, while searching for specific goals, exist in the literature. For example, Smith et al. [29] used the Spin model checker to verify whether a planning domain model would permit an automated planning system to select plans that would waste resources and therefore not meet the mission’s science goals. To express this requirement, they used “five data-producing activities must be scheduled by any returned plan” as a property for model checking. The automated system has two data-producing and two data-consuming activities, and a buffer that can hold four data blocks. The goal of the planner is to schedule five data-producing activity instances. The counterexample returned by the model checker represented a plan with the two data-consuming activities scheduled before four data-producing activities. This plan did not contain a fifth data-producing task, because the data buffer was full after four data-producing activities and the only two data-consuming tasks that would have freed space in the buffer, were scheduled at the beginning of the plan with no data in the buffer. Though the model checker found a counterexample to falsify the property, we argue that any sound planner would not generate such a plan, because it does not achieve the planning goal. As such, this counterexample would have been pruned during the planner’s goal search, and consequently, it would never have been returned as a plan, i.e. it is unreachable for the planner, yet reachable by a goal-ignorant model checker. For this reason it constitutes an invalid planning counterexample.

The problem with invalid planning counterexamples is that they mislead the designer to unnecessarily restrict the domain model in the process of removing them. Consequently, debugging is made harder and genuine counterexamples could potentially be introduced in the process. To overcome this, we observe that planning is performed for a specific goal. To exploit this observation for domain model verification, we propose to use the goal given to the planner as constraint to ensure that the counterexamples returned by a model checker, or other tools used in this context, falsify the given safety property while also achieving the planning goal. Thus, instead of performing unconstrained domain model verification, we propose goal-constrained verification of planning domain models. The details of this method are further explained in the next section.

## 4 Goal-constrained verification of planning domain models

Planning domain model verification covers different objectives, including the domain’s correctness, completeness, robustness, effectiveness and safety. The intent of safety verification in this context is to verify that any plan produced from the DUV will satisfy a given safety property. In other words, a domain is considered safe if the domain is guaranteed only to produce plans that satisfy the given safety property when used by a sound planner. This verification task can be performed using advanced search algorithms, such as model checkers or classical planners, to find a valid counterexample for the given safety property.

We define a valid planning counterexample to be a sequence of actions that, firstly, can falsify the given safety property, secondly, can achieve the planning goal from the given initial state, and, thirdly, none of the sub-sequences of the counterexample can achieve the goal.

Formally, this is defined as follows: Let the planning problem, , be a tuple, , where is the domain model that describes the set of all available actions, , is the initial state and is the desired goal. The plan is a solution to the planning problem , defined as a sequence of actions, . These actions are chosen from , , such that . In other words, when is applied to the initial state it yields a sequence of states , where the last state satisfies the planning goal , . We say a plan satisfies a property , , if the sequence of states , generated by the plan , satisfies the property , .

Furthermore, as defined in [16], we call a plan a redundant plan, if contains a subsequence, , , that achieves the goal .

Definition 1: A valid planning counterexample for a safety property, , of a planning problem is a non-redundant plan, , that falsifies the safety property, .

Plans are required to be non-redundant in the definition of valid planning counterexamples to exclude any plans that are enriched with action sequences which are unnecessary to achieve the planning goal but required to falsify the given safety property. Since sound planners can produce valid plans that have redundant subsequences, the scope of our method is limited to non-redundant planners i.e. planners that are guaranteed to produce non-redundant plans.

To ensure the returned counterexamples are valid, we constrain the verification problem with a goal, and we exclude any counterexample that is a redundant plan. More formally, the verification problem associated with planning task is defined as the tuple . Where is a formal safety property extracted from a given specification and required to hold over all valid paths that achieve the goal from the initial state .

It is important to highlight that although constraining the domain model verification with planning goals limits the verification results to planning problem instances, this is a necessary measure to obtain a verification method that is robust against invalid planning counterexamples. We note that none of the current planning domain model verification methods verify planning domain models in their generality. Firstly, current methods require a grounded model, which represents a finite set of planning problems, in contrast to the infinite set of planning problems that non-ground domain models represent. Secondly, all methods need a specific initial state to be able to perform the verification tasks. In our approach, by using the planning goal as one further constraint, we perform verification of single planning instances. This restriction is the cost associated with delivering a verification method that is robust against invalid planning counterexamples.

In this section, we introduced and formally defined the concept of goal-constrained verification of planning domain models. In the following subsections, we demonstrate how this concept can be realised using model checkers and state trajectory constraints planning techniques.

### 4.1 Goal-constrained planning domain model verification using model checkers

Model checkers verify safety properties by searching for counterexamples that falsify those properties. In the case of planning applications, any sequence of actions that does not achieve the given goal, will be pruned by any sound planner. Therefore, in the verification of planning problems, any counterexample that does not achieve the goal of the planning problem should be eliminated on the basis that this counterexample is unreachable by the planner.

To force model checkers to only return valid planning counterexamples, the safety property is first negated and then joined with the planning goal in a conjunction. This conjunction is then negated and supplied to the model checker as an input property. The final property requires the model checker to find a counterexample that both, falsifies the safety property and satisfies the planning goal. Note that, unlike Def. 1, this permits sequences that falsify the property after satisfying the goal. However, once the goal is achieved, planners terminate the search, thus rendering such sequences unreachable. To eliminate these sequences, model transitions should be augmented with an additional guard, representing the negation of the goal, to restrict all transitions once the goal is achieved. With this modification, the model checker is forced to return counterexamples that falsify the safety property before achieving the goal, because once the goal is satisfied no further transitions will be permitted.

For a verification problem, , we first check whether the planning goal is achievable, then we translate the domain model into the model checker’s input language, obtaining the model that incorporates the initial state . Then, the model is modified to by augmenting the guards of all transitions with the negation of the goal condition. From the definition of , we can derive two properties: First, P1: all plans generated from are also plans that can be generated from . Proof: any sequence of transitions from that ends with a transition that achieves the goal is also a sequence of transitions from . These sequences represent valid plans as they terminate with a transition that achieves the goal. Therefore, all plans generated from are also plans in . Second, P2: any valid counterexample for is also a valid counterexample for . Proof: as is a more constrained version of , the set of all legal transition of , , is contained in the set of all legal transitions of , , i.e. . It follows that any valid counterexample in is also in .

The model checker is then applied to the verification problem , where is defined using , the LTL eventually operator [10], as follows:

(1) |

There are two possible outcomes. If the model checker returns a counterexample :

(2) | |||

(3) |

From the definition of the LTL eventually operator :

(4) | ||||

(5) |

It follows that there is at least one sequence that falsifies the property , and there is a state in that sequence which satisfies the goal , according to (4) and (5). In addition to that, in the sequence , is guaranteed to be falsified before is satisfied. This is because and is constrained to not produce any transitions after achieving the goal. Thus, the counterexample is a valid planning counterexample in for the original safety property as per Def. 1. Furthermore, from (P2), is also a counterexample in . This proves that the DUV does not satisfy the safety property with respect to the goal g.

The other potential outcome is that the model checker fails to find a counterexample, then :

(6) | |||

(7) | |||

(8) | |||

(9) | |||

(10) | |||

(11) |

Where is the LTL always operator. This means is always true for any sequence of actions in that achieves the goal, i.e. for all possible plans. Since from (P1) all plans generated from are also plans in , and from (11) all plans in are safe, we can conclude that all plans in are safe. This proves that the DUV satisfies the original property with respect to the goal.

### 4.2 Goal-constrained planning domain model verification using planning techniques

Domain models can be verified to only produce valid plans, in terms of satisfying a given safety property, for a specific goal using planners that use breadth first search. This is achieved by consulting the planner over the DUV to produce a plan that can satisfy the goal and the negation of the property. If the domain model permits producing plans that, along with achieving the goal, contradict the safety property, then an unsafe plan can be found. Thus, the returned plan is a counterexample that demonstrates that the safety property does not hold. On the other hand, if the domain model does not permit the generation of plans that can satisfy the negation of the safety property while achieving the goal, then the planner will fail. Thus, the property holds in any plan produced for the given goal. The following subsection provides a description of how state trajectory constraints can be used to verify planning domain models for a specific goal.

#### Goal-constrained planning domain verification using planning techniques with state trajectory constraints

The PDDL3.0 state trajectory constraints [14] can be used to perform planning domain model verification. First, the negation of the given property is expressed using PDDL3.0 modal operators and embedded in the original domain model as state trajectory constraint. The modified model is then used by a planner, as described earlier, to perform the verification.

or a verification problem, , the safety property, , is negated and expressed in terms of PDDL3.0 modal operators as shown in [15]. The result is added as a state trajectory constraint to the original domain model.

Using the algorithm proposed in [11], the new model is transformed into a PDDL2 compatible version. This is performed by first translating the state trajectory constraint into a non-deterministic finite state automaton (NFA). The NFA which can capture property violations is then embodied in the model in terms of additional predicates and conditional effects. These additions observe the behaviour of the automaton that represents the constraint and stop goal satisfaction unless the constraint is satisfied too.

This yields a new planning problem, , where are modified instances of that are supplemented with the additional predicates and conditional effects of the automaton that represents the introduced constraint. Let the set of legal sequences of actions that can be generated from be and from be . Note that is an augmented version of and the additions to do not affect the number of original operators, their preconditions, or their effects. Furthermore, the additional conditional effects do not affect the original predicates. Hence, .

Then, a planner is applied to with two possible outcomes. If the planner finds a plan then: . Since the satisfaction of implies both, the satisfaction of the original goal at the last state of the sequence , and the satisfaction of the state trajectory constraint (the negation of the safety property) by the sequence : . Furthermore, since :

(12) |

Furthermore, from (12) it follows that , confirming that there is at least one sequence of actions from that achieves the goal while not respecting the safety property. Therefore, this sequence is a valid planning counterexample for that property as per Def. 1. Hence, the DUV does not satisfy the property wrt. the planning goal. Alternatively, if the planner fails to find a plan, then, as opposed to (12), we have:

(13) | |||

(14) | |||

(15) | |||

(16) | |||

(17) |

Hence, any sequence of actions from that achieves the goal also satisfies the safety property. Therefore, the property holds for the planning domain model wrt. the given goal.

## 5 Example

In this section, we discuss how goal-constrained planning domain verification can verify safety properties using both the Spin model checker and the MIPS-XXL planner with breadth first search option. We perform constrained and unconstrained verification tasks to show how unlike the latter task our method does not return unreachable counterexamples. As an example, we consider the classical cave diving planning domain taken from the Satisficing Track of the IPC-2014 [21]. The problem consists of an underwater cave system with a finite number of partially interconnected locations. Divers can enter the cave from a specific location, entrance, and swim from one location to an adjacent connected one. They can hold up to four oxygen tanks and they consume one for every swim and take-photo action. Only one diver can be in the cave at a time. Finally, divers have to perform a decompression manoeuvre to go to the surface and this can be done only at the entrance. Additionally, divers can drop tanks in or take tanks from any location if they hold at least one tank or there is one tank available at the location, respectively.

The planning goals of this domain, as provided in the problem files in the IPC-2014, consist of two parts. The first part dictates the required underwater location of which the photo is to be taken (we call it mission target) and the second part which mandates the divers should return to the surface after completing the mission (we call it safety target).

A critical safety property, , is that divers should not drown i.e. they should not be in an underwater location, other than the entrance, where neither the divers nor the location has one full oxygen tank at least.

To enable the planner and the model checkers to explore the entire state space, we considered only one diver and we modified some actions to enable the diver to go back into the water after a dive.
These modifications are further explained in the commented simplified planning domain model PDDL file which is provided along with the tasks problem PDDL and Promela files online ^{4}

First, we translated the planning domain model from PDDL to Promela. Thus, the verification results using the translated model only hold provided that the translation is valid. The verification of the translation is outside the scope and focus of this paper and left for future work.

In this example, the chosen planning goal is to have a photo of the first location, , and to get the diver outside the water. The verification tasks are:

1 - Unconstrained verification with only the safety property : Both Spin and MIPS-XXL found a counterexample could be prepare-a-tank, enter-water, swim. Indeed, this counterexample leads the diver to a drowning state. At the end of this sequence, the diver will have consumed their oxygen tank and will be in underwater location . This is not the entrance, so they can not surface and they do not have an oxygen tank to swim back to the entrance. However, this is not a plan because it does not achieve any useful goal. Therefore, it will not be produced by any sound planner when it is used in a practical scenario (taking a photo of any location).

2- Verification with safety property and incomplete goal (mission target only): Both Spin and MIPS-XXL returned prepare-tank, prepare-tank, enter-water, swim, take-photo. This counterexample achieves the goal and violates the property. However, without the safety part of the goal, it would be possible to generate plans that imply divers should swim to an underwater location and take a photo of it without requiring the divers to return to the surface. These kind of plans are illegal as they do not respect the safety part of the goal. Therefore, such sequences are unreachable counterexamples i.e. will never be produced by any sound planner while planning for a legal goal.

3- Verification using Spin with both safety property and proper goal but without the augmented model : Spin found a counterexample prepare-tank, prepare-tank, prepare-tank, prepare-tank, enter-water, swim, take-photo, swim, decompress, enter water, swim. This counterexample achieves the goal and violates the safety property but only after the goal is achieved. Therefore, this is also an unreachable counterexample because a non-redundant planner will terminate after achieving the goal and any counterexample that violates the property after achieving the goal will not be returned. Hence, it is unreachable.

4- Goal-constrained planning domain verification, as presented in this paper. The result was: No plan is returned by the planner MIPS-XXL with complete exploration and no counterexample is returned by Spin with exhaustive verification mode. This means the planning domain model has no provision of producing a plan that can violate the safety property before achieving the goal, i.e. this model is safe with respect to the given property and goal pair.

Though the counterexamples returned by the incomplete verification tasks number one, two and three are obviously unreachable and should not misguide the designers to overcomplicate the model, in a real world sized application such invalid planning counterexamples can be critical and much more difficult to recognise and avoid. We expect that our proposed concept can save practitioners a huge amount of person-hours trying to alter planning domain models for behaviours that their planners will never experience in practice.

## 6 Experiments

To evaluate the feasibility and the behaviour of our approach, we designed two experiments to investigate how constraining the verification with the planning goal impacts the verification cost. This cost is measured by the number of states evaluated by the verification tools to confirm whether or not a counterexample exists. We consider the number of evaluated states to be an objective measure that is repeatable on any hardware platform, as opposed to measuring execution time. The scripts to repeat the experiments along with the data are available online ^{5}

### 6.1 Experiment setup

The first experiment focuses on comparing the cost of both unconstrained and goal-constrained verification tasks while varying the safety property violation depth in order to explore situations with and without a valid planning counterexample. The safety property violation depth is hereafter termed “error depth”. We synthesised a fully reachable model that consists of one critical and three independent variables, each with a range from 0 to 31. Each variable has two actions, one to increase and one to decrease its value by one. The goal is achieved when the critical variable value reaches 14. The error condition is changed from the value of the critical variable being 1 to 31. The range of the variables is chosen as 31 to expose any possible trends. Consequently, the number of variables is set to four to allow the model to be explored within a memory limit of 10 GB.

The second experiment investigates the effect of the early termination of the verification process, after achieving the goal, on the cost of verification tasks while increasing the depth of the planning goal. The model used in this experiment has one critical and four independent variables, each with a range from 0 to 15. Variables have two actions as in the previous model. This time, there is no error in the model and the goal condition is varied from critical value 1 to 14. The variables’ range is reduced to 15 to permit increasing the number of variables to five while keeping the required memory within the 10 GB constraint. Both experiments are performed using the Spin model checker and the MIPS-XXL-IPC5 planner with breadth first search option.

### 6.2 Results and discussion

The results of applying our approach in comparison to the unconstrained verification methods are as follows. The states evaluated by Spin and MIPS-XXL-IPC5 are presented in Figure 2. In the first experiment, our approach showed broadly similar behaviour when it was applied using Spin in Figure 1(a) and MIPS-XXL-IPC5 in Figure 1(c). Note that the aim of these experiments is to showcase the feasibility of using our approach and to explore its behaviour, rather than comparing the performance of the verification tools. We believe such comparison depends heavily on the model under verification, for more insights the reader is referred to [12, 2, 23]. Ergo, we focus our discussion on the results obtained from model checking (Figure 1(a)).

The vertical line in Figure 1(a) marks the goal level (critical value of 14) and splits the graph into two areas. On the right-hand side, the errors are deeper than the goal, i.e. the errors can only be reached after the goal is achieved. Thus, these errors are regarded as invalid planning counterexamples by our method as per Def. 1. Therefore, unlike unconstrained verification approaches, our method continues its exhaustive search to confirm the non-existence of any valid planning counterexamples. Thus, our method evaluates the maximum number of states for these verification tasks as shown in Figure 1(a)-(1).

On the left-hand side, the errors are shallower than the goal, i.e. the errors are reachable before achieving the goal. Hence, these errors are considered as valid planning counterexamples according to Def. 1. For the same verification task, Figure 1(a)-(2) shows that our method assesses more states than the unconstrained approaches as depicted in Figure 1(a)-(3). This is due to the fact that after finding an error, a safety property violation, our method keeps exploring and searching for a path to the planning goal while traditional methods terminate as soon as an error is found. However, the short counterexamples returned by these methods may or may not be valid planning counterexamples, whereas our method is guaranteed to return valid planning counterexamples only. The extra states visited by our approach are the cost associated with this guarantee.

In Figure 1(a)-(2) (and in Figure 1(c)-(2), respectively), we notice a drop in the number of evaluated states by our method as the error depth gets closer to the goal depth. This is attributed to the fact that the safety property (state trajectory constraint) in the model checker (planner) is translated into an automaton. This automaton influences the state space exploration during the verification process. The automaton has a transition that is activated when an error is reached. Therefore, if an error is reached in an early stage in the verification, the error transition is triggered and the verification tool is forced to explore more states than if the error transition was triggered closer to the goal. Once both the error and the goal transitions are triggered, then the automaton reaches an acceptance state. Thus, the search terminates with a valid planning counterexample.

In the second experiment, Figure 1(b) shows that when using Spin with a planing domain model with no counterexample, our approach explores fewer states than unconstrained verification methods. This reduction in the verification cost is caused by the early termination of the verification search once the goal is achieved and no error could be found at shallower depths. This advantage of the goal-constrained verification approach comes at the cost of limiting the verification results to a single planning goal. Additionally, it is observed that the number of evaluated states by the goal-constrained verification method rises as an effect of the increasing goal depth. This is caused by the expansion of the part of the model that needs to be checked as the goal depth increases. On the other hand, the unconstrained methods visit a constant number of states as they are independent from the goal depth by definition.

Another interesting observation when using the planner in Figure 1(d) is that our method explores more states than the unconstrained verification approaches when the planning goal depth is more than three. This behaviour is caused by the interaction of two factors. In our approach, MIPS-XXL-IPC5 translates the state trajectory constraint to an automaton which is then incorporated in the planning domain model. Thus, the model used in our method is more complicated than the model used by the unconstrained approaches were state trajectory constraints are not used. After a certain depth of the planning goal, the extra states evaluated as a result of the additional state trajectory constraint in our method outweigh the saving from the early termination of the verification process.

## 7 Conclusions and future work

The verification of planning domain models is essential to guarantee the safety of planning-based automated systems. Invalid planning counterexamples returned by unconstrained planning domain model verification techniques undermine the verification results. They can mislead system designers to perform unnecessary remediations that can be prone to errors. In this paper, we introduced goal-constrained verification, a new concept to address this problem, which restricts the verification task to a specific goal. This limits counterexamples to those practically reachable by a planner that is tasked with achieving the goal. Consequently, our method verifies the domain model only wrt. a specific goal. We consider this to be an acceptable limitation, given that planners also operate on this basis. Since we have excluded redundant plans from our definition of valid planning counterexamples, the verification results of our method are limited to the application of non-redundant planners. A weaker form of non-redundancy will be considered in future work.

We have demonstrated how model checkers and planning techniques can be used to perform goal-constrained planning domain model verification. Our experimental evaluation confirms the feasibility of our method and presents its benefits and limitations compared to unconstrained verification methods. The proposed technique is simple which makes it readily usable in practice. It is also effective as formally proven in the paper.

We note that a grounded planning domain model defines a finite set of planning problems. For our method to completely verify such a set, it has to be repeatedly applied to every planning goal. In practice, we have noticed that only a small number of predicates is typically used to specify the planning goals. Thus, we expect that in applications where the set of planning goals is relatively small, our method could exhaustively verify the complete set of planning problems, especially if the tools take advantage of the latest optimisation techniques to reduce computational complexity.

## Acknowledgements

The authors are grateful to Derek Long for his useful comments.

### Footnotes

- thanks: Supported by EPSRC grant EP/P510427/1 in collaboration with Schlumberger.
- email: {first.last}@bristol.ac.uk
- email: {first.last}@bristol.ac.uk
- https://github.com/Anas-Shrinah/Goal-constrained-planning-domain-model-verification-repository
- https://github.com/Anas-Shrinah/Goal-constrained-planning-domain-model-verification-repository

### References

- M. Ai-Chang, J. Bresina, L. Charest, A. Chase, J. C. . Hsu, A. Jonsson, B. Kanefsky, P. Morris, Kanna Rajan, J. Yglesias, B. G. Chafin, W. C. Dias, and P. F. Maldague. Mapgen: mixed-initiative planning and scheduling for the mars exploration rover mission. IEEE Intelligent Systems, 19(1):8–12, Jan 2004.
- Aws Albarghouthi, Jorge A. Baier, and Sheila A. McIlraith. On the use of planning technology for verification. In In VVPSâ09. Proceedings of the ICAPS Workshop on Verification & Validation of Planning & Scheduling Systems, 2009.
- Gerd Behrmann, Kim G. Larsen, Henrik R. Andersen, Henrik Hulgaard, and Jørn Lind-Nielsen. Verification of hierarchical state/event systems using reusability and compositionality. In W. Rance Cleaveland, editor, Tools and Algorithms for the Construction and Analysis of Systems, pages 163–177, Berlin, Heidelberg, 1999. Springer Berlin Heidelberg.
- Saddek Bensalem, Klaus Havelund, and Andrea Orlandini. Verification and validation meet planning and scheduling. International Journal on Software Tools for Technology Transfer, 16(1):1–12, Feb 2014.
- Amedeo Cesta, Alberto Finzi, Simone Fratini, Andrea Orlandini, and Enrico Tronci. Validation and verification issues in a timeline-based planning system. The Knowledge Engineering Review, 25(3):299–318, 2010.
- S. Chien, R. Sherwood, D. Tran, B. Cichy, G. Rabideau, R. Castano, A. Davies, R. Lee, D. Mandl, S. Frye, B. Trout, J. Hengemihle, J. D’Agostino, S. Shulman, S. Ungar, T. Brakke, D. Boyer, J. Van Gaasbeck, R. Greeley, T. Doggett, V. Baker, J. Dohm, and F. Ip. The EO-1 autonomous science agent. In Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004., pages 420–427, July 2004.
- Benjamin Cichy, Steve Chien, Gregg Rabideau, and Daniel Tran. Validating the autonomous EO-1 science agent. 2004.
- Edmund M Clarke, Orna Grumberg, and David E Long. Model checking and abstraction. ACM transactions on Programming Languages and Systems (TOPLAS), 16(5):1512–1542, 1994.
- Edmund M Clarke, E Allen Emerson, Somesh Jha, and A Prasad Sistla. Symmetry reductions in model checking. In International Conference on Computer Aided Verification, pages 147–158. Springer, 1998.
- Edmund M Clarke, Orna Grumberg, and Doron Peled. Model checking. MIT press, 1999.
- Stefan Edelkamp, Shahid Jabbar, and Mohammed Nazih. Costoptimal planning with constraints and preferences in large state spaces. In International Conference on Automated Planning and Scheduling (ICAPS) Workshop on Preferences and Soft Constraints in Planning, pages 38–45, 2006.
- Stefan Edelkamp. Limits and possibilities of PDDL for model checking software. Edelkamp, & Hoffmann (Edelkamp & Hoffmann, 2003), 2003.
- EE Times. Formal verification with constraints â it doesn’t have to be like tightrope walking. https://www.eetimes.com/document.asp?doc_id=1277001, 2010. [Online; accessed 06-November-2019].
- Alfonso Gerevini and Derek Long. Preferences and soft constraints in PDDL3. In ICAPS workshop on planning with preferences and soft constraints, pages 46–53, 2006.
- Alfonso E Gerevini, Patrik Haslum, Derek Long, Alessandro Saetti, and Yannis Dimopoulos. Deterministic planning in the fifth international planning competition: PDDL3 and experimental evaluation of the planners. Artificial Intelligence, 173(5-6):619–668, 2009.
- Malik Ghallab, Dana Nau, and Paolo Traverso. Automated Planning: theory and practice. Elsevier, 2004.
- Robert P Goldman, Ugur Kuter, and A Schneider. Using classical planners for plan verification and counterexample generation. In Proceedings of AAAI Workshop on Problem Solving Using Classical Planning. To appear, 2012.
- Klaus Havelund, Alex Groce, Gerard Holzmann, Rajeev Joshi, and Margaret Smith. Automated testing of planning models. In International Workshop on Model Checking and Artificial Intelligence, pages 90–105. Springer, 2008.
- Gerard J Holzmann. The SPIN model checker: Primer and reference manual, volume 1003. Addison-Wesley Reading, 2004.
- Version Hugh, Hugh Cottam, Nigel Shadbolt, and John Kingston. Knowledge level planning in the search and rescue domain. In In Research and Development in Expert Systems XII, proceedings of BCS Expert Systems’95, pages 309–326. SGES Publications, 1995.
- IPC. International planning competition 2014 web site. https://helios.hud.ac.uk/scommv/IPC-14/domains.html, 2014. [Online; accessed 15-July-2019].
- Lina Khatib, Nicola Muscettola, and Klaus Havelund. Verification of plan models using UPPAAL. In International Workshop on Formal Approaches to Agent-Based Systems, pages 114–122. Springer, 2000.
- Yi Li, Jing Sun, Jin Song Dong, Yang Liu, and Jun Sun. Planning as model checking tasks. In 2012 35th Annual IEEE Software Engineering Workshop, pages 177–186. IEEE, 2012.
- Nicola Muscettola, P.Pandurang Nayak, Barney Pell, and Brian C. Williams. Remote agent: to boldly go where no AI system has gone before. Artificial Intelligence, 103(1):5 – 47, 1998. Artificial Intelligence 40 years later.
- Minh D Nguyen, Max Thalmaier, Markus Wedler, Jörg Bormann, Dominik Stoffel, and Wolfgang Kunz. Unbounded protocol compliance verification using interval property checking with invariants. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27(11):2068–2082, 2008.
- John Penix, Charles Pecheur, and Klaus Havelund. Using model checking to validate AI planner domain models. In Proceedings of the 23rd Annual Software Engineering Workshop, NASA Goddard, 1998.
- Franco Raimondi, Charles Pecheur, and Guillaume Brat. PDVer, a tool to verify PDDL planning domains. 2009.
- B. Smith, W. Millar, J. Dunphy, Yu-Wen Tung, P. Nayak, E. Gamble, and M. Clark. Validation and verification of the remote agent for spacecraft autonomy. In 1999 IEEE Aerospace Conference. Proceedings (Cat. No.99TH8403), volume 1, pages 449–468 vol.1, March 1999.
- Margaret H Smith, Gerard J Holzmann, Gordon C Cucullu, and BD Smith. Model checking autonomous planners: Even the best laid plans must be verified. In Aerospace Conference, 2005 IEEE, pages 1–11. IEEE, 2005.
- Austin Tate, Brian Drabble, and Jeff Dalton. O-Plan: a knowledge-based planner and its application to logistics. University of Edinburgh, Artificial Intelligence Applications Institute, 1996.