Negotiating the Probabilistic Satisfaction of Temporal Logic Motion Specifications
We propose a human-supervised control synthesis method for a stochastic Dubins vehicle such that the probability of satisfying a specification given as a formula in a fragment of Probabilistic Computational Tree Logic (PCTL) over a set of environmental properties is maximized. Under some mild assumptions, we construct a finite approximation for the motion of the vehicle in the form of a tree-structured Markov Decision Process (MDP). We introduce an efficient algorithm, which exploits the tree structure of the MDP, for synthesizing a control policy that maximizes the probability of satisfaction. For the proposed PCTL fragment, we define the specification update rules that guarantee the increase (or decrease) of the satisfaction probability. We introduce an incremental algorithm for synthesizing an updated MDP control policy that reuses the initial solution. The initial specification can be updated, using the rules, until the supervisor is satisfied with both the updated specification and the corresponding satisfaction probability. We propose an offline and an online application of this method.
Temporal logics, such as Linear Temporal Logic (LTL) and Computational Tree Logic (CTL), have been recently employed to express complex robot behaviors such as “go to region A and avoid region B unless regions C or D are visited” (see, for example, [KGFP07], [KF08], [KB08b], [WTM09], [BKV10]).
In order to use existing model checking and automata game tools for motion planning (see [BK08]), many of the above-mentioned works rely on the assumption that the motion of the vehicle in the environment can be modeled as a finite system [CGP99] that is either deterministic [DLB12], nondeterministic [KB08a], or probabilistic ([LAB12]). If a system is probabilistic, probabilistic temporal logics, such as Probabilistic CTL (PCTL) and Probabilistic LTL (PLTL), can be used for motion planning and control. In particular, given a robot specification expressed as a probabilistic temporal logic formula, probabilistic model checking and automata game techniques can be adapted to synthesize control policies that maximize the probability that the robot satisfies the specification ([LAB12], [CB12]).
However, in many complex tasks, it is critically important to keep humans in the loop and engaged in the overall decision-making process. For example, during deployment, by using its local sensors, a robot might discover that some environmental properties have changed since the initial computation of the control strategy. As a result, the satisfaction probability may decrease, and the human operator should be asked whether the probability is satisfying. Alternatively, the user can change the specification according to the new environmental properties to bring the satisfaction probability over a desired threshold. Thus, it is of great interest to investigate how humans and control synthesis algorithms can best jointly contribute to decision-making.
To answer this question, we propose a theoretical framework for a human-supervised control synthesis method. In this framework, the supervisor is relieved of low-level tasking and only specifies an initial robot specification and decides whether or not to deploy the vehicle, based on a given specification and the corresponding satisfaction probability. The control synthesis part deals with generating control polices and the corresponding satisfaction probabilities as well as proposing updated motion specifications, to the supervisor, guaranteed to increase (or decrease) the satisfaction probability.
We focus on controlling a stochastic version of a Dubins vehicle such that the probability of satisfying a specification given as a formula in a fragment of PCTL over a set properties at the regions in the environment is maximized. We assume that the vehicle can determine its precise initial position in a known map of the environment. However, inspired by practical applications, we assume that the vehicle is equipped with noisy actuators and, during its motion in the environment, it can only measure its angular velocity using a limited accuracy gyroscope. We extend our approach presented in [CB12] to construct a finite abstraction of the motion of the vehicle in the environment in the form of a tree-structured Markov Decision Process (MDP). For the proposed PCTL fragment, which is rich enough to express complex motion specifications, we introduce the specification update rules that guarantee the increase (or decrease) of the satisfaction probability.
We introduce two algorithms for synthesizing MDP control policies. The first provides an initial policy and the corresponding satisfaction probability and the second is used for obtaining an updated solution. In general, given an MDP and a PCTL formula, solving a synthesis problem requires solving a Linear Programing (LP) problem (see [BK08, LAB12]). By exploiting the special tree structure of the MDP, obtained through the abstraction process, as well as the structure of the PCTL fragment, we show that our algorithms produce the optimal solution in a fast and efficient manner without solving an LP. Moreover, the second algorithm produces an updated optimal solution by reusing the initial solution. Once the MDP control policy is obtained, by establishing a mapping between the states of the MDP and sequences of measurements obtained from the gyroscope, the policy is mapped to a vehicle feedback control strategy. We propose an offline and an online application of the method and we illustrate the method with simulations.
The work presented in this paper is, to the best of our knowledge, novel. In [Fai11] the authors introduce the problem of automatic formula revision for LTL motion planning specifications. Namely, if a specification can not be satisfied on a particular environment, the framework returns information to the user regarding how the specification can be updated so it can become satisfiable. The presented work addresses a different but related problem; the problem of automatic formula revision for PCTL motion planning specifications. Additionally, our framework allows for noisy sensors and actuators and for environmental changes during the deployment. [JKG12, GKP11] address the problem of probabilistic satisfaction of specifications for robotic applications. In [JKG12] noisy sensors are assumed and in [GKP11] the probabilities arise from the way the car-like robot is abstracted to a finite state representation. In both cases the probability with which a temporal logic specification is satisfied is calculated. These methods differ from our work since they assume perfect actuators, whereas in our case, we relax this assumption.
The remainder of this paper is organized as follows. In Sec. II, we introduce the necessary notation and review some preliminary results. We formulate the problem and outline the approach in Sec. III. The construction of the MDP model is described in Sec. IV. In Sec. V we propose two algorithms, one for generating an initial MDP control policy and the other for generating an updated MDP control policy. Case studies illustrating our method are presented in Sec. VI.
In this section, by following the standard notation for Markov Decision Processes (MDP) [BK08], we introduce a tree-structured MDP and give an informal introduction to Probabilistic Computation Tree Logic (PCTL).
Definition 1 (Tree-Structured MDP)
A tree-structured MDP is a tuple , where is a finite set of states; is the initial state; is a finite set of actions; is a function specifying the enabled actions at a state ; is a transition probability function such that 1) for all states and actions : , 2) for all actions and , , and 3) for all states there exists exactly one stateaction pair , s.t. ; is the set of propositions; and is a function that assigns some propositions in to each state of .
In other words in a tree-structured MDP, each state has only one incoming transition, i.e., there are no cycles. A path through a tree-structured MDP is a sequence of states that satisfies the transition probability of the MDP: . denotes the set of all finite paths.
Definition 2 (MDP Control Policy)
A control policy of an MDP is a function that specifies the next action to be applied after every path.
Informally, Probabilistic Computational Tree Logic (PCTL) is a probabilistic extension of Computation Tree Logic (CTL) that includes the probabilistic operator . Formulas of PCTL are constructed by connecting propositions from a set using Boolean operators ( (negation), (conjunction), and (implication)), temporal operators ( (next), (until)), and the probabilistic operator . For example, formula asks for the maximum probability of reaching the states of an MDP satisfying , without passing through states satisfying . The more complex formula ] asks for the maximum probability of eventually visiting states satisfying and then with probability greater than states satisfying , while always avoiding states satisfying . Probabilistic model-checking tools, such as PRISM (see [KNP04]), can be used to find these probabilities. Simple adaptations of the model checking algorithms, such as the one presented in [LAB12], can be used to find the corresponding control policies.
Iii Problem Formulation
In this paper, we develop a human-supervised control synthesis method, with an offline and online phase. In the offline phase (i.e., before the deployment) the supervisor gives an initial specification and the control synthesis algorithm returns the initial satisfaction probability. If the supervisor is not satisfied with the satisfaction probability, the system generates a set of specification relaxations that guarantee an increase in the satisfaction probability. The offline phase ends when the supervisor agrees with a specification and the corresponding satisfaction probability.
In the online phase (i.e., during the deployment), events occurring in the environment can affect the satisfaction probability. If such an event occurs, the system returns the updated control policy, and if necessary (i.e., if the probability decreases) proposes an updated specification that will increase the satisfaction probability. At the end of a negotiation process similar to the one described above, the supervisor agrees with one of the options recommended by the system. While the robot is stopped during the negotiation process, it is necessary that the time required for recomputing the policies be short.
Iii-a Models and specifications
Motion model: A Dubins vehicle ([Dub57]) is a unicycle with constant forward speed and bounded turning radius moving in a plane. In this paper, we consider a stochastic version of a Dubins vehicle, which captures actuator noise:
where and are the position and orientation of the vehicle in a world frame, is the control input (angular velocity before being corrupted by noise), is the control constraint set, and is a random variable modeling the actuator noise. For simplicity, we assume that is uniformly distributed on the bounded interval . However, our approach works for any continuous probability distribution supported on a bounded interval. The forward speed is normalized to . We denote the state of the system by .
Motivated by the fact that the optimal Dubins paths use only three inputs ([Dub57]), we assume , where is the minimum turn radius. We define
as the set of applied control inputs, i.e, the set of angular velocities that are applied to the system in the presence of noise. We assume that time is uniformly discretized (partitioned) into stages (intervals) of length , where stage is from to . The duration of the motion is finite and it is denoted by .111Since PCTL has infinite time semantics, we implicitly assume after the system remains in the state achieved at . We denote the control input and the applied control input at stage as and , respectively.
We assume that the noise is piece-wise constant, i.e, it can only change at the beginning of a stage. This assumption is motivated by practical applications, in which a servo motor is used as an actuator for the turning angle (see e.g., [Maz04]). This implies that the applied control is also piece-wise constant, i.e., , , is constant over each stage.
Sensing model: We assume that the vehicle is equipped with only one sensor, which is a limited accuracy gyroscope. At stage , the gyroscope returns the measured interval containing the applied control input. Motivated by practical applications, we assume that the measurement resolution of the gyroscope, i.e., the length of , is constant, and we denote it by . For simplicity of presentation, we also assume that , for some . Then, can be partitioned222Throughout the paper, we relax the notion of a partition by allowing the endpoints of the intervals to overlap. into intervals: , . We denote the set of all noise intervals as . At stage , if the applied control input is , the gyroscope will return the measured interval where . Since is uniformly distributed:
Environment model and specification: The vehicle moves in a static environment in which regions of interest are present. Let be a finite set of propositions satisfied at the regions in the environment. Let be a map such that , , is the set of all positions in satisfying all and only propositions . Inspired by a realistic scenario of an indoor vehicle leaving its charging station, we assume that the vehicle can precisely determine its initial state in a known map of the environment. Specification: In this work, we assume that the vehicle needs to carry out a motion specification expressed as a PCTL formula over :
, where , and are PCTL formulas constructed by connecting properties from a set of propositions using only Boolean operators in Conjunctive Normal Form (CNF) and Disjunctive Normal Form (DNF)333A formula is CNF if it is a conjunction of clauses, where a clause is a disjunction of propositions. A formula is in DNF if it is a disjunction of clauses, where a clause is a conjunction of propositions., respectively, and . We assume that is in Negation Normal Form (NNF), i.e., Boolean operator appears only in front of the propositions. In order to better explain the different steps in our framework, we consider throughout the paper the following example.
Consider the environment shown in Fig. 1. Let , where label pick-up, test1, test2, drop-off1, drop-off2 and the unsafe regions, respectively. Consider the following motion specification:
Specification 1: Starting form an initial state reach a pick-up region, while avoiding the test1 regions, to pick up a load. Then, reach a test1 region or a test2 region. Finally, reach a drop-off1 or a drop-off2 region to drop off the load. Always avoid the unsafe regions.
The specification translates to PCTL formula :
Note that the proposed PCTL fragment (Eqn. (3)) can capture the usual properties of interest: reachability while avoiding regions and sequencing (see [FGKGP09]). For example, the formula asks for the maximum probability of avoiding the unsafe, test1 and the test2 regions until a pick-up region is reached. The formula asks for the maximum probability ov visiting a pick-up, test1 and a drop-off1 region in that order.
Next, we define the satisfaction of (Eqn. 3) by a trajectory of the system from Eqn. (1). The word corresponding to a state trajectory is a sequence , , , generated according to the following rules, for all and , : 1) ; 2) if and , then s.t. a) and b) , , ; 3) if then . Informally, the word produced by is the sequence of sets of propositions satisfied by the position of the robot as time evolves. A trajectory satisfies PCTL formula iff the corresponding sequence satisfies the formula.
As time evolves and a sequence is generated, we can check what part of is satisfied so far. If part of is satisfied we say is satisfied up to , (for more details see Sec. V-B).
Assume that at , for some , the motion specification is updated. Then, given satisfied up to , , the updated PCTL formula, denoted , is obtained from by removing the already satisfied part of , and then by 1) adding or removing conjunction clause from , or 2) adding or removing a disjunction clause from , or 3) increasing or decreasing , for any . Formal definitions are given in Sec. V-B. To illustrate this idea consider the following example:
Consider Specification 1 and assume that at the vehicle enters a pick-up region, while avoiding the test1 and the unsafe regions, and additionally, that the drop-off2 regions become unavailable for the drop off, i.e., the vehicle is allowed to drop off the load only at the drop-off1 regions. Then, the updated formula is:
where is obtained from by removing the already satisfied part of , , and by removing the conjunction clause, , from .
While the vehicle moves, gyroscope measurements are available at each stage . We define a vehicle control strategy as a map that takes as input a sequence of measured intervals and returns the control input at stage .
Iii-B Problem formulation and approach
We are ready to formulate the main problem that we consider in this paper:
Given a set of regions of interest in environment satisfying propositions from set , a vehicle model described by Eqn. (1) with initial state , an initial and updated motion specifications, expressed as PCTL formulas and , respectively, over (Eqn. (3)), find a vehicle control strategy that maximizes the probability of satisfying and then .
Our approach to Problem 1 can be summarized as follows. We start by using the abstraction method presented in [CB12] as follows: by discretizing the noise interval, we define a finite subset of the set of possible applied control inputs. We use this to define a Quantized System (QS) that approximates the original system given by Eqn. (1). Next, we capture the uncertainty in the position of the vehicle and map QS to a tree-structured MDP. Then, we develop an efficient algorithm, which exploits the tree structure of the MDP, for obtaining an initial control policy that maximizes the probability of satisfying the initial specification. Next, for the PCTL formulas given by Eqn. (3) we introduce the specification update rules that guarantee the increase (or decrease) of the satisfaction probability and we develop an efficient algorithm for obtaining an updated control policy, which exploits the MDP structure, the structure of the PCTL formulas (Eqn. (3)), and reuses the initial control policy. From [CB12] it follows that each control policy can be mapped to a vehicle control strategy and that the probability that the vehicle satisfies the corresponding specification in the original environment is bounded from below by the maximum probability of satisfying the specification on the MDP.
Iv Construction of an MDP Model
The fact that we have introduced the initial PCTL formula (Eq. (3)) in NNF enables us to classify the propositions in according to whether they represent regions that must be reached (no negation in from of the proposition) or avoided (a negation operator appears in from of the proposition).
The abstraction process from [CB12] can only deal with PCTL formulas where the propositions are classified into two nonintersecting sets according to whether they represent regions that must be reached or avoided. In this paper, we do not make this limiting assumption. For example, consider the PCTL formula given by Eqn. (4) where the test1 regions (i.e., proposition ) need to be both avoided and reached.
Iv-a PCTL formula transformation
In order to use the method presented in [CB12], we start by removing any negation operators that appear in the initial formula. To do so we use the approach presented in [FGKGP09] as follows. We introduce the extended set of propositions . In detail, we first define two new sets of symbols and . Then, we set . We also define a translation function pos which takes as input a PCTL formula in NNF and it returns a formula pos where the occurrences of terms and have been replaced by the members and of respectively. Since we have a new set of propositions, , we need to define a new map for the interpretation of the propositions. This is straightforward: , if then , else (i.e., if ) (for more details see Fig. 3).
It can easily be seen that given a formula , a map and a trajectory of the system from Eqn. (1), the following holds: satisfies iff satisfies pos. Thus, since is equivalent to the formula pos under the maps and , next results are given with respect to a formula and a map . We denote all PCTL formulas in NNF without any negation operator using bold Greek letters, e.g., , , .
At this point we have distinguished the regions that must be avoided () and the regions that must be reached ().
We use and , , to denote the state trajectory and the constant applied control at stage , respectively. With a slight abuse of notation, we use to denote the end of state trajectory , i.e., . Given a state , the state trajectory can be derived by integrating the system given by Eqn. (1) from the initial state , and taking into account that the applied control is constant and equal to . Throughout the paper, we will also denote this trajectory by , when we want to explicitly capture the initial state and the constant applied control .
For each interval in we define a representative value , . i.e., is the midpoint of interval . We denote the set of all representative values as . We define as a finite set of applied control inputs. Also, let be a random variable, where with the probability mass function (follows from Eqn. (2)).
Finally, we define a Quantized System (QS) that approximates the original system as follows: The set of applied control inputs in QS is ; for a state and a control input , QS returns
with probability , where .
Next, we denote , in which gives a control input at stage , as a finite sequence of control inputs of length . Let denote the set of all such sequences. For the initial state and , we define the reachability graph (see [LaV06] for a related definition), which encodes the set of all state trajectories originating from that can be obtained, with a positive probability, by applying sequences of control inputs from according to QS given by Eqn. (5) (an example is given in Fig. 2).
Iv-C Position uncertainty and MDP construction
As explained before, in order to answer whether some state trajectory satisfies PCTL formula (Eqn. (3)), it is sufficient to know its projection in . Therefore, we focus only on the position uncertainty.
The position uncertainty of the vehicle when its nominal position is is modeled as a disc centered at with radius , where denotes the uncertainty: where denotes the Euclidian distance. The way we model the uncertainty along is given in [CB12]. Briefly, first, we obtain uncertainty at state , denoted , by using a worst case scenario assumption: if is the applied control input for QS, the corresponding applied control input at stage for the original system was or , where . Then, we define as an approximated uncertainty trajectory and we set , , , i.e., we set the uncertainty along the state trajectory equal to the maximum value of the uncertainty along , which is at state (for more details see Fig. 3).
A tree-structured MDP M that models the motion of the vehicle in the environment and the evolution of the position uncertainty is defined as a tuple where:
is the finite set of states. The meaning of the state is as follows: means that along the state trajectory , the uncertainty trajectory is ; the noise interval is ; and is the set of satisfied propositions along the state trajectory when is the uncertainty trajectory (see Fig. 3 for an example).
is the initial state, where is the set of propositions satisfied at .
is the set of actions ( is a dummy action);
gives the enabled actions at each state;
is a transition probability function;
is the set of propositions;
assigns proposition from to states according to the following rule: given , , iff .
We generate and while building starting from . Given , and the corresponding , , , first, we generate a sequence , , where is the set of satisfied propositions along the state trajectory , when the corresponding uncertainty trajectory is , for , , according to the following rules:
Let . Then, and .
If , and , then:
, , .
Next, for each , , we generate a state of the MDP such that and , and and are such that . Finally, the newly generated state , , , is added to and the transition probability function is updated, as follows:
If , and , and otherwise, i.e., if , and , .
The former follows from the fact that is not reached and control input for the next stage needs not to be chosen. Under dummy action , with probability , the system makes a transition to the next state in the sequence satisfying a different set of propositions.
The latter follows from the fact that is reached and the control input for the next stage needs to be chosen. Given a control input the applied control input will be , , with probability , and given a new state trajectory (Eqn. (5)) the first corresponding state will be (see Fig. 3).
If the termination time is reached, we set and . Such state is called a state.
The model defined above is a valid tree-structured MDP, i.e., it satisfies the Markov property, is a valid transition probability function and each state has exactly one incoming transition.
Proof: The proof follows from the construction of the MDP. Given a current state and an action , the conditional probability distribution of future states depends only on the current state , not on the sequences of events that proceed it (see the rules stated above). Thus, Markov property holds. In addition, since , it follows that is a valid transition probability function. Finally, the fact that M is a tree-structured MDP follows from the following: for each , a unique sequence of states , , is generated. Each state in that sequence has exactly one incoming transition. Thus, according to Def. 1, is a tree-structured MDP.
V PCTL Control Policy Generation
V-a Control policy for the initial PCTL formula
The proposed PCTL control synthesis is an adaptation of the approach from [LAB12]. Specifically, we exploit the tree-like structure of and develop an efficient algorithm for generating a control policy for that maximizes the probability of satisfying a PCTL formula (Eqn. (3)).
Given a tree-structured MDP and a PCTL formula , we are interested in obtaining the control policy that maximizes the probability of satisfying , as well as the corresponding probability value, denoted , where . Specifically, for , is the action to be applied at and is the probability of satisfying at under control policy . To solve this problem we propose the following approach:
Step 1: Solve , i.e., find the set of initial states from which is satisfied with probably greater than or equal to and determine the corresponding control policy . To solve this problem, first, let , and compute the maximizing probabilities . This can be done by dividing into three subsets (states satisfying with probability ), (states satisfying with probability ), and (the remaining states):
where and are the set of states satisfying and , respectively. The computation of maximizing probabilities for the states in can be obtained as a unique solution of the following system:
and the control policy at each state is equal to the action that gives rise to this optimal solution, i.e., , .
In general (i.e., for a non tree-structured MDPs containing cycles), solving Eqn. (6) requires solving a linear programming problem ([BK08, LAB12]). For a tree-structured MDPs the solution can be obtained in a simple fashion: from each leaf state of the MDP, move backwards, by visiting parent states until is reached; at each state in perform maximization from Eqn. (6). The fact that contains no cycles is sufficient to see that the procedure stated above will result in maximizing probabilities.
The state formula requires to reach a state in by going through states in with probability greater than or equal to . Thus, s.t. we set , and otherwise, i.e., s.t. we set . Finally, , and the set of initial states is
Step 2: Solve , i.e., find the set of initial states from which is satisfied with probability greater than or equal to . To solve this problem, again, begin by solving . Start by dividing into three subsets:
Note that, is the set of states satisfying intersected with . Next, perform the same procedure as in Step 1 for obtaining , and .
Step 3: Repeat Step 2 for , i.e., until , and are obtained where .
By the nature of the PCTL formulas, to ensure the execution of all specified tasks in , we construct a history dependent control policy of the following form:
Apply policy until a state in is reached. Then, apply policy until a state in is reached. Finally, apply until a state in is reached.
For the same reason as stated above, , the maximum probability of satisfying , can not be found directly because it is not known which state in will be reached first. However, since the probability of satisfying from each state in is available, a bound on the probability of satisfying can be defined. The lower and upper bounds are and , where and denote the minimum and maximum probability of satisfying from .
In [CB12] we show that a sequence of measured intervals corresponds to a unique state on the MDP. Thus, the desired vehicle control strategy returns the control input for the next stage by mapping the sequence to the state of the MDP; the control input corresponds to the optimal action, under , at that state.
V-B Control policy for the updated PCTL formula
Next, assume that at the end of stage , for some , is updated into . As noted in the previous subsection, given a sequence of measured intervals, we can follow vehicle’s progress on . We denote the current state as (if it is at the initial state, then ). We develop an efficient algorithm for obtaining , and , that reuses and , and exploits the structure of formulas given by Eqn. (3) and the fact that is a tree-structured MDP.
First, we formally define what it means for to be satisfied up to , . Note that, if under the execution of , is reached, it is guaranteed that