\text{STyLuS}^{*}: A Temporal Logic Optimal Control Synthesis Algorithm for Large-Scale Multi-Robot Systems

# STyLuS∗: A Temporal Logic Optimal Control Synthesis Algorithm for Large-Scale Multi-Robot Systems

Yiannis Kantaros,  and Michael M. Zavlanos,  Yiannis Kantaros and Michael M. Zavlanos are with the Department of Mechanical Engineering and Materials Science, Duke University, Durham, NC 27708, USA. @duke.edu. This work is supported in part by ONR under grant .
###### Abstract

This paper proposes proposes a new highly scalable optimal control synthesis algorithm from linear temporal logic specifications, called for large-Scale optimal Temporal Logic Synthesis, that is designed to solve complex temporal planning problems in large-scale multi-robot systems. Existing planning approaches with temporal logic specifications rely on graph search techniques applied to a product automaton constructed among the robots. In our previous work, we have proposed a more tractable sampling-based algorithm that builds incrementally trees that approximate the state-space and transitions of the synchronous product automaton and does not require sophisticated graph search techniques. Here, we extend our previous work by introducing bias in the sampling process which is guided by transitions in the Bchi automaton that belong to the shortest path to the accepting states. This allows us to synthesize optimal motion plans from product automata with hundreds of orders more states than those that existing optimal control synthesis methods or off-the-shelf model checkers can manipulate. We show that is probabilistically complete and asymptotically optimal and has exponential convergence rate. This is the first time that convergence rate results are provided for sampling-based optimal control synthesis methods. We provide simulation results that show that can synthesize optimal motion plans for very large multi-robot systems which is impossible using state-of-the-art methods.

Temporal logic, optimal control synthesis, formal methods, sampling-based motion planning, multi-robot systems.

## I Introduction

Control synthesis for mobile robots under complex tasks, captured by Linear Temporal Logic (LTL) formulas, builds upon either bottom-up approaches when independent LTL expressions are assigned to robots [1, 2, 3, 4] or top-down approaches when a global LTL formula describing a collaborative task is assigned to a team of robots [5, 6], as in this work. Common in the above works is that they rely on model checking theory [7, 8] to find paths that satisfy LTL-specified tasks, without optimizing task performance. Optimal control synthesis under local and global LTL specifications has been addressed in [9, 10] and [11, 12, 13], respectively. In top-down approaches [11, 12, 13], optimal discrete plans are derived for every robot using the individual transition systems that capture robot mobility and a Non-deterministic Bchi Automaton (NBA) that represents the global LTL specification. Specifically, by taking the synchronous product among the transition systems and the NBA, a synchronous product automaton can be constructed. Then, representing the latter automaton as a graph and using graph-search techniques, optimal motion plans can be derived that satisfy the global LTL specification and optimize a cost function. As the number of robots or the size of the NBA increases, the state-space of the product automaton grows exponentially and, as a result, graph-search techniques become intractable. Consequently, these motion planning algorithms scale poorly with the number of robots and the complexity of the assigned task. A more tractable approach is presented in [14, 15] that identifies independent parts of the LTL formula and builds a local product automaton for each agent. Nevertheless, this approach can be applied only to finite LTL missions and does not have optimality guarantees.

To mitigate these issues, in our previous work we proposed a sampling-based optimal control synthesis algorithm that avoids the explicit construction of the product among the transition systems and the NBA [16]. Specifically, this algorithm builds incrementally directed trees that approximately represent the state-space and transitions among states of the product automaton. The advantage is that approximating the product automaton by a tree rather than representing it explicitly by an arbitrary graph, as existing works do, results in significant savings in resources both in terms of memory to save the associated data structures and in terms of computational cost in applying graph search techniques.

In this work, we propose a new highly scalable optimal control synthesis algorithm from LTL specifications, called STyLuS* for large-Scale optimal Temporal Logic Synthesis, that is designed to solve complex temporal planning problems in large-scale multi-robot systems. In fact, extends the sampling-based synthesis algorithm proposed in [16] by introducing bias in the sampling process. For this, we first exploit the structure of the atomic propositions to prune the NBA by removing transitions that can never be enabled. Then, we define a metric over the state-space of the NBA that captures the shortest path, i.e., the minimum number of feasible transitions, between any two NBA states. Given this metric, we define a probability distribution over the nodes that are reachable from the current tree so that nodes that are closer to the final/accepting states of the NBA are sampled with higher probability; no particular sampling probability is proposed in [16]. We show that introducing bias in the sampling process does not violate the sampling assumptions in [16] so that inherits the same probabilistic completeness and asymptotic optimality guarantees. Moreover, we provide exponential convergence bounds; the first in the field of optimal control synthesis methods. We show that by biasing the sampling process we can synthesize optimal motion plans from product automata with order states and beyond, which is hundreds of orders more states than those that any existing optimal control synthesis algorithms [17, 18, 12, 13, 16] can handle. For example, our algorithm in [16], when implemented with uniform sampling, can optimally solve problems with order states, which are still more than existing methods [17, 18, 12, 13]. Compared to off-the-shelf model-checkers, such as NuSMV [19] and nuXmv [20], that can design feasible but not optimal motion plans, our proposed biased sampling-based algorithm can find feasible plans much faster. NuSMV can solve problems with order states, while nuXmv can handle infinite-state synchronous transition systems but it is slower than . Note that can be implemented in a distributed way, as in our recent work [21], which can further decrease the computational time.

Relevant sampling-based control synthesis methods are also presented in [22, 23]. These methods consider continuous state spaces and employ sampling-based methods to build discrete abstractions of the environment that are represented by graphs of arbitrary structure, e.g., as in [24, 25] for point-to-point navigation. Once these abstractions become expressive enough to generate motion plans that satisfy the LTL specification, graph search methods are applied to the respective PBA to design a feasible path. However, representing the environment using graphs of arbitrary structure compromises scalability of temporal logic planning methods since, as the size of these graphs increases, more resources are needed to save the associated structure and search for optimal plans using graph search methods. While our proposed sampling-based approach assumes that a discrete abstraction of the environment is available, as in [26, 27, 28, 29, 30], it build trees, instead of arbitrary graphs, to approximate the product automaton. Therefore, it is more economical in terms of memory requirements and does not require the application of expensive graph search techniques to find the optimal motion plan. Instead, it tracks sequences of parent nodes starting from desired accepting states. Combined with the proposed biased sampling approach, our method can handle much more complex planning problems with more robots and LTL tasks that correspond to larger NBAs. A more detailed comparison with [22, 23] can be found in [16].

A preliminary version of this work can be found in [31]. In [31] it is shown that the proposed biased sampling process satisfies the assumptions in [16] so that the proposed algorithm inherits the same probabilistic completeness and asymptotic optimality guarantees. Compared to [31], here we additionally provide exponential convergence rate bounds. To the best of our knowledge, this the first sampling-based motion planning algorithm with temporal logic specifications that also has convergence-rate guarantees. Moreover, we provide additional extensive simulation studies that show the effect of bias in sampling in the convergence rate of the algorithm, as well as scalability of our method with respect to the number of robots, the size of the transition systems, and the size of the NBA. We also compare our method to relevant state-of-the-art methods. To the best of our knowledge, this the first optimal control synthesis method for global temporal logic specifications with optimality and convergence guarantees that can be applied to large-scale multi-robot systems.

The rest of the paper is organized as follows. In Section II we present the problem formulation. In Section III we describe our proposed sampling-based planning algorithm and in Section IV we examine its correctness, optimality, and convergence rate. Numerical experiments are presented in Section V.

## Ii Problem Formulation

Consider mobile robots that live in a complex workspace . We assume that there are disjoint regions of interest in . The -th region is denoted by and it can be of any arbitrary shape.111For simplicity of notations we consider disjoint regions . However, overlapping regions can also be considered by introducing additional states to defined in Definition II.1 that capture the presence of robot in more than one region. Given the robot dynamics, robot mobility in the workspace can be represented by a weighted Transition System (wTS). The wTS for robot is defined as follows:

###### Definition II.1 (wTS)

A weighted Transition System (wTS) for robot , denoted by is a tuple where: (a) is the set of states, where a state indicates that robot is at location ; (b) is the initial state of robot ; (c) is the transition relation for robot . Given the robot dynamics, if there is a control input that can drive robot from location to , then there is a transition from state to denoted by ; (d) is a cost function that assigns weights/costs to each possible transition in wTS. For example, such costs can be associated with the distance that needs to be traveled by robot in order to move from state to state ; (e) is the set of atomic propositions, where is true if robot is inside region and false otherwise; and (f) is an observation/output function defined as , for all .

Given the definition of the wTS, we can define the synchronous Product Transition System (PTS) as follows [12]:

###### Definition II.2 (Pts)

Given transition systems , the product transition system is a tuple where (a) is the set of states; (b) is the initial state, (c) is the transition relation defined by the rule , where with slight abuse of notation , . The state is defined accordingly. In words, this transition rule says that there exists a transition from to if there exists a transition from to for all ; (d) is a cost function that assigns weights/cost to each possible transition in PTS, defined as , where , and stands for the projection of state onto the state space of . The state is obtained by removing all states in that do not belong to ; (e) is the set of atomic propositions; and, (f) is an observation/output function giving the set of atomic propositions that are satisfied at a state .

In what follows, we give definitions related to the PTS that we will use throughout the rest of the paper. An infinite path of a PTS is an infinite sequence of states, such that , , and , , where is an index that points to the -th entry of denoted by . The trace of an infinite path of a PTS, denoted by , where denotes infinite repetition, is an infinite word that is determined by the sequence of atomic propositions that are true in the states along , i.e., . A finite path of a PTS can be defined accordingly. The only difference with the infinite path is that a finite path is defined as a finite sequence of states of a PTS. Given the definition of the weights in Definition II.2, the cost of a finite path , denoted by , can be defined as

 ^J(τ)=|τ|−1∑k=1wPTS(τ(k),τ(k+1)), (1)

where, stands for the number of states in . In words, the cost (1) captures the total cost incurred by all robots during the execution of the finite path .

We assume that the robots have to accomplish a complex collaborative task captured by a global LTL statement defined over the set of atomic propositions . Due to space limitations, we abstain from formally defining the semantics and syntax of LTL. A detailed overview can be found in [7]. Given an LTL formula , we define the language , where is the satisfaction relation, as the set of infinite words that satisfy the LTL formula . Any LTL formula can be translated into a Nondeterministic Bchi Automaton (NBA) over denoted by defined as follows [32].

###### Definition II.3 (Nba)

A Nondeterministic Bchi Automaton (NBA) over is defined as a tuple , where is the set of states, is a set of initial states, is an alphabet, is the transition relation, and is a set of accepting/final states.

Given the PTS and the NBA that corresponds to the LTL , we can now define the Product Bchi Automaton (PBA) , as follows [7]:

###### Definition II.4 (Pba)

Given the product transition system and the NBA , we can define the Product Bchi Automaton as a tuple where (a) is the set of states; (b) is a set of initial states; (c) is the transition relation defined by the rule: . Transition from state to , is denoted by , or ; (d) , where and ; and (e) is a set of accepting/final states.

Given and the PBA an infinite path of a PTS satisfies if and only if , which is equivalently denoted by . Specifically, if there is a path satisfying , then there exists a path that can be written in a finite representation, called prefix-suffix structure, i.e., , where the prefix part is executed only once followed by the indefinite execution of the suffix part . The prefix part is the projection of a finite path that lives in onto . The path starts from an initial state and ends at a final state , i.e., it has the following structure with . The suffix part is the projection of a finite path that lives in onto . The path is a cycle around the final state , i.e., it has the following structure , where . Then our goal is to compute a plan , where stands for the projection on the state-space , so that the following objective function is minimized

 J(τ)=β^J(τpre)+(1−β)^J(τsuf), (2)

where and stand for the cost of the prefix and suffix part, respectively and is a user-specified parameter. Specifically, in this paper we address the following problem.

###### Problem 1

Given a global LTL specification , and transition systems , for all robots , determine a discrete team plan that satisfies , i.e., , and minimizes the cost function (2).

### Ii-a A Solution to Problem 1

Problem 1 is typically solved by applying graph-search methods to the PBA. Specifically, to generate a motion plan that satisfies , the PBA is viewed as a weighted directed graph , where the set of nodes is indexed by the set of states , the set of edges is determined by the transition relation , and the weights assigned to each edge are determined by the function . Then, to find the optimal plan , shortest paths towards final states and shortest cycles around them are computed. More details about this approach can be found in [9, 12, 13, 17] and the references therein.

## Iii Sampling-based Optimal Control Synthesis

In this section, we build upon our previous work [16] and propose a biased sampling-based optimal control synthesis algorithm that can synthesize optimal motion plans in prefix-suffix structure, i.e., , that satisfy a given global LTL specification from PBA with arbitrarily large state-space. The procedure is based on the incremental construction of a directed tree that approximately represents the state-space and the transition relation of the PBA defined in Definition II.4. In what follows, we denote by the tree that approximately represents the PBA . Also, we denote by the root of . The set of nodes contains the states of that have already been sampled and added to the tree structure. The set of edges captures transitions between nodes in , i.e., , if there is a transition from state to state . The function assigns the cost of reaching node from the root of the tree. In other words, , where and is the path in the tree that connects the root to .

The construction of the prefix and the suffix part is described in Algorithm 1. In lines 1-1, first the LTL formula is translated to a NBA and then is pruned by removing transitions that can never happen. Then, in lines 1-1, the prefix parts are constructed, followed by the construction of their respective suffix parts in lines 1-1. Finally, using the constructed prefix and suffix parts, the optimal plan is synthesized in lines 1-1.

### Iii-a Feasible Words

In this section, given the NBA that corresponds to the assigned LTL formula , we define a function that returns the minimum number of feasible NBA transitions that are required to reach a state starting from a state [lines 1- 1, Alg. 1]. This function will be used in the construction of the prefix and suffix parts to bias the sampling process. A feasible NBA transition is defined as follows; see also Figure 1.

###### Definition III.1 (Feasible NBA transitions)

A transition is feasible if the finite word is a feasible word, i.e., if can be generated by the PTS defined in Definition II.2.

To characterize the words that are feasible, we first need to define the words that are feasible, i.e, the words that can be generated by defined in Definition II.1.

###### Definition III.2 (Feasible words σ∗i∈Σi)

A word is feasible if and only if , where is a Boolean formula defined as

 binfi=∨∀rj(∨∀re(πrji∧πrei)). (3)

Note that the Boolean formula is satisfied by any finite word that requires robot to be present in two or more disjoint regions, simultaneously. For instance, the word satisfies .222Note that if we consider regions that are not necessarily disjoint, then the Boolean formula is defined as . Note also that definition of infeasible words depends on the problem at hand, i.e, the definition of atomic propositions included in the sets . Next, we define the feasible words .

###### Definition III.3 (Feasible words σ∗∈Σ)

A word is feasible if and only if , for all robots , where , is defined in (3), and stands for the projection of the word onto .333For instance, .

To define the proposed function , we first construct sets that collect feasible finite words that enable a transition from a state to according to [line 1, Alg. 1]. To construct these sets, sets that collect all finite (feasible or infeasible) words that enable a transition from to , for all , are required.444Note that the sets can be computed during the translation of an LTL formula to a NBA; see e.g., the software package [33] that relies on [34] for the construction of the NBA. Then, the sets can be constructed by removing from all words that are not feasible, for all .

Next, we view the NBA as a directed graph , where the set of nodes is indexed by the states and the set of edges collects the edges from nodes/states to denoted by , where exists if [line 1, Alg. 1]. Assigning weights equal to one to all edges in the set , we define the function as

 d(qB,q′B)={|SPqB,q′B|,if SPqB,q′B exists,∞,         otherwise, (4)

where denotes the shortest path in from to and stands for its cost., i.e., the number of transitions/edges in .

### Iii-B Construction of Optimal Prefix Parts

In this section we describe the construction of the tree that will be used for the synthesis of the prefix part [lines 1-1, Alg. 1]. Since the prefix part connects an initial state to an accepting state , with , we can define the goal region for the tree , as [line 1, Alg. 1]:

 Xpregoal={qP=(qPTS, qB)∈QP | qB∈QFB}. (5)

The root of the tree is an initial state of the PBA and the following process is repeated for each initial state , in parallel [line 1-1, Alg. 1]. The construction of the tree is described in Algorithm 2 [line 1, Alg. 1]. In line 1 of Algorithm 1, stands for the -th initial state assuming an arbitrary enumeration of the elements of the set . The set initially contains only the root , i.e., an initial state of the PBA [line 2 , Alg. 2] and, therefore, the set of edges is initialized as [line 2, Alg. 2]. By convention, we assume that the cost of is zero [line 2, Alg. 2]. Given the root we select a feasible final state , such that (i) and (ii) . Among all final states that satisfy both (i) and (ii), we select one randomly denoted by [line 2, Alg. 2]. If there does not exist such a state , then this means that there is no prefix-suffix plan associated with the initial state . In this case, the construction of the tree stops without having detected any final states around which a loop exists [lines 2-2, Alg. 2]. The final state will be used in the following subsection in order to bias the exploration of the PBA towards this state. We also define the set that collects the nodes that have the minimum distance among all nodes in , i.e.,

 Dmin={qP=(qPTS,qB)∈VT | d(qB,qF,feasB)=dmin}, (6)

where and stands for the projection of all states onto . The set initially collects only the root [line 2, Alg. 2].

#### Iii-B1 Sampling a state qnewP∈QP

The first step in the construction of the graph is to sample a state from the state-space of the PBA. This is achieved by a sampling function Sample; see Algorithm 3. Specifically, we first create a state , where is sampled from a given discrete distribution and stands for the projection of onto the state-space of the PTS [line 3, Alg. 3]. The probability density function defines the probability of selecting the state as the state at iteration of Algorithm 2 given the set . The distribution is defined as follows:

 frand(qP|VT,Dmin)=⎧⎪⎨⎪⎩prand1|Dmin|,            if qP∈Dmin(1−prand)1|VT∖D% min|, otherwise, (7)

where stands for the probability of selecting any node to be . Note that can change with iterations but it should always satisfy so that states are selected more often to be [line 3, Alg. 3].

###### Remark III.4 (Density function frand)

Observe that the discrete density function in (7) is defined in a uniform-like way, since all states in the set are selected with probability while the states in are selected with probability , where . However, alternative definitions for are possible as long as (a) satisfies Assumptions 4.1(i)-(iii) made in [16]; and (b) is biased to generate states that belong to more often than states that belong to . Assumptions 4.1(i) and 4.1(iii) in [16] are required to guarantee that the proposed algorithm is probabilistically complete while Assumptions 4.1(i)-(iii) are required for the asymptotic optimality of the proposed method; see Section IV. The fact that the density function (7) satisfies Assumption 4.1 in [16] is shown in Section IV. Finally, bias in the sampling increases scalability of the proposed control synthesis method; see Section V. Note that, as it will be discussed later, can also change with iterations of Algorithm 2.

Given , in our previous work [16], we sample a state from a discrete distribution so that is reachable from . Here, we sample a state so that it is both reachable from and also it can lead to a final state by following the shortest path in that connects to . The proposed biased sampling process is illustrated in Figure 2.

##### Selection of qminB∈QB

First, given , we first construct the reachable set that collects all states that can be reached in one hop in from given the observation (see also Figure 2) [line 3, Alg. 3], i.e.,

 RB(qrandB)={qB∈QB | (qrandB,LPTS(q