Simultaneous Task Allocation and Planning Under Uncertainty

Simultaneous Task Allocation and Planning Under Uncertainty

Fatma Faruq, Bruno Lacerda, Nick Hawes and David Parker School of Computer Science, University of Birmingham {fxf603|d.a.parker}@cs.bham.ac.ukOxford Robotics Institute, University of Oxford {bruno|nickh}@robots.ox.ac.uk
Abstract

We propose novel techniques for task allocation and planning in multi-robot systems operating in uncertain environments. Task allocation is performed simultaneously with planning, which provides more detailed information about individual robot behaviour, but also exploits the independence between tasks to do so efficiently. We use Markov decision processes to model robot behaviour and linear temporal logic to specify tasks and safety constraints. Building upon techniques and tools from formal verification, we show how to generate a sequence of multi-robot policies, iteratively refining them to reallocate tasks if individual robots fail, and providing probabilistic guarantees on the performance (and safe operation) of the team of robots under the resulting policy. We implement our approach and evaluate it on a benchmark multi-robot example.

I Introduction

In many service robot applications, such as intra-logistics, surveillance or stock monitoring, it is desirable for a collection of tasks to be allocated to a team of robots. In this paper, we address applications such as these where tasks are independent (there are no inter-task dependencies) and each task only requires a single robot to complete it. Most existing approaches for solving this class of problems divide the problem into separate task allocation (TA) and planning processes. TA determines which robot should complete which tasks, and planning determines how each task, or conjunction of tasks, should be completed. This separation is made to reduce the computational complexity of the problem. It allows each robot to plan separately for its own task set, avoiding the need for a joint planning model which is typically exponential in the number of team members.

This separation also allows specialised algorithms to be used for the TA and planning parts, increasing the efficiency with which the task-directed behaviour of the team can be generated. When doing this, TA usually assumes a greatly simplified model of planning in order to be able to efficiently compute allocations. However, this separation also means that the TA process cannot be informed by the plans of the individual robots, which prevents it from exploiting opportunities, or avoiding hindrances, that are only evident once planning has been performed. For example, if the individual robots plan with time-based models, a task may be much quicker to complete at a particular time of day, but with TA separated from planning, this information cannot be exploited in the allocation process.

To address this limitation, recent work has considered the problem of simultaneous task allocation and planning (STAP) [1], which solves the complete problem in a single process, and can therefore take the plans of each robot (and their costs etc.) into account during the allocation process. In this paper, we build upon the STAP approach and make the following contributions. We present the first formalisation of simultaneous task allocation and planning under uncertainty (STAPU), and a method to solving this problem using techniques from formal verification. Our formalisation explicitly includes the possibility of robot failures. So, we also contribute an extension of the approach to deal with reallocation of tasks when a robot fails.

Individual robots’ capabilities and environments are described using Markov decision processes (MDPs). The set of tasks to be completed by the team of robots is formally specified using linear temporal logic (LTL). More precisely, tasks are defined in the co-safe fragment of LTL. Furthermore a safe LTL formula is provided to specify safety constraints to be obeyed by all robots. Using techniques and tools from formal verification, more specifically from probabilistic verification, we propose methods to generate multi-robot policies that maximise the probability of successful task completion and minimise costs. We tackle the basic task allocation and planning problem using a team MDP that exploits the independence between tasks and adopts a sequential modelling approach. We then iteratively improve the joint policy, by incorporating the possibility of reallocating tasks in the event of individual robot failures. In the latter phase, we construct and solve a precise joint-agent model of synchronise execution of the generated sequential policies, yielding probabilistic guarantees on the performance (and safe operation) of the team of robots. We implement our approach as an extension of the widely used probabilistic verification tool PRISM [2], and evaluate its performance on a benchmark multi-robot example.

Ii Related Work

When looking at the existing literature in this area, we can consider the following distinctions: 1) multi-agent path finding (MAPF) approaches, which have rich models of inter-agent spatial interactions but can only solve path planning problems, versus more general planning approaches which can reason about a greater range of tasks; 2) planning approaches which explicitly model uncertainty, versus those with deterministic models; 3) approaches which produce solutions using verification-based methods (and thus can produce guaranteed behaviour), versus other solution approaches; and 4) approaches which integrate task allocation and planning, versus those that separate these processes.

In robotics, MAPF is a widely-studied problem which is closely related to most of the STAP instances in the literature. Starting from approaches which do not consider uncertainty, MAPF solutions (e.g. [3, 4, 5]) typically focus on ensuring efficient, collision-free movement of a robot team through an environment. By focusing purely on path finding, domain-specific heuristics and algorithms can be used to efficiently solve larger problem instances than would otherwise be possible. A more general class of problems is multi-agent planning, which allows robot actions to have preconditions and effects across state variables in the problem, and can therefore represent a wider range of robot tasks [6, 7]. Within the planning community, multi-agent planning approaches typically focus on problems where interactions and coordination between agents are required to solve a single task (e.g. one robot needs to place an object on to a second robot), but these works don’t often involve explicit allocation of tasks.

Existing TA approaches operate over both MAPF [8, 9] and more general planning problems [10, 11]. When a single agent has multiple tasks, TA becomes over-subscription planning [12]. When we integrate multiple (independent or otherwise) planning agents into a single problem via a separate TA process (e.g. [9, 13]), TA can only use fixed models of agent behaviour created as abstractions of the planning process. To overcome this, a small number of approaches combine multi-agent TA and planning into a single problem [1, 14, 15, 16].

Of particular relevance to this paper is the work described in [1]. They use a logical model of robot operation plus a team task specification in LTL and propose an algorithm which allocates tasks to robots in order to minimise the maximum cost any agent will take to complete its tasks. Combining TA with planning allows TA to reason directly about how each agent can perform a task, but introduces the complexity of reasoning in a space which grows exponentially in the number of agents. To overcome this, the authors propose an approach in which separate planning models for each agent are linked sequentially by switch transitions, which allow one agent to pass tasks to the next agent. They exploit these transitions to produce multi-agent plans which allocate tasks across agents to minimise the aforementioned solution metric of minimising the largest agent cost. This paper takes that work as a basis, but extends it to include uncertainty in the effects of robot actions. In multi-robot systems, if uncertainty is not modelled, it is usually dealt with sub-optimally through execution monitoring and replanning.

The various planning fields surveyed above also have analogues which include uncertainty. MAPF approaches have included uncertainty to account for the performance of mobile robot localisation and navigation reliability [17, 18, 19]. Many single agent planning approaches assume that the environment is fully observable but responds probabilistically to robot actions and thus formulate planning problems using MDPs [20, 21]. Approaches in this space include our prior work [22, 23] which uses verification-based methods to produce probabilistically-guaranteed behaviour policies for a mobile robot, where elements of the MDP are learnt from experience. By generating policies to meet a formal LTL specification the behaviour of a robot system can be fully characterised in advance, and users can control the behaviour simply by altering this specification (as opposed to modifying reward functions etc.). When extending MDP planning approaches to multi-agent settings, authors either assume some (limited) communication between robots in order to maintain the full observability of the problem [24, 25, 13], or otherwise use the computationally-demanding decentralised, partially-observable MDP (DecPOMDP) formalisation which accounts for the unknown state of other agents in the problem [26]. As is appropriate in many service robot domains, we make the assumption of perfect communication, thus allowing this work to retain the MDP formalisation.

Whilst there is work on TA under uncertainty [27], there is very little existing literature which addresses STAP under uncertainty.  [14] deals with TA and MAPF under uncertainty in a warehouse environment. They focus on scalability and thus only address pathfinding and use an approximate solution method. [16] too, ignores uncertainty but uses an online execution policy to ensure robustness. In contrast, the work in this paper allows for rich LTL task specifications, and solves the problem exactly to provide probabilistic guarantees over the resulting behaviour.

Iii Preliminaries

Iii-a Markov Decision Processes

We use Markov decision processes (MDPs) to model the stochastic evolution of robots and their environment. An MDP is a tuple , where: is a finite set of states; is the initial state; is a finite set of actions; is a probabilistic transition function, where for all , ; is a set of atomic propositions; and is a labelling function, such that iff is true in .

We define the set of enabled actions in as . An infinite path through an MDP is a sequence where for all . A finite path is a prefix of an infinite path. The choice of action to take at each step of the execution of an MDP is made by a policy, which can base its decision on the history of up to the current state. Formally, a policy is a function from finite paths of to actions in such that, for any finite path ending in state , . In this work, we will use memoryless policies (which only base their choice on the current state) and finite-memory policies (which need to track only a finite set of “modes”).

Iii-B Linear Temporal Logic

Linear temporal logic (LTL) is an extension of propositional logic which allows reasoning about infinite sequences of states. We provide a brief overview of LTL and its safe/co-safe fragments here, and direct the reader to [28] and [29], respectively, for a more complete introduction to these two topics. LTL formulas over atomic propositions are defined using the following grammar:

The operator is read “next”, meaning that the formula it precedes will be true in the next state. The operator is read “until”, meaning that its second argument will eventually become true in some state, and the first argument will be continuously true until this point. The other propositional connectives can be derived from the ones above in the usual way. Moreover, other useful LTL operators can be derived from the ones above. Of particular interest for our work are the “eventually” operator , which requires that is satisfied in some future state and the “always” operator , which requires to be satisfied in all future states: and .

Given an infinite path , we write to denote that satisfies formula . Furthermore, we write to denote the maximum probability (over all policies) of satisfying from state in MDP . The semantics of full LTL is defined over infinite paths. However, in this work, we are interested in specifying events that occur within finite time. So, we use two well-known subsets of LTL for which properties are meaningful when evaluated over finite paths: safe and co-safe LTL. These are based on the notions of bad prefix and good prefix. A bad prefix for is a finite path that cannot be extended in such a way that is satisfied, and a good prefix for is a finite path that cannot be extended in such a way that is not satisfied. Safe LTL is defined as the set of LTL formulas for which all non-satisfying infinite paths have a finite bad prefix. Conversely, co-safe LTL is the set of LTL formulas for which all satisfying infinite paths have a finite good prefix.

For simplicity, we assume a syntactic restriction for safe and co-safe LTL. We assume all formulas to be in the positive normal form (negation can only appear next to atomic propositions). Syntactically safe LTL is the set of formulas for which only the and temporal operators occur, and syntactically co-safe LTL is the set of formulas for which only the , and temporal operators occur.

For any (co-)safe LTL formula written over , we can build a deterministic finite automaton (DFA) , where: is a finite set of states; is the initial state; is the set of accepting (i.e., final) states; is the alphabet; and is a transition function. If is safe, is a DFA that accepts exactly the finite paths (or, more precisely, the sequences of state labellings from those paths) that are a bad prefix for . Conversely, if is co-safe, is a DFA that accepts exactly the finite paths that are a good prefix for  [29].

Iii-C Optimal Policies for (Co)-Safe Specifications

For any (co-)safe LTL specification and MDP , we can build a product MDP . behaves like the original MDP , but is augmented with information about the satisfaction of . Once a path of reaches an accepting state (i.e., a state of the form for ), it is a good prefix for if is safe, or a bad prefix if it is co-safe. We then know that is satisfied, or not satisfied, respectively. The construction of the product MDP is well known (see, e.g., [30]) and is such that it preserves the probabilities of paths from . So we can reduce, for example, the problem of finding a policy for for a co-safe to a reachability problem in the product MDP . Optimal policies are memoryless in , and thus finite-memory in , with modes.

Iv Simultaneous Task Allocation and Planning under Uncertainty (STAPU)

Iv-a Problem Formulation

Let be a set of robots (agents). The operation of each individual robot as it attempts to perform tasks is modelled by an MDP . Probabilities in the MDP may represent either uncertainty in its environment or the possibility of failure. For the latter, we assume that has a designated failure state from which, once reached, the robot cannot execute more tasks.

We assume a mission where is a set of independent co-safe LTL task specifications and is a safety specification: a formula of safe LTL independent with all . We assume the mission to fulfil the two decomposition properties used in [1], in particular LTL formulas must be independent, i.e., (non-)satisfaction of one specification must not violate any other specification in the mission; and completion of all mission specification implies the completion of the overall mission (defined as the conjunction of all specifications).

We define a simultaneous task allocation and planning under uncertainty (STAPU) problem as finding a task allocation mapping such that:

(1)

where the LTL specification is defined as the conjunction of tasks for robot given task allocation :

(2)

We also need to compute the corresponding optimal policies for the MDPs . Since we assume task independence, solving a STAPU problem is effectively finding a joint policy (i.e. allocation of tasks and the actions performed by each robot) that maximises the probability of the team achieving the mission. For now, we assume that a task that is in progress when a robot fails is never completed; we will see how to deal with this situation in Section V.

Iv-B Solution

In order to solve the problem described above efficiently, we extend the approach proposed in [1], which exploits the assumption that tasks have no interdependencies and can each be completed by a single robot. Although tasks will ultimately be executed by robots in parallel, solving a single task allocation and planning (STAPU) problem is done by ignoring interactions between robots. This uses a sequential model in which we consider each robot independently in turn, avoiding construction of the a fully synchronised team model (whose size is exponential on the number of robots).

Each independent model is a local product MDP, encoding the dynamics of an individual robot, the definitions of the tasks and the extent to which they have so far been completed. These models are joined into a team MDP through the use of switch transitions which represent changes in the allocation of a task from robot to (the next robot in the sequential model). For the STAPU problem we add switch transitions from every state in robot ’s model where it completes a task to every initial state for . This next robot has an initial state for every possible combination of allocated tasks. Considered sequentially, this model allows each robot a choice, on task completion, of whether it or the subsequent robot should tackle the next task. When the model is solved to create a team policy, the switch transitions result in a policy which creates the optimal (given a solution metric) task allocation across the team, along with the optimal action choices for each task, i.e. a solution to the STAPU problem. We formalise these steps below.

Iv-B1 Local Product MDPs

The first step of the approach is encoding the task definitions into each robot model . To do so, we build the product MDP , i.e., we include a separate DFA for each LTL formula making up the mission specification . The state space of the resulting MDP is , allowing us to keep track of the state of satisfaction for each part of the mission specification separately. Finding a policy that maximises the probability of achieving a subset of the mission tasks can be done by calculating the policy that maximises the probability, in the product MDP , of reaching the accepting states of the corresponding DFAs.

Iv-B2 Team MDP

The team MDP is the union of the local product MDPs . As mentioned above, instead of building a fully synchronised joint model of the team, we build upon the approach of [1] and construct a team MDP that represents each robot sequentially. More precisely, we define where:

  • keeps track of the robot currently being considered and its current state (within its local product MDP):

  • , i.e., we start planning for robot , with it in its initial state;

  • , i.e., the action space comprises the individual robot actions plus a special switch transition  that will be used to make the planning process move from allocating tasks for to allocating tasks for ;

  • For , the transition function mirrors the corresponding local product transition function:

    For , the transition function updates the system state such that it can start planning for the next robot, i.e., if all of the following conditions hold:

    • , i.e., we connect the robots in a ring topology;

    • The state of all the tasks is preserved and we do not switch during the execution of a task, i.e., we keep all the DFA components of the state the same, and they must correspond to either the initial or the accepting state of the DFA, i.e., and , where and each is either the initial state or an accepting state in from the corresponding DFA ;

    • corresponds to an initial state of robot , i.e., .

    For all other pair of states, . We omit details of the propositions and labelling function , whose details are not required here.

In the team MDP above, maximising the probability of reaching an accepting state for all DFA components is equivalent to solving the STAPU problem, as defined in IV-A. This follows from the independent task and non-interacting robots assumptions. More concretely, any allocation of tasks to robots is valid and the order in which tasks are completed is not relevant, due to task independence. Furthermore, we can plan for each robot ignoring the state of the other robots due to the non-interacting robot assumption. Given that the team MDP encompasses all possible task allocations and all possible ways a robot might complete each task, by solving it optimally we in fact find a sequence of policies that solve the basic STAPU problem optimally when executed jointly in a parallel fashion.

V STAPU with Reallocation

In the previous section, when a task fails to be completed by a robot, it is removed from the set achievable by the system. However, the advantage of a multi-robot system is that this task can be reallocated to another member of the team for completion. In this section we extend the approach described above such that tasks to the other team-members when a robot fails. In order to do so, we add new switch transitions so that tasks which a failing robot has not completed can be reallocated to the rest of the team.

When a failure occurs, robots have already started executing policies obtained from the solution of the STAPU problem. So, the outcomes of the switch transitions for reallocation must describe in which states the robots might be at the point in time when the failure occurred. This need for synchronisation across multiple robots breaks the assumptions which allowed us to build a sequential model for the basic STAPU solution.

In order to tackle this issue, while avoiding building a full joint multi-robot MDP for the problem, we build a joint model representing the synchronised evolution of the system just under the computed STAPU policies, i.e., we build the joint synchronised multi-robot policy for the STAPU solution. This model has a set of fail states, corresponding to situations where some robot has failed. We then choose one of these fail states as the initial state of a new STAPU instance, adding switch transitions to it.

More precisely, let be such a fail state in the synchronised multi-robot policy, where is the fail state to be addressed. We create a new (sequential) team MDP as described in IV-B2 except we now consider the initial state to be and the switch transitions now point to the state of the next robot in the fail state rather than to the initial state of robot . By solving this new STAPU instance, we effectively find a reallocation policy from the chosen fail state. We can then use the reallocation policy to continue building the synchronised multi-robot policy from , and then choose a new fail state to address. We choose the next fail state to address in decreasing order of reachability probability of fail states, i.e., we start by addressing the most probable fail states. When all fail states have been addressed, we terminate the procedure. Note that because we choose the next fail state to address in decreasing order of reachability probability of fail states, the algorithm is an anytime algorithm, in that it can be terminated at any point and provide a solution, and the longer it runs the more failure cases it will cover. Fig 1 shows a summary of the full procedure for STAPU with reallocation.

Fig. 1: Outline of the overall approach. The mission MDP for each robot is built as the product of the robot MDP and the specification DFAs. Then, we build the team MDP and solve a STAPU for the initial state of the robots. The obtained sequential policies are then used to build a synchronised joint policy . A fail state is chosen from , the initial state of the team MDP, along with its switch transitions, are updated to represent the chosen fail state, and a new STAPU is solved. We keep choosing new fail states and solving new STAPUs until no more fail states exist, or the procedure is interrupted, at which point the current joint policy is returned.

We finish by noting that this approach can be extended to also include expected team sum of costs minimisation by defining a cost structure (representing navigation duration for example) for each robot and using nested value iteration (NVI) [31] it generate policies. NVI enforces a strict preference of objectives, by choosing, from all policies that maximise the probability of success, the one that minimises the total expected costs.

Vi Evaluation

We implement STAPU with reallocation in the PRISM model checker [2] which supports solving MDPs for LTL properties and provides a rich set of structures that can be used to implement verification based planning methods. We evaluate the scalability of the approach with respect to the number of robots and tasks in the model. The number of times reallocation is performed depends on the the number of robots as well as the number of times these robots fail. Therefore we also evaluate how well the algorithm scales as the number of states leading to a fail state is increased.

For our evaluation we create a topological map for one of the environments in the Patrolling Sim simulator [32]. The map consists of 30 states. We assume homogeneous robots performing simple reachability tasks of the form and a safety task of the form . This is similar to a travelling salesman problem. NVI is used to generate policies for individual STAPU instances.

To illustrate the anytime behaviour of the algorithm, we do not generate policies for states which are less probable than a threshold of . We also impose a time out duration of 3 minutes, after which STAPU is terminated.

Tasks STAPU with reallocation Full Product
Time(s) Time(s)

2
240 0.159 3600 0.816
4 960 0.246 14400 2.729
6 3840 0.681 57600 29.413
8 15360 2.712 230400 406.467*

TABLE I: Comparing STAPU with reallocation and full product

Table I compares the size of states in STAPU and the full product approach as well as the planning time for 2 robots (30 MDP states each) with an increasing number of tasks.* denotes that the algorithm was timed out after 3 min. The full product approach was also implemented in PRISM, by running NVI over the full joint multi-robot MDP. For STAPU, the number of tasks dominates the size of the team MDP, which is linear with respect to the number of robots. Furthermore STAPU with reallocation is also able to compute policies much faster than the approach using the synchronised multi-robot MDP. For the same scenario with more than 2 robots, the multi-robot MDP did not get past the model construction.

Since STAPU with reallocation redistributes tasks among robots only when a robot fails, the algorithm depends on the number of states that could lead to fail states when following the generated policy. As the number of such states increases, more fail states will be discovered, increasing the required number of reallocation steps and therefore the computation time. However, the anytime nature of our approach allows us to control how many of these fail we want to explore. Fig 2 shows the increase in computation time as the number of tasks increases. FS indicates the number of states which lead to a fail state and R indicates the number of robots. Fig 3 shows the corresponding number of fail states explored. The dashed line marks the timeout limit, after which the algorithm was terminated. The trend in the fail states explored illustrates the anytime approach of the algorithm. Despite an increase in the size of the team MDP, the algorithm is able to generate policies that reallocate tasks for the most probable failures before it times out.

Fig. 2: Computation time (seconds) with respect to the number of tasks
Fig. 3: Fail states explored with respect to the number of task

Vii Conclusion

We presented an approach to simultaneous task allocation and planning in multi-robot systems which are operating in uncertain environments and prone to failures, using MDPs and LTL, and building on techniques form formal verification. We evaluated its applicability by developing a prototype implementation. Future work will include optimising the policy generation process, e.g. through the re-use of failure states; incorporating more realistic models of time and robot collisions; and exploiting our anytime approach for online policy execution.

References

  • [1] P. Schillinger, M. Bürger, and D. V. Dimarogonas, “Decomposition of finite ltl specifications for efficient multi-agent planning,” in DARS, 2016.
  • [2] M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of probabilistic real-time systems,” in CAV, 2011.
  • [3] A. Felner, R. S. Solomon, E. Shimony, I. E. Boyarski, I. M. Goldenberg, G. Sharon, N. Sturtevant, G. Wagner, and P. Surynek, “Search-Based Optimal Solvers for the Multi-Agent Pathfinding Problem: Summary and Challenges,” in SoCS, 2017.
  • [4] J. Scharpff, D. M. Roijers, F. A. Oliehoek, M. T. J. Spaan, and M. de Weerdt, “Solving multi-agent MDPs optimally with conditional return graphs,” in MSDM, 2015.
  • [5] I. Saha, R. Ramaithitima, V. Kumar, G. J. Pappas, and S. A. Seshia, “Automated composition of motion primitives for multi-robot systems from safe LTL specifications,” in IROS, 2014.
  • [6] S. Zhang, Y. Jiang, G. Sharon, and P. Stone, “Multirobot symbolic planning under temporal uncertainty,” in AAMAS, 2017.
  • [7] J. Tumova and D. V. Dimarogonas, “Multi-agent planning under local LTL specifications and event-based synchronization,” Automatica, vol. 70, 2016.
  • [8] S. Yoon and J. Kim, “Efficient multi-agent task allocation for collaborative route planning with multiple unmanned vehicles,” IFAC, vol. 50, no. 1, pp. 3580–3585, 2017.
  • [9] M. Turpin, K. Mohta, N. Michael, and V. Kumar, “Goal Assignment and Trajectory Planning for Large Teams of Aerial Robots,” 2013.
  • [10] W. E. Walsh and M. P. Wellman, “A market protocol for decentralized task allocation,” in ICMAS, 1998.
  • [11] F. Leofante, E. Ábrahám, T. Niemueller, G. Lakemeyer, and A. Tacchella, “On the synthesis of guaranteed-quality plans for robot fleets in logistics scenarios via optimization modulo theories,” CoRR, vol. abs/1711.04259, 2017.
  • [12] D. Smith, “Choosing Objectives in Over-Subscription Planning,” 2004.
  • [13] A. Desai, I. Saha, J. Yang, S. Qadeer, and S. A. Seshia, “DRONA: A Framework for Safe Distributed Mobile Robotics,” 2017.
  • [14] D. Claes, F. Oliehoek, H. Baier, and K. Tuyls, “Decentralised online planning for multi-robot warehouse commissioning,” in AAMAS, 2017.
  • [15] H. Ma and S. Koenig, “Optimal target assignment and path finding for teams of agents,” in AAMAS, 2016.
  • [16] I. Gavran, R. Majumdar, and I. Saha, “Antlab: A Multi-Robot Task Server,” ACM TECS, vol. 16, no. 5s, pp. 1–19, 2017.
  • [17] G. Wagner and H. Choset, “Path Planning for Multiple Agents Under Uncertainty,” in ICAPS, 2017.
  • [18] H. Ma, T. K. S. Kumar, and S. Koenig, “Multi-agent path finding with delay probabilities,” in AAAI, 2017.
  • [19] J. Scharpff, D. M. Roijers, F. A. Oliehoek, M. T. J. Spaan, and M. M. De Weerdt, “Solving Transition-Independent Multi-agent MDPs with Sparse Interactions (Extended version),” AAAI, 2016.
  • [20] M. Lahijanian and M. Kwiatkowska, “Specification revision for Markov decision processes with optimal trade-off,” in CDC, 2016.
  • [21] S. L. Smith, J. Tůmová, C. Belta, and D. Rus, “Optimal path planning for surveillance with temporal-logic constraints,” IJRR, vol. 30, no. 14, 2011.
  • [22] B. Lacerda, D. Parker, and N. Hawes, “Optimal policy generation for partially satisfiable co-safe LTL specifications,” in IJCAI, 2015.
  • [23] ——, “Optimal and dynamic planning for markov decision processes with co-safe LTL specifications,” in IROS, 2014.
  • [24] A. Viseras, V. Karolj, and L. Merino, “An asynchronous distributed constraint optimization approach to multi-robot path planning with complex constraints,” in SAC, 2017.
  • [25] M. Guo and M. M. Zavlanos, “Distributed data gathering with buffer constraints and intermittent communication,” in ICRA, 2017.
  • [26] C. Amato, G. Konidaris, A. Anders, G. Cruz, J. P. How, and L. P. Kaelbling, “Policy Search for Multi-Robot Coordination under Uncertainty,” RSS, 2015.
  • [27] D. Claes, P. Robbel, F. A. Oliehoek, K. Tuyls, D. Hennes, and W. van der Hoek, “Effective approximations for multi-robot coordination in spatially distributed tasks,” in AAMAS, 2015.
  • [28] A. Pnueli, “The temporal semantics of concurrent programs,” Theoretical Computer Science, vol. 13, 1981.
  • [29] O. Kupferman and M. Y. Vardi, “Model checking of safety properties,” Formal Methods in System Design, vol. 19, no. 3, 2001.
  • [30] C. Baier and J.-P. Katoen, Principles of model checking.   MIT press, 2008.
  • [31] B. Lacerda, D. Parker, and N. Hawes, “Nested value iteration for partially satisfiable co-safe ltl specifications,” in AAAI Fall Symposium on Sequential Decision Making for Intelligent Agents, 2015.
  • [32] D. Portugal and R. P. Rocha, “Distributed multi-robot patrol: A scalable and fault-tolerant framework,” Robot. Auton. Syst., vol. 61, no. 12, 2013.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
119912
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description