Integrated Task and Motion Planning for Multiple Robots under Path and Communication Uncertainties
Abstract
We consider a problem called task ordering with path uncertainty (TOPU) where multiple robots are provided with a set of task locations to visit in a bounded environment, but the length of the path between a pair of task locations is initially known only coarsely by the robots. The objective of the robots is to find the order of tasks that reduces the path length (or, energy expended) to visit the task locations in such a scenario. To solve this problem, we propose an abstraction called a task reachability graph (TRG) that integrates the task ordering with the path planning by the robots. The TRG is updated dynamically based on intertask path costs calculated using a samplingbased motion planner, and, a Hidden Markov Model (HMM)based technique that calculates the belief in the current path costs based on the environment perceived by the robot’s sensors and task completion information received from other robots. We then describe a Markov Decision Process (MDP)based algorithm that can select the paths that reduce the overall path length to visit the task locations and a coordination algorithm that resolves path conflicts between robots. We have shown analytically that our task selection algorithm finds the lowest cost path returned by the motion planner, and, that our proposed coordination algorithm is deadlock free. We have also evaluated our algorithm on simulated Corobot robots within different environments while varying the number of task locations, obstacle geometries and number of robots, as well as on physical Corobot robots. Our results show that the TRGbased approach can perform considerably better in planning and locomotion times, and number of replans, while traveling almostsimilar distances as compared to a closest first, no uncertainty (CFNU) task selection algorithm.
I Introduction
Multirobot task planning and path planning are important problems in multirobot systems when robots have to perform tasks at different locations within an environment. The problem is encountered in many applications of multirobot systems such as automated surveillance [1], robotic demining [2], and automated inspection of engineering structures [3]. As a motivating example, we consider a scenario for performing standoff detection of explosives or landmines using autonomous robots where multiple robots are provided with a coarse map containing locations of objects of interest. The robots are required to autonomously plan their paths to get in proximity of each object of interest so that they can analyse the object with their detection sensors. For realizing this, the main computational problem is to calculate a suitable task plan or ordering among the tasks for each robot so that a performance metric, such as the energy expended or the time required by robots to perform the tasks gets reduced. Researchers have proposed Multirobot Task Allocation (MRTA) techniques [4] as well as multirobot path planning techniques [5] to address this problem. However, on one hand, most MRTA techniques assume that the costs or distances between the task locations are fixed and known to all the robots as soon as they become aware of the task. This criterion might not be valid if the robots have a coarse map of the environment and the path cost between tasks can change dynamically as the robots discover obstacles in the environment, or if due to communication constraints, the delivery of a task completed message is delayed. On the other hand, path planning techniques account for dynamically discovered obstacles but they focus mainly on finding collision and conflictfree paths for robots and do not adjust the ordering between the waypoints or task locations that are being visited by the robots. Keeping a fixed order between tasks might result in unnecessary longer paths to complete the task schedule, especially when a dynamically updated path around an obstacle could induce a shorter task schedule.
To address the above problem, it would make sense to investigate techniques that have a closer integration between the task and motion planning operations of robots. Researchers have proposed the Simultaneous Task and Motion Planning (STAMP) problem to investigate this problem in the context of robotic manipulation [6]. Our work advances this direction of research by proposing a framework called Task Ordering under Path Uncertainty (TOPU) to address the STAMP problem in the context of a search and exploration scenario using wheeled, mobile robots. An example scenario shown in Figure 1 illustrates an example of the reduced time taken and distance traveled when robots use our proposed algorithm, in contrast with Closest First No Uncertainty (CFNU) algorithm used for task selection, that does not consider uncertainty in path costs and availabilities in its calculations. The main contributions of our paper are the following: We present a formalization of the problem called task ordering with path uncertainty (TOPU), where multiple robots have to visits tasks whose locations are uncertain due to the presence of obstacles in the environment, while reducing the distances traveled between the tasks. We propose a data structure called a task reachability graph (TRG) that is used to model the problem and a Markov Decision Process (MDP)based algorithm that each robot uses to dynamically calculate its task schedule in realtime using the TRG. We also propose a distributed coordination algorithm for resolving deadlock scenarios due to path conflicts between multiple robots using our algorithm. We have proved analytically that our task scheduling algorithm is optimal and the coordination algorithm is deadlock free. We also provide extensive experimental results on simulated and physical Coroware Corbot robots, with different number of robots and tasks within environments with different obstacles geometries and task distributions. Our results show that our proposed TRGbased approach could perform up to better in planning and locomotion times with fewer replans, while traveling similar distances as compared to a closest first, no uncertainty (CFNU) selection algorithm. The rest of our paper is structured as follows: in the next section, we discuss existing research on MRTA and motion planning techniques. In Section III, we formalize the TOPU problem and describe the robot task scheduling and multirobot coordination algorithms. Sections IV and V describe our analytical and experimental results and finally we conclude.
Ii Related Work
Motion and task planning have been important problems in robotics. Several approaches for solving them have been proposed in literature over the past two decades, although these problems have largely been treated separately. In motion planning the objective is to find a collision free path for a robot so that it can navigate within its environment [7]. Samplingbased motion planners like probabilistic roadmap (PRM) [8] and Rapidlyexploring Random Trees (RRT) [9] have been used widely for motion planning. Recently, researchers have proposed extensions to these techniques by using methods to reduce the time required to calculate the paths and address the problem of moving through narrow passages [10, 11, 12], and handling uncertainty in obstacle locations [13, 14]. Researchers have also investigated the problem of coordinating the paths between multiple robots [15, 5] where robots exchange their individual paths with each other and mutually exclusive paths are calculated for each robot in the robots’ joint configuration space. To reduce the complexity of planning in the joint configuration space, a lightweight protocol was proposed in [16] where robots iteratively make way for one robot at a time to reach its goal until a consistent set of maneuvers have been determined for all robots to reach their goal. We use a complementary approach in this paper that allows robots to mostly calculate path plans individually in their local configuration space but if a set of robots get within close proximity of each other they use a conflict resolution algorithm to find collision free paths. Path planning in dynamic environments where the cost between the source and goal locations can change abruptly was addressed in [17, 18] using Markov Decision Processes (MDPs). In our proposed approach, path costs are also dynamically updated using the robots’ sensor information, and the updated path costs are used immediately to recalculate the task schedule using an MDP to allow for switching between tasks to reduce the cost of the total path length to visit all the tasks.
The problem of finding a suitable ordering of operations or tasks to perform by multiple robots has been researched as the MultiRobot Task Allocation (MRTA) problem [19], excellent reviews of MRTA are available in [20, 4]. Most of the approaches focus mainly on finding a suitable ordering of tasks while assuming appropriate robot motion planning techniques. Recently, researchers have addressed more tightly coupled task and motion planning under the simultaneous task and motion planning (STAMP) problem. The proposed solution techniques combine symbolic task planning with control based techniques [21, 6, 22] for a mobile manipulation problem where task interdependencies form a critical aspect and reasoning using symbolic task planning is critical to determine the task precedence. In contrast, for our setting, reducing the cost of the task schedule is more critical than the order of tasks, and our algorithm uses probabilistic methods to quickly incorporate robots’ perceptions about the environment into its plan.
Iii Task Ordering with Path Uncertainty
We consider a set of wheeled, mobile robots, , deployed within an environment. Robots are capable of localizing themselves within the environment and can also communicate wirelessly with each other. The environment contains a set of tasks, . Robots have to visit the locations of tasks to perform operations required to complete the tasks. Each task can require visits by one of more robots to get completed; the information about how many robots are required to complete a task is provided a priori to the robots. We consider tasks that are loosely coupled and all robots required to complete a task do not necessarily need to visit the task’s location at the same time. Each robot is initially aware of the locations of the tasks, but does not know the exact paths between the tasks^{1}^{1}1In the rest of the paper, we have referred to task locations as tasks for legibility. or the obstacles along those paths. To represent this path uncertainty, each robot uses a task reachability graph (TRG), a fully connected graph with task locations as its vertices. Formally, a TRG is denoted by where:

is the vertex set and is the robot’s current location. Each corresponds to a task location the robot is aware of at time

is the edge set connecting the vertices in the TRG

is the expected distance or cost expended by a robot to traverse the path underlying edge . denotes the expected cost from the robot’s current location () to

is the probability that edge is not available
Owing to path uncertainties between task locations (TRG vertices), and are estimated from perceived sensor data and they get updated by the robot as it discovers obstacles and task availabilities while navigating between tasks. Let denote a function that returns an ordering over the set of tasks. Each robot maintains its own TRG and plans its path using its TRG. The problem facing each robot to find a suitable order for visiting the tasks is specified by the Task Ordering under Path Uncertainty (TOPU) problem below:
TOPU Problem. Given representing the set of tasks, intertask costs and task availabilities at time , determine a schedule that induces an ordering over the tasks, given by:
subject to:
(1) 
where is the battery available to a robot at current time . Note that represents the path through the TRG with the minimum expected cost, weighted with availability. The second constraint above ensures that the robot is able to complete this path with its currently available battery. Note that can change dynamically for a robot as tasks can get completed by other robots. An instance of the TOPU problem corresponds to the wellknown traveling salesman problem (TSP) [23]. However, solving the conventional TSP might not guarantee an optimal path as edge costs () could change dynamically as robots discover previously unknown obstacles while traveling between tasks, while edge availabilities () could change dynamically because some tasks got completed by other robots. To address the dynamic nature of the problem, we propose a Hidden Markov Model (HMM)based method to update the edge availabilities, and then use the updated information within an MDP to find the desired ordering of the TRG vertices to solve the TOPU problem. In the rest of the paper, for legibility, we have omitted the time notation from the TRG parameters, assuming it to be understood from context.
Iiia Dynamically Updating Edge Cost and Availability
Edge Cost Update. TRG edge costs correspond to the distance that the robot requires to travel to reach from one TRG vertex to another. Each edge cost is initialized to the Euclidean distance between the pair of TRG vertices forming the edge. However, if there exist previously unknown obstacles in the path between a pair of TRG vertices, then the distance the robot travels might exceed the Euclidean distance between the vertices. To accommodate the path distance uncertainty, the robot uses a probabilistic roadmap(PRM)based path planner [14] to dynamically update the expected edge cost. The PRM planner works by first generating a set of sampled points from the robot’s configuration space. It then uses the available information about obstacles perceived by the robot from its current location to determine path segments that are close to obstacles and might result in collision with high probability; such segments are associated with a high penalty value. Following [14], the TRG edge cost calculations are given by the following steps.

Calculate the cost of each path segment that connects any two sampled points ( is the free space in the environment) as:
where the probability of collision of segment based on its distance to perceived obstacles, is an arbitrary large number used to discourage paths that have a high probability of collision and is the Euclidean distance between and

Calculate the physical path corresponding to TRG edge as a sequence of path segments , given by:

Calculate the expected cost for TRG edge as the sum of the costs of path segments in path calculated in step above, as:
(2)
Edge Availability Update. In our scenario, tasks are completed in a distributed manner by different robots and a task (or, TRG vertex for a robot) might get completed by other robots before the robot reaches it. When a task is completed, the last robot visiting the task broadcasts a task completion message to all other robots. Each robot then needs to remove the TRG vertex for the task from its TRG. Because message communication in unstructured environments might be unreliable, task completion message might be lost due to noisy communication, or, the robot broadcasting the message might be outside the communication range of some robots. To handle this uncertainty, it would be useful if the robot could infer whether the task was still available or not, from information related to the task’s availability that it can directly observe. For our problem, we assume that this observable information related to task availability is the distance or path length remaining to reach the task  if the path length is very large, it could be due to the task becoming unavailable^{2}^{2}2Note that when a TRG edge is removed, it can be looked upon as the edge length becoming infinitely large.. One caveat to using the path length as an indicator of task availability is that it is also affected by obstacles along the path; it changes dynamically as the robot encounters obstacles while going towards the task. The problem facing the robot then is to observe the path length values over the recent past and infer from it whether the task is still available.
To model this inference problem, each robot uses a Hidden Markov Model (HMM) [24]; one HMM, , is used to update the availability of each TRG edge . The crucial HMM variable is Path Length Long () that evolves temporally as the robot encounters static obstacles () and mobile obstacles () on its range sensor, or, receives a task completion message called task not available (), as shown in Figure 2. Variables and are binaryvalued and they too evolve with time as the robot moves towards the task and encounters obstacles, or receives task completion message. The temporal transition model is given in Figure 2 via the arrows moving between dashed boxes, the sample temporal probabilities of these variables are also provided. The dependencies between these variables affecting are captured in each slice of the HMM as shown inside the dashed boxes in Figure 2. Because each of the variables affecting  static obstacles, mobile obstacles and task not available  do not affect each other and can be considered as independent of each other, their probabilistic effect on can be combined relatively easily from the individual inhibition probabilities for these variables using a noisyOR model. An example noisyOR based probability calculation for is shown alongside Figure 2. We assume that the environment has a communication success rate, or communication failure rate, leading to the probability value given in Figure 2.
To solve the problem of calculating the probability of a task being still available from values, the robot first calculates the observed value of variable for the current time step. For , the observed value of variable for current time step is determined by assuming that is very large when it is times more than the minimum cost of any edge in the current TRG, as given by the equation below:
(3) 
where is a user defined constant that is based on system and environment factors such as battery remaining, terrain and navigation conditions. The sequence of values for is recorded, and used to estimate the probability of the state variable , given by , using the ForwardBackward algorithm [24]. The forward stage is given by the equation:
(4) 
and the backwards stage is given by:
where is the combination of the set of state variables, and at time , is set of evidence (PLL observations) from time through , and is the value of the variable at time . Finally, to integrate the calculated value of with the TRG edge , we model the task availability as probabilistic availability of TRG edge . gets a value when the robot is certain that the task is available and there exists a finite distance path to reach it, and, when the path to reach the task is infinitely large meaning the task is not available; intermediate probabilities represent the uncertainty of the task not being completed by other robots and still remaining available to the robot. is calculated by normalizing Equation 3, given by:
(5) 
The normalization ensures that the robot has a probability of leaving TRG vertex through at least one of its incident edges.
IiiB TOPU Solution using Markov Decision Process
Following the update of the edge costs and availabilities, the robot has to select the TRG edge with the minimum expected cost, weighted with availability to solve the TOPU problem (Equation 1). Because of the uncertainties in edge costs and availabilities, a Markov Decision Process(MDP) is used to do this. An MDP [24] consists of a set of states, a set of actions to transition between states, along with a probability distribution and reward for each action at each state. The output of an MDP is a policy that prescribes an action at each state, which maximizes the cumulative, expected reward to the robot to reach a desired or goal state from its current state. A more thorough discussion on MDPs and solution techniques is given in [24]. For our TOPU problem, the TRG’s vertices, , represent the MDP’s states, the set of actions at each state (TRG vertex) of the MDP correspond to the edges from that TRG vertex, TRG edge availabilities give the transition probabilities between MDP states, while the inverse of the TRG edge costs correspond to the reward for reaching each state in the MDP (lower edge costs corresponds to larger rewards). The policy calculated by the MDP gives the maximum expected reward (or minimum expected cost, weighted by availability) for the robot to visit the TRG vertices. The MDP is solved using the value iteration algorithm, that solves the following equation:
(6) 
where is the inverse expected cost from the robots current location to , is a userdefined, rewarddiscount factor and is the probability that the robot will reach task when starting at task and attempting to follow the edge towards task which may or may not be the same task as . happens if the robot was to attempt going to task , but due to obstacles, communications, etc. it determines that it is better to instead go to task . The equation for is given below, which if , the probability is the edge availability, and if , then it is the probability of the edge not being available distributed evenly to the remaining tasks.
(7) 
IiiC Robot Navigation and Multirobot Path Coordination Algorithms
The main algorithm used by a robot for selecting tasks to visit is shown in Algorithm 1. The main idea of the algorithm is to select the task, , determined by the MDP policy, and plan a path to reach it. If the path results in potential collisions with other robots’ paths, path conflicts are resolved (line ). Every time the path cost to a task changes due to obstacles, or a task completed message from another robot is received, a TRG update is triggered (line ). This might result in switching the task the robot is headed to. The robot continues to move towards its currently selected task until it is reached and upon reaching the task its removes its vertex from the TRG and broadcasts task complete message to other robots (lines ).
The algorithm used to update the TRG is shown in Algorithm 2. When a robot’s TRG vertex set or path costs on the TRG change, it calculates a new navigation path to its destination vertex (lines ) and new edge availability values using its current perception in the HMM (line ). These updated values are incorporated into the MDP and the MDP’s policy is recalculated to yield the new destination vertex (line ). If the recalculated policy prescribes a new target vertex, then the robot performs a task switch and its destination vertex is changed from to (line ). The algorithm also handles the case where all tasks in a robot’s TRG might get completed by other robots before it reached those tasks; in that case the algorithm returns a null vertex and empty path(lines ) so that the robot stops.
IiiD Coordinating paths between robots to avoid collisions
If robots determine their paths individually using Algorithm 1, it could lead to robot collisions when the planned paths of two or more robots intersect with each other. To avoid this scenario, we have used a collision avoidance algorithm shown in Algorithm 3. Each robot uses the locations broadcast by other robots to check if there are other robots within a radius of , called the collision circle, of itself (lines ). When a set of robots are within the collision circle of each other, all the robots stop and the robots exchange their identifiers, representing their priorities, with each other. A leader election algorithm called the bully algorithm [25] is then used to select the robot with the highest priority as the winner. The winner robot holds the winner token, which gives it the right to move (lines ). All other robots in the collision circle, which do not hold the winner token, remain stationary (line ). The winner robot uses the PRM planner in conjunction with updating the TRG using Algorithm 2 to find a path to its destination vertex . The path returned by the PRM planner is executed and the moving robot releases the winner token once it is outside its collision circle (lines ). If the PRM planner is not able to find a path to the goal, e.g., if the goal is unreachable because there is another robot within the collision circle that is stopped right at the goal location, the moving robot relinquishes its right to move by setting its priority to a high value () (lines ). Another robot from within the set of stopped robots gets a chance to run the bully algorithm and attempts to move. This protocol ensures that at least one robot exits the collision circle with each execution of the bully algorithm, and finally there is only one robot left inside the collision circle. This robot then reverts to using Algorithm 1 to plan its path.
Iv Theoretical Results
(a)  (b) 
In this section, we prove some properties related to our proposed algorithms.
Iva Task Selection Algorithm Properties
Theorem IV.1.
Algorithm TRGTaskSelect finds a solution that is admissible, that is, it never overestimates the expected cost to a task calculated using the TRG.
Proof.
(By Contradiction.) Let us suppose there is another algorithm that selects TRG edges with lower expected cost than Algorithm TRGTaskSelect. Let and () respectively denote the task (vertex) selected by Algorithm TRGTaskSelect and , when the robot is at vertex . Using Equations 6 and 7, the MDP in Algorithm TRGTaskSelect (Alg. 2, line ) calculates the expected cost to reach from using TRG edge as . Algorithm , which does not follow the vertex recommendation made by Algorithm TRGTaskSelect, selects a different TRG edge , which has cost with probability and TRG edge with probability . The robot’s expected cost to reach using Algorithm is then . Since, by Equation 6, the MDP selects the edge with the minimum cost, it follows that, ; let , . By our assumption that the expected cost calculated by is lower than that of Algorithm TRGTaskSelect, we get:
(8)  
Because, by definition, and are both , we get , which is not valid as . Therefore, the expected cost returned by cannot be lower. Hence proved. ∎
Lemma IV.2.
The solution found by Algorithm TRGTaskSelect remains admissible when there are errors in the edge cost and availability estimates.
Proof.
Let and denote the estimated values and errors in edge costs and availabilities returned by the PRM and HMM respectively,(using Algorithm 2, lines and ), where and and . If the true cost and availability of edge is denoted by and respectively, and . For Algorithm TRGTaskSelect to not overestimate edge costs with these estimated values Theorem IV.1 should hold when they are used in Equation IVA. That is, . Substituting above values of and we get:
(9) 
As has limited domain of , and the above equation is linear in , we can check the two boundary points of the domain; if the inequality is satisfied for both boundaries, then it is satisfied for all points inside the domain. Note that then when , and when , . Substituting in Equation 9, we get . Since and , each of the terms on l.h.s. of the last inequality is and the inequality holds. Similarly, substituting in Equation 9, we get . Since when , we substitute the boundary values of in the last equation; when , we get which is valid from the definition of ; when , we get , which is valid, because, by definition . ∎
Theorem IV.3.
The solution found by Algorithm TRGTaskSelect is consistent.
Proof.
Suppose the robot is at vertex and is the next vertex selected by Algorithm TRGTaskSelect.
For consistency property, we need to show that for any . We prove by contradiction  suppose . Note that the TRG is a complete graph and vertices and form a triangle. Consequently, , where is the Euclidean distance between and . From Section IIIA, the path for navigating the robot along TRG edge has cost ; it is found by the path planner and from Equation 2, it is guaranteed to be the minimum cost, collisionfree path connecting and . In other words , satisfying . Because a task location has to be in the free space in the environment, and, so, the last inequality is valid for . Therefore, . Therefore, our assumption was invalid. Hence proved. ∎
Theorem IV.4.
The solution found by Algorithm TRGTaskSelect is optimal.
Proof.
Similar to Lemma IV.2, it can be shown that the solution found by Algorithm TRGTaskSelect is consistent and hence optimal, even with errors in path cost estimates.
IvB Completeness of Coordination Algorithm
Next, we analyse the synchronization properties of our proposed multirobot coordination algorithm to show that it does not give rise to deadlock or livelock conditions between robots, resulting in their inability to move and reach tasks. To facilitate this analysis, we consider the movement of the robots between sets, corresponding to robot states defined by the algorithm.
Let dist(, ) denote the Euclidean distance between robots . We define the collision circle for robot as dist(, ), where is the distance away from robot that we consider robots to be in immediate risk of collision. An example of a collision circle can be seen in Figure 4(a), where robot 1 has robots 2 and 3 in its collision circle. This means that . We next define the concept of a collision shape. A collision shape is the group of all robots that are either in each others collision circle, or, through sharing collision circles with other robots, can reach the collision circle of another robot without exiting any overlapping collision circles. As shown in Figure 4(b), robot 3 is not in the collision circle of robot 1, however it is in the same collision shape as robot 1, because by traveling through the collision circle of robot 2, robot 1 can reach the collision circle of robot 3. On the other hand, robot 6 is not in the collision shape of robot 1 because there is no way to move into the collision circle of robot 6 without leaving collision circles. To help define the collision shape, we first define a recursive helper set , corresponding to robot at recursive step , as and . As , becomes the set of all robots in the current collision shape. We can now define a collision shape as:
Definition 1.
Let denote a helper set of robot , as defined above. Then collision shape
The collision circle and collision shape of a robot get updated as the algorithm proceeds. We use and to denote the collision circle and collision shape of robot at round respectively. We now define three subsets of used in our analysis.

: The set of all robots in the collision shape of robot waiting for their turn to move

: The set of all robots in the collision shape of robot that are allowed to move

: The set of all robots in the collision shape of robot that have surrendered their right to move
We also define a set as the set of all robots that were originially in the collision shape of robot , but have since left. Formally, . The elements of the above sets are mutually exclusive and the subsets fully partition the set , in other words the following properties hold: , , , , and
Lemma IV.5.
If during round , there are robots currently waiting to be allowed to move, then in the next round, there has to be at least one robot that is selected for movement. That is, if then .
Proof.
Line 3 selects one element from each where for membership in . The robot selected can be a member of one of two sets, or . If , then , because will remain in . And if , then because in the next time step, , which means . Therefore, we can conclude that if , then in the next time step . ∎
Lemma IV.6.
coordinatePath deadlocks when no robots are allowed to move and there are no robots available to be selected for movement. This occurs when , and .
Proof.
As, is the termination case, is a necessary condition. When the robots deadlock, that means that for the rest of time, no robots are allowed to move. Or, mathematically, such that . By Lemma IV.5, this implies that . This implies that for each round that is in deadlock . ∎
Theorem IV.7.
Algorithm coordinatePath does not deadlock.
We analyse the operation of the collision avoidance algorithm as a method to move robots between two sets, and . Initially, all robots in are placed in , and from , movements through and are possible until the movement into is possible. Based on the above descriptions, a deadlock can only occur when no robot is allowed to move, meaning that no winners have been selected, and there are no available waiting robots to become winners.
Proof.
By Lemma IV.6, coordinatePath deadlocks when and . As , and , this implies that . Line 3 tests for this condition when all robots remaining in the collision shape have a priority of . This causes joint configuration space planning to be called, which is guaranteed to find paths for all robots, if such paths exist, or a failure when paths do not exist, but does not deadlock. If paths do exist, all of the robots in the suspended state are transitioned into the waiting state , on line 3. In other words and . The conditions for deadlock given in Lemma IV.6 no longer holds as . ∎
Definition 2.
A livelock occurs for algorithm coordinatePath when there is no robot in the collision shape that is able to reach the sink state X through a finite set of state transitions across the W, L, and S states.
Lemma IV.8.
Any cycle in the state transition graph that contains state cannot be a livelocking cycle
Proof.
Once a robot enters state , there are only two scenarios, stay in or exit state by going only to state . In the first case, there will come a time where either all robots have entered . This triggers a joint planning, which is guaranteed to not result in livelock. In the second case, the only way a robot in is allowed to reach is if another robot makes a transition from to . This implies, from the definition of livelock, that algorithm coordinatePath is not in livelock, because robots outside of the coordinate path algorithm can make progress towards their tasks uninhibited. ∎
Lemma IV.9.
There are only a finite number of cycles or paths through the state graph in Figure 5 that could feasibly result in a livelock scenario
Proof.
Based on Lemma IV.8, any cycle that moves through can be eliminated from the list, which leaves only two possible states to move through. Then based on the properties of finite state machines, all cycles which result in multiple visits of the same state in a row can be collapsed into the equivalent cycle visiting that state only once. As such, this leaves us with only three potential cycles, , , and . ∎
Lemma IV.10.
It is not possible for robots to eternally stay in state .
Proof.
All robots in must remain moving and may not remain stopped for any time not required for planning or sensing. The paths returned by PRM are the shortest path in the roadmap to the goal possition. Based on these two facts, the robot must either move out of the collision shape, or enter a collision with another robot causing its transition either into or depending on if the collision was with a higher priority robot, or if the collision caused all feasible paths for the robot to become blocked. ∎
Lemma IV.11.
It is not possible for robots to eternally cycle between sets to and back to .
Proof.
Transitions from to occur when a robot is the highest priority robot in its collision circle . Transitions from back to occur when a robot loses its position as the highest priority robot in its collision circle, when it was moving. The robot would still be able to move towards its goal if the higher priority robot was not there. Also, consider the worst case scenario, where, as each robot selected from to get into , moves and encounters a robot of higher priorty, causing it to transition from back to . Even assuming that contains only one robot at a time, as the set of robots is finite, and all robot ids are unique inside or , there must exist at least one robot with highest priority, which, once entering , can not be moved back to as there is no other higher prority robot to prevent its movement. This highest priority robot only has two destination states that it can reach, or ; either of which breaks the loop . Therefore, a scenario where a robot cycles between states cannot exist. ∎
Lemma IV.12.
It is not possible for robots to eternally stay in the state
Proof.
This would be considered a deadlock state. For robots to stay in , there must be a higher priority robot inside their collision circle that is selected to move to . All members of must remain in motion, otherwise they become members of , and no longer the highest priority robot in the collision circle, causing a new robot from to be selected. Robots in can not remain perennially inside the collision circle as the path planner PRM generates a path to goal. If a feasible path is found by the planner, the robot will leave the collision circle by transitioning from to . On the other hand, if a feasible path cannot be determined, the robot enters . ∎
Theorem IV.13.
Algorithm coordinatePath does not result in livelock.
Proof.
Livelock is when the robots are still able to move, however they make no progress towards completing their goals. To do this, the robots must remain in states that allow motion, but never reach their goal locations (tasks), or exit their collision shapes. Figure 5 shows all of the possible transitions between states of the coordinatePath algorithm. Through inspection of this figure, we can determine that livelock occurs when the robots follow any cycle in the graph without any robots entering state . This provides the following livelock possibilities for the robots:

Stay in

Stay in

Cycle between states states and
V Experimental Results
Tasks  Robots  Visits  Load 

5  3  1  1.67 
10  3  1  3.33 
5  1  1  5 
10  3  2  6.67 
10  1  1  10 
15  3  2  10 
15  1  1  15 
15  3  3  15 
(a)  (b) 
(c)  (d) 
To verify our approach, we have run several experiments on the Webots robot simulator using a model of the Coroware Corobot robot, as well as on physical Corobot robots. The Corobot is a 4wheel cm cm, skidsteer robot which we control in a differential wheel manner. It is equipped with an indoor Stargazer localization system [26] which provides 2D location and heading, Hokuyo laser distance sensor which provides a range up to m in distance at a resolution of , and wireless communications. The onboard computer has an 1.6GHz Intel Atom 330 processor with 4GB of RAM. The simulator provides a Gaussian distributed noise to sensor and actuator motions. The environments have a size of m; obstacles were placed at different locations within the environment. The number of tasks () was varied over and . The task locations were selected in a way that if there were a single robot in the environment following a closest first task selection strategy, it would cause the robot to switch tasks for of the replans it does. When there are multiple robots, one of them is selected arbitrarily and placed at a location that would effect the aforementioned task switching for replans. Any other robots are placed randomly in the environment, while keeping an even distribution. For the different settings we have used, the robot positions are the same for all runs in an environment. The number of robots () used were and . For each task a certain number of robots had to visit it to complete it; the variable for visits per task () was varied over , , and . To present our results in a concise manner, we have used a parameter representing each robot’s average task load, given by . The different combinations of and , and corresponding values of used for our experiments are given in Table I. Userdefined constants were set to (Discount rate of MDP in Equation 6) and (PLL parameter in Equation 3), unless otherwise stated. Probability values used in the HMM were similar to those given in Figure 2, which were based on average times for a robot for avoiding static and mobile obstacles within the environments used for the experiments. All results were averaged over simulation runs. We have compared the quality of task ordering performed by our proposed technique with an approach where the task ordering is done using CFNU [27]  each robot selects the task that has the least cost on its TRG from its current location, without modeling uncertainties in the intertask paths or updating TRG edge availabilities. The CFNU algorithm switches tasks when another task is closer to it than the currently selected task, as measured using straight line distance. To enable comparison, both task ordering approaches use the same underlying path planner and multirobot coordination mechanism, when necessary. We have reported three metrics for quantifying the performance of the algorithms  the distances traveled by the robots, the number of replans (with and without task switches) and the times (planning and locomotion) taken to visit all tasks. Also, to understand only the performance of our MDPbased task selection method, we have reported the metrics separately for single robot scenarios, where . That is,there is only robot that has to visit all tasks once, and edge availability is affected only by previously unknown obstacles; effects due to communication uncertainties of TaskComplete messages sent by other robots do not arise.
(a)  (b) 
(c)  (d) 
Figures 6(a) and (b) show the number of replans made by each robot, resulting in and not resulting in task switches respectively for the two algorithms for the different load values shown in Table I. We observe that, on average, the TRGbased approach results in less replanning and less task switching than the CFNU approach. The reduced planning and task switching by the TRGbased algorithm can be attributed to its ability to reason more efficiently about task availabilities using its costs and beliefs about paths in the MDP based approach, along with realtime sensor data incorporated into its decisions using the HMM. In contrast, the CFNU approach uses only Euclidean distances to select tasks and consequently performs poorly. In Figures 6(c) and (d), the number of replans resulting in and not resulting in tasks switches are shown for the single robot cases. We see that the switching replans are the same for both approaches because there are no other robots in the environment which could complete the task before the robot reaches it first. We also see that the TRG performs fewer nonswitching replans.
(a)  (b) 
In Figures 7(a) and (b), we show the average time taken for both approaches, this includes path planning and task ordering times, and, locomotion times for each robot, which includes time taken to resolve interrobot path conflicts using Algorithm 3. The TRGbased approach takes much less time for both planning and navigation compared to the CFNU approach. In this case, the TRGbased approach requires up to less planning time, and, up to lower locomotion and coordination times than the CFNU approach. This is because the TRGbased approach accounts for both the known obstacles between tasks and the likelihood that the task will become unavailable. The CFNU approach behaves myopically and selects the closest task to visit, which could be on the other side of a large obstacle and require considerable planning and locomotion times to reach. In contrast, the TRGbased approach uses the robotâs perception of the environment to weight the path costs to tasks with the corresponding path belief to reduce the overall path costs. Note that when the number of tasks is small, or the average task load per robot is close to 1, both algorithms have comparable performance for all three metrics as each robot has to visit only one task and there is no task ordering required. Figures 7(c) and (d) show the average time taken by TRG and CFNU approaches in the single robot case as the average robot load increases. We can see that our TRG approach takes less planning time compared to the CFNU approach.
We also observe that as the average task load of the robots increase (from left to right on the xaxis), the distances traveled by the robots increases. The robots using the TRGbased approach travel distances between (more) to (less) than the CFNU approach, with an average improvement of about (less distance traveled) across all experiments. The TRGbased approach sometimes travels a small amount more than the CFNU approach when the robot decides to abandon its current task for another task. In that case, the CFNU approach will switch as soon as the other task becomes closer, which in some cases is the best decision, where as the TRGbased approach will continue to follow its previous task even though another task is closer. In some cases this might be the best thing to do because the closer task might be on the other side of a wall that the robot has yet to discover and actually require more distance to explore enough of the wall to realize that the task is no longer the closest; TRG performs better in such scnearios. In some of the environments with a larger load value, the TRGbased approach starts to perform better.
We also tested our proposed algorithm on physical Corobot robots. We used the environment shown in Table II (left) where the white dots represent the tasks that the robot must visit. This environment was also designed such that a CFNU closesttaskfirst algorithm would switch of the time. We tested for a fixed value of , and tested for , , and . Results were averaged over three runs, and are shown in Table II (right). In both environments, the robots were able to navigate and visit the tasks with the required number of visits. In comparing these results to those shown in Figure 6, we can see that the hardware performed fairly similar. For example, Table II shows that the five tasks, one robot, one visit environment had four switching replans, which is within the margin of error for the switching replans.
Rob.  Vis.  Dist. Traveled (cm)  # Switches  # Nonswitches  Plan Time (s)  Navigation Time (s)  

Vi Conclusions and Future work
In this paper, we introduced the TOPU problem where robots have to determine the order to visit a set of task locations when the path costs between a pair of tasks can vary dynamically as the robot discovers obstacles while navigating between tasks. We proposed a data structure called a task reachability graph (TRG) along with techniques to integrate the task ordering with uncertainty in path costs and availabilities. Our analytical results show that our proposed algorithm is optimal and complete. The algorithm’s performance was also evaluated extensively through experiments and was shown to result in reduced time and fewer computations (less task switching) as compared to an algorithm that does not consider uncertainty in path costs and task availabilities. As future work, we are considering analyzing situations with stricter constraints such as a partial ordering over the task set. In the present work, multirobot coordination is handled using a lightweight coordination mechanism where all robots, except one, stop. We are investigating techniques to integrate tighter, but fast, coordination approaches to improve the coordination of robots, such as exchanging path plans and calculating a plan in the joint configuration space, only when robots get within close proximity of each other.
References
 [1] R. Zlot and A. Stentz, “Marketbased multirobot coordination for complex tasks,” I. J. Robotic Res., vol. 25, no. 1, pp. 73–101, 2006.
 [2] W. Lenagh, P. Dasgupta, and A. MuñozMeléndez, “A spatial queuingbased algorithm for multirobot task allocation,” Robotics, vol. 4, no. 3, pp. 316–340, 2015.
 [3] S. Rutishauser, N. Correll, and A. Martinoli, “Collaborative coverage using a swarm of networked miniature robots,” Robotics and Autonomous Systems, vol. 57, no. 5, pp. 517–525, 2009.
 [4] G. A. Korsah, A. Stentz, and M. B. Dias, “A comprehensive taxonomy for multirobot task allocation,” I. J. Robotic Res., vol. 32, no. 12, pp. 1495–1512, 2013.
 [5] G. Wagner and H. Choset, “Subdimensional expansion for multirobot path planning,” Artif. Intell., vol. 219, pp. 1–24, 2015. [Online]. Available: http://dx.doi.org/10.1016/j.artint.2014.11.001
 [6] I. Sucan and L. Kavraki, “Accounting for uncertainty in simultaneous task and motion planning using task motion multigraphs,” in IEEE International Conference on Robotics and Automation, St. Paul, May 2012, pp. 4822–4828.
 [7] H. Choset, W. Burgard, S. Hutchinson, G. Kantor, L. Kavraki, K. Lynch, and S. Thrun, Principles of Robot Motion: Theory, Algorithms, and Implementation. MIT Press, June 2005.
 [8] R. Bohlin and L. Kavraki, “Path planning using lazy prm,” in ICRA, vol. 1, 2000, pp. 521–528.
 [9] S. Lavalle, “Rapidlyexploring random trees: A new tool for path planning,” Tech. Rep., 1998.
 [10] J. Gammell, S. Srinivasa, and T. Barfoot, “Informed rrt*: Optimal samplingbased path planning focused via direct sampling of an admissible ellipsoidal heuristic,” in 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems, Chicago, IL, USA, September 1418, 2014, 2014, pp. 2997–3004. [Online]. Available: http://dx.doi.org/10.1109/IROS.2014.6942976
 [11] K. Hauser, “Lazy collision checking in asymptoticallyoptimal motion planning,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on, May 2015, pp. 2951–2957.
 [12] C. Voss, M. Moll, and L. Kavraki, “A heuristic approach to finding diverse short paths,” in Robotics and Automation (ICRA), 2015 IEEE International Conference on, May 2015, pp. 4173–4179.
 [13] M. Kneebone and R. Dearden, “Navigation planning in probabilistic roadmaps with uncertainty,” in International Conference on Automated Planning and Scheduling, 2009. [Online]. Available: http://aaai.org/ocs/index.php/ICAPS/ICAPS09/paper/view/717/1113
 [14] P. Missiuro and N. Roy, “Adapting probabilistic roadmaps to handle uncertain maps,” in Robotics and Automation, 2006. ICRA 2006. Proceedings 2006 IEEE International Conference on, 2006, pp. 1261–1267.
 [15] V. Desaraju and J. How, “Decentralized path planning for multiagent teams with complex constraints,” Autonomous Robots, vol. 32, no. 4, pp. 385–403, 2012.
 [16] R. Luna and K. Bekris, “Efficient and complete centralized multirobot path planning,” in International Conference on Intelligent Robots and Systems, IROS 2011, 2011, pp. 3268–3275.
 [17] B. Marthi, “Robust navigation execution by planning in belief space,” in Proceedings of Robotics: Science and Systems, Sydney, Australia, July 2012.
 [18] S. Loibl, D. MeyerDelius, and P. Pfaff, “Probabilistic timedependent models for mobile robot path planning in changing environments,” in 2013 IEEE International Conference on Robotics and Automation, Karlsruhe, Germany, May 610, 2013, 2013, pp. 5545–5550.
 [19] B. Gerkey and M. Mataric, “A formal analysis and taxonomy of task allocation in multirobot systems,” The International Journal of Robotics Research, vol. 23, no. 9, pp. 939–954, 2004. [Online]. Available: http://ijr.sagepub.com/content/23/9/939.abstract
 [20] M. B. Dias, R. Zlot, N. Kalra, and A. Stentz, “Marketbased multirobot coordination: A survey and analysis,” Proceedings of the IEEE, vol. 94, no. 7, pp. 1257–1270, July 2006.
 [21] L. Kaelbling and T. LozanoPérez, “Integrated task and motion planning in belief space,” International Journal of Robotics Research, vol. 32, no. 910, 2013.
 [22] J. Wolfe, B. Marthi, and S. Russell, “Combined task and motion planning for mobile manipulation,” in International Conference on Automated Planning and Scheduling, Toronto, Canada, 2010.
 [23] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms. MIT Press, June 2009.
 [24] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Prentice Hall, June 2009.
 [25] H. GarciaMolina, “Elections in a distributed computing system,” IEEE Trans. Computers, vol. 31, no. 1, pp. 48–59, 1982.
 [26] Stargazer from hagisonic inc. [Online]. Available: http://eng.hagisonic.kr/cnt/prod/prod010102?uid=11$&$cateID=2$&$fieldName=$&$orderBy=
 [27] B. Woosley and P. Dasgupta, “Multirobot task allocation with realtime path planning,” in FLAIRS Conference, 2013, pp. 574–579.