DeepMNavigate: Deep Reinforced Multi-Robot Navigation Unifying Local & Global Collision Avoidance
We present a novel algorithm (DeepMNavigate) for global multi-agent navigation in dense scenarios using deep reinforcement learning (DRL). Our approach uses local and global information for each robot from motion information maps. We use a three-layer CNN that takes these maps as input to generate a suitable action to drive each robot to its goal position. Our approach is general, learns an optimal policy using a multi-scenario, multi-state training algorithm, and can directly handle raw sensor measurements for local observations. We demonstrate the performance on dense, complex benchmarks with narrow passages and environments with tens of agents. We highlight the algorithm’s benefits over prior learning methods and geometric decentralized algorithms in complex scenarios.
FigFig. LABEL:#1 \newrefformatfigFig. LABEL:#1 \newrefformatparSec. LABEL:#1 \newrefformatappenAppendix LABEL:#1 \newrefformatsecSec. LABEL:#1 \newrefformatsubSec. LABEL:#1 \newrefformattableTable LABEL:#1 \newrefformattabTable LABEL:#1 \newrefformatalgAlgorithm LABEL:#1 \newrefformatAlgAlgorithm LABEL:#1 \newrefformatDefDefinition LABEL:#1 \newrefformatThmTheorem LABEL:#1 \newrefformatstepStep LABEL:#1 \newrefformatlnLine LABEL:#1 \newrefformateqEq. LABEL:#1 \newrefformatpbProblem LABEL:#1 \newrefformatitItem LABEL:#1 \newrefformatteTerm LABEL:#1 \newrefformatEqEquation (LABEL:eq:#1)
Multi-robot systems are increasingly being used for different applications, including surveillance, quality control systems, autonomous guided vehicles, warehouses, cleaning machines, etc. A key challenge is to develop efficient algorithms for navigating such robots in complex scenarios while avoiding collisions with each other and the obstacles in the environment. As larger numbers of robots are used, more efficient methods are needed that can handle dense and complex scenarios.
Multi-agent navigation has been studied extensively in robotics, AI, and computer animation. At a broad level, previous approaches can be classified into centralized [32, 33, 18, 24, 28, 36] or decentralized planners [31, 9, 12, 7]. One benefit of decentralized methods is that they can scale to a large number of agents, though it is difficult to provide any global guarantees on the resulting trajectories  or to handle challenging scenarios with narrow passages (see Figures 1 or 2).
Recently, there has been considerable work on developing new, learning-based planning algorithms [6, 21, 4, 5, 15, 16] for navigating one or more robots through dense scenarios. Most of these learning methods tend to learn an optimal policy using a multi-scenario, multi-stage training algorithm. However, current learning-based methods are limited to using only the local information and do not exploit any global information. Therefore, it is difficult to use them in dense environments or narrow passages.
Main Results: We present a novel, multi-robot navigation algorithm (DeepMNavigate) based on reinforcement learning that exploits a combination of local and global information. We use a multi-stage training scheme that uses various multi-robot simulation scenarios with global information. In terms of training, we represent the global information using a motion information map that includes the location of each agent or robot. We place the robot information in the corresponding position to generate a bit-map and use the resulting map as an input to a three-layer CNN. Our CNN considers this global information along with local observations in the scene to generate a suitable action to drive each robot to the goal without collisions. We have evaluated our algorithm in dense environments with many tens of robots (e.g., 90 robots) navigating in tight scenarios with narrow passages. As compared to prior multi-robot methods, our approach offers the following benefits:
We use global knowledge in terms of motion information maps in our network to improve the performance of DRL. This also results in higher reward value.
Our approach scales with the number of robots and is able to compute collision-free and smooth trajectories. It takes many tens of seconds on a PC with 32-core CPU and one NVIDIA RTX GPU on multi-robot systems with robots.
Our algorithm is general and can handle all kinds of scenes. We highlight its performance on benchmarks that are quite different from training scenarios.
Our algorithm easily handle challenging multi-robot scenarios like inter-changing robot positions or multiple narrow corridors, which are difficult for prior geometric decentralized or local learning methods. We highlight the performance on five difficult environments.
Ii Related Work
Ii-a Geometric Multi-Robot Navigation Algorithms
Most prior algorithms are based on geometric techniques such as sampling-based methods, geometric optimization, or complete motion planning algorithms. The centralized methods assume that each robot has access to complete information about the state of the other robots based on some global data structure or communication system [17, 27, 35, 30] and compute a safe, optimal, and complete solution for navigation. However, they do not scale to large multi-robot systems with tens of robots. Many practoca; geometric decentralized methods for multi-agent systems are based on reciprocal velocity obstacles  or its variants . These synthetic methods can be used during the training phase of learning algorithms.
Ii-B Learning-Based Navigation Methods
Learning-based collision avoidance techniques usually try to optimize a parameterized policy using the data collected from different tasks. Many navigation algorithms adopt a supervised learning paradigm to train collision avoidance policies. Muller et al.  present a vision-based static obstacle avoidance system using a 6-layer CNN to map input images to steering angles. Zhang et al.  describe a successor-feature-based deep reinforcement learning algorithm for robot navigation tasks based on raw sensory data. Barreto et al.  apply transfer learning to deploy a policy for new problem instances. Sergeant et al.  propose an approach based on multimodal deep autoencoders that enables a robot to learn how to navigate by observing a dataset of sensor inputs and motor commands collected while being tele-operated by a human. Ross et al.  adapt an imitation learning technique to train reactive heading policies based on the knowledge of a human pilot. Pfeiffer et al.  map the laser scan and goal positions to motion commands using expert demonstrations. To be effective, these methods need to collect training data in different environments, and the performance is limited by the quality of the training sets.
To overcome the limitations of supervised-learning, Tai et al.  present a mapless motion planner trained end-to-end without any manually designed features or prior demonstrations. Kahn et al.  propose an uncertainty-aware model-based learning algorithm that estimates the probability of collision, then uses that information to minimize the collisions at training time. To extend learning-based methods to highly dynamic environments, some decentralized techniques have been proposed. Godoy et al.  propose a Bayesian inference approach that computes a plan that minimizes the number of collisions while driving the robot to its goal. Chen et al. [4, 3] and Everett et al.  present multi-robot collision avoidance policies based on deep reinforcement learning, requiring the deployment of multiple sensors to estimate the state of nearby agents and moving obstacles. Yoon et al.  extend the framework of centralized training with decentralized execution to perform additional optimization for inter-agent communication. Fan et al.  and Long et al. [15, 16] describe a decentralized multi-robot collision avoidance framework where each robot makes navigation decisions independently without any communication with other agents. It has been extended in terms of multiple sensors and explicit pedestrian motion prediction . Other methods account for social constraints . However, all these methods do not utilize global information about the robot or the environment, which could be used to improve the optimality of the resulting paths or handle challenging narrow scenarios.
Iii Multi-Robot Navigation
Iii-a Problem Formulation and Notation
We consider the multi-robot navigation problem for non-holonomic differential drive robots. Our goal is to design a scheme that avoids collisions with obstacles and other robots and works well in dense and general environments. We describe the approach for 2D, but it can be extended to 3D workspaces and to robots with other dynamics constraints.
Let the number of robots be . We represent each robot as a disc with radius . At each timestep , the -th robot has access to an observation and then computes an action that drives the -th robot towards its goal from the current position . The observation of each robot includes four parts: , where denotes the sensor measurement (e.g., laser sensor) of its surrounding environment, stands for its relative goal position, refers to its current velocity, and is the robot motion information, which includes the global state of the system, discussed in Section IV. In this paper, we focus on analyzing and incorporating motion information in the navigation system. Meanwhile, there are static obstacles in the environment. We use to denote the area occupied by a static -th obstacle. The computed action drives the robot to its goal while avoiding collisions with other robots and obstacles within the time horizon until the next observation is received.
Let be the set of trajectories for all robots, subject to the robot’s kinematic constraints, i.e.:
Iii-B Multi-Agent Navigation Using Reinforcement Learning
Our approach builds on prior reinforcement learning approaches that use local information comprised of various observations. Some of them only utilize three of the four elements mentioned in III-A. The term may include the measurements of the last three consecutive frames from a sensor. The relative goal position in these cases is a 2D vector representing the goal position in polar coordinates with respect to the robotâs current position. The observed velocity includes the current velocity of the robot. These observations are normalized using the statistics aggregated over training. The action of a differential robot includes the translational and rotational velocity, i.e. . We use the following reward function to guide a team of robots:
When the robot gets closer or reaches its goal, it is rewarded as
When there is a collision, it is penalized using the function
In addition to collision avoidance, one of our goals is generating a smooth path. A simple technique is to impose penalties whenever there are large rotational velocities. This can be expressed as
, , and are parameters to control the reward.
Iv DeepMNavigate: Trajectory Computation Using Global Information
In this section, we present our novel, learning-based, multi-agent navigation algorithm that uses positional information of other agents. Our formulation is based on motion information maps and uses a -layer CNN to generate a suitable action for each agent.
Iv-a Motion Information Maps
Prior rule-based decentralized methods such as  use information corresponding to the position and velocity of each agent to compute a locally-optimal and collision-free trajectory. Our goal is to compute similar state information to design better learning-based navigation algorithms. Such state information can be either gathered based on some communication with nearby robots or computed using a deep network that uses raw sensor data. In our formulation, we use maps that consist of each agent’s location as input. In particular, we use two different map representations: one corresponds to all the robots based on the world coordinate system and is called the global-map; the other map is centered at each robot’s current location and uses the relative coordinate system and is called the local-map.
We use the following method to compute the global-map and the local-map. During each timestep , we specify the -th robot’s position in the world frame as . We also use the goal positions , obstacle information , to build the map for the -th robot. Assume the size of the simulated robot scenario is , where represents the height, represents the width, and the origin of the world frame is located at . Each pixel of the map () indicates which kind of object lies in the small area in the world frame. Assuming each object’s radius is , then corresponds to:
where “1” represents the corresponding robot, “2” represents the neighboring robots, “3” represents the robot’s goal, “4” represents the obstacles and “0” represents the empty background (i.e. free space). This is highlighted in \prettyreffig:pipeline.
In some scenarios, there could be a restriction on the robot’s movement in terms of static obstacles or regions that are not accessible. Our global-map computation takes this into account in terms of representing . However, these maps may not capture a scene with no clear boundaries. In these cases, we use the local-map for each agent instead of considering the whole scenario with size . These maps only account for information in a relatively small neighborhood with fixed size . The size of the local neighborhood () can be tuned to find a better performance for different applications. In addition to the position information, these maps may contain other state information of the robot, including velocity, orientation, or dynamics constraints as additional channels.
Iv-B Proximal Policy Optimization
We use proximal policy optimization  to optimize the overall system. This training algorithm has the stability and reliability of trust-region methods: i.e., it tries to compute an update at each step that minimizes the cost function while ensuring that the deviation from the previous policy is relatively small. The resulting proximal policy algorithm updates the network using all the steps after several continuous simulations (i.e. after each robot in the entire system reaches its goal or stops running due to collisions) instead of using only one step to ensure the stability of network optimization. In these cases, if we store the robot positions or motion information as dense matrices corresponding to the formulation in \prettyrefeq:map, it would require a significant amount of memory and also increase the overall training time. Instead, we use a sparse matrix representation for . We compute the non-zero entries of based on the current position of each robot, the goal positions and obstacle information using \prettyrefeq:map. To feed the input to our neural network, we generate dense representations using temporary sparse storage. This design choice allows us to train the system using trajectories from 58 agents performing 450 actions using only 2.5GB memory. More details on the training step are given in \prettyrefsec:exp.
To analyze a large matrix and produce a low-dimensional feature for input , we use a convolutional neural network (CNN) because such network structures are useful for handling an image-like input (or our map representation). Our network has three convolutional layers with architecture, as shown in \prettyreffig:pipeline. Our network extracts the related locations of different objects in the environment and could guide the overall planner to avoid other agents and obstacles to reach the goal.
Our approach to handling raw sensor data (e.g., 2D laser scanner data) uses the same structure as local methods, i.e. a two-layer, 1D convolutional network. Overall, we use a two-layer, fully-connected network that takes as input the observation features, including the feature generated by the 1D & 2D CNNs, related goal position, and observed velocity. This generates the action output and local path for each robot. We highlight the whole pipeline of the network for local and global information in \prettyreffig:pipeline.
|Layer||Convolutional Filter||Stride||Padding||Activation Fuction||Output Size|
|Max Pooling 1||‘SAME’||-|
|Max Pooling 2||‘SAME’||-|
|Max Pooling 3||‘SAME’||-|
Iv-D Network Training
Our training strategy extends the method used by learning algorithms based on local information [15, 4]. To accelerate the training process, we divide the overall training computation into two stages. In the first stage, we use robots (e.g. k=) with random initial positions and random goals in a fully-free environment. In the second stage, we include a more challenging environments, such as a narrow passage, random obstacles, etc. We generate a high number of scenes with different kinds of narrow passages and random obstacles with up to or robots and many tens of obstacles. These varying training environments and the large number of robots in the system can result in a good overall policy. Moreover, the use of global information results in much larger network parameters. We use a FC layer, which is more difficult to train than a relatively simple, small network that only accounts for local information. To accelerate the training process and generate accurate results, we do not train the entire network from scratch, instead use the multi-stage approach. During the first stage, we retrain the network with additional structures corresponding to the global information (i.e. the global-map) using the pretrained local information network part afterward. We highlight the reward as a function of the training time during the second stage of the training and compare it with the local method  in \prettyreffig:reward. Notice that, since we use a 2D convolutional neural network, our overall training algorithm needs more time during each iteration of training. As a result, we do not perform the same number of training iterations as, as shown in \prettyreffig:reward.
V Implementation and Performance
In this section, we discuss the performance of our multi-agent navigation algorithm (DeepMNavigate) on complex scenarios and highlight the benefits over prior reinforcement learning methods that only use local information .
During simulation, the radius is set as . In the current implementation, we set for the global-map and for each local-map. In both situations, we set . Although a larger map (e.g., ) could include more details from the system, it would significantly increase the network size and the final running time. For instance, the memory requirement of CNN increases quadratically with the input size. In current implementation, we use a GPU NVIDIA RTX 2080 Ti with 11GB. For parameters in the reward function, we set , , , and . Current parameter choices let the network converge relatively faster while achieve higher reward, reaching a balance between training time and accuracy.
|Metrics||Methods||# of agents (cirle radius (unit:))|
|30 (8)||40 (8)||50 (8)||60 (8)||70 (8)||80 (12)||90 (12)|
|Metrics||Methods||# of agents|
|Metrics||Methods||# of agents|
|Metrics||Methods||# of agents|
V-B Evaluation Metrics and Benchmarks
To evaluate the performance of our navigation algorithm, we use the following metrics:
Success rate: the ratio of the number of robots reaching their goals in a certain time limit without any collisions to the total number of robots in the environment.
Collision or stuck rate: if the robot cannot reach the destination within a limited time or collides, they are considered as getting stuck or in-collisions, respectively.
Extra time: the difference between the average travel time of all robots and the lower bound of the robots’ travel time. The latter is computed as the average travel time when going straight towards the goal at the maximum speed without checking for any collisions.
Average speed: the average speed of all robots during the navigation.
We have evaluated our algorithm in five challenging and representative benchmarks:
Narrow Corridor: Two groups of robots exchange their positions through a narrow corridor. This benchmark is hard for geometric decentralized methods , which cannot navigate robots through narrow passages (\prettyreffig:corridor).
Room with Obstacles: The robots cross across a room full of obstacles, from one side to the other. Methods only using local information  will spend more time finding the path to the goal and may even fail due to lacl of global information (\prettyreffig:obstacles).
Random Starts and Goals: The robots start from random initial positions and moves to random goal positions. Also, there are obstacles with random locations and orientation. Global information will help find safer and faster paths (\prettyreffig:random_scene).
Room Evacuation: The robots start from random initial positions and evacuate to the outside of the room. They need to cross one small door while avoiding collisions (\prettyreffig:evacuation).
V-C New and Unseen Benchmarks
We have evaluated the performance on benchmarks that are quite different from training data or use different number of agents. In the Circle Crossing benchmark, we only train with agents, but evaluate in a similar scene with agents. Furthermore, some of the benchmarks like Narrow Corridor and Room Evacuation are quite different from the training dataset. DeepMNavigate is still able to compute collision-free and smooth trajectories for all the agents with no collisions and each agent arrives at its goal position. As shown in Fig. 2, the local learning method  does not consider the global map information and fails in such scenarios. In contrast, our approach enable robots to learn a reciprocal navigation behavior according to the global map information without any communication on action decision.
V-D Quantitative Evaluation
We have evaluated the performance of our algorithm in terms of different evaluation metrics described above. In the circle crossing scenario, the failure rate of  rises with the increasing number or the density of robots. However, our approach always results in a stable performance and can avoid the deadlock situation. At times, some of the robots may need to take a longer path to avoid congestion and this could reduce the robot’s efficiency. As a result, we obtain better performance as compared to  or the decentralized collision avoidance methods in high density benchmarks like Circle Crossing.
Different from the circle crossing scenario, the other benchmarks incorporate some static obstacles in the environment. In this case, our method integrates the map information into the policy network and utilizes that information to handle such static obstacles and narrow passages. As the experimental results shown, our approach outperforms  both in terms of success rate and efficiency metric. In addition, our method behaves robustly even with the increasing density of robots.
Another important criteria to evaluate the performance of multi-robot systems is the collision or stuck rate, which is a measure of the number of measures. As shown in Tables II and III, our collision rate is zero for all these benchmarks. On the other hand, techniques based on local navigation information result in some number of failures on different benchmarks. Furthermore, the failure increases as the number of agents or the density increases.
The running times for different numbers of robots are shown in \prettyreffig:running_time. We can observe linear time performance with number of agents. That is because the decision process of our method is independent, each agent can compute their action by itself, based on the information it receives.
Vi Conclusion, Limitations, and Future Work
We present a novel, multi-agent navigation algorithm based on deep reinforcement learning. We show how global information about the environment can be used based on the global-map and present a novel network architecture to compute a collision-free trajectory for each robot. We highlight its benefits over many challenging scenarios and show benefits over geometric decentralized methods or reinforcement learning methods that only use local information. Even with training on simple scenarios, our DeepMNavigation algorithm can handle complex scenarios robustly, which cannot be handled by prior learning-based methods or decentralized multi-agent navigation algorithms. Furthermore, we demonstrate the performance on new, unseen benchmarks that are different from training scenarios. Overall, our approach demonstrates the value of global information in terms of discretized maps for DRL-based ethod.
Our results are promising and there are many ways to improve the performance. We need better techniques to compute the optimal size of the global-map and the local-map, and we can also include other components of the state information like velocity, orientation, or dynamics constraints. We also need to extend the approach to handle general dynamic scenes where no information is available about the motion of the moving obstacles. The use of global information increases the complexity of the training computation and we use a two-stage algorithm to reduce its running time. One other possibility is to use an auto-encoder to automatically derive the low-dimensional feature representations and then use them as a feature extractor. It may be possible to split the global and local navigation computation to combine our approach with local methods to avoid collisions with other agents or dynamic obstacles [31, 9]. Such a combination of local and global methods has been used to simulate large crowds  and it may be useful to develop a similar framework for learning-based algorithms.
We have only demonstrated the application of our DRL-method on challenging, synthetic environments. A good area for future work is extending to real-world scenes, where we need to use other techniques to generate the motion information maps. It would be useful to combine our learning methods with SLAM techniques to improve the navigation capabilities.
- (2013) Optimal reciprocal collision avoidance for multiple non-holonomic robots. In Distributed Autonomous Robotic Systems, pp. 203–216. Cited by: §II-A.
- (2017) Successor features for transfer in reinforcement learning. In Advances in neural information processing systems, pp. 4055–4065. Cited by: §II-B.
- (2017) Socially aware motion planning with deep reinforcement learning. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1343–1350. Cited by: §II-B.
- (2017) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In ICRA, pp. 285–292. Cited by: Fig. 1, §I, §II-B, §IV-D, Fig. 5.
- (2018) Motion planning among dynamic, decision-making agents with deep reinforcement learning. In IROS, pp. 3052–3059. Cited by: §I, §II-B.
- (2018) Fully distributed multi-robot collision avoidance via deep reinforcement learning for safe and efficient navigation in complex scenarios. arXiv preprint arXiv:1808.03841. Cited by: Fig. 1, §I, §II-B.
- (1997) The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine 4 (1), pp. 23–33. Cited by: §I, §II.
- (2004) Inevitable collision statesâa step towards safer robots?. Advanced Robotics 18 (10), pp. 1001–1024. Cited by: §I, §II.
- (2008) Using the corridor map method for path planning for a large number of characters. In International Workshop on Motion in Games, pp. 11–22. Cited by: §I, §VI.
- (2016) Moving in a crowd: safe and efficient navigation among heterogeneous agents.. In IJCAI, pp. 294–300. Cited by: §II-B.
- (2017) Efficient multi-agent global navigation using interpolating bridges. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4391–4398. Cited by: §II.
- (1995) Social force model for pedestrian dynamics. Physical review E 51 (5), pp. 4282. Cited by: §I.
- (2020) DenseCAvoid: real-time navigation in dense crowds using anticipatory behaviors. arXiv, pp. arXiv–2002. Cited by: §II-B.
- (2017) Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182. Cited by: §II-B.
- (2018) Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6252–6259. Cited by: Fig. 1, §I, 1(c), 1(d), Fig. 2, §II-B, Fig. 4, §IV-C, §IV-D, Fig. 5, Fig. 8, 1st item, 3rd item, §V-C, §V-D, §V-D, TABLE II, TABLE III, §V.
- (2017) Deep-learned collision avoidance policy for distributed multiagent navigation. IEEE Robotics and Automation Letters 2 (2), pp. 656–663. Cited by: §I, §II-B.
- (2011) Efficient and complete centralized multi-robot path planning. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3268–3275. Cited by: §II-A.
- (2011) Push and swap: fast cooperative path-finding with completeness guarantees. In Twenty-Second International Joint Conference on Artificial Intelligence, Cited by: §I.
- (2006) Off-road obstacle avoidance through end-to-end learning. In Advances in neural information processing systems, pp. 739–746. Cited by: §II-B.
- (2009) Aggregate dynamics for dense crowd simulation. In ACM transactions on graphics (TOG), Vol. 28, pp. 122. Cited by: §VI.
- (2016-09) From Perception to Decision: A Data-driven Approach to End-to-end Motion Planning for Autonomous Ground Robots. arXiv e-prints, pp. arXiv:1609.07910. External Links: Cited by: §I.
- (2017) From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots. In 2017 IEEE international conference on robotics and automation (icra), pp. 1527–1533. Cited by: §II-B.
- (2013) Learning monocular reactive uav control in cluttered natural environments. In 2013 IEEE international conference on robotics and automation, pp. 1765–1772. Cited by: §II-B.
- (2002) Using a prm planner to compare centralized and decoupled planning for multi-robot systems. In Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), Vol. 2, pp. 2112–2119. Cited by: §I.
- (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Cited by: §IV-B.
- (2015) Multimodal deep autoencoders for control of a mobile robot. In Proc. of Australasian Conf. for Robotics and Automation (ACRA), Cited by: §II-B.
- (2015) Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence 219, pp. 40–66. Cited by: §II-A.
- (2016) On the hardness of unlabeled multi-robot motion planning. The International Journal of Robotics Research 35 (14), pp. 1750–1759. Cited by: §I.
- (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 31–36. Cited by: §II-B.
- (2018) Hold or take optimal plan (hoop): a quadratic programming approach to multi-robot trajectory generation. The International Journal of Robotics Research 37 (9), pp. 1062–1084. Cited by: §II-A.
- (2011) Reciprocal n-body collision avoidance. In Robotics research, pp. 3–19. Cited by: §I, Fig. 2, §II-A, §IV-A, 1st item, 2nd item, §VI.
- (2005) Prioritized motion planning for multiple robots. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 430–435. Cited by: §I, §II.
- (2009) Centralized path planning for multiple robots: optimal decoupling into sequential plans.. In Robotics: Science and systems, Vol. 2, pp. 2–3. Cited by: §I.
- (2019) Learning to communicate: a machine learning framework for heterogeneous multi-agent robotic systems. In AIAA Scitech 2019 Forum, pp. 1456. Cited by: §II-B.
- (2016) Optimal multirobot path planning on graphs: complete algorithms and effective heuristics. IEEE Transactions on Robotics 32 (5), pp. 1163–1177. Cited by: §II-A.
- (2018) An effective algorithmic framework for near optimal multi-robot path planning. In Robotics Research, pp. 495–511. Cited by: §I.
- (2017) Deep reinforcement learning with successor features for navigation across similar environments. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2371–2378. Cited by: §II-B.