DeepMNavigate: Deep Reinforced Multi-Robot Navigation Unifying Local & Global Collision Avoidance

DeepMNavigate: Deep Reinforced Multi-Robot Navigation Unifying Local & Global Collision Avoidance


We present a novel algorithm (DeepMNavigate) for global multi-agent navigation in dense scenarios using deep reinforcement learning (DRL). Our approach uses local and global information for each robot from motion information maps. We use a three-layer CNN that takes these maps as input to generate a suitable action to drive each robot to its goal position. Our approach is general, learns an optimal policy using a multi-scenario, multi-state training algorithm, and can directly handle raw sensor measurements for local observations. We demonstrate the performance on dense, complex benchmarks with narrow passages and environments with tens of agents. We highlight the algorithm’s benefits over prior learning methods and geometric decentralized algorithms in complex scenarios.


FigFig. LABEL:#1 \newrefformatfigFig. LABEL:#1 \newrefformatparSec. LABEL:#1 \newrefformatappenAppendix LABEL:#1 \newrefformatsecSec. LABEL:#1 \newrefformatsubSec. LABEL:#1 \newrefformattableTable LABEL:#1 \newrefformattabTable LABEL:#1 \newrefformatalgAlgorithm LABEL:#1 \newrefformatAlgAlgorithm LABEL:#1 \newrefformatDefDefinition LABEL:#1 \newrefformatThmTheorem LABEL:#1 \newrefformatstepStep LABEL:#1 \newrefformatlnLine LABEL:#1 \newrefformateqEq. LABEL:#1 \newrefformatpbProblem LABEL:#1 \newrefformatitItem LABEL:#1 \newrefformatteTerm LABEL:#1 \newrefformatEqEquation (LABEL:eq:#1)

I Introduction

Multi-robot systems are increasingly being used for different applications, including surveillance, quality control systems, autonomous guided vehicles, warehouses, cleaning machines, etc. A key challenge is to develop efficient algorithms for navigating such robots in complex scenarios while avoiding collisions with each other and the obstacles in the environment. As larger numbers of robots are used, more efficient methods are needed that can handle dense and complex scenarios.

Multi-agent navigation has been studied extensively in robotics, AI, and computer animation. At a broad level, previous approaches can be classified into centralized [32, 33, 18, 24, 28, 36] or decentralized planners [31, 9, 12, 7]. One benefit of decentralized methods is that they can scale to a large number of agents, though it is difficult to provide any global guarantees on the resulting trajectories [8] or to handle challenging scenarios with narrow passages (see Figures 1 or 2).

Recently, there has been considerable work on developing new, learning-based planning algorithms [6, 21, 4, 5, 15, 16] for navigating one or more robots through dense scenarios. Most of these learning methods tend to learn an optimal policy using a multi-scenario, multi-stage training algorithm. However, current learning-based methods are limited to using only the local information and do not exploit any global information. Therefore, it is difficult to use them in dense environments or narrow passages.

Fig. 1: Circle Crossing: Simulated trajectories of robots in circle crossing scenarios generated by our algorithm that uses global information. The yellow points are the initial positions of the robots and the red points are the diametrically opposite goals of the robot. Our DeepMNavigate algorithm can handle such scenarios without collisions along the trajectories, and all the robots reach their goals. Prior learning methods that only use local methods [15, 6, 4] cannot handle such scenarios, as the robots tend to get stuck.

Main Results: We present a novel, multi-robot navigation algorithm (DeepMNavigate) based on reinforcement learning that exploits a combination of local and global information. We use a multi-stage training scheme that uses various multi-robot simulation scenarios with global information. In terms of training, we represent the global information using a motion information map that includes the location of each agent or robot. We place the robot information in the corresponding position to generate a bit-map and use the resulting map as an input to a three-layer CNN. Our CNN considers this global information along with local observations in the scene to generate a suitable action to drive each robot to the goal without collisions. We have evaluated our algorithm in dense environments with many tens of robots (e.g., 90 robots) navigating in tight scenarios with narrow passages. As compared to prior multi-robot methods, our approach offers the following benefits:

  1. We use global knowledge in terms of motion information maps in our network to improve the performance of DRL. This also results in higher reward value.

  2. Our approach scales with the number of robots and is able to compute collision-free and smooth trajectories. It takes many tens of seconds on a PC with 32-core CPU and one NVIDIA RTX GPU on multi-robot systems with robots.

  3. Our algorithm is general and can handle all kinds of scenes. We highlight its performance on benchmarks that are quite different from training scenarios.

  4. Our algorithm easily handle challenging multi-robot scenarios like inter-changing robot positions or multiple narrow corridors, which are difficult for prior geometric decentralized or local learning methods. We highlight the performance on five difficult environments.

Ii Related Work

Ii-a Geometric Multi-Robot Navigation Algorithms

Most prior algorithms are based on geometric techniques such as sampling-based methods, geometric optimization, or complete motion planning algorithms. The centralized methods assume that each robot has access to complete information about the state of the other robots based on some global data structure or communication system [17, 27, 35, 30] and compute a safe, optimal, and complete solution for navigation. However, they do not scale to large multi-robot systems with tens of robots. Many practoca; geometric decentralized methods for multi-agent systems are based on reciprocal velocity obstacles [31] or its variants [1]. These synthetic methods can be used during the training phase of learning algorithms.

Ii-B Learning-Based Navigation Methods

Learning-based collision avoidance techniques usually try to optimize a parameterized policy using the data collected from different tasks. Many navigation algorithms adopt a supervised learning paradigm to train collision avoidance policies. Muller et al. [19] present a vision-based static obstacle avoidance system using a 6-layer CNN to map input images to steering angles. Zhang et al. [37] describe a successor-feature-based deep reinforcement learning algorithm for robot navigation tasks based on raw sensory data. Barreto et al. [2] apply transfer learning to deploy a policy for new problem instances. Sergeant et al. [26] propose an approach based on multimodal deep autoencoders that enables a robot to learn how to navigate by observing a dataset of sensor inputs and motor commands collected while being tele-operated by a human. Ross et al. [23] adapt an imitation learning technique to train reactive heading policies based on the knowledge of a human pilot. Pfeiffer et al. [22] map the laser scan and goal positions to motion commands using expert demonstrations. To be effective, these methods need to collect training data in different environments, and the performance is limited by the quality of the training sets.

To overcome the limitations of supervised-learning, Tai et al. [29] present a mapless motion planner trained end-to-end without any manually designed features or prior demonstrations. Kahn et al. [14] propose an uncertainty-aware model-based learning algorithm that estimates the probability of collision, then uses that information to minimize the collisions at training time. To extend learning-based methods to highly dynamic environments, some decentralized techniques have been proposed. Godoy et al. [10] propose a Bayesian inference approach that computes a plan that minimizes the number of collisions while driving the robot to its goal. Chen et al. [4, 3] and Everett et al. [5] present multi-robot collision avoidance policies based on deep reinforcement learning, requiring the deployment of multiple sensors to estimate the state of nearby agents and moving obstacles. Yoon et al. [34] extend the framework of centralized training with decentralized execution to perform additional optimization for inter-agent communication. Fan et al. [6] and Long et al. [15, 16] describe a decentralized multi-robot collision avoidance framework where each robot makes navigation decisions independently without any communication with other agents. It has been extended in terms of multiple sensors and explicit pedestrian motion prediction [13]. Other methods account for social constraints [5]. However, all these methods do not utilize global information about the robot or the environment, which could be used to improve the optimality of the resulting paths or handle challenging narrow scenarios.

(a) Our algorithm is able to compute collision-free trajectories for robots.
(b) We highlight some of the computed trajectories with temporal information computed using our algorithm.
(c) Trajectories computed using local learning method [15]. All agents do not reach the goal position.
(d) Selected trajectories computed by [15] with temporal information. The agents get stuck.
Fig. 2: Narrow Corridor: We compute the trajectories computed by DeepMNavigate and [15] for two groups of robots ( total) exchanging their positions through narrow corridors. In (a) and (c), the yellow points correspond to the initial positions and the red points correspond the final positions. (b) and (d), highlight the temporal information along the trajectories using color and transparency. Prior local planning methods [15] or geometric decentralized methods [31] cannot handle such cases and we highlight the benefits of our approach. This benchmark is quite different from training datasets.

Iii Multi-Robot Navigation

Iii-a Problem Formulation and Notation

We consider the multi-robot navigation problem for non-holonomic differential drive robots. Our goal is to design a scheme that avoids collisions with obstacles and other robots and works well in dense and general environments. We describe the approach for 2D, but it can be extended to 3D workspaces and to robots with other dynamics constraints.

Let the number of robots be . We represent each robot as a disc with radius . At each timestep , the -th robot has access to an observation and then computes an action that drives the -th robot towards its goal from the current position . The observation of each robot includes four parts: , where denotes the sensor measurement (e.g., laser sensor) of its surrounding environment, stands for its relative goal position, refers to its current velocity, and is the robot motion information, which includes the global state of the system, discussed in Section IV. In this paper, we focus on analyzing and incorporating motion information in the navigation system. Meanwhile, there are static obstacles in the environment. We use to denote the area occupied by a static -th obstacle. The computed action drives the robot to its goal while avoiding collisions with other robots and obstacles within the time horizon until the next observation is received.

Let be the set of trajectories for all robots, subject to the robot’s kinematic constraints, i.e.:


Iii-B Multi-Agent Navigation Using Reinforcement Learning

Our approach builds on prior reinforcement learning approaches that use local information comprised of various observations. Some of them only utilize three of the four elements mentioned in III-A. The term may include the measurements of the last three consecutive frames from a sensor. The relative goal position in these cases is a 2D vector representing the goal position in polar coordinates with respect to the robot’s current position. The observed velocity includes the current velocity of the robot. These observations are normalized using the statistics aggregated over training. The action of a differential robot includes the translational and rotational velocity, i.e. . We use the following reward function to guide a team of robots:


When the robot gets closer or reaches its goal, it is rewarded as


When there is a collision, it is penalized using the function


In addition to collision avoidance, one of our goals is generating a smooth path. A simple technique is to impose penalties whenever there are large rotational velocities. This can be expressed as


, , and are parameters to control the reward.

Fig. 3: We highlight the architecture of our policy network (DeepMNavigate), including (global and local) maps used by our approach. The global map is based on the world coordinate system and the local map is centered at the corresponding robot’s current location. The red robot represents the map’s corresponding robot, black robots represent the neighboring robots, the yellow star represents the goal, and the blue area is an obstacle. In our implementation, the map is discretized and assigned different values. We use a 2D convolutional neural network to handle the additional global information input from the map and fully-connected network to compute the action for each robot.

Iv DeepMNavigate: Trajectory Computation Using Global Information

In this section, we present our novel, learning-based, multi-agent navigation algorithm that uses positional information of other agents. Our formulation is based on motion information maps and uses a -layer CNN to generate a suitable action for each agent.

Iv-a Motion Information Maps

Prior rule-based decentralized methods such as [31] use information corresponding to the position and velocity of each agent to compute a locally-optimal and collision-free trajectory. Our goal is to compute similar state information to design better learning-based navigation algorithms. Such state information can be either gathered based on some communication with nearby robots or computed using a deep network that uses raw sensor data. In our formulation, we use maps that consist of each agent’s location as input. In particular, we use two different map representations: one corresponds to all the robots based on the world coordinate system and is called the global-map; the other map is centered at each robot’s current location and uses the relative coordinate system and is called the local-map.

We use the following method to compute the global-map and the local-map. During each timestep , we specify the -th robot’s position in the world frame as . We also use the goal positions , obstacle information , to build the map for the -th robot. Assume the size of the simulated robot scenario is , where represents the height, represents the width, and the origin of the world frame is located at . Each pixel of the map () indicates which kind of object lies in the small area in the world frame. Assuming each object’s radius is , then corresponds to:


where “1” represents the corresponding robot, “2” represents the neighboring robots, “3” represents the robot’s goal, “4” represents the obstacles and “0” represents the empty background (i.e. free space). This is highlighted in \prettyreffig:pipeline.

In some scenarios, there could be a restriction on the robot’s movement in terms of static obstacles or regions that are not accessible. Our global-map computation takes this into account in terms of representing . However, these maps may not capture a scene with no clear boundaries. In these cases, we use the local-map for each agent instead of considering the whole scenario with size . These maps only account for information in a relatively small neighborhood with fixed size . The size of the local neighborhood () can be tuned to find a better performance for different applications. In addition to the position information, these maps may contain other state information of the robot, including velocity, orientation, or dynamics constraints as additional channels.

Iv-B Proximal Policy Optimization

We use proximal policy optimization [25] to optimize the overall system. This training algorithm has the stability and reliability of trust-region methods: i.e., it tries to compute an update at each step that minimizes the cost function while ensuring that the deviation from the previous policy is relatively small. The resulting proximal policy algorithm updates the network using all the steps after several continuous simulations (i.e. after each robot in the entire system reaches its goal or stops running due to collisions) instead of using only one step to ensure the stability of network optimization. In these cases, if we store the robot positions or motion information as dense matrices corresponding to the formulation in \prettyrefeq:map, it would require a significant amount of memory and also increase the overall training time. Instead, we use a sparse matrix representation for . We compute the non-zero entries of based on the current position of each robot, the goal positions and obstacle information using \prettyrefeq:map. To feed the input to our neural network, we generate dense representations using temporary sparse storage. This design choice allows us to train the system using trajectories from 58 agents performing 450 actions using only 2.5GB memory. More details on the training step are given in \prettyrefsec:exp.

Iv-C Network

To analyze a large matrix and produce a low-dimensional feature for input , we use a convolutional neural network (CNN) because such network structures are useful for handling an image-like input (or our map representation). Our network has three convolutional layers with architecture, as shown in \prettyreffig:pipeline. Our network extracts the related locations of different objects in the environment and could guide the overall planner to avoid other agents and obstacles to reach the goal.

Our approach to handling raw sensor data (e.g., 2D laser scanner data) uses the same structure as local methods[15], i.e. a two-layer, 1D convolutional network. Overall, we use a two-layer, fully-connected network that takes as input the observation features, including the feature generated by the 1D & 2D CNNs, related goal position, and observed velocity. This generates the action output and local path for each robot. We highlight the whole pipeline of the network for local and global information in \prettyreffig:pipeline.

Layer Convolutional Filter Stride Padding Activation Fuction Output Size
Input - - - -
Conv 1 ‘SAME’ ReLU
Max Pooling 1 ‘SAME’ -
Conv 2 ‘SAME’ ReLU
Max Pooling 2 ‘SAME’ -
Conv 3 ‘SAME’ ReLU
Max Pooling 3 ‘SAME’ -
Flatten - - - -
Fully connected - - ReLU
Fully connected - - ReLU
TABLE I: Architecture and hyper-parameters of a convolutional neural network that considers the global information.
Fig. 4: We highlight the reward as a function of the number of iterations during the second stage of the overall training algorithm. We compare the performance of a reinforcement learning algorithm that only uses local information [15] to our method, which uses local and global information. Our method obtains a higher reward than [15] due to the global information.

Iv-D Network Training

Our training strategy extends the method used by learning algorithms based on local information [15, 4]. To accelerate the training process, we divide the overall training computation into two stages. In the first stage, we use robots (e.g. k=) with random initial positions and random goals in a fully-free environment. In the second stage, we include a more challenging environments, such as a narrow passage, random obstacles, etc. We generate a high number of scenes with different kinds of narrow passages and random obstacles with up to or robots and many tens of obstacles. These varying training environments and the large number of robots in the system can result in a good overall policy. Moreover, the use of global information results in much larger network parameters. We use a FC layer, which is more difficult to train than a relatively simple, small network that only accounts for local information. To accelerate the training process and generate accurate results, we do not train the entire network from scratch, instead use the multi-stage approach. During the first stage, we retrain the network with additional structures corresponding to the global information (i.e. the global-map) using the pretrained local information network part afterward. We highlight the reward as a function of the training time during the second stage of the training and compare it with the local method [15] in \prettyreffig:reward. Notice that, since we use a 2D convolutional neural network, our overall training algorithm needs more time during each iteration of training. As a result, we do not perform the same number of training iterations as[15], as shown in \prettyreffig:reward.

V Implementation and Performance

In this section, we discuss the performance of our multi-agent navigation algorithm (DeepMNavigate) on complex scenarios and highlight the benefits over prior reinforcement learning methods that only use local information [15].

Fig. 5: Room with Obstacles: We highlight the trajectories of robots in a room with multiple obstacles. We highlight the initial position of the agent (yellow) and the final position (red) along with multiple obstacles. Prior learning methods that only use local methods [15, 4] will take more time and may not be able to handle such scenarios when the number of obstacles or the number of agents increases.
Fig. 6: Random Start and Goal Positions: Simulated trajectories of robots from and to random positions in a scene with obstacles. The yellow points are the initial positions and the red points are the final positions. The blue areas are obstacles with random locations and orientations.

V-a Parameters

During simulation, the radius is set as . In the current implementation, we set for the global-map and for each local-map. In both situations, we set . Although a larger map (e.g., ) could include more details from the system, it would significantly increase the network size and the final running time. For instance, the memory requirement of CNN increases quadratically with the input size. In current implementation, we use a GPU NVIDIA RTX 2080 Ti with 11GB. For parameters in the reward function, we set , , , and . Current parameter choices let the network converge relatively faster while achieve higher reward, reaching a balance between training time and accuracy.

Metrics Methods # of agents (cirle radius (unit:))
30 (8) 40 (8) 50 (8) 60 (8) 70 (8) 80 (12) 90 (12)
Success Rate [15]
Stuck/Collision Rate [15]
Extra Time [15]
Average Speed [15]
Metrics Methods # of agents
8 12 16 20
Success Rate [15]
Stuck/Collision Rate [15]
Extra Time [15] -
Average Speed [15] -
Metrics Methods # of agents
5 10 15 20
Success Rate [15]
Stuck/Collision Rate [15]
Extra Time [15]
Average Speed [15]
TABLE II: The performance of our method (DeepMNavigate) and prior methods based on local information on different benchmarks (Top: Circle Crossing; Bottom left: Narrow Corridor; Bottom right: Room with Obstacles.), measured in terms of various metrics using different numbers of agents. The bold entries represent the best performance. Our method can guarantee a higher success rate in dense environments as compared to prior multi-agent navigation algorithms.
Metrics Methods # of agents
20 30 40 50
Success Rate [15]
Stuck/Collision Rate [15]
Extra Time [15]
Average Speed [15]
TABLE III: The performance of our proposed method and prior learning algorithms on the Random Starts and Goals benchmark. Our method demonstrates better results (bold face) than prior multi-agent navigation algorithms.
Fig. 7: Room Evacuation: Simulated trajectories of robots evacuating a room with our algorithm. The yellow points are the initial positions and the red points are the final positions. Even with a narrow passage, our DeepMNavigate algorithm works well. This benchmark is quite different from training data.

V-B Evaluation Metrics and Benchmarks

To evaluate the performance of our navigation algorithm, we use the following metrics:

  • Success rate: the ratio of the number of robots reaching their goals in a certain time limit without any collisions to the total number of robots in the environment.

  • Collision or stuck rate: if the robot cannot reach the destination within a limited time or collides, they are considered as getting stuck or in-collisions, respectively.

  • Extra time: the difference between the average travel time of all robots and the lower bound of the robots’ travel time. The latter is computed as the average travel time when going straight towards the goal at the maximum speed without checking for any collisions.

  • Average speed: the average speed of all robots during the navigation.

We have evaluated our algorithm in five challenging and representative benchmarks:

  • Circle Crossing: The robots are uniformly placed around a circle and the goal position of each robot is on the opposite end of the circle based on the diameter. The scenarios are widely used in prior multi-agent navigation algorithms [31, 15] (\prettyreffig:traj_circle).

  • Narrow Corridor: Two groups of robots exchange their positions through a narrow corridor. This benchmark is hard for geometric decentralized methods [31], which cannot navigate robots through narrow passages (\prettyreffig:corridor).

  • Room with Obstacles: The robots cross across a room full of obstacles, from one side to the other. Methods only using local information [15] will spend more time finding the path to the goal and may even fail due to lacl of global information (\prettyreffig:obstacles).

  • Random Starts and Goals: The robots start from random initial positions and moves to random goal positions. Also, there are obstacles with random locations and orientation. Global information will help find safer and faster paths (\prettyreffig:random_scene).

  • Room Evacuation: The robots start from random initial positions and evacuate to the outside of the room. They need to cross one small door while avoiding collisions (\prettyreffig:evacuation).

V-C New and Unseen Benchmarks

We have evaluated the performance on benchmarks that are quite different from training data or use different number of agents. In the Circle Crossing benchmark, we only train with agents, but evaluate in a similar scene with agents. Furthermore, some of the benchmarks like Narrow Corridor and Room Evacuation are quite different from the training dataset. DeepMNavigate is still able to compute collision-free and smooth trajectories for all the agents with no collisions and each agent arrives at its goal position. As shown in Fig. 2, the local learning method [15] does not consider the global map information and fails in such scenarios. In contrast, our approach enable robots to learn a reciprocal navigation behavior according to the global map information without any communication on action decision.

V-D Quantitative Evaluation

We have evaluated the performance of our algorithm in terms of different evaluation metrics described above. In the circle crossing scenario, the failure rate of [15] rises with the increasing number or the density of robots. However, our approach always results in a stable performance and can avoid the deadlock situation. At times, some of the robots may need to take a longer path to avoid congestion and this could reduce the robot’s efficiency. As a result, we obtain better performance as compared to [15] or the decentralized collision avoidance methods in high density benchmarks like Circle Crossing.

Different from the circle crossing scenario, the other benchmarks incorporate some static obstacles in the environment. In this case, our method integrates the map information into the policy network and utilizes that information to handle such static obstacles and narrow passages. As the experimental results shown, our approach outperforms [15] both in terms of success rate and efficiency metric. In addition, our method behaves robustly even with the increasing density of robots.

Another important criteria to evaluate the performance of multi-robot systems is the collision or stuck rate, which is a measure of the number of measures. As shown in Tables II and III, our collision rate is zero for all these benchmarks. On the other hand, techniques based on local navigation information result in some number of failures on different benchmarks. Furthermore, the failure increases as the number of agents or the density increases.

V-E Scalability

The running times for different numbers of robots are shown in \prettyreffig:running_time. We can observe linear time performance with number of agents. That is because the decision process of our method is independent, each agent can compute their action by itself, based on the information it receives.

Fig. 8: We show the running time with different numbers of agents in several benchmarks. We use a CPU with 32 cores and NVIDIA RTX 2080 Ti to generate these performance graphs. This timing graph demonstrates that our approach is practical for many tens of robots. We also compare the running time with a learning-based algorithm that only uses local information [15]. The additional overhead in the running time with the use of the global information is rather small.

Vi Conclusion, Limitations, and Future Work

We present a novel, multi-agent navigation algorithm based on deep reinforcement learning. We show how global information about the environment can be used based on the global-map and present a novel network architecture to compute a collision-free trajectory for each robot. We highlight its benefits over many challenging scenarios and show benefits over geometric decentralized methods or reinforcement learning methods that only use local information. Even with training on simple scenarios, our DeepMNavigation algorithm can handle complex scenarios robustly, which cannot be handled by prior learning-based methods or decentralized multi-agent navigation algorithms. Furthermore, we demonstrate the performance on new, unseen benchmarks that are different from training scenarios. Overall, our approach demonstrates the value of global information in terms of discretized maps for DRL-based ethod.

Our results are promising and there are many ways to improve the performance. We need better techniques to compute the optimal size of the global-map and the local-map, and we can also include other components of the state information like velocity, orientation, or dynamics constraints. We also need to extend the approach to handle general dynamic scenes where no information is available about the motion of the moving obstacles. The use of global information increases the complexity of the training computation and we use a two-stage algorithm to reduce its running time. One other possibility is to use an auto-encoder to automatically derive the low-dimensional feature representations and then use them as a feature extractor. It may be possible to split the global and local navigation computation to combine our approach with local methods to avoid collisions with other agents or dynamic obstacles [31, 9]. Such a combination of local and global methods has been used to simulate large crowds [20] and it may be useful to develop a similar framework for learning-based algorithms.

We have only demonstrated the application of our DRL-method on challenging, synthetic environments. A good area for future work is extending to real-world scenes, where we need to use other techniques to generate the motion information maps. It would be useful to combine our learning methods with SLAM techniques to improve the navigation capabilities.


  1. J. Alonso-Mora, A. Breitenmoser, M. Rufli, P. Beardsley and R. Siegwart (2013) Optimal reciprocal collision avoidance for multiple non-holonomic robots. In Distributed Autonomous Robotic Systems, pp. 203–216. Cited by: §II-A.
  2. A. Barreto, W. Dabney, R. Munos, J. J. Hunt, T. Schaul, H. P. van Hasselt and D. Silver (2017) Successor features for transfer in reinforcement learning. In Advances in neural information processing systems, pp. 4055–4065. Cited by: §II-B.
  3. Y. F. Chen, M. Everett, M. Liu and J. P. How (2017) Socially aware motion planning with deep reinforcement learning. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1343–1350. Cited by: §II-B.
  4. Y. F. Chen, M. Liu, M. Everett and J. P. How (2017) Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In ICRA, pp. 285–292. Cited by: Fig. 1, §I, §II-B, §IV-D, Fig. 5.
  5. M. Everett, Y. F. Chen and J. P. How (2018) Motion planning among dynamic, decision-making agents with deep reinforcement learning. In IROS, pp. 3052–3059. Cited by: §I, §II-B.
  6. T. Fan, P. Long, W. Liu and J. Pan (2018) Fully distributed multi-robot collision avoidance via deep reinforcement learning for safe and efficient navigation in complex scenarios. arXiv preprint arXiv:1808.03841. Cited by: Fig. 1, §I, §II-B.
  7. D. Fox, W. Burgard and S. Thrun (1997) The dynamic window approach to collision avoidance. IEEE Robotics & Automation Magazine 4 (1), pp. 23–33. Cited by: §I, §II.
  8. T. Fraichard and H. Asama (2004) Inevitable collision states—a step towards safer robots?. Advanced Robotics 18 (10), pp. 1001–1024. Cited by: §I, §II.
  9. R. Geraerts, A. Kamphuis, I. Karamouzas and M. Overmars (2008) Using the corridor map method for path planning for a large number of characters. In International Workshop on Motion in Games, pp. 11–22. Cited by: §I, §VI.
  10. J. Godoy, I. Karamouzas, S. J. Guy and M. L. Gini (2016) Moving in a crowd: safe and efficient navigation among heterogeneous agents.. In IJCAI, pp. 294–300. Cited by: §II-B.
  11. L. He, J. Pan and D. Manocha (2017) Efficient multi-agent global navigation using interpolating bridges. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 4391–4398. Cited by: §II.
  12. D. Helbing and P. Molnar (1995) Social force model for pedestrian dynamics. Physical review E 51 (5), pp. 4282. Cited by: §I.
  13. A. Jagan Sathyamoorthy, J. Liang, U. Patel, T. Guan, R. Chandra and D. Manocha (2020) DenseCAvoid: real-time navigation in dense crowds using anticipatory behaviors. arXiv, pp. arXiv–2002. Cited by: §II-B.
  14. G. Kahn, A. Villaflor, V. Pong, P. Abbeel and S. Levine (2017) Uncertainty-aware reinforcement learning for collision avoidance. arXiv preprint arXiv:1702.01182. Cited by: §II-B.
  15. P. Long, T. Fanl, X. Liao, W. Liu, H. Zhang and J. Pan (2018) Towards optimally decentralized multi-robot collision avoidance via deep reinforcement learning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6252–6259. Cited by: Fig. 1, §I, 1(c), 1(d), Fig. 2, §II-B, Fig. 4, §IV-C, §IV-D, Fig. 5, Fig. 8, 1st item, 3rd item, §V-C, §V-D, §V-D, TABLE II, TABLE III, §V.
  16. P. Long, W. Liu and J. Pan (2017) Deep-learned collision avoidance policy for distributed multiagent navigation. IEEE Robotics and Automation Letters 2 (2), pp. 656–663. Cited by: §I, §II-B.
  17. R. Luna and K. E. Bekris (2011) Efficient and complete centralized multi-robot path planning. In 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3268–3275. Cited by: §II-A.
  18. R. J. Luna and K. E. Bekris (2011) Push and swap: fast cooperative path-finding with completeness guarantees. In Twenty-Second International Joint Conference on Artificial Intelligence, Cited by: §I.
  19. U. Muller, J. Ben, E. Cosatto, B. Flepp and Y. L. Cun (2006) Off-road obstacle avoidance through end-to-end learning. In Advances in neural information processing systems, pp. 739–746. Cited by: §II-B.
  20. R. Narain, A. Golas, S. Curtis and M. C. Lin (2009) Aggregate dynamics for dense crowd simulation. In ACM transactions on graphics (TOG), Vol. 28, pp. 122. Cited by: §VI.
  21. M. Pfeiffer, M. Schaeuble, J. Nieto, R. Siegwart and C. Cadena (2016-09) From Perception to Decision: A Data-driven Approach to End-to-end Motion Planning for Autonomous Ground Robots. arXiv e-prints, pp. arXiv:1609.07910. External Links: 1609.07910 Cited by: §I.
  22. M. Pfeiffer, M. Schaeuble, J. Nieto, R. Siegwart and C. Cadena (2017) From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots. In 2017 IEEE international conference on robotics and automation (icra), pp. 1527–1533. Cited by: §II-B.
  23. S. Ross, N. Melik-Barkhudarov, K. S. Shankar, A. Wendel, D. Dey, J. A. Bagnell and M. Hebert (2013) Learning monocular reactive uav control in cluttered natural environments. In 2013 IEEE international conference on robotics and automation, pp. 1765–1772. Cited by: §II-B.
  24. G. Sanchez and J. Latombe (2002) Using a prm planner to compare centralized and decoupled planning for multi-robot systems. In Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No. 02CH37292), Vol. 2, pp. 2112–2119. Cited by: §I.
  25. J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov (2017) Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. Cited by: §IV-B.
  26. J. Sergeant, N. Sünderhauf, M. Milford and B. Upcroft (2015) Multimodal deep autoencoders for control of a mobile robot. In Proc. of Australasian Conf. for Robotics and Automation (ACRA), Cited by: §II-B.
  27. G. Sharon, R. Stern, A. Felner and N. R. Sturtevant (2015) Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence 219, pp. 40–66. Cited by: §II-A.
  28. K. Solovey and D. Halperin (2016) On the hardness of unlabeled multi-robot motion planning. The International Journal of Robotics Research 35 (14), pp. 1750–1759. Cited by: §I.
  29. L. Tai, G. Paolo and M. Liu (2017) Virtual-to-real deep reinforcement learning: continuous control of mobile robots for mapless navigation. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 31–36. Cited by: §II-B.
  30. S. Tang, J. Thomas and V. Kumar (2018) Hold or take optimal plan (hoop): a quadratic programming approach to multi-robot trajectory generation. The International Journal of Robotics Research 37 (9), pp. 1062–1084. Cited by: §II-A.
  31. J. Van Den Berg, S. J. Guy, M. Lin and D. Manocha (2011) Reciprocal n-body collision avoidance. In Robotics research, pp. 3–19. Cited by: §I, Fig. 2, §II-A, §IV-A, 1st item, 2nd item, §VI.
  32. J. P. Van Den Berg and M. H. Overmars (2005) Prioritized motion planning for multiple robots. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 430–435. Cited by: §I, §II.
  33. J. van Den Berg, J. Snoeyink, M. C. Lin and D. Manocha (2009) Centralized path planning for multiple robots: optimal decoupling into sequential plans.. In Robotics: Science and systems, Vol. 2, pp. 2–3. Cited by: §I.
  34. H. Yoon, H. Chen, K. Long, H. Zhang, A. Gahlawat, D. Lee and N. Hovakimyan (2019) Learning to communicate: a machine learning framework for heterogeneous multi-agent robotic systems. In AIAA Scitech 2019 Forum, pp. 1456. Cited by: §II-B.
  35. J. Yu and S. M. LaValle (2016) Optimal multirobot path planning on graphs: complete algorithms and effective heuristics. IEEE Transactions on Robotics 32 (5), pp. 1163–1177. Cited by: §II-A.
  36. J. Yu and D. Rus (2018) An effective algorithmic framework for near optimal multi-robot path planning. In Robotics Research, pp. 495–511. Cited by: §I.
  37. J. Zhang, J. T. Springenberg, J. Boedecker and W. Burgard (2017) Deep reinforcement learning with successor features for navigation across similar environments. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2371–2378. Cited by: §II-B.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description