LAC-Nav: Collision-Free Mutiagent Navigation Based on The Local Action Cells

LAC-Nav: Collision-Free Mutiagent Navigation Based on The Local Action Cells

Li Ning and Yong Zhang
Shenzhen Institutes of Advanced Technology, Chinese Academy of Science
{li.ning, zhangyong}

Collision avoidance is one of the most primary problems in the decentralized multiagent navigation: while the agents are moving towards their own targets, attentions should be paid to avoid the collisions with the others. In this paper, we introduced the concept of the local action cell, which provides for each agent a set of velocities that are safe to perform. Consequently, as long as the local action cells are updated on time and each agent selects its motion within the corresponding cell, there should be no collision caused. Furthermore, we coupled the local action cell with an adaptive learning framework, in which the performance of selected motions are evaluated and used as the references for making decisions in the following updates. The efficiency of the proposed approaches were demonstrated through the experiments for three commonly considered scenarios, where the comparisons have been made with several well studied strategies.

1 Introduction

Collision-free navigation is a fundamental and important problem in the design of the multiagent systems, which are widely applied in the fields such as robots control and traffic engineering. When moving the agents in an environment with static or dynamic obstacles, it is usually a necessary requirement to well plan the trajectories such that no collision is caused. As the number of agents increases and the environment area becomes large, planning the realtime motions for all agents in the centralized manner causes huge amount of the calculations, and is often restricted by the efficiency of the communication between the agent and the planning monitor. Therefore, it is natural (sometimes necessary) to consider the decentralized navigation approaches, by which the individual agent is responsible for sensing the nearby obstacles and performing the proper motion to progress towards its destination without causing any collisions. On the other hand, as a consequence of the decentralized navigation, it is in general difficult for the agents to fully coordinate before making the independent moves. Thus they should also be considered and avoided as the obstacles to each other.

As noticed in the existing works, when avoiding collisions with the other agents, it is important to take into account the fact that they are also intelligent to perform the collision avoidance motions (otherwise, undesirable oscillations may be observed during the navigation). Consequently, it is not necessary for any individual agent to take all the responsibility of making sure that the performed motion is safe. ORCA [Berg2011] is a well known decentralized approach that guarantees to generate the optimal reciprocal collision-free velocities, except for some certain conditions with densely packed agents. BVC [Zhou2017] has been proposed to restrict the agents moving inside the non-intersecting areas, and thus the collision avoidance is guaranteed. After the safe field of the motions (i.e. the safe range of velocities or the safe area of positions) is determined, both of the ORCA-based approaches and the BVC-based approaches usually select the motion that is closest to the preferred motion, within the safe field. Such a greedy strategy is natural and widely used in the local-search-based optimizations. However, it may cause the less efficient performance in the multiagent navigation, as the agents may refuse to detour until there is no chance to approach the target. In the worst case, with the greedy selection, agents may get stuck in a loop of two or more situations (also known as the deadlocks). Although some tricks have been proposed to fix such drawbacks (including the ideas described in [Zhou2017]), they are not always valid in the concrete implementations, and the improvements vary from case to case.

In this work, in order to improve the navigation efficiency, we extend the buffered Voronoi cell [Zhou2017] in the velocity space, and consider the relative velocities for their effects on causing the potential conflicts. In the selection of the motion to perform, the traveling progress has been also considered, and consequently the agents may detour earlier, as long as approaching directly to the target leads to less progress in the moving distance.

Problem formation.

In this work, we consider a set of the disk-shaped agents moving in the plane. For any time point, agent of position is free to change its velocity , and after a short time , it moves to , if there is no collisions between the agents (i.e. the distance between any pair of agents is at least the sum of their radii). For a decentralized navigation approach, it runs independently for each individual agent , and based on the observations of the environment, it updates the velocity in order to guide agent to arrive at the given and fixed destination/target . On the measure of the approach’s performance, we want all the agents arriving at their destinations/targets as soon as possible, without causing any collisions.

Our contributions.

We introduced the concept of the local action cell to specify the underlying choices for the selection of the motion to perform, and proposed two approaches (LAC-Nav and LAC-Learn) that guarantee to provide the collision-free navigations. While the LAC-Nav approach simply perform the action of the largest penalized length (among all choices in the local action cell), the LAC-Learn approach evaluates the performed actions and adjust the selection based on an adaptive learning framework. The experiment results have shown that the proposed approaches perform more efficiently in the completion time (formally defined in the section of “Experiments”), compared to several well studied approaches.

Related works.

The velocity-based collision-free navigation have been extensively studied in the last two decades. The idea of reciprocal velocity obstacles (RVO, [Berg2008]) was introduced to reduce the problem of calculating the collision-free motion to solving a low-dimensional linear program, based on the definition of velocity obstacles [Fiorini1998], and it was further improved to derive the optimal reciprocal collision avoidance (ORCA, [Berg2011]) framework, which guarantees the optimal reciprocal collision-free motions, except for some certain conditions with densely packed agents. While the safety of the final motion is guaranteed by ORCA, the ALAN [Godoy2015] online learning framework has been proposed for adapting the preferred motions of multiple agents without the need for offline training; and the CNav [Godoy2016] is designed to allow the agents to take the others’ preferred motion into account and adjust accordingly to achieve the better coordination in the crowd environments. Notice that although the efficiency of CNav has been demonstrated through the experiments, it requires the the spreading of some private information of the agents, such as their preferred motions or their targets, which is often a controversial issue in the practical applications.

As the well known Voronoi diagram can be used to divide the working space into non-intersecting areas, it has been also adopted for the collision-free path planning with multiple robots [Garrido2006, Bhattacharya2008]. Inspired by the algorithms for the coverage control of the agents [Pimenta2008], and a Voronoi-cell-based algorithm [Bandyopadhyay2014] which is introduced to avoid collisions within a larger probabilistic swarm, the buffered Voronoi cell (BVC, [Zhou2017]) approach has been proposed to achieve the collision avoidance guarantee for the multiagent navigation, based on only the information of the positions. With the up-to-date information of the others’ positions, the agents are restricted to move in the non-intersecting areas, and thus there should be no collisions. In [Senbaslar2019], a trajectory planning algorithm was proposed to navigate the agents under the higher-order dynamic limits, in which BVC is used as the low-level strategy to avoid collisions.

2 The Local Action Cells

In this paper, we assume that all the agents in have the same radius for the simplicity of the argument (for the case when the agents have different radii, the arguments in this paper can be directly extended by substituting the classical Voronoi diagram with its weighted variant). Thus for any time and any pair of non-colliding agents and , it always holds that , where stands for .

Recall that in [Zhou2017], the buffered Voronoi cell of agent is defined as

which implies a safe velocity domain

for agent to change and maintain its velocity in order to reach a point in , where is the length of the time interval between two consecutive updates. Equivalently, domain can be presented as

where is the unit vector along the same direction with , i.e. . Obviously, domain is the intersection of the half-planes ’s for each agent , with

Assuming that agent is moving at velocity and agent is moving at velocity , we estimate the colliding risk by calculating


and define the safe half-plane of agent according to agent as a subset of

where is the relax factor indicating how much the agent considers the long-sighted decision, and it is set to through this paper.

Now, we are ready to define the local action cell (LAC) of agent , denoted by , as a subset of velocities in the intersection of all the safe half-planes, i.e.

where indicates the maximum moving speed, is the destination/target of agent , denotes the angle (in radians) of the clockwise rotation of the argument vector to align with the positive direction of the -axis, and is a set of candidate angles which is defined by

through this paper. (See Figure 1 for an illustration of the local action cell of an agent moving through two neighbors.)

Figure 1: The local action cell of an agent (the black one) moving through two neighbors.

3 Collision-Free Navigation

In this section, we introduce a distributed approach, named LAC-Nav, for the collision-free navigation with multiple agents. As shown in Algorithm 1, the approach is straight forward with the following steps executed in loops: for each agent , calculate the current local action cell; and then select a proper velocity from the cell.

Algorithm 2 follows the definition of the local action cell and describes the calculation details; Algorithm 3 shows how the new velocity is selected: Given the current local action cell , each velocity is at first evaluated according to the penalized length , where is the factor that is initialized as and decreased exponentially on the angle between and the direction of . Finally, the velocity of the maximum penalized length is returned as the result.

1 while  is not at the destination do
      2 LAC;
      3 SelectVel;
      4 agent moves at velocity ;
Algorithm 1 LAC-Nav: The LAC-based navigation algorithm running on agent .
1 ;
2 ;
3 for  to  do
      4 calculate such that and mod ;
6 for agent with  do
      7 calculate the safe half-plane ;
      8 for  do
            9 ;
            10 ;
11 Return: ;
Algorithm 2 LAC: Calculate the current local action cell of agent .
1 for  do
      2 mod ;
      3 ;
      4 ;
6 Return: ;
Algorithm 3 SelectVel: Select a velocity inside cell as the new velocity to move at.

While calculating the local action cells, it is not necessary to consider all the agents in the environment. When the distance between agent and agent is at least , it holds directly that and . Thus the corresponding safe half-planes can be ignored in the calculation of the agents’ local action cells, which implies it is sufficient to consider only the neighbors within distance .

Processing complexity.

When considering only the agents within a distance , the number of an agent’s neighbors is at most , since there is no overlap between the neighbors and for each of them, at least of the body is covered by the disk of radius . Consequently, the loop of Lines is executed for a constant time within one step of update of an individual agent. Thus, the processing complexity of LAC is determined by the efficiency to detect the neighbors in the specified range. In the simulations, the neighbors can be efficiently derived through querying in a KD-Tree that maintains all the positions, and in more practical cases, the neighbor detection is often executed in a parallel process, and it can be assume that the required information is always ready when it is needed.

Learning with LAC.

In LAC-Nav, the new velocity is selected according to the penalized length, which can be roughly seen as an estimate of the traveling distance of the next move. On the other hand, it is also common to evaluate the performed actions and record the results, which also provides the information that may be useful for making decisions in the future. In the case when a specific behavior should perform well for a period of time, selecting the action of the best known evaluation should be more promising than trying based on the estimates only. Generally, the evaluations are learned as the agent keeps running in the “sense-evaluate-act” cycles.

Following the ALAN learning framework [Godoy2015], we propose the LAC-Learn approach, in which the reward of the latest performed action is defined as the summation of the penalized lengths of the velocities in the resulting local action cell. Notice that by this definition, the reward naturally incorporates the considerations of the goal-oriented performance and the politeness performance, which are treated as two separate components in ALAN. In fact, the lengths of the velocities approaching to the destination reflect how efficient the performed action is for getting the agent closer to the goal; and the lengths of velocities in the local action cell as a whole reflects the efficiency in avoiding the crowding situations. In spite of the definition of the action reward, LAC-Learn selects the new velocity in a different way from the one used in ALAN. With LAC-Learn, the selected new velocity is the one corresponding to the action that maximizes a linear combination of the reward and the penalized velocity length.

1 while  is not at the destination do
      2 LAC;
      3 CalcWeights;
      4 UpdateReward;
      5 UpdateWUCB;
      6 SelectAct;
      7 ;
      8 agent moves at velocity ;
Algorithm 4 LAC-Learn: Navigation algorithm of agent while learning with the local action cells.
1 ;
2 Null;
3 if  then
      4 if  then
            5 ;
            6 ;
            8 ;
9if  Null then
      10 take from uniformly at random;
      11 if  then
            12 ;
            14 ;
15 Return: ;
Algorithm 5 SelectAct: Select the action for agent to perform.

Inside an execution cycle of some agent , after the local action cell is calculated by LAC (Algorithm 2), the penalized length of each velocity in is calculated as what has been done in Line  of SelectVel (Algorithms 3), and saved in a set . In UpdateReward, the reward of the last performed action is updated to the sum of all weights in , as mentioned earlier.

Notice that although the velocities given by may vary from step to step, in the local view, they can always be interpreted as the actions corresponding to the angles specified in . For example, without considering the variation of the length, the velocity pointing to the destination can always be interpreted as the action corresponding to angle . For an action/angle , we use to denote the velocity such that .

Following the ALAN learning framework, we calculate (by UpdateWUCB) and maintain (in ) the upper confidence bound within a moving time window (i.e. a sequence of consecutive time steps), which is used when the agent explores in the action space. As defined in [Godoy2015], the wUCB score of action during the last steps is defined by

where is the average reward of action , denotes the number of times action has been chosen, and denotes the total number of performed actions, all with respect to the moving time window.

Similar to the context-aware action selection approach proposed in [Godoy2015], SelectAct (Algorithm 5) decides with the “win-stay, lose-shift” strategy and the adaptive -greedy strategy in which the wUCB suggested action is chosen for the exploration.

When the agent is in the winning state (i.e. the goal-oriented action is performed in the last update and is still a good choice in the sense that the corresponding velocity is little constrained), it is natural to keep forwarding to the goal. Otherwise, if the agent is in the losing state, it performs the -greedy strategy to exploit on the action that maximizes a linear combination of the action reward and the penalized length of the corresponding velocity. With a small and adaptively adjusted probability, the agent explores and performs the action that maximizes the wUCB score.

the hyper-parameters (Line  in Algorithm 5) and (Line  in Algorithm 5) are determined depends on the scenarios.

4 Experiments

In this section, we present the results of running experiments with LAC-Nav and LAC-Learn, on a computer of Intel Core i7-6700 CPU ( GHz) processors. The simulations are implemented in Python , while the update processes of individual agents have been speeded up by applying the multitasking scheme. For one second, there are updates performed for each agent, and therefore we set in the implementation of LAC-Nav and LAC-Learn.


For the experiments, we considered three scenarios (Figure 2): the reflection scenario, the circle scenario and the crowd scenario, where

  • in the reflection scenario, two groups of agents start from the left side and right side of the area, respectively (Figure 2(a)). For each agent, the target is the position on the other side that is symmetric to its start position (Figure 2(d)). Through navigating the agents to the target positions, the picture of the starting configuration is reflected.

  • in the circle scenario, the agents start in layers of circles (Figure 2(b)), and each agent targets the antipodal position (Figure 2(e)). That is, the picture of starting configuration is going to be “rotated” by half of a circle, around the origin/center.

  • in the crowd scenario, the start positions (Figure 2(c)) and target positions (Figure 2(f)) are randomly picked from a small area.

(a) Reflection: start

(b) Circle: start

(c) Crowd: start

(d) Reflection: target

(e) Circle: target

(f) Crowd: target
Figure 2: Experiment scenarios.

For each of these three scenarios, we compare the performances of LAC-Nav and LAC-Learn, with the performances of approaches including BVC [Zhou2017], CNav [Godoy2016], ALAN [Godoy2015] and ORCA [Berg2011].

In this work, we consider two measurements: the completion time and the average detour-distance ratio, as the evaluation of the algorithm’s performance for the multiagent navigation tasks.

  • the completion time of running a navigation algorithm is defined as the time (in seconds) when the last agent arrives at its target, assuming all the agents start from time ;

  • the average detour-distance ratio is defined as the average of the ratios between the actual travel distance and the optimal travel distance (i.e. the length of the straight line from the start position to the target position), over all the agents.

  • the average detour-time ratio is defined as the average of the ratios between the actual travel time and the optimal travel time (i.e. the time of moving in a straight line from the start position to the target position, at the maximum speed), over all the agents.

While the completion time justifies the algorithm’s global performance on finishing the navigation tasks, by investigating the detour-distance/time ratio, it provides a view on the variance of the individual agent’s behavior with different algorithms.

In the experiments for all scenarios, the agent’s radius is uniformly set as , and the maximum moving speed is set as . In addition, as mentioned in the beginning of this section, within each second there are updates performed for each of the agents, which implies the that the time interval between two consecutive updates is , i.e.  in all the experiments.

Recall that when calculating the local action cells, the hyper-parameter is needed to locate the safe half-planes. Through all experiments involving the local action cells, we set . In addition, the penalty factor is also needed in the calculation of the local action cells (thus it is required when running LAC-Nav and LAC-Learn). Through the experiments, we set , with which the goal-orthogonal action (with angle from the direction pointing to the goal) is penalized by , and the goal-opposite action (along the direction leaving the goal) is penalized by about .

For LAC-Learn, we set the mixing factor (Line  of Algorithm 5) as for the reflection scenario and the circle scenario, and for the crowd scenario. and the length of the moving time window (for the calculation of wUCB) as , which is the minimum choice as there are actions in the used . Furthermore, the incremental step (for adjusting the exploration probability) is set as (Line  of Algorithm 5).

For ORCA, the collision-free time window is set as , i.e. twice of the update interval’s length. Recall that with CNav and ALAN, ORCA is also called to make sure the performed velocity is collision-free, where the time window are also set as . Notice that the time window in ORCA has different meaning from the hyper-parameter using the same symbol in the calculation of the local action cells, even though they are both related to the avoidance of the potential collisions.

For CNav, the hyper-parameter for mixing the goal-oriented reward and the constrained-reduction reward is set as for the reflection scenario, and for the circle scenario and the crowd scenario; the number of constrained neighbors of which the action’s effect is estimated is set as ; the number of the neighbor-based actions is set as .

For ALAN, the hyper-parameter for mixing the goal-oriented reward and the politeness reward is set as ; the length of the moving time window for the calculation of wUCB is set as ; and the incremental step for adjusting the exploration probability is set as .


In Figure 3, it shows the experiment results for

  • the reflection scenario of agents ( agents on each side);

  • the circle scenario of agents (in circles around the same center point);

  • the crowd scenario of agents located in the area of size .

Overall, LAC-Nav and LAC-Learn outperform almost all the other approaches in the completion time. The only exception is in the reflection scenario, BVC has shown the advantages and it completes earlier than LAC-Learn.

Figure 3: Experiment results, where ctime (s) stands for the completion time (in seconds); addr stands for the average detour-distance ratio; and adtr stands for the average detour-time ratio.

In general, the efficiency of the LAC based approaches is due to the fact that it considers both of the task to arrive at the target and the intension to move as much as possible in every step. The later consideration prevents the agent from the non-necessary halting before it arrives at the target. By maintaining and comparing the penalized lengths of all the candidate actions (according to ), even though the agent still has the change to move directly towards the target, it detours as long as there is an other action that provides a better moving (penalized) velocity. As shown in the reflection scenario, this kind of active detouring results in a more fluent navigation as the agents (of the antagonistic moving directions) pass by each other (Figure 4).

(a) LAC-Nav

(b) CNav

(c) ORCA
Figure 4: The antagonistic agents pass by each other in the reflection scenario, where the points of black outline and colored (red/blue) inside are the current positions of the agents, and the simply colored (red/blue) points are the target positions of the agents.

Recall that according to the definition of the safe half-planes, the local action cell is depressed if there are neighbors approaching. Therefore, with the same relative position, it is easier for an agent to “follow” a leaving-away neighbor, if they have the similar preferred trajectories. As a consequence result, in the case when there are more conflicts, such as the circle scenario, after gathered around the central area, instead of squeezing through (as what happens with the other approaches), the agents with the LAC based approaches spin as a whole to resolve the conflicts (Figure 5).

(a) LAC-Nav

(b) CNav

(c) ORCA
Figure 5: Agents resolve the conflicts in the circle scenario, where the points of black outline and colored inside are the current positions of the agents, and the simply colored points are the target positions of the agents.

Although the local action cell can be seen as a variant or extension of the buffered Voronoi cell, it should be noticed that the LAC based approaches perform distinguishably from BVC, except for the simple situation such as the reflection scenario. For the more crowding situations (like the circle scenario and the crowd scenario), the individual agents with LAC-Nav or LAC-Learn spend more time on average, while on the other hand the global completion time is shorter. By investigating into the experiment processes, it can be found that the LAC based approaches caused less stuck agents than the other approaches did. This fact can also be revealed by checking the completion time of the first arrivals (Figure 6), in which the approaches’ performances are less distinguishable.

Figure 6: The completion time (in seconds) of the first arrivals.

5 Discussions

In this work, we introduced the definition of the local action cells, and proposed two approaches LAC-Nav and LAC-Learn, of which the efficiency in the completion time have been experimentally demonstrated. In order to improve the approaches’ performance, besides trying with different parameter values, there are some natural directions that also extend the proposed approaches or make a variant.

Adaptive .

Recall that in the definition of the safe half-plane, we have set the relax factor as . Intuitively, indicates how much the agents would like to compromise in the next move, in order to avoid the collisions that may happen in a near future. Although it is valid to select any value from the theoretical respect, it should be noted that a very small may cause the local action cell being depressed too much, and a very large may help little for the long-sighted consideration. The value is a balanced choice, and it also follows from an important idea in the reciprocal collision avoidance: each agent take half of the responsibility to avoid the coming collisions. However, it will be more interesting if can be dynamic adjusted as the agents learned more information about the environment.

Non-uniform .

In the candidate set used through this paper, the angles between any consecutive actions are uniform. While it is natural to use another uniform candidate set of different size, say of size , or , it is also valid to include the actions between which the angles are arbitrary, such as the neighbor-based actions considered in CNav. In order to prevent the actions being penalized too much, it should be better to set close to or bound maximum penalty, when the size of becomes large.

Continuous LAC.

In this paper, we defined the local action cells as sets of finite number of actions. However, it may be more natural to consider the continuous area spanned by the velocities in a cell. There is a direct way to extend the definition of the local action cell to include all the linear combinations between every pair of the adjacent velocities. Formally speaking, we can define,

which is a continuous area in the velocity space. With the continuous version the local action cell, the agents are no longer restricted to select the actions from , and they can move in any angle as long as the corresponding velocity has a positive length.


Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
Add comment
Loading ...
This is a comment super asjknd jkasnjk adsnkj
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test description