# Ergodic Control Strategy for Multi-Agent Environment Exploration

## Abstract

In this study, an ergodic environment exploration problem is introduced for a centralized multi-agent system. Given the reference distribution represented by the Mixture of Gaussian (MoG), the ergodicity is achieved when the time-averaged robot distribution is identical to the given reference distribution. The major challenge associated with this problem is to determine a proper timing for a team of agents (robots) to visit each Gaussian component in the reference MoG for ergodicity. The ergodic function is defined as a measure of ergodicity and the condition for convergence is derived based on timing analysis. The proposed control strategy provides relatively reasonable performance to achieve the ergodicity. We provide the formal algorithm for centralized multi-agent control to achieve the ergodicity and simulation results are presented for the validation of the proposed algorithm.

## I Introduction

In recent years, the concept of ergodic environment exploration for autonomous robots has gained a lot of interests due to the efficiency of the exploration scheme as well as its applicability to various sectors such as search and rescue, disaster response, surveillance and reconnaissance, wildlife and weather monitoring, space exploration, etc. The efficiency in this context indicates that the agents are capable of covering a spatial domain with some priority or degrees of importance associated with the given environment. In this case, the agents need to survey an environment such that the time-averaged robot distribution is identical to the given reference distribution.

The first approach to achieve the ergodic exploration for an autonomous multi-agent system is introduced in [1]. This study provides a new measure for ergodicity based on Fourier Basis Function indicating the difference between the time-averaged robot distribution and reference spatial distribution. This ergodic metric is then utilized to develop a feedback control law for first and second-order robot dynamics with centralized multi-agent systems. In [2], a strategy to obtain an optimal trajectory for a team of autonomous robots is proposed for data acquisition task. This study focuses on designing an automated trajectory using an optimal control that makes the agents spend time in a region where the duration of time is proportional to the probability of getting informative data. This study also employs the Fourier Basis Function-based ergodic metric to determine the ergodicity of the robot distribution. A similar work is done by [3] where the objective is to develop an algorithm to generate trajectories for efficient explorations based on a probabilistic information density map of the given region. Another contribution of this study is the consideration of general non-linear robot dynamics. In [4], an ergodic exploration scheme is presented for multiple agents deployed to survey a region with obstacles and restrictive areas, where coordination among multiple agents with different sensory capability is utilized to demonstrate ergodic exploration of the given domain. A finite receding horizon optimal control-based algorithm, termed Ergodic Environmental Exploration (E3) is proposed in [5], to survey an unknown environment consisting of regions with varying degrees of importance. This algorithm determines a required minimum control effort and minimum difference between distributions for time-averaged system trajectory and the information gain. In [6], an iterative optimal control algorithm for general nonlinear dynamics is studied. The authors have demonstrated two separate approaches for discrete-time iterative optimization – first order discretization and symplectic integration. It is presented that the control and state trajectories are significantly influenced by the discretization choice for a system. A receding horizon control approach to achieve ergodic coverage is proposed in [7]. This study presents that the algorithm improves the ergodicity between an information density distribution of the spatial domain and time-averaged trajectory of agents. This algorithm enables the agents to explore the domain independently and to share coverage information with other agents across a communication network. A trajectory optimization approach is presented in [8] for ergodic area explorations with the consideration of stochastic nonlinear sensor dynamics. The results suggest that the proposed algorithm can generate trajectories with greater and more predictable ergodicity. A decentralized multi-agent ergodic control algorithm with nonlinear dynamics is developed in [9] . This algorithm requires the agents to share only a coefficient associated with the action of one agent with others to realize decentralized exploration.

It is noteworthy that all of the aforementioned research works on the ergodic environment exploration are developed based on the Fourier Basis Function introduced in [1]. The downside of this metric is that for practical implementation, it inevitably involves an approximation due to the inclusion of infinite summation terms. A similar concept related to obtain ergodicity is discussed in [10, 11, 12, 13, 14], where ergodicity is achieved based on the global behaviors of multi-agent system from the macrostate of the partial differential equation. However, the appropriate behavior can only be attained if there are extremely large numbers of agents deployed in the domain. Another ergodic exploration scheme inspired by the optimal transport theory is presented in [15], which also involves an approximation due to the sample representation of the reference spatial distribution.

In our previous work [16], an ergodic environment exploration plan is proposed based on the timing analysis. This plan is, however, only applicable to a single-agent system. In this study, a new strategy to realize a multi-agent ergodic exploration is developed. A Mixture of Gaussian is considered as a reference spatial distribution and the agents are assumed to generate a mass in the form of a skinny Gaussian. The main problem is to find proper timing for the team of agents to survey and exit a Gaussian component. The ergodic function, a measure of ergodicity, is defined and a condition that guarantees the convergence of the ergodic function is derived, which is one of the major contributions of this paper. Further, a control law for the agent position update is provided and the formal algorithm for realizing multi-agent ergodic exploration is presented. To verify the validity of the proposed method, simulations are carried out and simulation results are provided.

## Ii Problem Description

Notation: A set of real and natural numbers are denoted by and , respectively. Further, . The symbols and , respectively, denote the Euclidean norm and the transpose operator. The variable is used to denote a discrete time.

This section introduces the ergodic environment exploration problem for multi-agent systems. The spatial distribution, , is given as the reference distribution, which is assumed to be in the form of an MoG as follows:

###### Assumption 1

The given spatial distribution is expressed as an MoG in the following form:

(1) |

where is a weight such that , with , is a Gaussian distribution with mean and covariance and is the total number of Gaussian distributions in the given MoG.

Throughout this paper, it is also assumed that is stationary (i.e., it does not change over time).

Given numbers of agents to explore the domain, each agent is assumed to generate a unit mass concentrated on the current location with a skinny Gaussian distribution as illustrated in Fig. 1.

In the continuous time, the agent located at keeps generating a unit mass with a skinny Gaussian distribution, which is generalized as follows.

###### Assumption 2

Suppose that the position of the agent at any time is given as . Then, the mass generated by the agent is represented by a skinny Gaussian , where is stationary, given such that its distribution is narrow and identical for all agents.

Mathematically, for the two dimensional case has the following structure:

(2) | ||||

where is the determinant and represents any point in the domain.

The time-averaged distribution formed by the multi-agent system is then defined by the following form:

(3) |

where denotes a skinny Gaussian for agent in the continuous time case.

Notice that in the above form, the integral is taken with respect to time with a summation over all agents, followed by a division for the number of agents as well as the total elapsed time. Thus, it is not only the time- but also agent-averaged behavior.

The counterpart for the discrete-time case is then written by

(4) |

where stands for a skinny Gaussian in the discrete-time step for agent . The above discrete-time equation will be used throughout the paper to derive the timing condition for ergodicity.

The difference between the time-averaged distribution and the given spatial distribution at any time is written as

(5) |

Furthermore, the ergodic function is defined as the integral of the absolute value of over the given domain by

(6) |

The value for always lies between and due to the given definition of .

Fig. 1 illustrates multiple agents with mass generation in the shape of skinny Gaussian as well as the given spatial distribution as an MoG with a negative sign as in (5). The spatial domain is divided into two regions: the region consisting of the holes, denoted by (blue dashed lines in Fig. 1) and the remaining area of outside , denoted by (red solid lines in Fig. 1). Mathematically, and are defined as

(7) |

where denotes the domain belongs to each component-wise Gaussian .

The main objective of this paper is to achieve the ergodicity such that the time-averaged robot distribution converges to the given spatial distribution as time approaches infinity ( as ). In other words, a proper control strategy for the multi-agent system is required for the ergodicity, which is equivalent to determine the robot position , , at each discrete time .

One may try to achieve this goal by making the agents stay at each hole (or component-wise weighted Gaussian in a given MoG as shown in Fig. 1) with a given portion . However, the approach is too simplistic and will not work for this particular problem for the reasons that follow. Recalling the time-averaged dynamics in (4), it can be written recursively by

(8) |

It is evident from (8) that the influence of the current mass generation on the time-averaged distribution reduces nonlinearly by , resulting in the difficulty to attain the ergodicity. Moreover, even if one hole is completely filled with mass generated by the agents, the mass vanishes gradually as soon as the agents depart from that hole. Lastly, the agents are unable to jump from one hole to another and hence, they generate unnecessary masses while traveling through the region. These issues induce uncertainty about timing for the agents to visit each hole and the duration they should stay there.

In the following section, we thus provide the analysis to guarantee that is decreasing under a certain condition.

## Iii Error Analysis of Ergodic Operation

The proposed multi-agent exploration scheme is developed in a way that all agents act as a team to explore a hole together and then, proceed to another hole once the current hole is filled to a certain amount. The variable is given to denote the averaged time for the agents being inside . Similarly, indicates the averaged time spent by the agents in a hole to explore that hole. Alternatively, and can be written by

(9) |

where and denote the time spent by agent in and a hole, respectively. For a given spatial distribution with an MoG form having multiple holes, a subscript will be used to indicate a specific hole. Before proceeding to the error analysis, the following proposition sheds light on how the time-averaged distribution changes as the team of agents move in the domain.

###### Proposition 1

Given the time-averaged distribution at any time , the variation in the time-averaged distribution after time steps, , can be calculated by

From (8), the time-averaged distribution at time can be written as

(10) |

The agents need to spend time in while traveling to a hole from the previous hole and therefore, goes up as the agents are spending time where they should not be. On the contrary, exploring a hole to match the time-averaged distribution with the given spatial distribution for a hole results in drop of . Based on these observations, the proposed method to attain the ergodicity can be explained in the following way.

While the team of agents travel from one hole to another hole with an average of time steps in , increases since the mass is generated outside holes. On the other hand, decreases while agents stay in a hole. This is illustrated in Fig. 2. Given the period for the time step , the piece-wise decreasing property of is guaranteed if the decrement of ergodic function from to is greater than the increment from to . Satisfying this condition throughout the multi-agent exploration can ensure that the ergodic function will converge to zero, which is defined as piece-wise convergence. To this end, the following theorem is developed for the piece-wise convergence of the ergodic function.

###### Theorem 1

Consider the multi-agent ergodicity problem to realize as . Given time steps for the team of agents in to reach a certain hole, the ergodic function is a piece-wise contraction mapping, if the agents stay at the hole for time steps given by

(11) |

In this case, we have .

Fig. 2 illustrates that for time steps, the ergodic function increases such that and hence, . Then, is obtained by

(12) |

For an ideal case, is positive (negative) in (). From this observation, one can proceed with (III) by replacing in (III) with (5) as

(13) |

From Proposition 1, the above equation can be rewritten as

(14) |

In (III), some terms in the right hand side can be simplified as follows:

(15) | ||||

(16) |

In the next step, the condition for such that is derived. For time steps, the decrement of the ergodic function can be observed from Fig. 2 and hence, . By following the same procedure to obtain (17), the expression for can be derived and written as

(18) |

where

(19) |

Note that, while the agents are travelling through , the agents do not generate any mass in (no term). Thus, from (10), we have

(20) |

resulting in

(21) |

To guarantee the piece-wise convergence of the ergodic function, the following condition must be satisfied:

(22) |

The condition guaranteeing (22) can be derived by replacing both sides terms with preceding results as follows.

or equivalently,

Theorem 1 provides the duration , the team of agents should stay in a certain hole given that the team spends amounts of time steps in . To ensure that the ergodic function will be piece-wise decreasing, this condition must be satisfied.

In the proceeding section, the robot control law is provided to describe how the team of agents will explore the domain, which is different from the timing analysis presented in this section and hence, provides another contribution in this research.

## Iv Robot Control Law

In the previous section, the timing analysis is provided to guarantee the piece-wise convergence of . As this result is not related to how the team of robots need to explore the holes, this section will present the robot control law that is the combination of two different control methods: the nearest point and the gradient method.

The nearest point method is to make each agent visit some point in a hole where the is negative and closest to the current agent locations. According to this control scheme, the position for the agent can be updated by

(23) |

where is the maximum velocity attainable by the robot and denotes the nearest location with negative obtained by

(24) |

The nearest point-based control law may lead to a successful exploration for the centralized multi-agent system; However, the method is nearsighted in that it always drives each agent towards the nearest point where .

The second control law, the gradient method, is introduced as follows:

(25) |

where is the gradient of at the current location for the agent . This gradient-based control law in general provides the information about magnitude and direction for the agent to fill the hole with a given mass generation. Obviously, this gradient-based method does not work efficiently once it gets stuck in either local minima or maxima.

To compensate for the weaknesses of the two different methods, we provide the combination of the two as follows:

(26) |

where is defined by

(27) |

Here, and indicate the values of time-averaged robot distribution and given spatial distribution at current agent location, i.e., and . Notice that the values of always vary between and according to the given definition.

The parameters and in (26) are defined as the weights assigned to the nearest point and gradient methods, which are included to help the agents decide which method to prioritize to update the position based on and . For example, implies that at the current agent location , there is excess mass (, and thus, ). So the agent is driven towards nearest point with negative . On the other hand, indicates mass demand at , so the agent needs to move towards the nearest local minima of by prioritizing the gradient method. Thus, the agents can realize efficient exploration of the given domain by switching priorities between (23) and (25) depending on the values of and . As a consequence, (26) has higher convergence speed compared to (23) and (25).

## V Algorithm

This section provides the formal algorithm to attain the ergodicity for the multi-agent system. Fig. 3 illustrates how the team of agents travels in the domain . The red and blue points in the figure are given as the starting points for the two-agent system. Initially, the agents search for the first target hole to visit using the following steps. The value is defined by the distance from the mean location of Gaussian component in the MoG to the agent. Then, the target hole for all agents to visit is determined from the following equation:

(28) |

The above equation shows that not only distances but also the weight of each hole are taken into account to determine the first target hole. As a result, even if a hole with a greater weight is not the closest one, it may be selected to be visited initially due to its weight.

For the given example in Fig. 3, hole is found as the first target hole according to (28), as a result, the agents approach and explore the hole. The agents generate masses in the shape of skinny Gaussian distribution in every time step, as described in Assumption 2. The agents update their position using (26) as discussed in the previous section.

The necessary exploration time in the current hole is proposed in (11) to guarantee the convergence of , however, the theorem only provides the lower bound of the hole exploration time. This indicates that the convergence speed for may be too slow given the team of agents decides to leave the current hole just after (11) is satisfied. Consequently, another condition is required for the departure time that should be greater than the time in (11). The following condition is included to determine a proper departure time for the agents from the current hole:

(29) |

where with and being some positive coefficients and as a cycle number. This cycle number increases when the team of agents visit all the holes and arrive at the initial hole again.

The condition in (29) is incorporated to make the agents explore the hole until the accumulated error in the current hole obtains a value greater that , meaning that the hole is to be filled by a mass of certain amount as mentioned in (29).

The coefficient is defined as the above form due to the following observations. First of all, the term in (4) implies that at lower time steps, the mass generated by the agents has greater contribution to the time-averaged distribution and error . As a result, the longer exploration time may lead to the increment of . Additionally, the agents cannot move outside the hole as soon as they decide to exit, which results in spilling some extra masses in the hole. Thus, the error may achieve a value that is positive, given is zero, which is undesired. Hence, is defined in a way that at lower time steps, the agents decide to leave the hole with some existing error in the hole and at later explorations, the agents exit the hole with less error in it.

Once the exploration time for the current hole has a value higher than , the team of agents heads towards the next hole, which is predetermined by the given configuration of the MoG. Here, and denote the time when the team of agents first satisfies (11) and (29). Each agent takes the shortest possible path to travel from one hole to another, although the path varies from agent to agent as the agents are located at different positions at the time of departure. In Fig. 3, the agents approach hole instead of hole because hole is closer to hole .

The agents explore hole in a similar way that they explored hole . They fill up the hole with mass and update their position using (26). As soon as the exploration time is greater than , they exit hole and travel to hole . After the exploration of hole 3, the team will again approach hole and thus, they explore the domain in a cyclic manner. While traversing , the agents may not follow the same path to go from one hole to another because of the variation of , and from different exploration cycle. Here, is defined as the time spent in by agent to reach the hole from hole.

A pseudo code is provided to illustrate the formal procedure of the proposed multi-agent centralized ergodic algorithm.

## Vi Simulations

Numerical simulation results are provided in this section to verify the the correctness of the proposed methods as well as the effectiveness of the ergodic exploration algorithm.

The spatial reference distribution is given as an MoG such that

with .

The three-agent system is considered here with their initial positions given as , , . The covariance matrix for the mass generation in the form of the skinny Gaussian is . The maximum velocity of all agents is limited to .

In Fig. 4 (a), the initial robot positions (red triangle symbols) and the negated spatial distribution along with their associated hole numbers are illustrated. Starting from the initial position, the robots head towards the initial target hole that is determined by equation (28), which in this case, is the lower-right hole (hole ). To satisfy the proposed convergence (11) and departure (29) conditions, the robots spend a designated time, , , and , to explore the hole. After that, the robots move to the succeeding hole determined by the given configuration of the MoG, and then spend another designated time in that hole as determined by (11) and (29).

In Fig. 5 (a), vs time plot is presented in a log scale for the convergence results. To compare the rate of convergence for with respect to the number of agents as well as the robot control law, numerous simulations were conducted. For the the single agent case (), the control law comparison is provided between the nearest point and the proposed combination method (the nearest + gradient). It is observed that (26) (labeled ”nearest + grad” on figure) performs better than the nearest only control law. Fig. 5 (a) does not clearly illustrate whether an increase in the agent number leads to a faster convergence of as the 3, 5 and 8 cases alternate with each other. By taking the integral of , however, it is clearly shown in Fig. 5 (b) that an increase in the agent number yields a faster convergence of . From this plot, keeps decreasing and hence, it can be concluded that the multi-agent system will achieve the ergodicity as .

## Vii Conclusion

In this paper, a centralized multi-agent exploration strategy is developed to realize ergodicity. Each agent is assumed to generate a mass with a skinny Gaussian distribution and the reference spatial distribution is given as an MoG. The piece-wise convergence condition to attain ergodicity is derived based on the timing analysis. Multi-robot control strategy is also developed for the faster ergodic exploration and the formal algorithm for achieving ergodicity is provided. Simulations were performed to support the validity as well as effectiveness of the proposed multi-agent ergodic exploration method.

### References

- G. Mathew and I. Mezić, “Metrics for ergodicity and design of ergodic dynamics for multi-agent systems,” Physica D: Nonlinear Phenomena, vol. 240, no. 4-5, pp. 432–442, 2011.
- Y. Silverman, L. M. Miller, M. A. MacIver, and T. D. Murphey, “Optimal planning for information acquisition,” in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5974–5980, IEEE, 2013.
- L. M. Miller and T. D. Murphey, “Trajectory optimization for continuous ergodic exploration,” in 2013 American Control Conference, pp. 4196–4201, IEEE, 2013.
- E. Ayvali, H. Salman, and H. Choset, “Ergodic coverage in constrained environments using stochastic trajectory optimization,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5204–5210, IEEE, 2017.
- R. O’Flaherty and M. Egerstedt, “Optimal exploration in unknown environments,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5796–5801, IEEE, 2015.
- A. Prabhakar, K. Flaßkamp, and T. D. Murphey, “Symplectic integration for optimal ergodic control,” in 2015 54th IEEE Conference on Decision and Control (CDC), pp. 2594–2600, IEEE, 2015.
- A. Mavrommati, E. Tzorakoleftherakis, I. Abraham, and T. D. Murphey, “Real-time area coverage and target localization using receding-horizon ergodic exploration,” IEEE Transactions on Robotics, vol. 34, no. 1, pp. 62–80, 2017.
- G. De La Torre, K. Flaßkamp, A. Prabhakar, and T. D. Murphey, “Ergodic exploration with stochastic sensor dynamics,” in 2016 American Control Conference (ACC), pp. 2971–2976, IEEE, 2016.
- I. Abraham and T. D. Murphey, “Decentralized ergodic control: distribution-driven sensing and exploration for multiagent systems,” IEEE Robotics and Automation Letters, vol. 3, no. 4, pp. 2987–2994, 2018.
- D. Milutinovi and P. Lima, “Modeling and optimal centralized control of a large-size robotic population,” IEEE Transactions on Robotics, vol. 22, no. 6, pp. 1280–1285, 2006.
- H. Hamann and H. Wörn, “A framework of space–time continuous models for algorithm design in swarm robotics,” Swarm Intelligence, vol. 2, no. 2-4, pp. 209–239, 2008.
- J. Qi, R. Vazquez, and M. Krstic, “Multi-agent deployment in 3-d via pde control,” IEEE Transactions on Automatic Control, vol. 60, no. 4, pp. 891–906, 2014.
- S. Ivić, B. Crnković, and I. Mezić, “Ergodicity-based cooperative multiagent area coverage via a potential field,” IEEE transactions on cybernetics, vol. 47, no. 8, pp. 1983–1993, 2016.
- U. Eren and B. Açıkmeşe, “Velocity field generation for density control of swarms using heat equation and smoothing kernels,” IFAC-PapersOnLine, vol. 50, no. 1, pp. 9405–9411, 2017.
- R. H. Kabir and K. Lee, “Receding-horizon ergodic exploration planning using optimal transport theory,” in 2020 American Control Conference (ACC), IEEE, 2020. to appear. Preprint is available with DOI: 10.13140/RG.2.2.22013.31202/1.
- R. H. Kabir and K. Lee, “On the ergodicity of an autonomous robot for efficient environment explorations,” 2020 ASME Dynamic Systems and Control Conferenec (DSCC), 2020. to appear. Preprint is available with arXiv preprint arXiv:2005.01959.