Probabilistic and Distributed Control of a LargeScale Swarm of Autonomous Agents
Abstract
We present a novel method for guiding a largescale swarm of autonomous agents into a desired formation shape in a distributed and scalable manner. Our Probabilistic Swarm Guidance using Inhomogeneous Markov Chains (PSG–IMC) algorithm adopts an Eulerian framework, where the physical space is partitioned into bins and the swarm’s density distribution over each bin is controlled. Each agent determines its bin transition probabilities using a timeinhomogeneous Markov chain. These timevarying Markov matrices are constructed by each agent in realtime using the feedback from the current swarm distribution, which is estimated in a distributed manner. The PSG–IMC algorithm minimizes the expected cost of the transitions per time instant, required to achieve and maintain the desired formation shape, even when agents are added to or removed from the swarm. The algorithm scales well with a large number of agents and complex formation shapes, and can also be adapted for area exploration applications. We demonstrate the effectiveness of this proposed swarm guidance algorithm by using results of numerical simulations and hardware experiments with multiple quadrotors.
I Introduction
A largescale swarm of space robotic systems could collaboratively complete tasks that are very difficult for a single agent, with significantly enhanced flexibility, adaptability, and robustness [1]. Moreover, a largescale swarm (having – or more agents) may be deployed for challenging missions like constructing a complex formation shape (see Fig. 1) [2]–[8] or exploring an unknown environment [9]–[14]. The control algorithm for such a largescale swarm of autonomous agents should be:

Distributed: The algorithm should not need a centralized supervisor or controller.

Versatile: The algorithm can be easily tailored for multiple applications such as maintaining the formation shape or exploring the target area.

Robust: Since a small fraction of agents in the swarm might get lost during the course of an operation or new agents might get added to the swarm, the algorithm should seamlessly adapt to loss or addition of agents. Moreover, the algorithm should accommodate measurement errors, actuation errors, and other uncertainties.

Scalable: The algorithm should scale well with the number of agents and the size of the area.
In this paper, we lay the theoretical foundations of a distributed, versatile, robust, and scalable algorithm for controlling the shape of largescale swarms.
In fluid mechanics, the motion of fluids is described using either a Lagrangian framework, where each fluid particle’s motion is tracked; or an Eulerian framework, where the fluid’s density distribution at specific locations is tracked. Analogously, distributed control algorithms for swarms can be classified into two categories [15, 16]: the individualagentbased Lagrangian framework, where each agent’s trajectory is generated separately [2]–[14]; and the continuumbased Eulerian framework, where the collective properties of the swarm (e.g., its density distribution) is controlled. In the Lagrangian framework, the computation cost for generating each agent’s trajectory increases rapidly with a large number (– or more) of agents. For example, the computational complexity of each agent’s target assignment problem increases at least quadratically with the number of agents [5]–[8]. The Eulerian framework decouples these two tasks by first solving the target assignment problem at a coarser spatial resolution. Moreover, the Lagrangian framework does not efficiently handle loss or addition of agents, nor does it scale well with the size of the area and arbitrary formation shapes [2]–[4]. Consequently, we adopt the Eulerian framework in this paper.
Ia Literature Review
Numerous path planning algorithms within the Lagrangian framework are discussed in the survey papers on swarm robotics [17, 18]. In this subsection, we focus on guidance algorithms that use an Eulerian approach [19]–[21]. For shape formation and reconfiguration applications, the physical space over which the swarm is distributed is first tessellated (i.e., partitioned) into discrete bins [22, 23]. The bin size is determined by the spatial resolution of the desired formation shape. Assuming that the number of agents is much larger than the number of nonempty bins, the density distribution of the swarm over these bins is controlled to achieve the desired formation shape.
Within the Eulerian framework, Homogeneous Markov Chain (HMC) based algorithms are a popular choice for shape formation [24]–[27], area exploration [28, 29], task allocation [30, 31], and surveillance applications [32, 33]. In such algorithms, the agent’s transition probability between bins is encoded in a constant Markov matrix that has the desired formation shape as its stationary distribution. Such an approach is probabilistic, as opposed to deterministic, because each agent determines its next bin location by inversetransform sampling of the Markov matrix [34]. Since the number of agents in the swarm can vary with time and the agents do not keep track of the timevarying number of agents in the swarm, a probabilistic approach is preferred because a deterministic target assignment algorithm needs to keep track of the changes in the number of agents and targets [7]. Moreover, as shall be seen in the paper, our probabilistic approach can also handle measurement uncertainties and actuation errors in a robust manner. The HMCbased algorithms possess the aforementioned benefits of robustness and scalability, because addition or removal of agents from the swarm does not affect the convergence of the HMC to the stationary distribution.
However, the major drawback of these HMCbased algorithms is that they are inherently openloop strategies which cannot incorporate any feedback. Clearly, the effectiveness of these algorithms can be greatly improved by refining the Markov matrix at each time step using some feedback. Such refinement results in an Inhomogeneous Markov Chain (IMC), which is at the core of our algorithm.
In this paper, we derive the Probabilistic Swarm Guidance using Inhomogeneous Markov Chains (PSG–IMC) algorithm, which incorporates the feedback from the current swarm density distribution at each time step. The PSG–IMC algorithm is a closedloop distributed guidance strategy that retains the original robustness and scalability properties associated with a Markovian approach. Another disadvantage of HMCbased algorithms is that they suffer undesirable transitions, i.e., transitions from bins that are deficient in agents to bins with surplus agents. Such undesirable transitions prevent the swarm from converging to the desired formation. The PSG–IMC algorithm suppresses such undesirable transitions between bins, thereby reducing the control effort needed for achieving and maintaining the formation. This benefit also results in smaller convergence errors than HMCbased algorithms.
The motion of swarm agents can be formulated as an optimal transport problem with respect to a given cost function [35]. If perfect feedback of the current swarm distribution is available, then our prior work on the Probabilistic Swarm Guidance using Optimal Transport (PSG–OT) algorithm [36] gives good performance. However, there are two major disadvantages of such an approach. First, the performance of an optimal transportbased algorithm drops precipitously with estimation errors in the feedback loop. Measurement and estimation errors are routinely encountered in practice and it is often impossible or impractical to generate perfect feedback of the current swarm distribution. Second, the computation time of the optimization problem increases very fast with an increasing number of bins. This is a notable drawback because a large number of bins are necessary for capturing fine spatial details in the desired formation shape. The PSG–IMC algorithm proposed in the present paper can overcome both challenges, since it works effectively in the presence of errorprone feedback and scales well with a large number of bins.
A different approach to swarm formation within the Eulerian framework is to model the continuum dynamics using partial differential equations (PDEs) [37]–[39]. Since our goal is to achieve arbitrary formation shapes that are not limited to to the equilibrium states of the PDE, we do not consider a PDEbased approach. Our approach is also different from the multiagent Markov decision process approach in [40, 41] because our agents do not keep track of the states and actions of other agents.
IB Main Contributions
The first contribution of this paper is a novel technique for constructing feedbackbased Markov matrices for a given stationary distribution which represents the desired formation, where the expected cost of transitions at each time instant is minimized. Each Markov matrix satisfies the motion constraints that might arise due to the dynamics or other physical constraints. The Markov matrix converges to the identity matrix when the swarm converges to the desired formation, thereby ensuring that the swarm settles down and reducing unnecessary transitions.
Second, we describe the PSG–IMC algorithm for shape formation. We rigorously derive the convergence proofs for the PSG–IMC algorithm based on the analysis of IMC, which is more involved than the convergence proof for HMC. We show that each agent’s IMC strongly ergodically converges to the desired formation shape. Further, we provide a timevarying probabilisticbound on the convergence error as well as a lower bound on the number of agents for ensuring that the final convergence error is below the desired threshold. Furthermore, we present an extension of the PSG–IMC algorithm for area exploration applications.
Along with a lowerlevel collisionfree motion planner, we demonstrate using multiple quadrotors that the PSG–IMC algorithm can be executed in realtime to achieve multiple desired formation shapes. Using results of numerical simulations, we show that the PSG–IMC algorithm yields a smaller convergence error and more robust convergence than the HMCbased and PSG–OT algorithms, while significantly reducing the number of transitions in the presence of estimation errors. Thus, the PSG–IMC algorithm is bestsuited for largescale swarms with errorprone feedback and complex desired formations with a large number of bins.
Compared to our conference paper [45], we have added detailed proofs in the convergence analysis section and extensions of our algorithm for shape formation and area exploration applications. We have also added extensive numerical and experimental results to this paper.
This paper is organized as follows. The problem statement for shape formation is discussed in Section II. In Section III, we present our novel techniques for constructing Markov matrices. The PSG–IMC algorithm for shape formation is presented in Section IV. The PSG–IMC algorithm for area exploration is presented in Section V. Results of numerical simulations and experimentation are presented in Section VI. This paper is concluded in Section VII.
Symbol  Explanation 

Motion constraints matrix of the agent (Definition 4)  
Bins, where (Definition 1)  
Markov matrix of the agent (6)  
Condition for escaping transient bins (17)  
Cost matrix at the time instant (Definition 5)  
Number of agents in the swarm (Assumption 1)  
Actual bin position of the agent (Assumption 3)  
Probability vector of the agent predicted position (22)  
Probability vector of the desired formation (Definition 2)  
Estimation error between and (3)  
Current swarm distribution (Definition 3)  
Estimate of by the agent  
Hellingerdistancebased feedback gain (Definition 6) 
IC Notations
The time index is denoted by a right subscript and the agent index is denoted by a lowercase right superscript. Frequently used symbols are listed in Table I.
The symbol refers to the probability of an event. Let and be the set of natural numbers (positive integers) and real numbers respectively. The matrix denotes the diagonal matrix of appropriate size with the vector as its diagonal elements. Let , , , and denote the ones (column) vector, the identity matrix, the zero matrix of appropriate sizes, and the empty set respectively. Let denote the vector norm. Let denote the minimum of the positive elements.
A probability vector is a row vector with nonnegative elements, where the sum of all the elements is 1 (i.e., , ) [46, pp. 92]. The metric represents the distance between probability vectors and , where the subscript represents the type of metric.
Ii Preliminaries and Problem Statement
In this section, we state the problem statement for shape formation in Section IIB and introduce the PSG–IMC algorithm for shape formation in Section IIC.
Definition 1.
(Bins ) The compact physical space over which the swarm is distributed is partitioned into disjoint bins. These bins are represented by for all . The size of the bins is determined by the spatial resolution of the desired formation shape. For example, in Fig. 2.
Definition 2.
(Desired Formation and Recurrent Bins) The desired formation shape is a probability vector in . Each element represents the desired swarm density distribution in the corresponding bin . The bins that have nonzero elements in are called recurrent bins. Let denote the number of recurrent bins. The remaining bins, with zero elements in , are called transient bins. Without loss of generality, we relabel the bins such that the first bins are recurrent bins (i.e., for all ) and the remaining bins are transient bins (i.e., for all ). For example, see Fig. 2.
Note that representing the desired formation as a distribution over bins is analogous to representing a 2D image using pixels or a 3D shape using voxels. In this paper, the complex desired formation shapes of the Taj Mahal in Fig. 1 and the Eiffel Tower in Fig. 6 are generated in this manner.
Assumption 1.
Let the scalar denote the number of agents in the swarm at the time instant. We assume that and the agents do not keep track of . In Section IVA, we provide a lowerbound on for achieving the desired convergence error. Moreover, we can only achieve the best quantized representation of using agents, due to the quantization error of . For example, if and , then the bestquantized representation of that can be achieved is .
Assumption 2.
Assumption 3.
We assume that each agent can sense the bin it belongs to. Note that this requirement is less stringent than sensing the precise location. For example, outdoor robots and spacecraft in low Earth orbit can sense their position in a distributed manner using the Global Positioning System (GPS) or other relative navigation technologies (e.g., cellphone towers, radio beacons).
The indicator (row) vector represents the actual bin position of the agent at the time instant. If the element , then the agent is present inside the bin at the time instant; otherwise .
Definition 3.
(Current Swarm Distribution ) The current swarm distribution is a probability vector in . It is given by the ensemble mean of actual bin positions of the agents:
(1) 
Each element gives the swarm density distribution in the corresponding bin at the time instant.
Definition 4.
(Matrix of Motion Constraints) An agent in a particular bin can only transition to some bins but cannot transition to other bins, because of the dynamics or physical constraints. These (possibly timevarying) motion constraints are specified by the matrix , where each element is given by:
(2) 
We assume that satisfies the following properties:

The matrix is symmetric and irreducible^{1}^{1}1A matrix is irreducible if and only if the graph conforming to that matrix is strongly connected.,

for all agents, bins, and time instants,

Since the first bins are recurrent bins, the submatrix encapsulate the motion constraints between the recurrent bins. The matrix is irreducible.
These properties are visualized in Fig. 3.
As shown in Fig. 3, the recurrent bins need not be contiguous. Therefore, the desired distribution can have multiple disconnected components. Note that the matrix is different from the Markov matrix introduced in Section III.
Definition 5.
(Cost Matrix ) Consider a matrix that encapsulates the cost of transitions between bins. Each element denotes the cost incurred by an agent while transitioning from bin to bin at the time instant. This cost represents the control effort or the fuel consumed or any other metric that the agents seek to minimize. We assume that the agents do not incur any cost if they remain in their present bin and the agents incur some positive cost if they transition out of their present bin (i.e., and for all and ).
Assumption 4.
(A Priori Information Available with Each Agent) We assume that the physical space over which the swarm is distributed, the position of the bins , and the desired formation shape are known before the algorithm starts. Subsequently, the timevarying cost matrices and the motion constraints matrices , which depend on the position of the bins, are computed and stored beforehand. Finally, the four design variables (namely , , , and ), which are introduced later, are also stored on board each agent.
Iia Distributed Estimation of the Current Swarm Distribution
The algorithms in this paper use feedback of the current swarm distribution . In order to generate this estimate in a distributed manner, we need the following assumption.
Assumption 5.
The timevarying communication network topology of the swarm is assumed to be stronglyconnected.
Under Assumption 5, several algorithms exist in the literature for estimating in a distributed manner [49, 50, 51]. For example, the distributed consensus algorithm [51] is used to estimate in Remark 13 in Appendix.
Any distributed estimation algorithm will have some residual estimation error between the current swarm distribution and the agent’s estimate of the current swarm distribution at the time instant, which is represented by the probability vector . Let the positive parameter represent the maximum estimation error between and :
(3) 
We later show that our algorithm works remarkably well in the presence of this estimation error (3).
IiB Problem Statement for Shape Formation
Under Assumptions 1–5, the objectives of the PSG–IMC algorithm for shape formation are as follows:

Each agent independently determines its bintobin trajectory using a Markov chain, which obeys motion constraints , so that the overall swarm converges to a desired formation shape .

The algorithm automatically detects and repairs damages to the formation.
IiC Outline of the PSG–IMC Algorithm
The key steps in the proposed PSG–IMC algorithm for shape formation are shown in Fig. 4. The agent first determines its present bin and estimates the current swarm distribution (Section IIA). If the agent is in a transient bin, then it selects another bin using the condition for escaping transient bins (Section IIIC). Otherwise, the agent computes the Markov matrix (Section IIIA) and then modifies it to suppress undesirable transitions (Section IV). Finally, the agent uses inverse transform sampling to select the next bin (Remark 14 in Appendix). The agent uses a lowerlevel guidance and control algorithm, which depends on the agent’s dynamics, to go from its present bin to the selected bin in a collisionfree manner. Such lowerlevel algorithms based on realtime optimal control and Voronoi partitions are presented in [36, 52, 7] and also discussed briefly in Remark 15 in Appendix. The pseudocode for the PSG–IMC algorithm for shape formation is given in Method 2 in Section IV.
Iii Construction of Feedbackbased Markov Matrix
In this section, we present our novel techniques for constructing Markov matrices.
Iiia Construction of Minimum Cost Markov Matrix
In this subsection, we construct Markov matrices that minimize the expected cost of transitions at each time instant.
Definition 6.
(Feedback Gain and Desired Convergence Error ) The feedback gain is given by the Hellinger distance (HD) between the current swarm distribution and the desired formation :
(4) 
The HD is a symmetric measure of the difference between two probability distributions and bounded by [53, 54].
Let represent the desired convergence error threshold between the final swarm distribution and .
Remark 1.
(Advantages of Hellinger Distance) The HD between and in (4) is bounded as follows [55]:
(5) 
We choose HD, over other popular metrics like and distances, because of its properties illustrated in Fig. 5. The distances for the cases (, , ) from are equal. But in Case 1, the wrong agent is in a bin where there should be no agent, hence HD heavily penalizes this case. If all the agents are only in those bins which have positive weights in , then HD is significantly smaller. Finally, if an agent is missing from a bin that has fewer agents in (Case 2) compared to a bin that has more agents in (Case 3), then HD penalizes Case 2 slightly more than Case 3. These properties are useful for swarm guidance.
Consider the Markov matrix in that encapsulates the transition probabilities between bins. Each element represents the probability that the agent in bin at the time instant will transition to bin at the time instant:
(6) 
Therefore, the Markov matrix is row stochastic (i.e., ). Its stationary distribution is defined as follows.
Definition 7.
(Stationary Distribution) The stationary distribution of the Markov matrix is given by the solution of , where is a probability (row) vector in (i.e., , ). The stationary distribution is unique if the Markov matrix is irreducible [46, pp. 119].
Definition 8.
(Expected Cost of Transitions at Each Time Instant) The expected cost of transitions for the agent at the time instant is given by , where the cost matrix is defined in Definition 5.
The following Method 1, Theorem 1, and Corollary 1 present our construction of the optimal Markov matrix that minimizes this expected cost of transitions at the each time instant. Our construction technique has no relation with the wellknown MetropolisHastings (MH) algorithm, which is commonly used for constructing Markov matrices with a given stationary distribution [56, 57]. In the MH algorithm, the proposal distribution is used to iteratively generate the next sample, which is accepted or rejected based on the desired stationary distribution. There is no direct method for incorporating feedback into the MH algorithm. In contrast, the feedback of the current swarm distribution is directly incorporated within our construction process using the feedback gain.
Method 1: Construction of Optimal Markov Matrix Under Assumptions 1–5, the optimal Markov matrix that minimizes the expected cost of transitions at each time instant is constructed as follows: (A) If , then set . (B) Otherwise, is computed as follows: If , then set for all bins . If , then set for all bins with . The remaining elements in are computed using the following linear program (LP):
Remark 2.
The Markov matrix designed in Method 1 has the following desirable properties:

If (see Definition 6), the swarm is deemed to have converged to the desired formation. Then, is set to an identity matrix so that the agents do not transition anymore. Hence, Step (A) ensures that the swarm remains converged without additional movement.

If the swarm has not converged to the desired formation (i.e., ), then Step (B) is initiated.

(CS1) prevents those transitions that are not allowed by motion constraints.

(CS2) prevents transitions into transient bins.

(LP1) ensures that is row stochastic.

(LP2) ensures that has as its stationary distribution (i.e., ).

The lower bound in (LP3) ensures that there is nonzero probability for agents to remain in their present bin when . The upper bound in (LP3) is derived from (LP1).

The lower bound in (LP4) ensures that the minimum transition probability to a target bin is nonzero and directly proportional to both the feedback gain and the target bin’s desired distribution . But the minimum transition probability decreases with increasing cost of transition to the target bin.

The upper bound in (LP4) ensures that the maximum transition probability is also directly proportional to the feedback gain .
A salient feature of the constraints (LP3,4) is that they depend on the feedback gain . Therefore, if the swarm distribution converges to (i.e., ), then (because ) and based on these constraints. The identity matrix ensures that agents settle down after the desired formation is achieved, thereby reducing unnecessary transitions. In Section IVA, we show that these constraints also help prove the convergence of the algorithm.
Theorem 1.
The feasible set of Markov matrices that satisfy the constraints (CS1,2) and the linear constraints in LP (7) in Method 1 is nonempty. The optimal Markov matrix is rowstochastic, has as its stationary distribution, and only allows transitions into recurrent bins.

The optimization problem in (7) is an LP because the constraints are all linear inequalities or equalities and the objective function is linear. An optimal solution for the LP exists if the feasible set of Markov matrices is nonempty. We now show that the following family of Markov matrices are within the set of feasible solutions:
(8) (9) where and is a positive column vector in , with for all bins.
The matrix satisfies (CS1) due to (8). If , then the offdiagonal element satisfies and the matrix satisfies (CS2).
We now show that the matrix satisfies (LP1):
where . We now prove that the matrix satisfies (LP2):
The matrix satisfies (LP3) because each diagonal element is lower bounded by , which is one of the positive terms in (9). The term is lower bounded by because .
We now prove that the matrix satisfies (LP4). Since the element , the offdiagonal element is upper bounded by because , , and . On the other hand, the offdiagonal element is lower bounded by because and as . Therefore, because . Thus the matrix satisfies (LP4).
Therefore, the feasible set is nonempty and the optimal Markov matrix posses the desirable properties discussed in Remark 2.
Remark 3.
(Computation Time) Although each agent only needs the row of the Markov matrix corresponding to its present bin, it has to solve the entire LP (7). The computation time for an LP increases with an increasing number of bins because the number of variables in is approximately equal to . For example, if the desired formation is given by or in Fig. 6, then the computation time is a few minutes on a standard desktop computer. If the desired formation is given by (with variables) or (with variables), then the LP is impractical for realtime computation. This escalating computation time with increasing number of bins is an issue for all LPbased algorithms [27, 36].
Therefore, we need a faster method for computing the Markov matrices. The following corollary gives the analytical formula of the optimal Markov matrix when the cost matrix is symmetric.
Corollary 1.
The optimal Markov matrix of the LP (7) in Method 1 is given by:
(10) 
(11) 
if the cost matrix is symmetric (i.e., ).

We first state a simpler LP by neglecting the constraints (LP1,2) from the original LP (7). We state this simpler LP (12) using the following substitutions for all positive elements and :
(12) subject to:
According to Definition 5, and for all . The minimum cost of this simpler LP (12) is obtained when , because the second term in the cost function is a constant. Therefore, all the positive offdiagonal elements are equal to their respective lower bounds in the optimal solution of the simpler LP (12). This optimal solution of the simpler LP (12) is given by the Markov matrix (10)(11).
If the optimal solution of the simpler LP (12) also satisfies the constraints (LP1,2) that we neglected previously, then it is the optimal solution of the original LP (7). It follows from the construction of the diagonal elements in (11) that it satisfies (LP1). The diagonal elements of are given by:
(13) Note that the matrix is a reversible Markov matrix because of the symmetric cost matrix, i.e., for all . Hence, the matrix also satisfies (LP2) because:
Therefore, the matrix is the optimal solution of the original LP (7).
If the cost matrix is symmetric, then the Markov matrix (10)(11) gives significant savings in computation time because each agent can directly compute the row of the optimal Markov matrix. For example, the computation times for all four cases in Fig. 6 are less than minutes on a standard desktop computer.
Remark 4.
(Alternative Functions for Constraints) Note that our construction technique holds even if the term in the constraint (LP4) in Method 1 and (10) in Corollary 1 is replaced by any monotonic function in that decreases with an increasing . Similarly, the term in the constraints (LP3,4) can be replaced by any monotonic function in that decreases with a decreasing . For example, see Fig. 12(b) in Section VIB.
IiiB Construction of Fastest Mixing IMC
In this subsection, we construct the fastest mixing IMC, where the IMC’s convergence rate to the rank one matrix is optimized. The convergence rate of HMC, with timeinvariant Markov matrix , is determined by the second largest eigenvalue modulus (i.e., ) [58, 59]. On the other hand, the convergence rate of IMC is determined by the coefficient of ergodicity [46, pp. 137]. Since the first bins are recurrent bins, the Markov matrix can be decomposed as follows:
(14) 
where the submatrix encapsulates the bin transition probabilities between the recurrent bins.
Definition 9.
[46, pp. 137–139] (Coefficient of Ergodicity) For the stochastic matrix , the coefficient of ergodicity is defined as:
(15) 
where , are probability row vectors in and .
We define as the graph diameter in the graph conforming to the matrix ; i.e., it is the greatest number of edges in a shortest path between any pair of recurrent bins [60]. If , then there exist recurrent bins and such that either or for all . Substituting these bins into (15) show that when . In order to avoid this trivial case, we minimize the coefficient of ergodicity of the positive matrix [61, Theorem 8.5.2, pp. 516].
Corollary 2.
IiiC Condition for Escaping Transient Bins
In this subsection, we discuss the condition for escaping transient bins.
Definition 10.
(Trapping Bins) If an agent is inside a transient bin () and its motion constraints matrix only allows transitions to other transient bins, then that transient bin is called a trapping bin. This agent is trapped in this bin because the Markov matrix does not allow transitions out of this bin. Let represent the set of trapping bins for the agent at the time instant. For example, see Fig. 7.
Since the irreducible motion constraints matrices are known a priori (Assumption 4 and Definition 4), the deterministic path for exiting the set of trapping bins is stored on board each agent. For each trapping bin , the agent transitions to a transient bin , chosen a priori, such that the transition from bin to bin is allowed by motion constraints. This deterministic path, which is chosen a priori, ensures that the agent exits the set of trapping bins as soon as possible. This bin has to be chosen on a casebycase basis depending on the motion constraints matrix . For example, in Fig. 7, for the trapping bin 5, the best option is bin 3 in case (a) and bin 7 in case (b). Therefore, the agent can follow this path to deterministically exit the set of trapping bins in finite time instants.
If an agent is in a transient bin, but not in a trapping bin, then its motion constraint matrix allows transitions to some recurrent bins. We can speed up the process of exiting this transient bin by forcing the agent to transition to any reachable recurrent bin, with equal probability, during the current time instant. Thus the agent transitions from its current transient bin to a recurrent bin in one time instant.
The matrix encapsulates the condition for escaping transient bins. If is a transient bin (i.e., ), then each element in the corresponding row is given by:
(17) 
where is the number of recurrent bins that the agent can transition to, from bin at the time instant. This condition is used only when the agent is in a transient bin, as shown in Method 1. In Section IVA, we show that the agent exits the set of transient bins within finite time instants due to this condition.
Iv PSG–IMC Algorithm for Shape Formation
In this section, we first state the PSG–IMC algorithm for shape formation and then present its convergence analysis and its property of robustness.
Method 2: PSG–IMC Algorithm for Shape Formation  

1:  One iteration of agent during time instant, 
where the agent is in bin  
2:  Given , , , , , , , and 
3:  if , then 
4:  Compute using (17) 
5:  Generate a random number 
6:  Select bin such that 
7:  else 
8:  Compute the feedback gain using (4) 
9:  Compute using Corollary 1 
or compute using Method 1  
10:  Compute the term using (20) 
11:  Compute using (18) and (19) 
12:  Generate a random number 
13:  Select bin such that 
14:  end if 
15:  Go to bin while avoiding collisions 
The pseudocode for the PSG–IMC algorithm for shape formation is given in Method 2, whose key steps are shown in Fig. 4. At the start, the agent knows the desired formation shape , the timevarying cost matrix , and its timevarying motion constraints matrix (Assumption 4). During each iteration, the agent determines the bin it belongs to (Assumption 3, here we assume that the agent is in bin ) and the current swarm distribution from Section IIA (lines 1–2).
If the agent is in a transient bin (line 3), then it uses inverse transform sampling (Remark 14 in Appendix) to select the next bin from the corresponding row of the matrix (lines 4–6).
Otherwise, the agent first computes the HDbased feedback gain (line 8). If the cost matrix is symmetric, then the agent can directly compute the row using Corollary 1 (line 9). Otherwise, the agent computes the entire Markov matrix using Method 1 (line 9).
In order to avoid undesirable transitions from bins that are deficient in agents (i.e., where ), the agent modifies its Markov matrix row as follows:
(18)  
(19)  