A Distributed ADMM Approach to NonMyopic Path Planning for MultiTarget Tracking
Abstract
This paper investigates nonmyopic path planning of mobile sensors for multitarget tracking. Such problem has posed a high computational complexity issue and/or the necessity of highlevel decision making. Existing works tackle these issues by heuristically assigning targets to each sensing agent and solving the split problem for each agent. However, such heuristic methods reduce the target estimation performance in the absence of considering the changes of target state estimation along time. In this work, we detour the taskassignment problem by reformulating the general nonmyopic planning problem to a distributed optimization problem with respect to targets. By combining alternating direction method of multipliers (ADMM) and local trajectory optimization method, we solve the problem and induce consensus (i.e., highlevel decisions) automatically among the targets. In addition, we propose a modified recedinghorizon control (RHC) scheme and edgecutting method for efficient realtime operation. The proposed algorithm is validated through simulations in various scenarios.
I Introduction
Technological advances in sensor networks have enabled many applications such as monitoring pollution/atmospheric phenomena, reconnaissance and surveillance missions in both defense and civilian areas, traffic monitoring, forest fire localization and wildlife tracking [12, 44].
These sensor networks are based on two types of sensors: static and mobile sensors. It is widely demonstrated that mobile sensors offer distinctive advantages over static ones in terms of the quality of sensing and estimation [6, 15, 33, 27, 32, 30, 14, 36, 17], coverage area, data offloading [5] and adaptability to dynamic environments [34].
However, for mobile sensor networks, it is crucial to efficiently operate sensing agents to maximize information gain about target systems while satisfying constraints on entire systems.
This problem is described as a sensor planning problem that determines the future utilization of sensing agents given the current states of resources and environments.
One typical approach for the sensor planning problem is making myopic plans covering one or few time steps to reduce the complexity of the problem.
For instance, Hoffmann and Tomlin [15] proposed a single step mobile sensor planning algorithm for single target tracking based on an informationtheoretic metric.
They plan the one step ahead control inputs of the sensing agents to maximize the mutual information between the sensor measurements and the estimated target state.
In multitarget tracking tasks, most existing works adopt the twophase approach in which they cluster and assign targets to each sensing agent and, then, solve the split problem. [30, 14].
However, such myopic planning algorithms are vulnerable to bad local optima and show poor performances for sensor networks with sensing holes [23, 20]. Their optimization over the near future cannot reflect longterm goals such as taking targets out of sensing holes by changing the positions and directions of sensors.
Thus, we focus on nonmyopic planning algorithms which explore multiple time steps.
There are several existing approaches to nonmyopic sensor planning problems.
One natural way is to extend sensor scheduling methods [43, 8, 21] for static sensor networks.
As the methods select operating sensors among finite candidates, control inputs of the mobile sensors are discretized and the same optimization methods are applied for the finite optimization variables [6, 20].
However, this approach suffers from a critical computational issue that the time complexity grows exponentially as the planning horizon increases.
Another approach is the samplingbased planning.
Informationrich Rapidlyexploring Random Tree (IRRT) algorithm [22] connects random sample points on the sensing space so that the sensor measurements along the connected paths maximize the information on a single target. Luders et al. [25] apply this IRRT algorithm in a twophase approach so that it tracks multiple targets. However, these samplingbased works do not guarantee any optimality in their solutions [18].
A better candidate for nonmyopic planning of mobile sensors is local trajectory optimization method.
It searches for a solution to the optimization problem near the given initial guess using gradient descent algorithms.
This gradientbased trajectory optimization has been successfully applied in many singletarget tracking applications by employing analytically differentiable filter models on target estimation [33, 27, 32, 34, 36].
It shows much faster computation time than the discrete optimization methods and guarantees local optimality unlike the samplingbased methods.
However, in multitarget tracking problems, local trajectory optimization methods can be easily trapped in a bad local optimum unless a proper initial guess is given.
In this paper, we investigate nonmyopic path planning of mobile sensors for multitarget tracking problems.
There is a key challenge to employ local trajectory optimization method in multitarget tracking problems; to make a good initial guess, we need a highlevel decisionmaking process that
determines which mobile sensors should track which targets and in what order the targets should be tracked.
This problem is also referred to as the taskassignment problem.
Unfortunately, the problem is hard to be formulated to reflect the mobility of the sensors and expected changes in the states of the sensors and the targets.
Instead, existing works, while they are myopic, with twophase approach use heuristics for the formulation [30, 14], which degrade the quality of initial guess and, thus, the target estimation performance.
Therefore, the objective of this study is to solve the nonmyopic path planning problem for multitarget tracking without heuristic taskassignment.
The main contribution of this paper is to provide a new practical algorithm without heuristic taskassignment on the nonmyopic path planning problem for multitarget tracking. More specifically, we

reformulate the general nonmyopic path planning problem for multitarget tracking to a distributed optimization problem with respect to targets,

solve the transformed problem by combining ADMM and local trajectory optimization method, and

propose a modified recedinghorizon control (RHC) scheme and edgecutting method for efficient realtime operation
Our algorithm circumvents the taskassignment problem by solving subproblems for each target, which have only single task, and inducing consensus among those subproblems automatically. Consequently, our distributed framework integrates the decisionmaking process into the trajectory optimization process so that they are solved simultaneously without the aid of heuristic taskassignment algorithms. The alternating direction method of multipliers (ADMM) [3, 29, 45] has been also effectively applied to other problems in robotic control and sensor scheduling [2, 31, 10, 28, 24]. For instance, Bento et al. [2] adopted ADMM to multiagent trajectory planning for collisionavoidance. ADMM is used to incorporate various constraints, such as energy minimization, with the planning objective. Ong et al. [31] also developed a messagepassing algorithm based on ADMM to solve a collisionavoidance problem with communication constraints. Mordatch et al. [28] combined trajectory optimization and global policy (neural network) learning using ADMM in a robotic control problem. Liu et al.[24] used ADMM to solve an optimization problem with cardinality functions in scheduling static sensor networks for state estimation of dynamical systems. Like these works, we formulate a problem on which ADMM can work effectively and successfully solve the problem. The rest of this paper is organized as follows. In Section II, we define the general nonmyopic path planning problem for multitarget tracking. Then, we reformulate the original problem as a distributed trajectory optimization problem in Section III. Section IV presents the details of our algorithm including the modified RHC scheme. In Section V, computational complexity of the proposed algorithm is analyzed and the edgecutting method is introduced to reduce the complexity. Finally, the simulation results of the proposed algorithm are given in Section VI.
Ii Problem Formulation
Iia NonMyopic Path planning for multitarget tracking
We consider the problem of nonmyopically planning multiple mobile sensors to efficiently track multiple targets. We formulate this problem based on the known dynamics of targets and sensors, as well as the sensing model. From these models, we plan an optimal path that each sensor takes successive measurements to estimate target states efficiently. In this work, targets are assumed to follow linear stochastic dynamics which is a common assumption in the literature [20, 34, 6, 33]. Let the index set of targets be , where the number of targets, , is known and fixed. The th () target’s state, , is updated as:
(1) 
where is the Gaussian random noise independent of other targets and measurements. and are the transition matrices and noise covariance matrices of the th target, respectively. For sensor dynamics, we employ a general model, , for each sensor. Let the index set of mobile sensors be . The dynamics for the th () sensor’s state, , is expressed as:
(2) 
where is the control input at time . Here, each mobile sensor is assumed to be carried by one fixedwing UAV for simplicity. ^{1}^{1}1 In this paper, we use the terms ‘mobile sensor’ and ‘UAV’ interchangeably. For simplicity, we assume that each UAV has one sensor on it. In general, some platforms may carry more than one sensor, and the sensors may be heterogeneous. In such cases, the sensors on the same vehicle belong to the same dynamics. Denoting the measurement taken by the th UAV against the target at time as , the sensor measurement model can be expressed as follows:
(3) 
where is the Gaussian random noise, independent of other measurements and targets’ motion noise, . These models are incorporated to form an objective function, , which indicates the uncertainty of the target state estimation. The goal of the general target tracking problem is to estimate target states accurately, which corresponds to minimizing the uncertainty. Since we are solving a nonmyopic path planning problem, the optimization variables are a sequence of states/control inputs of mobile sensors over certain future time steps, . Thus, in general, nonmyopic path planning for multitarget tracking can be formulated as [20, 34, 6]:
(4) 
where
Here, the optimal trajectory, , satisfies the dynamics constraint and the given initial condition. is the uncertainty of the th target at time , penalizes the control effort of the mobile sensors along the trajectory, and is an additional constraint, such as collisionavoidance, related to the states of the mobile sensors. is a weighting matrix.
IiB Target State Estimation
The target states are estimated through the extended Kalman filter (EKF), which is widely used for state estimation of nonlinear systems [4]. Note that any other nonlinear Gaussian filters, such as the unscented Kalman filter, can be incorporated instead. The targets are assumed to have independent motions and be measured independently via data association methods[1]. Then, their joint distribution could be factorized and a single tracking filter is used for each target [36]. For each target, its belief states, , are updated by (for simplicity, the superscript is temporarily dropped):
(5)  
(6) 
where
h  
Equations (5) and (6) define the belief dynamics. The second term in (5), called the innovation term, depends on the measurements . Equation (6) evolves given the current covariance of the target, regardless of the measurement. Our optimization problem in (4) evaluates the target state estimations in future time steps. Since future measurements are unknown in advance, we treat the innovation term in (5) as stochastic, which follows [41]. Defining the belief state of the th target as ( is a vector with upper triangular portion of the input matrix since is symmetric), the stochastic belief dynamics of the target is given by:
(7) 
where
(8) 
IiC Uncertainty Measure
The uncertainty term in (4) represents the inaccuracy of target state estimation. For this measure, we use
(9) 
The trace of the covariance matrix of target state estimation physically means the average variance of the estimation error and is one of the widely used metrics [22, 33, 14, 34]. This measure is in a quadratic scale for physical distance error. The expectation is taken since we are employing stochastic belief dynamics as in (7). Note that any other differentiable metric can be incorporated instead.
Iii Transformation to Distributed Trajectory Optimization Problem
Iiia Challenge of NonMyopic Problem
The nonmyopic path planning problem in (4) generally faces an abundance of local optima due to the nonconvexity of its cost function. This nonconvexity primarily comes from the dependency of optimal sensor trajectories on highlevel decisionmaking. They significantly vary upon which target is assigned to which sensor and the order in which each sensor tracks the assigned targets. Since most trajectory optimization methods find solutions around an initial guess, our problem with numerous local optima is very sensitive to initialization. Therefore, we need to guide a solution by properly initializing the optimization variables. Finding a good initial guess for the problem induces another hard problem, taskassignment. Unfortunately, it is very difficult to construct the taskassignment problem in our setting to reflect the mobility of the sensors and expected changes in the states of the sensors and the targets. One feasible option is to heuristically assign tasks based on the initial states of targets and sensors utilizing the traveling salesman problem (TSP) [35], CBBA [7, 25], and/or clustering [30, 14]. However, it is still hard to consider the change of targets and sensors’ states over time sufficiently, and the option may provide a bad local optimum.
IiiB Transformed Problem
Instead of solving the hard problem, we detour the difficulty and reformulate the original problem (4) into a distributed trajectory optimization problem. For each target, we solve:
(10) 
where
Trajectory optimization for each target is much easier than that for multiple targets because taskassignment is not considered. Also, it is not difficult to generate the initial guess for the problem of tracking a single target. By solving the trajectory optimization for each target and making consensus, this distributed optimization is solved through Alternating Direction Method of Multipliers (ADMM) [3, 42].
IiiB1 Distributed Alternating Direction Method of Multipliers
ADMM is an algorithm that efficiently optimizes objective functions composed of terms that each has efficient solution methods [3]. For our purposes, we use a consensus and sharing optimization form of ADMM, also known as distributed ADMM [42, 29, 45]. We consider the following minimization problem:
(11) 
This problem can be rewritten with the local variable and common global variable :
(12) 
This is called the global consensus problem, as the constraint is that all the local variables should be equal (i.e., agree). Solving this problem is equivalent to optimizing the augmented Lagrangian:
(13) 
where is the Lagrange multipliers for , and is a penalty weight. The ADMM algorithm updates the local variable , the common global variable and the Lagrange multiplier as follows. For we iteratively execute the following three steps:
(14) 
where is a stepsize parameter. Here, note that the first and the last steps can be carried out independently for each . Thus, we can run the algorithm in parallel depending on the implementation with multiple but no more than computational devices, where each device performs an update on its set of variables. The optimization results are communicated back to a master node, which performs the global variable update and returns the result back to other worker nodes. The processing element that handles the global variable is sometimes called the central collector or the fusion center.
Iv Algorithm Description
Iva Overall Algorithm
We now describe the distributed trajectory optimization algorithm to solve the problem (10). After applying initial conditions, our algorithm works iteratively based on the distributed ADMM with three major steps. First, the original problem (10) is split into a separate optimization problem for each target and solved in parallel. Based on the optimization results, the common global variables are updated for consensus. Then, the Lagrange multipliers are updated to enforce the local variables to be closer to the common global variables. The complete procedure is summarized in Algorithm 1. To solve the augmented Lagrangian, the split optimization problem for each target is:
(15) 
where and are the local variables for the states and control inputs of the mobile sensors, respectively. This is a trajectory optimization problem with two additional quadratic cost terms and can be solved with the existing trajectory optimization method described in Section IVB. The split problems are independent and, then, able to be solved in parallel, as described in Section IIIB1. In this work, the trajectory variables in (15) are initialized by the method described in Section IVB2, and they are averaged to initialize the global common variables . The updates for the common global variables are as follows:
(16) 
This is the mean consensus process that takes the average of the results for each trajectory optimization. Then, the common global variables affect the update of the trajectory variables for each target (15), which causes the results of trajectory optimization for each target to converge to the same solution. The updates of the Lagrange multipliers for each target are as follows:
(17) 
The important role of the multipliers is making highlevel decisions, i.e., taskassignment. They put more forces on local variables which are largely deviated from the global variables to be close by reducing its magnitude more. By affecting the cost function in (15), the optimal trajectory solutions are guided into more informative regions. This process lets the taskassignment be automatically accomplished so that mobile sensors can further reduce the uncertainty of the targets.
IvB Trajectory Optimization
We adopt belief space iterative Linear Quadratic Gaussian (belief space iLQG) [41] to solve the trajectory optimization problems described in (15). Belief space iLQG extends the iLQG method[39] to Partially Observable Markov Decision Processes (POMDPs) using a Gaussian belief, instead of a fully observable state. In the forward pass, belief space iLQG uses a standard EKF to compute the next time step belief. For a backward pass, belief space iLQG linearizes covariance in addition to quadratizing states and control inputs. Although the applications of this approach are focused on control problems, it is directly applicable to estimation problems (e.g., target tracking) due to the duality of control and estimation [40] The belief space iLQG operates with the belief dynamics of the whole system with targets and sensors. That of the targets is given in (7). By integrating it with the dynamics of mobile sensors, the belief dynamics of the entire system is represented by:
(18) 
where
IvB1 Control Policy
Here, we explain the details of solving (15) with the belief space iLQG. By linearizing the dynamics around the nominal trajectory distribution, the approximate dynamics is expressed as:
(19) 
where
is the th column of matrix . Note that has columns, where is the dimension of the state. We approximate the nonquadratic cost function in (15) to quadratic one along the nominal belief and control trajectory . For the notional simplicity, the cost function (15) is denoted by ,
(20) 
where . and are the deviations from the nominal trajectory. The terms with subscripts denote Jacobian and Hessian matrices of their respective functions.
Given the linear dynamics (19) and quadratic cost (20), we obtain a quadratic approximation of the value function along a nominal trajectory :
(21) 
Following the dynamic programming principle [16], the Bellman equation for the value function and control policy in discretetime are specified as:
(22) 
where
By substituting equations (19) and (20) into (22), the Qfunction is given by:
(23) 
where
(24) 
In order to find the optimal control policy, we compute the local variations in the control that minimize the quadratic approximation of the function in (22):
(25) 
The optimal control can be found as . Substituting (25) into (22) gives the value function as a function of only in the form of (21):
(26) 
This recursion then continues by computing the control policy for time step .
IvB2 Initial Guess Generation
For fast convergence to a better solution, good initial guesses are still required in the subproblems (15), in which it is much easier to find a good one than the original problem (4). For the initialization, we derive an initial guess generator for a case of fixedwing UAV mounting a sensor with perpendicular boresight. The generator is based on Dubins paths[11] that can mimic the behavior of a fixedwing UAV. We first generate a continuous path. Let , and be the minimum turning radius of the th UAV, the distance between the nadir point and the center point of the th UAV’s sensor footprint and the expected position of the target , respectively. As shown in Fig. 1(a), a path is created to arrive as soon as possible at any point of a circle with radius centered on with the sensor boresight toward the center. Then, the remaining path is set for the sensor to rotate along the circle. Here, should be set to an appropriate value reflecting the movement of the target. The expected position of the target can be predicted using the current estimates and the dynamic model of the target. Then, the generated path is discretized at specific intervals and used as waypoints to generate a sequence of control inputs. For the sequence of desired waypoints , we construct a controller that reduces the angle between the line connecting the two adjacent waypoints and the heading direction of the UAV. A damping effect is also added to the controller for the stability. The geometry of this path tracking problem is depicted in Fig. 1(b). In our simulation, the control input of the PDcontroller can be computed with (30) as follows:
(27) 
where and are the gains of the controller.
IvC Modified Receding Horizon Control for RealTime Operation
For realtime operation, the planning should be done while the UAVs are executing the previous plan. Otherwise, they stop and wait for the next plan for every planning cycle. Also, the established plan could be corrected by reflecting newly obtained information during its execution. For these purposes, we propose a modified recedinghorizon control (RHC) scheme with introducing a new concept of control horizon. RHC is a form of a feedback control system which executes only the forepart of a plan and start over the planning for a shifted time horizon with updated information [19]. Extending this concept, our algorithm plans for future time steps while executing the previous plan for time steps with predefined control horizon and planning horizon . Note that is determined by the available computational resources and the communication delay, and should be large enough so that the trajectories of the mobile sensors can cover the overall domain of interest. The modified RHC is described in Algorithm 2. Here, we explain a single cycle starting at . Precalculating the future plan for to requires the target belief estimation at . We use the maximum likelihood assumption and estimate the future belief without actual measurements (line 6). A new plan for to is created using the proposed distributed trajectory optimization algorithm (line 7). This planning is done in parallel to executing the previous plan. While the algorithm plans for future time steps, only the fore steps are actually executed (line 9). The process is then repeated for the next time window.
V Complexity Reduction
Va Complexity Analysis
The computational complexity of each iteration of the distributed ADMM is dominated by solving the subproblems defined in (15) because computing the global common variable and Lagrange multipliers requires very simple computations. The subproblems are solved by the belief space iLQG method described in Section IVB , and its computational complexity in control problems is already well analyzed in [41]. The bottleneck of running time in our problem lies in the calculation of the matrix in (23). We first consider the case of a single target. Let and be the state dimensions of the target and the mobile sensors, respectively. As the target belief contains the covariance matrix of the state, the dimension of the belief is . Accordingly, the belief dimension of the entire system is , and computing the product in (24) takes time. Evaluating using numerical differentiation (central differences) can be done in time. The remaining elements do not form bottlenecks during computation as discussed in [41]. Then, since one cycle of value iteration takes steps, its complexity is . The number of cycles cannot be expressed in terms of dimensional notation, but convergence can be expected after 10100 cycles in practice. Furthermore, it is known that belief space iLQG converges with a secondorder rate to a local optimum [41]. For multiple targets, our algorithm operates distributedly. The complexity is, then, multiplied by the number of iterations of distributed ADMM if the computation is fully parallelized. ADMM is known to require a few tens of iterations to converge with modest accuracy for many applications [3, 24, 26]. Note that when the original problem (4) is solved nondistributedly, the complexity is multiplied by , instead, regarding the independence among the targets. This is more complex than our distributed approach when is large, and it even requires highlevel decisionmaking as mentioned in Section IIIA. There have been many researches on fast convergence for general distributed ADMM[2, 10, 38]. In this paper, we propose an edgecutting method which is tailored to our algorithm and effectively shortens its running time.
VB Edgecutting Method
The computational load of our algorithm can be effectively reduced when we consider the characteristics of our problem. As pointed out earlier, our distributed scheme automatically makes highlevel decisions (i.e., taskassignment). A mobile sensor may choose a path specialized for certain targets during the consensus and disregards the others. In this case, the sensor cannot contribute to tracking the disregarded targets, and, thus, it is a bad investment of computational resources to solve the trajectory optimization of the sensor for those targets. Based on this observation, we propose a heuristic method to detect this inefficacy and halt the unprofitable investment. The edgecutting method detects unavailing sensors for each target for every consensus step in ADMM. For given , the method excludes sensor in the tracking task of target if
(28) 
where denotes the uncertainty of the target when the sensor is excluded. The is chosen to be a threshold for judging whether the sensor reduces the uncertainty of the target enough or not. The uncertainty is evaluated for obtained in the consensus phase (16) except for the sensor . This pruned architecture is reset to the original one for new time windows. When is properly chosen (not too large), the method has empirically shown similar target estimation results as not applying it in most cases with significantly reduced computation time. The method is quantitatively evaluated in Section VID. Our edgecutting method has similar, but different, concept with the three weight messagepassing algorithm (TWA) in [2], which solves multiagent trajectory optimization for energy minimization and collisionavoidance. As shown in Fig. 2, our algorithm can be also expressed with a bipartite graph as in [2] based on the structure of ADMM. Each blue node on the left side represents the trajectory optimizer solving (15) for each target, and each red node on the right side is the consensus node solving (16) for each sensor. For the consensus node, TWA employs weighted averages and excludes an edge from the consensus process by applying a zero weight without any change in the graph. On the other hand, our method prunes an edge of the graph when (28) is satisfied. This effectively reduces the complexities of the EKF in Section IIB and the iLQG in Section IVB by reducing the dimensions of the sensor states and the measurement vector. Note that our algorithm allows pruning an edge since the trajectory optimization for a target always produces feasible solutions when solved for subsets of sensors. This is not the case for the optimizers for collisionavoidance in [2].
Vi Simulations
In this section, we numerically investigate the computational properties and the applicability of the proposed algorithms. To ascertain the effectiveness of the distributed trajectory optimization algorithm, we compare it with existing approaches; nonmyopic trajectory planning methods with heuristic taskassignments as well as the myopic one. We also evaluate the modified RHC scheme and the edge cutting algorithm which are proposed for effective realtime operations.
Via Simulation Model
Number of targets  2  3  4 

Myopic trajectory planning [30]  314.74  262.45  318.38 
Nonmyopic trajectory planning with initial guess  259.21  209.98  234.99 
Euclidean heuristicbased taskassignment [25] + nonmyopic trajectory planning  171.94  179.64  182.58 
Uncertaintybased taskassignment [14] + nonmyopic trajectory planning  167.63  172.67  173.88 
Proposed distributed trajectory optimization algorithm  124.52  150.76  149.11 
We assume that the UAVs fly at different altitudes from each other to alleviate collisionavoidance constraints. Throughout the entire simulations, the planning horizon, , is set to 100 seconds, which is enough for the UAVs to cover all areas of interest with flying speed of . Furthermore, we set in (15), and , , and in (17). It is important to set the penalty parameters, , properly for fast convergence of ADMM. While they are chosen empirically in this work, many extensions of the distributed ADMM algorithm have been explored to adaptively select the penalty parameters. We would like to refer the readers to [3, 38]. For target motion model, we consider the motion on the XYplane with a fourdimensional state vector of positions and velocities (in x and y axes) for a target. The matrix parameters of the linear stochastic dynamics in (1) are set as:
(29) 
where is the time interval between two successive measurements, and is the process noise intensity representing the strength of the deviations from the predicted motion by the dynamic model. When is small, this model represents a nearly constant velocity. For simulations, we set and . For sensor platforms, we consider a set of fixedwing Unmanned Aerial Vehicles (UAVs). The dynamics of the UAVs is given by:
(30) 
where , , and are the position, speed, heading angle and bank angle of th UAV at time , respectively. Gravity acceleration is denoted as . The sensors are mounted on the left or right sides of the UAVs and are aimed in a degree downward direction as shown in Fig. 3. Each sensor measures the signaltonoise ratio (SNR) [37]:
(31) 
where , , and are selected to model the SNR of the sensor. and represent the distance between the sensor and the target as well as the angle between the sensor’s boresight and the direction from the sensor to the target, respectively. When is set to zero, the sensor measures the quasidistance, which is a commonly used model [9, 43, 15]. We set , and .
ViB Process of Convergence
First, we consider a toy example in which a single UAV tracks two targets with zero initial velocities to explore the convergence process of the proposed distributed trajectory optimization algorithm. Figs. 4 (ad) present the snapshots of the optimization process. In the beginning, the algorithm generates an initial guess (trajectory) of the UAV for each target independently, without any highlevel decisions to consider both targets simultaneously. In the early phase, note that the consensus trajectory largely deviates from the distributively optimized one for each target. As the number of iterations increases, the deviation gets smaller and eventually vanishes. This convergence is enabled as each subproblem reflects the result of the other subproblem from the previous iteration through the Lagrange multipliers in (15). By repeating this interaction, the algorithm provides a nonmyopic trajectory planning result that sequentially tracks two targets, as shown in Fig. 4(e). In addition to observing the convergence, we evaluate the performance of the converged trajectory in reducing the uncertainty of each target. As the components of the posterior uncertainties after executing the plan, the variances of the target state (position) estimations are logged along the time as shown in 4(f). The variances of each target are effectively reduced when the target is in the sensing range, while they increase gradually due to the stochastic motion of the target when it is in sensing holes.
Uncertainty  Posterior Uncertainty  Localization RMSE [m]  

Myopic trajectory planning [30]  252.15  252.47  8.4400 
Nonmyopic trajectory planning with initial guess  204.01  204.23  7.1812 
Euclidean heuristicbased taskassignment [25] + nonmyopic trajectory planning  194.56  194.32  6.8059 
Uncertaintybased taskassignment [14] + nonmyopic trajectory planning  169.23  169.50  6.7532 
Proposed distributed trajectory optimization algorithm  147.39  146.95  6.5595 
Number of targets  10  15 

Target clustering & Euclidean heuristicbased taskassignment [25] + nonmyopic trajectory planning  557.01  498.53 
Target clustering & uncertaintybased taskassignment [14] + nonmyopic trajectory planning  566.55  501.75 
Proposed distributed trajectory optimization algorithm  389.59  377.06 
ViC Single UAV tracking a Few Targets
In the second example, we compare the proposed algorithm with existing approaches under scenarios where one UAV tracks a few targets with zero initial velocities. The initial positions of the UAV and the targets on the XYplane are randomly (uniformly) generated within a area. The targets are on the ground level and the UAV is flying at a fixed altitude of . The initial covariance matrices of the target estimations are also randomly generated as diagonal matrices of which diagonals are picked uniformly between 10 and 100. As the existing works [14, 25] on multitarget tracking tasks are myopic approaches, we extend them to nonmyopic ones. We adopt their taskassignment algorithms to produce initial guesses on the original undistributed problem (4) and solve it through belief space iLQG as in our algorithm. The algorithms assign clustered targets to the closest mobile sensor (w.r.t. the center of the cluster) and determine the priorities of the clusters when they assigned to the same sensor. In [14], the priority is determined as the sum of the uncertainties (at ) of the targets in each cluster, while [25] uses the Euclidean distance of each cluster from its assigned mobile sensor and the number of targets in the cluster. Initial guesses are generated in a similar way as described in Section IVB2, as shown as the dotted lines in Figs. 5(cd), so that the UAV can visit the clusters (single target for each cluster in this case) sequentially based on the taskassignment results. Additionally, we tested two more undistributed algorithms. First, we apply a myopic algorithm [30] to (4) repeatedly to cover the planning horizon with singlestep plans. The other is solving (4) with a naive initial guess, .


Uncertainty  Posterior Uncertainty  Localization RMSE [m]  

Clustering based heuristic taskassignment [25, 14] + nonmyopic trajectory planning  584.72  576.08  15.8290 
Proposed distributed trajectory optimization algorithm  349.22  357.95  13.3126 
Table I summarizes the average result over the 100 randomly generated topologies for each number of targets. To evaluate the planned trajectory, we compare the uncertainty of target estimation, which is the primary objective to minimize in our problem, averaged over targets, timesteps, and topologies. Since the uncertainty has a meaning of the average variance of target estimation, we can compare the values approximately in a quadratic scale. Our algorithm shows the best performance among the tested algorithms. To analyze the details, we investigate a single instance of the randomly generated topologies in depth. Fig. 5 shows the result of each algorithm in a scenario of single UAV tracking three targets. As shown in the Fig. 5(a), the myopic algorithm just plans a trajectory that minimizes only the control efforts until any target gets in the sensing range of the UAV. This is because the UAV cannot reduce the uncertainty without observing the targets. In Fig. 5(b), the resulting trajectory tracks only a single target. This shows that it could be stuck in a bad local optimum to perform nonmyopic trajectory optimization without proper decision making. The twophase approaches in Figs. 5(cd) show better trajectories so that they sense more targets. However, the trajectories include more portion of idle time without observing a target than the proposed algorithm in Fig. 5(e). To compare the results numerically, we examine their average performance over 1000 randomly generated instances on the topology of Fig. 5. As shown in Table II, we evaluate posterior uncertainty and root mean square error (RMSE) of target localization in addition to uncertainty. Even though the topology was simple and easy to have intuitive decision making, the proposed algorithm still outperforms the others. While the twophase approaches define new approximate/heuristic cost functions for the taskassignment and that decision making considers only the current status of the sensor networks, the proposed algorithm incorporates the decision making into the optimization process while considering the mobility of the sensor platform and the target state through the distributed formulation.
ViD Multiple UAVs tracking Numerous Targets
The third example considers the situation in which two UAVs track numerous targets with zero initial velocities. To compare the proposed algorithm to the twophase approaches [14, 25] involving target clustering, the topologies of the targets and UAVs are randomly generated based on threecluster structure. (Choose centers of the clusters and the number and positions of the targets for each cluster randomly.) For the clustering algorithm, densitybased spatial clustering of applications with noise (DBSCAN) [13] is used. As parameters of DBSCAN, we set the minimum number of targets for a single cluster and the maximum radius of a cluster to 2 and 150, respectively. Table III summarizes the results. While the twophase approaches show similar average uncertainty values, the proposed algorithm outperforms them. We further investigate an instance of the randomly generated topologies to analyze the performance gap in detail. Fig. 6 shows the resulting trajectories of the twophase approaches and the proposed algorithm in a scenario with 10 targets. The Euclidean heuristic and uncertaintybased taskassignments provide the same initial guesses, where each cluster is assigned to a single UAV. However, the proposed algorithm automatically generates the trajectories for the two UAVs to visit the same cluster. Table IV shows that the trajectories outperform those from the twophase approaches. Even in this wellclustered topology, the proposed algorithm plans trajectories more optimized for the mobile sensors to visit the target clusters without the aid of clustering and taskassignment algorithms.






ViE Evaluation of EdgeCutting Method
We evaluate the edgecutting method for the scenario in Fig. 6(b). Varying the values, the time when each edge is cut is analyzed in Fig. 7(ac). As expected, larger implies faster cutting time since it plays a role of a weaker threshold. However, as shown in Fig. 7(d), too large values make the solution of the trajectory optimization converge to a bad local optimum with a larger cost. This is because too easily cut edges change the original problem significantly beyond the degree of proper approximation. When the epsilon value is set appropriately, not too large, the edgecutting method produces the same result as it is not applied () with effectively reduced dimensions of the optimization. In Figs. 7(ef), the actual computation time is compared between the proposed algorithms with and without the edgecutting method. The comparison is performed on a 3.4 GHz QuadCore Intel(TM) i7 PC. The results show that the edgecutting method reduces the computation time effectively by rapidly achieving a solution with lower cost. The messagepassing algorithm, TWA, in [2] is also tested to verify the effectiveness of the proposed method in our problem. TWA reduces the computation time compared to not applying it, but the amount is much smaller than the proposed method. Comparing Figs. 7(e) and (f), they verify that parallel computing further reduces the computation time effectively since our algorithm based on ADMM is capable of parallelization.






ViF RealTime Operation Test
In the last example, shown in Fig. 8, we consider a scenario in which two UAVs track three targets, one with an initial velocity of 3m/s in the xaxis direction and the others with zero initial velocities. We applied the distributed trajectory optimization algorithm, the edgecutting method, and the modified RHC scheme all together. It is assumed that a particle filter is used to estimate the states of the targets in the actual operating environment of the UAVs. The control horizon, , is set to 25 seconds considering the computation time of the algorithms. It is assumed that the sensors are attached to the left side of one UAV and to the right side of the other UAV, and that UAVs fly at different altitudes of and , respectively. Figs. 8(ad) show some of the simulated results under the assumption that the UAVs operate for 400 seconds. A notable point in this example is that the UAV automatically passes the tracking mission to the other UAV after about 160 seconds. Figs. 8(ef) show the state estimation results of the targets, where it can be confirmed that all targets are tracked reliably through the modified RHC scheme.
Vii Conclusions
In this paper, we investigated a distributed optimization approach to trajectory planning for a multitarget tracking problem. In order to solve the trajectory optimization and taskassignment problems simultaneously, the distributed trajectory optimization problem was formulated, and then solved by integrating a variant of a differential dynamic programming algorithm called iterative LinearQuadraticGaussian (iLQG) algorithm with the distributed Alternating Direction Method of Multipliers (ADMM). In addition, we proposed an edgecutting method to reduce the computation time of the algorithm and the modified RHC scheme for realtime operation. Numerical experiments were conducted and the results presented to demonstrate the applicability and validity of the proposed approach.
References
 [1] (2011) Tracking and data fusion. YBS publishing. Cited by: §IIB.
 [2] (2013) A messagepassing algorithm for multiagent trajectory planning. In Advances in neural information processing systems, pp. 521–529. Cited by: §I, §VA, §VB, §VIE.
 [3] (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3 (1), pp. 1–122. Cited by: §I, §IIIB1, §IIIB, §VA, §VIA.
 [4] (2003) Bayesian filtering: from kalman filters to particle filters, and beyond. Statistics 182 (1), pp. 1–69. Cited by: §IIB.
 [5] (2018) UAV trajectory optimization for data offloading at the edge of multiple cells. IEEE Transactions on Vehicular Technology 67 (7), pp. 6732–6736. Cited by: §I.
 [6] (2006) Nonmyopic sensor scheduling and its efficient implementation for target tracking applications. EURASIP Journal on Applied Signal Processing 2006, pp. 9–9. Cited by: §I, §IIA.
 [7] (2009) Consensusbased decentralized auctions for robust task allocation. IEEE transactions on robotics 25 (4), pp. 912–926. Cited by: §IIIA.
 [8] (2013) An outerapproximation approach for informationmaximizing sensor selection. Optimization Letters 7 (4), pp. 745–764. Cited by: §I.
 [9] (2009) On the roles of smoothing in planning of informative paths. In American Control Conference, 2009. ACC’09., pp. 2154–2159. Cited by: §VIA.
 [10] (2013) An improved threeweight messagepassing algorithm. arXiv preprint arXiv:1305.1961. Cited by: §I, §VA.
 [11] (1957) On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. American Journal of mathematics 79 (3), pp. 497–516. Cited by: §IVB2.
 [12] (2012) Robots for environmental monitoring: significant advancements and applications. IEEE Robotics & Automation Magazine 19 (1), pp. 24–39. Cited by: §I.
 [13] (1996) A densitybased algorithm for discovering clusters in large spatial databases with noise.. In Kdd, Vol. 96, pp. 226–231. Cited by: §VID.
 [14] (2017) A scalable multitarget tracking system for cooperative unmanned aerial vehicles. IEEE Transactions on Aerospace and Electronic Systems 53 (4), pp. 1947–1961. Cited by: §I, §IIC, §IIIA, §VIC, §VID, TABLE I, TABLE II, TABLE III, TABLE IV.
 [15] (2010) Mobile sensor network control using mutual information methods and particle filters. IEEE Transactions on Automatic Control 55 (1), pp. 32–47. Cited by: §I, §VIA.
 [16] (1970) Differential dynamic programming. Cited by: §IVB1.
 [17] (2018) Online path planning of autonomous uavs for bearingonly standoff multitarget following in threat environment. IEEE Access 6, pp. 22531–22544. Cited by: §I.
 [18] (2011) Samplingbased algorithms for optimal motion planning. The international journal of robotics research 30 (7), pp. 846–894. Cited by: §I.
 [19] (2004) Three dimensional receding horizon control for uavs. In AIAA Guidance, Navigation, and Control Conference and Exhibit, pp. 5144. Cited by: §IVC.
 [20] (2018) Potential gamebased nonmyopic sensor network planning for multitarget tracking. IEEE Access 6, pp. 79245–79257. Cited by: §I, §IIA.
 [21] (2018) Efficient sensor network planning based on approximate potential game. International Journal of Distributed Sensor Networks 14 (6), pp. 1–17. Cited by: §I.
 [22] (2010) Informationrich path planning with general constraints using rapidlyexploring random trees. Cited by: §I, §IIC.
 [23] (2003) Multistep informationdirected sensor querying in distributed sensor networks. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., Vol. 5, pp. V–145. Cited by: §I.
 [24] (2014) Optimal periodic sensor scheduling in networks of dynamical systems. IEEE Transactions on Signal Processing 62 (12), pp. 3055–3068. Cited by: §I, §VA.
 [25] (2011) Informationrich task allocation and motion planning for heterogeneous sensor platforms. In Infotech@ Aerospace 2011, pp. 1588. Cited by: §I, §IIIA, §VIC, §VID, TABLE I, TABLE II, TABLE III, TABLE IV.
 [26] (2012) Sparsitypromoting extended kalman filtering for target tracking in wireless sensor networks. IEEE Signal Processing Letters 19 (12), pp. 845–848. Cited by: §VA.
 [27] (2009) A pomdp framework for coordinated guidance of autonomous uavs for multitarget tracking. EURASIP Journal on Advances in Signal Processing 2009, pp. 2. Cited by: §I.
 [28] (2014) Combining the benefits of function approximation and trajectory optimization.. In Robotics: Science and Systems, Cited by: §I.
 [29] (2013) Dadmm: a communicationefficient distributed algorithm for separable optimization. IEEE Transactions on Signal Processing 61 (10), pp. 2718–2723. Cited by: §I, §IIIB1.
 [30] (2015) Coordinated standoff tracking of moving target groups using multiple uavs. IEEE Transactions on Aerospace and Electronic Systems 51 (2), pp. 1501–1514. Cited by: §I, §IIIA, §VIC, TABLE I, TABLE II.
 [31] (2015) Cooperative collision avoidance via proximal message passing. In 2015 American Control Conference (ACC), pp. 4124–4130. Cited by: §I.
 [32] (1999) Optimization of observer trajectories for bearingsonly target localization. IEEE Transactions on Aerospace and Electronic Systems 35 (3), pp. 892–902. Cited by: §I.
 [33] (2008) Trajectory optimization for target localization using small unmanned aerial vehicles. Ph.D. Thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics. Cited by: §I, §IIA, §IIC.
 [34] (2013) UAV path planning in a dynamic environment via partially observable markov decision process. IEEE Transactions on Aerospace and Electronic Systems 49 (4), pp. 2397–2412. Cited by: §I, §IIA, §IIC.
 [35] (1991) TSPLIB—a traveling salesman problem library. ORSA journal on computing 3 (4), pp. 376–384. Cited by: §IIIA.
 [36] (2007) UAV path and sensor planning methods for multiple ground target search and trackinga literature survey. Linköping University Electronic Press. Cited by: §I, §IIB.
 [37] (1970) Radar handbook. Cited by: §VIA.
 [38] (2016) Fast admm algorithm for distributed optimization with adaptive penalty.. In AAAI, pp. 753–759. Cited by: §VA, §VIA.
 [39] (2005) A generalized iterative lqg method for locallyoptimal feedback control of constrained nonlinear stochastic systems. In American Control Conference, 2005. Proceedings of the 2005, pp. 300–306. Cited by: §IVB.
 [40] (2008) General duality between optimal control and estimation. In 2008 47th IEEE Conference on Decision and Control, pp. 4286–4292. Cited by: §IVB.
 [41] (2012) Motion planning under uncertainty using iterative local optimization in belief space. The International Journal of Robotics Research 31 (11), pp. 1263–1278. Cited by: §IIB, §IVB, §VA.
 [42] (2012) Distributed alternating direction method of multipliers. Cited by: §IIIB1, §IIIB.
 [43] (2007) Approximate dynamic programming for communicationconstrained sensor network management. IEEE Transactions on Signal Processing 55 (8), pp. 4300–4311. Cited by: §I, §VIA.
 [44] (2018) Optimal uav path planning: sensing data acquisition over iot sensor networks using multiobjective bioinspired algorithms. IEEE Access 6, pp. 13671–13684. Cited by: §I.
 [45] (2014) Asynchronous distributed admm for consensus optimization. In International Conference on Machine Learning, pp. 1701–1709. Cited by: §I, §IIIB1.