A Distributed ADMM Approach to Non-Myopic Path Planning for Multi-Target Tracking

A Distributed ADMM Approach to Non-Myopic Path Planning for Multi-Target Tracking

Soon-Seo Park\authorrefmark1, Youngjae Min\authorrefmark1, Jung-Su Ha, Doo-Hyun Cho and Han-Lim Choi\authorrefmark2 Soon-Seo Park and Han-Lim Choi are with the Department of Aerospace Engineering & KI for Robotics, KAIST. Youngjae Min is with School of Electrical Engineering, KAIST. Jung-Su Ha is with Machine Learning and Robotics Lab, University of Stuttgart and Max Planck Institute for Intelligent Systems. Doo-Hyun Cho is with Mechatronics R&D center, Samsung Electronics.\authorrefmark1Equal contribution. \authorrefmark2Corresponding author: hanlimc@kaist.ac.kr
Abstract

This paper investigates non-myopic path planning of mobile sensors for multi-target tracking. Such problem has posed a high computational complexity issue and/or the necessity of high-level decision making. Existing works tackle these issues by heuristically assigning targets to each sensing agent and solving the split problem for each agent. However, such heuristic methods reduce the target estimation performance in the absence of considering the changes of target state estimation along time. In this work, we detour the task-assignment problem by reformulating the general non-myopic planning problem to a distributed optimization problem with respect to targets. By combining alternating direction method of multipliers (ADMM) and local trajectory optimization method, we solve the problem and induce consensus (i.e., high-level decisions) automatically among the targets. In addition, we propose a modified receding-horizon control (RHC) scheme and edge-cutting method for efficient real-time operation. The proposed algorithm is validated through simulations in various scenarios.

I Introduction

Technological advances in sensor networks have enabled many applications such as monitoring pollution/atmospheric phenomena, reconnaissance and surveillance missions in both defense and civilian areas, traffic monitoring, forest fire localization and wildlife tracking [12, 44]. These sensor networks are based on two types of sensors: static and mobile sensors. It is widely demonstrated that mobile sensors offer distinctive advantages over static ones in terms of the quality of sensing and estimation [6, 15, 33, 27, 32, 30, 14, 36, 17], coverage area, data offloading [5] and adaptability to dynamic environments [34]. However, for mobile sensor networks, it is crucial to efficiently operate sensing agents to maximize information gain about target systems while satisfying constraints on entire systems. This problem is described as a sensor planning problem that determines the future utilization of sensing agents given the current states of resources and environments. One typical approach for the sensor planning problem is making myopic plans covering one or few time steps to reduce the complexity of the problem. For instance, Hoffmann and Tomlin [15] proposed a single step mobile sensor planning algorithm for single target tracking based on an information-theoretic metric. They plan the one step ahead control inputs of the sensing agents to maximize the mutual information between the sensor measurements and the estimated target state. In multi-target tracking tasks, most existing works adopt the two-phase approach in which they cluster and assign targets to each sensing agent and, then, solve the split problem. [30, 14]. However, such myopic planning algorithms are vulnerable to bad local optima and show poor performances for sensor networks with sensing holes [23, 20]. Their optimization over the near future cannot reflect long-term goals such as taking targets out of sensing holes by changing the positions and directions of sensors. Thus, we focus on non-myopic planning algorithms which explore multiple time steps. There are several existing approaches to non-myopic sensor planning problems. One natural way is to extend sensor scheduling methods [43, 8, 21] for static sensor networks. As the methods select operating sensors among finite candidates, control inputs of the mobile sensors are discretized and the same optimization methods are applied for the finite optimization variables [6, 20]. However, this approach suffers from a critical computational issue that the time complexity grows exponentially as the planning horizon increases. Another approach is the sampling-based planning. Information-rich Rapidly-exploring Random Tree (IRRT) algorithm [22] connects random sample points on the sensing space so that the sensor measurements along the connected paths maximize the information on a single target. Luders et al. [25] apply this IRRT algorithm in a two-phase approach so that it tracks multiple targets. However, these sampling-based works do not guarantee any optimality in their solutions [18]. A better candidate for non-myopic planning of mobile sensors is local trajectory optimization method. It searches for a solution to the optimization problem near the given initial guess using gradient descent algorithms. This gradient-based trajectory optimization has been successfully applied in many single-target tracking applications by employing analytically differentiable filter models on target estimation [33, 27, 32, 34, 36]. It shows much faster computation time than the discrete optimization methods and guarantees local optimality unlike the sampling-based methods. However, in multi-target tracking problems, local trajectory optimization methods can be easily trapped in a bad local optimum unless a proper initial guess is given. In this paper, we investigate non-myopic path planning of mobile sensors for multi-target tracking problems. There is a key challenge to employ local trajectory optimization method in multi-target tracking problems; to make a good initial guess, we need a high-level decision-making process that determines which mobile sensors should track which targets and in what order the targets should be tracked. This problem is also referred to as the task-assignment problem. Unfortunately, the problem is hard to be formulated to reflect the mobility of the sensors and expected changes in the states of the sensors and the targets. Instead, existing works, while they are myopic, with two-phase approach use heuristics for the formulation [30, 14], which degrade the quality of initial guess and, thus, the target estimation performance. Therefore, the objective of this study is to solve the non-myopic path planning problem for multi-target tracking without heuristic task-assignment. The main contribution of this paper is to provide a new practical algorithm without heuristic task-assignment on the non-myopic path planning problem for multi-target tracking. More specifically, we

  1. reformulate the general non-myopic path planning problem for multi-target tracking to a distributed optimization problem with respect to targets,

  2. solve the transformed problem by combining ADMM and local trajectory optimization method, and

  3. propose a modified receding-horizon control (RHC) scheme and edge-cutting method for efficient real-time operation

Our algorithm circumvents the task-assignment problem by solving subproblems for each target, which have only single task, and inducing consensus among those subproblems automatically. Consequently, our distributed framework integrates the decision-making process into the trajectory optimization process so that they are solved simultaneously without the aid of heuristic task-assignment algorithms. The alternating direction method of multipliers (ADMM) [3, 29, 45] has been also effectively applied to other problems in robotic control and sensor scheduling [2, 31, 10, 28, 24]. For instance, Bento et al. [2] adopted ADMM to multi-agent trajectory planning for collision-avoidance. ADMM is used to incorporate various constraints, such as energy minimization, with the planning objective. Ong et al. [31] also developed a message-passing algorithm based on ADMM to solve a collision-avoidance problem with communication constraints. Mordatch et al. [28] combined trajectory optimization and global policy (neural network) learning using ADMM in a robotic control problem. Liu et al.[24] used ADMM to solve an optimization problem with cardinality functions in scheduling static sensor networks for state estimation of dynamical systems. Like these works, we formulate a problem on which ADMM can work effectively and successfully solve the problem. The rest of this paper is organized as follows. In Section II, we define the general non-myopic path planning problem for multi-target tracking. Then, we reformulate the original problem as a distributed trajectory optimization problem in Section III. Section IV presents the details of our algorithm including the modified RHC scheme. In Section V, computational complexity of the proposed algorithm is analyzed and the edge-cutting method is introduced to reduce the complexity. Finally, the simulation results of the proposed algorithm are given in Section VI.

Ii Problem Formulation

Ii-a Non-Myopic Path planning for multi-target tracking

We consider the problem of non-myopically planning multiple mobile sensors to efficiently track multiple targets. We formulate this problem based on the known dynamics of targets and sensors, as well as the sensing model. From these models, we plan an optimal path that each sensor takes successive measurements to estimate target states efficiently. In this work, targets are assumed to follow linear stochastic dynamics which is a common assumption in the literature [20, 34, 6, 33]. Let the index set of targets be , where the number of targets, , is known and fixed. The -th () target’s state, , is updated as:

(1)

where is the Gaussian random noise independent of other targets and measurements. and are the transition matrices and noise covariance matrices of the -th target, respectively. For sensor dynamics, we employ a general model, , for each sensor. Let the index set of mobile sensors be . The dynamics for the -th () sensor’s state, , is expressed as:

(2)

where is the control input at time . Here, each mobile sensor is assumed to be carried by one fixed-wing UAV for simplicity. 111 In this paper, we use the terms ‘mobile sensor’ and ‘UAV’ interchangeably. For simplicity, we assume that each UAV has one sensor on it. In general, some platforms may carry more than one sensor, and the sensors may be heterogeneous. In such cases, the sensors on the same vehicle belong to the same dynamics. Denoting the measurement taken by the -th UAV against the target at time as , the sensor measurement model can be expressed as follows:

(3)

where is the Gaussian random noise, independent of other measurements and targets’ motion noise, . These models are incorporated to form an objective function, , which indicates the uncertainty of the target state estimation. The goal of the general target tracking problem is to estimate target states accurately, which corresponds to minimizing the uncertainty. Since we are solving a non-myopic path planning problem, the optimization variables are a sequence of states/control inputs of mobile sensors over certain future time steps, . Thus, in general, non-myopic path planning for multi-target tracking can be formulated as [20, 34, 6]:

(4)

where

Here, the optimal trajectory, , satisfies the dynamics constraint and the given initial condition. is the uncertainty of the -th target at time , penalizes the control effort of the mobile sensors along the trajectory, and is an additional constraint, such as collision-avoidance, related to the states of the mobile sensors. is a weighting matrix.

Ii-B Target State Estimation

The target states are estimated through the extended Kalman filter (EKF), which is widely used for state estimation of non-linear systems [4]. Note that any other non-linear Gaussian filters, such as the unscented Kalman filter, can be incorporated instead. The targets are assumed to have independent motions and be measured independently via data association methods[1]. Then, their joint distribution could be factorized and a single tracking filter is used for each target [36]. For each target, its belief states, , are updated by (for simplicity, the superscript is temporarily dropped):

(5)
(6)

where

h

Equations (5) and (6) define the belief dynamics. The second term in (5), called the innovation term, depends on the measurements . Equation (6) evolves given the current covariance of the target, regardless of the measurement. Our optimization problem in (4) evaluates the target state estimations in future time steps. Since future measurements are unknown in advance, we treat the innovation term in (5) as stochastic, which follows [41]. Defining the belief state of the -th target as ( is a vector with upper triangular portion of the input matrix since is symmetric), the stochastic belief dynamics of the target is given by:

(7)

where

(8)

Ii-C Uncertainty Measure

The uncertainty term in (4) represents the inaccuracy of target state estimation. For this measure, we use

(9)

The trace of the covariance matrix of target state estimation physically means the average variance of the estimation error and is one of the widely used metrics [22, 33, 14, 34]. This measure is in a quadratic scale for physical distance error. The expectation is taken since we are employing stochastic belief dynamics as in (7). Note that any other differentiable metric can be incorporated instead.

Iii Transformation to Distributed Trajectory Optimization Problem

Iii-a Challenge of Non-Myopic Problem

The non-myopic path planning problem in (4) generally faces an abundance of local optima due to the non-convexity of its cost function. This non-convexity primarily comes from the dependency of optimal sensor trajectories on high-level decision-making. They significantly vary upon which target is assigned to which sensor and the order in which each sensor tracks the assigned targets. Since most trajectory optimization methods find solutions around an initial guess, our problem with numerous local optima is very sensitive to initialization. Therefore, we need to guide a solution by properly initializing the optimization variables. Finding a good initial guess for the problem induces another hard problem, task-assignment. Unfortunately, it is very difficult to construct the task-assignment problem in our setting to reflect the mobility of the sensors and expected changes in the states of the sensors and the targets. One feasible option is to heuristically assign tasks based on the initial states of targets and sensors utilizing the traveling salesman problem (TSP) [35], CBBA [7, 25], and/or clustering [30, 14]. However, it is still hard to consider the change of targets and sensors’ states over time sufficiently, and the option may provide a bad local optimum.

Iii-B Transformed Problem

Instead of solving the hard problem, we detour the difficulty and reformulate the original problem (4) into a distributed trajectory optimization problem. For each target, we solve:

(10)

where

Trajectory optimization for each target is much easier than that for multiple targets because task-assignment is not considered. Also, it is not difficult to generate the initial guess for the problem of tracking a single target. By solving the trajectory optimization for each target and making consensus, this distributed optimization is solved through Alternating Direction Method of Multipliers (ADMM) [3, 42].

Iii-B1 Distributed Alternating Direction Method of Multipliers

ADMM is an algorithm that efficiently optimizes objective functions composed of terms that each has efficient solution methods [3]. For our purposes, we use a consensus and sharing optimization form of ADMM, also known as distributed ADMM [42, 29, 45]. We consider the following minimization problem:

(11)

This problem can be rewritten with the local variable and common global variable :

(12)

This is called the global consensus problem, as the constraint is that all the local variables should be equal (i.e., agree). Solving this problem is equivalent to optimizing the augmented Lagrangian:

(13)

where is the Lagrange multipliers for , and is a penalty weight. The ADMM algorithm updates the local variable , the common global variable and the Lagrange multiplier as follows. For we iteratively execute the following three steps:

(14)

where is a step-size parameter. Here, note that the first and the last steps can be carried out independently for each . Thus, we can run the algorithm in parallel depending on the implementation with multiple but no more than computational devices, where each device performs an update on its set of variables. The optimization results are communicated back to a master node, which performs the global variable update and returns the result back to other worker nodes. The processing element that handles the global variable is sometimes called the central collector or the fusion center.

Iv Algorithm Description

Iv-a Overall Algorithm

We now describe the distributed trajectory optimization algorithm to solve the problem (10). After applying initial conditions, our algorithm works iteratively based on the distributed ADMM with three major steps. First, the original problem (10) is split into a separate optimization problem for each target and solved in parallel. Based on the optimization results, the common global variables are updated for consensus. Then, the Lagrange multipliers are updated to enforce the local variables to be closer to the common global variables. The complete procedure is summarized in Algorithm 1. To solve the augmented Lagrangian, the split optimization problem for each target is:

(15)

where and are the local variables for the states and control inputs of the mobile sensors, respectively. This is a trajectory optimization problem with two additional quadratic cost terms and can be solved with the existing trajectory optimization method described in Section IV-B. The split problems are independent and, then, able to be solved in parallel, as described in Section III-B1. In this work, the trajectory variables in (15) are initialized by the method described in Section IV-B2, and they are averaged to initialize the global common variables . The updates for the common global variables are as follows:

(16)

This is the mean consensus process that takes the average of the results for each trajectory optimization. Then, the common global variables affect the update of the trajectory variables for each target (15), which causes the results of trajectory optimization for each target to converge to the same solution. The updates of the Lagrange multipliers for each target are as follows:

(17)

The important role of the multipliers is making high-level decisions, i.e., task-assignment. They put more forces on local variables which are largely deviated from the global variables to be close by reducing its magnitude more. By affecting the cost function in (15), the optimal trajectory solutions are guided into more informative regions. This process lets the task-assignment be automatically accomplished so that mobile sensors can further reduce the uncertainty of the targets.

1:Choose penalty weight and step-size
2:Generate initial guess for each target
3:while not converged do
4:     Update by solving each trajectory optimization problems (15) in parallel (see Section IV-B)
5:     Update by averaging the results for each trajectory optimization
6:     Update using (17)
7:end while
Algorithm 1 Distributed Trajectory Optimization

Iv-B Trajectory Optimization

We adopt belief space iterative Linear Quadratic Gaussian (belief space iLQG) [41] to solve the trajectory optimization problems described in (15). Belief space iLQG extends the iLQG method[39] to Partially Observable Markov Decision Processes (POMDPs) using a Gaussian belief, instead of a fully observable state. In the forward pass, belief space iLQG uses a standard EKF to compute the next time step belief. For a backward pass, belief space iLQG linearizes covariance in addition to quadratizing states and control inputs. Although the applications of this approach are focused on control problems, it is directly applicable to estimation problems (e.g., target tracking) due to the duality of control and estimation [40] The belief space iLQG operates with the belief dynamics of the whole system with targets and sensors. That of the targets is given in (7). By integrating it with the dynamics of mobile sensors, the belief dynamics of the entire system is represented by:

(18)

where

Iv-B1 Control Policy

Here, we explain the details of solving (15) with the belief space iLQG. By linearizing the dynamics around the nominal trajectory distribution, the approximate dynamics is expressed as:

(19)

where

is the -th column of matrix . Note that has columns, where is the dimension of the state. We approximate the nonquadratic cost function in (15) to quadratic one along the nominal belief and control trajectory . For the notional simplicity, the cost function (15) is denoted by ,

(20)

where . and are the deviations from the nominal trajectory. The terms with subscripts denote Jacobian and Hessian matrices of their respective functions.

(a) Modified dubins path
(b) Path tracking scheme
Fig. 1: The path is generated by a modified Dubins path algorithm, which is used to provide waypoints for the generation of the initial guess. Control inputs are generated so that UAV follows the path connecting the waypoints through the PD-controller, and these are used as the initial guess of the local trajectory optimization method.

Given the linear dynamics (19) and quadratic cost (20), we obtain a quadratic approximation of the value function along a nominal trajectory :

(21)

Following the dynamic programming principle [16], the Bellman equation for the value function and control policy in discrete-time are specified as:

(22)

where

By substituting equations (19) and (20) into (22), the Q-function is given by:

(23)

where

(24)

In order to find the optimal control policy, we compute the local variations in the control that minimize the quadratic approximation of the -function in (22):

(25)

The optimal control can be found as . Substituting (25) into (22) gives the value function as a function of only in the form of (21):

(26)

This recursion then continues by computing the control policy for time step .

Iv-B2 Initial Guess Generation

For fast convergence to a better solution, good initial guesses are still required in the subproblems (15), in which it is much easier to find a good one than the original problem (4). For the initialization, we derive an initial guess generator for a case of fixed-wing UAV mounting a sensor with perpendicular boresight. The generator is based on Dubins paths[11] that can mimic the behavior of a fixed-wing UAV. We first generate a continuous path. Let , and be the minimum turning radius of the -th UAV, the distance between the nadir point and the center point of the -th UAV’s sensor footprint and the expected position of the target , respectively. As shown in Fig. 1(a), a path is created to arrive as soon as possible at any point of a circle with radius centered on with the sensor boresight toward the center. Then, the remaining path is set for the sensor to rotate along the circle. Here, should be set to an appropriate value reflecting the movement of the target. The expected position of the target can be predicted using the current estimates and the dynamic model of the target. Then, the generated path is discretized at specific intervals and used as waypoints to generate a sequence of control inputs. For the sequence of desired waypoints , we construct a controller that reduces the angle between the line connecting the two adjacent waypoints and the heading direction of the UAV. A damping effect is also added to the controller for the stability. The geometry of this path tracking problem is depicted in Fig. 1(b). In our simulation, the control input of the PD-controller can be computed with (30) as follows:

(27)

where and are the gains of the controller.

Iv-C Modified Receding Horizon Control for Real-Time Operation

For real-time operation, the planning should be done while the UAVs are executing the previous plan. Otherwise, they stop and wait for the next plan for every planning cycle. Also, the established plan could be corrected by reflecting newly obtained information during its execution. For these purposes, we propose a modified receding-horizon control (RHC) scheme with introducing a new concept of control horizon. RHC is a form of a feedback control system which executes only the forepart of a plan and start over the planning for a shifted time horizon with updated information [19]. Extending this concept, our algorithm plans for future time steps while executing the previous plan for time steps with predefined control horizon and planning horizon . Note that is determined by the available computational resources and the communication delay, and should be large enough so that the trajectories of the mobile sensors can cover the overall domain of interest. The modified RHC is described in Algorithm 2. Here, we explain a single cycle starting at . Pre-calculating the future plan for to requires the target belief estimation at . We use the maximum likelihood assumption and estimate the future belief without actual measurements (line 6). A new plan for to is created using the proposed distributed trajectory optimization algorithm (line 7). This planning is done in parallel to executing the previous plan. While the algorithm plans for future time steps, only the fore steps are actually executed (line 9). The process is then repeated for the next time window.

1:Set control horizon and planning horizon
2:Input , and initial plan
3:while operation time is remaining do
4:     for  do
5:         if  then (in parallel)
6:              
7:              
8: Algorithm 1.
9:         end if
10:         
11:     end for
12:     
13:     
14:end while
Algorithm 2 Modified Receding-Horizon Control

V Complexity Reduction

V-a Complexity Analysis

The computational complexity of each iteration of the distributed ADMM is dominated by solving the subproblems defined in (15) because computing the global common variable and Lagrange multipliers requires very simple computations. The subproblems are solved by the belief space iLQG method described in Section IV-B , and its computational complexity in control problems is already well analyzed in [41]. The bottleneck of running time in our problem lies in the calculation of the matrix in (23). We first consider the case of a single target. Let and be the state dimensions of the target and the mobile sensors, respectively. As the target belief contains the covariance matrix of the state, the dimension of the belief is . Accordingly, the belief dimension of the entire system is , and computing the product in (24) takes time. Evaluating using numerical differentiation (central differences) can be done in time. The remaining elements do not form bottlenecks during computation as discussed in [41]. Then, since one cycle of value iteration takes steps, its complexity is . The number of cycles cannot be expressed in terms of dimensional notation, but convergence can be expected after 10-100 cycles in practice. Furthermore, it is known that belief space iLQG converges with a second-order rate to a local optimum [41]. For multiple targets, our algorithm operates distributedly. The complexity is, then, multiplied by the number of iterations of distributed ADMM if the computation is fully parallelized. ADMM is known to require a few tens of iterations to converge with modest accuracy for many applications [3, 24, 26]. Note that when the original problem (4) is solved non-distributedly, the complexity is multiplied by , instead, regarding the independence among the targets. This is more complex than our distributed approach when is large, and it even requires high-level decision-making as mentioned in Section III-A. There have been many researches on fast convergence for general distributed ADMM[2, 10, 38]. In this paper, we propose an edge-cutting method which is tailored to our algorithm and effectively shortens its running time.

Fig. 2: (a) Fully connected bipartite graph, and (b) bipartite graph after applying the edge-cutting method

V-B Edge-cutting Method

The computational load of our algorithm can be effectively reduced when we consider the characteristics of our problem. As pointed out earlier, our distributed scheme automatically makes high-level decisions (i.e., task-assignment). A mobile sensor may choose a path specialized for certain targets during the consensus and disregards the others. In this case, the sensor cannot contribute to tracking the disregarded targets, and, thus, it is a bad investment of computational resources to solve the trajectory optimization of the sensor for those targets. Based on this observation, we propose a heuristic method to detect this inefficacy and halt the unprofitable investment. The edge-cutting method detects unavailing sensors for each target for every consensus step in ADMM. For given , the method excludes sensor in the tracking task of target if

(28)

where denotes the uncertainty of the target when the sensor is excluded. The is chosen to be a threshold for judging whether the sensor reduces the uncertainty of the target enough or not. The uncertainty is evaluated for obtained in the consensus phase (16) except for the sensor . This pruned architecture is reset to the original one for new time windows. When is properly chosen (not too large), the method has empirically shown similar target estimation results as not applying it in most cases with significantly reduced computation time. The method is quantitatively evaluated in Section VI-D. Our edge-cutting method has similar, but different, concept with the three weight message-passing algorithm (TWA) in [2], which solves multi-agent trajectory optimization for energy minimization and collision-avoidance. As shown in Fig. 2, our algorithm can be also expressed with a bipartite graph as in [2] based on the structure of ADMM. Each blue node on the left side represents the trajectory optimizer solving (15) for each target, and each red node on the right side is the consensus node solving (16) for each sensor. For the consensus node, TWA employs weighted averages and excludes an edge from the consensus process by applying a zero weight without any change in the graph. On the other hand, our method prunes an edge of the graph when (28) is satisfied. This effectively reduces the complexities of the EKF in Section II-B and the iLQG in Section IV-B by reducing the dimensions of the sensor states and the measurement vector. Note that our algorithm allows pruning an edge since the trajectory optimization for a target always produces feasible solutions when solved for subsets of sensors. This is not the case for the optimizers for collision-avoidance in [2].

Vi Simulations

In this section, we numerically investigate the computational properties and the applicability of the proposed algorithms. To ascertain the effectiveness of the distributed trajectory optimization algorithm, we compare it with existing approaches; non-myopic trajectory planning methods with heuristic task-assignments as well as the myopic one. We also evaluate the modified RHC scheme and the edge cutting algorithm which are proposed for effective real-time operations.

Fig. 3: Sensor model for target tracking

Vi-a Simulation Model

(a) Initial guess
(b) Number of iterations = 2
(c) Number of iterations = 5
(d) Number of iterations = 25
(e) Proposed algorithm
(f) Resulting variances of target estimations
Fig. 4: (a)-(d) process of convergence and (e) final trajectory of the proposed non-myopic planning in a scenario with a single UAV and two stationary targets, and (f) the variances of the target estimations over the execution of the plan
Number of targets 2 3 4
Myopic trajectory planning [30] 314.74 262.45 318.38
Non-myopic trajectory planning with initial guess 259.21 209.98 234.99
Euclidean heuristic-based task-assignment [25] + non-myopic trajectory planning 171.94 179.64 182.58
Uncertainty-based task-assignment [14] + non-myopic trajectory planning 167.63 172.67 173.88
Proposed distributed trajectory optimization algorithm 124.52 150.76 149.11
TABLE I: Average uncertainty over 100 simulations for each algorithm in the single UAV scenarios.

We assume that the UAVs fly at different altitudes from each other to alleviate collision-avoidance constraints. Throughout the entire simulations, the planning horizon, , is set to 100 seconds, which is enough for the UAVs to cover all areas of interest with flying speed of . Furthermore, we set in (15), and , , and in (17). It is important to set the penalty parameters, , properly for fast convergence of ADMM. While they are chosen empirically in this work, many extensions of the distributed ADMM algorithm have been explored to adaptively select the penalty parameters. We would like to refer the readers to [3, 38]. For target motion model, we consider the motion on the XY-plane with a four-dimensional state vector of positions and velocities (in x and y axes) for a target. The matrix parameters of the linear stochastic dynamics in (1) are set as:

(29)

where is the time interval between two successive measurements, and is the process noise intensity representing the strength of the deviations from the predicted motion by the dynamic model. When is small, this model represents a nearly constant velocity. For simulations, we set and . For sensor platforms, we consider a set of fixed-wing Unmanned Aerial Vehicles (UAVs). The dynamics of the UAVs is given by:

(30)

where , , and are the position, speed, heading angle and bank angle of -th UAV at time , respectively. Gravity acceleration is denoted as . The sensors are mounted on the left or right sides of the UAVs and are aimed in a -degree downward direction as shown in Fig. 3. Each sensor measures the signal-to-noise ratio (SNR) [37]:

(31)

where , , and are selected to model the SNR of the sensor. and represent the distance between the sensor and the target as well as the angle between the sensor’s boresight and the direction from the sensor to the target, respectively. When is set to zero, the sensor measures the quasi-distance, which is a commonly used model [9, 43, 15]. We set , and .

Vi-B Process of Convergence

First, we consider a toy example in which a single UAV tracks two targets with zero initial velocities to explore the convergence process of the proposed distributed trajectory optimization algorithm. Figs. 4 (a-d) present the snapshots of the optimization process. In the beginning, the algorithm generates an initial guess (trajectory) of the UAV for each target independently, without any high-level decisions to consider both targets simultaneously. In the early phase, note that the consensus trajectory largely deviates from the distributively optimized one for each target. As the number of iterations increases, the deviation gets smaller and eventually vanishes. This convergence is enabled as each subproblem reflects the result of the other subproblem from the previous iteration through the Lagrange multipliers in (15). By repeating this interaction, the algorithm provides a non-myopic trajectory planning result that sequentially tracks two targets, as shown in Fig. 4(e). In addition to observing the convergence, we evaluate the performance of the converged trajectory in reducing the uncertainty of each target. As the components of the posterior uncertainties after executing the plan, the variances of the target state (position) estimations are logged along the time as shown in 4(f). The variances of each target are effectively reduced when the target is in the sensing range, while they increase gradually due to the stochastic motion of the target when it is in sensing holes.

(a) Myopic trajectory planning (b) Non-myopic trajectory planning with initial guess (c) Euclidean heuristic-based task-assignment + non-myopic trajectory planning (d) Uncertainty-based task-assignment + non-myopic planning
(e) Proposed algorithm
Fig. 5: Comparison of existing approaches and the proposed algorithm in a scenario with a single UAV and three targets
Uncertainty Posterior Uncertainty Localization RMSE [m]
Myopic trajectory planning [30] 252.15 252.47 8.4400
Non-myopic trajectory planning with initial guess 204.01 204.23 7.1812
Euclidean heuristic-based task-assignment [25] + non-myopic trajectory planning 194.56 194.32 6.8059
Uncertainty-based task-assignment [14] + non-myopic trajectory planning 169.23 169.50 6.7532
Proposed distributed trajectory optimization algorithm 147.39 146.95 6.5595
TABLE II: Average result over 1000 simulations for each algorithm in the single UAV scenario.
Number of targets 10 15
Target clustering & Euclidean heuristic-based task-assignment [25] + non-myopic trajectory planning 557.01 498.53
Target clustering & uncertainty-based task-assignment [14] + non-myopic trajectory planning 566.55 501.75
Proposed distributed trajectory optimization algorithm 389.59 377.06
TABLE III: Average uncertainty over 100 simulations for each algorithm in two-UAV scenarios.

Vi-C Single UAV tracking a Few Targets

In the second example, we compare the proposed algorithm with existing approaches under scenarios where one UAV tracks a few targets with zero initial velocities. The initial positions of the UAV and the targets on the XY-plane are randomly (uniformly) generated within a area. The targets are on the ground level and the UAV is flying at a fixed altitude of . The initial covariance matrices of the target estimations are also randomly generated as diagonal matrices of which diagonals are picked uniformly between 10 and 100. As the existing works [14, 25] on multi-target tracking tasks are myopic approaches, we extend them to non-myopic ones. We adopt their task-assignment algorithms to produce initial guesses on the original undistributed problem (4) and solve it through belief space iLQG as in our algorithm. The algorithms assign clustered targets to the closest mobile sensor (w.r.t. the center of the cluster) and determine the priorities of the clusters when they assigned to the same sensor. In [14], the priority is determined as the sum of the uncertainties (at ) of the targets in each cluster, while [25] uses the Euclidean distance of each cluster from its assigned mobile sensor and the number of targets in the cluster. Initial guesses are generated in a similar way as described in Section IV-B2, as shown as the dotted lines in Figs. 5(c-d), so that the UAV can visit the clusters (single target for each cluster in this case) sequentially based on the task-assignment results. Additionally, we tested two more undistributed algorithms. First, we apply a myopic algorithm [30] to (4) repeatedly to cover the planning horizon with single-step plans. The other is solving (4) with a naive initial guess, .

(a) Target clustering & Euclidean heuristic/uncertainty-based task-assignment + non-myopic trajectory planning
(b) Proposed algorithm
Fig. 6: Comparison of two-phase approaches and the proposed algorithm for a scenario with two UAVs and 10 targets
Uncertainty Posterior Uncertainty Localization RMSE [m]
Clustering based heuristic task-assignment [25, 14] + non-myopic trajectory planning 584.72 576.08 15.8290
Proposed distributed trajectory optimization algorithm 349.22 357.95 13.3126
TABLE IV: Average result over 1000 simulations for each algorithm in the two-UAV scenario.

Table I summarizes the average result over the 100 randomly generated topologies for each number of targets. To evaluate the planned trajectory, we compare the uncertainty of target estimation, which is the primary objective to minimize in our problem, averaged over targets, timesteps, and topologies. Since the uncertainty has a meaning of the average variance of target estimation, we can compare the values approximately in a quadratic scale. Our algorithm shows the best performance among the tested algorithms. To analyze the details, we investigate a single instance of the randomly generated topologies in depth. Fig. 5 shows the result of each algorithm in a scenario of single UAV tracking three targets. As shown in the Fig. 5(a), the myopic algorithm just plans a trajectory that minimizes only the control efforts until any target gets in the sensing range of the UAV. This is because the UAV cannot reduce the uncertainty without observing the targets. In Fig. 5(b), the resulting trajectory tracks only a single target. This shows that it could be stuck in a bad local optimum to perform non-myopic trajectory optimization without proper decision making. The two-phase approaches in Figs. 5(c-d) show better trajectories so that they sense more targets. However, the trajectories include more portion of idle time without observing a target than the proposed algorithm in Fig. 5(e). To compare the results numerically, we examine their average performance over 1000 randomly generated instances on the topology of Fig. 5. As shown in Table II, we evaluate posterior uncertainty and root mean square error (RMSE) of target localization in addition to uncertainty. Even though the topology was simple and easy to have intuitive decision making, the proposed algorithm still outperforms the others. While the two-phase approaches define new approximate/heuristic cost functions for the task-assignment and that decision making considers only the current status of the sensor networks, the proposed algorithm incorporates the decision making into the optimization process while considering the mobility of the sensor platform and the target state through the distributed formulation.

Vi-D Multiple UAVs tracking Numerous Targets

The third example considers the situation in which two UAVs track numerous targets with zero initial velocities. To compare the proposed algorithm to the two-phase approaches [14, 25] involving target clustering, the topologies of the targets and UAVs are randomly generated based on three-cluster structure. (Choose centers of the clusters and the number and positions of the targets for each cluster randomly.) For the clustering algorithm, density-based spatial clustering of applications with noise (DBSCAN) [13] is used. As parameters of DBSCAN, we set the minimum number of targets for a single cluster and the maximum radius of a cluster to 2 and 150, respectively. Table III summarizes the results. While the two-phase approaches show similar average uncertainty values, the proposed algorithm outperforms them. We further investigate an instance of the randomly generated topologies to analyze the performance gap in detail. Fig. 6 shows the resulting trajectories of the two-phase approaches and the proposed algorithm in a scenario with 10 targets. The Euclidean heuristic and uncertainty-based task-assignments provide the same initial guesses, where each cluster is assigned to a single UAV. However, the proposed algorithm automatically generates the trajectories for the two UAVs to visit the same cluster. Table IV shows that the trajectories outperform those from the two-phase approaches. Even in this well-clustered topology, the proposed algorithm plans trajectories more optimized for the mobile sensors to visit the target clusters without the aid of clustering and task-assignment algorithms.

(a) = 0 (not applied)
(b) = 2000
(c) = 6000
(d) Cost comparison w.r.t. value
(e) Results without parallel computing
(f) Results with parallel computing
Fig. 7: (a)-(c) Cutting time analysis for the edge-cutting method with three different values (Each solid line corresponds to an edge and indicates that the UAV is considering the target in the trajectory optimization process), (d) cost comparison for different values and (e)-(f) comparison of the computation times

Vi-E Evaluation of Edge-Cutting Method

We evaluate the edge-cutting method for the scenario in Fig. 6(b). Varying the values, the time when each edge is cut is analyzed in Fig. 7(a-c). As expected, larger implies faster cutting time since it plays a role of a weaker threshold. However, as shown in Fig. 7(d), too large values make the solution of the trajectory optimization converge to a bad local optimum with a larger cost. This is because too easily cut edges change the original problem significantly beyond the degree of proper approximation. When the epsilon value is set appropriately, not too large, the edge-cutting method produces the same result as it is not applied () with effectively reduced dimensions of the optimization. In Figs. 7(e-f), the actual computation time is compared between the proposed algorithms with and without the edge-cutting method. The comparison is performed on a 3.4 GHz Quad-Core Intel(TM) i7 PC. The results show that the edge-cutting method reduces the computation time effectively by rapidly achieving a solution with lower cost. The message-passing algorithm, TWA, in [2] is also tested to verify the effectiveness of the proposed method in our problem. TWA reduces the computation time compared to not applying it, but the amount is much smaller than the proposed method. Comparing Figs. 7(e) and (f), they verify that parallel computing further reduces the computation time effectively since our algorithm based on ADMM is capable of parallelization.

(a) time = 32 sec.
(b) time = 96 sec.
(c) time = 160 sec.
(d) time = 256 sec.
(e) Position error
(f) Velocity error
Fig. 8: (a)-(d) Snapshots of the current plan (solid line) and future plan (dashed line) for each mobile sensor in the replanning process scenario with two UAVs, one moving target and two stationary targets, and (e)-(f) particle filter estimation results

Vi-F Real-Time Operation Test

In the last example, shown in Fig. 8, we consider a scenario in which two UAVs track three targets, one with an initial velocity of 3m/s in the x-axis direction and the others with zero initial velocities. We applied the distributed trajectory optimization algorithm, the edge-cutting method, and the modified RHC scheme all together. It is assumed that a particle filter is used to estimate the states of the targets in the actual operating environment of the UAVs. The control horizon, , is set to 25 seconds considering the computation time of the algorithms. It is assumed that the sensors are attached to the left side of one UAV and to the right side of the other UAV, and that UAVs fly at different altitudes of and , respectively. Figs. 8(a-d) show some of the simulated results under the assumption that the UAVs operate for 400 seconds. A notable point in this example is that the UAV automatically passes the tracking mission to the other UAV after about 160 seconds. Figs. 8(e-f) show the state estimation results of the targets, where it can be confirmed that all targets are tracked reliably through the modified RHC scheme.

Vii Conclusions

In this paper, we investigated a distributed optimization approach to trajectory planning for a multi-target tracking problem. In order to solve the trajectory optimization and task-assignment problems simultaneously, the distributed trajectory optimization problem was formulated, and then solved by integrating a variant of a differential dynamic programming algorithm called iterative Linear-Quadratic-Gaussian (iLQG) algorithm with the distributed Alternating Direction Method of Multipliers (ADMM). In addition, we proposed an edge-cutting method to reduce the computation time of the algorithm and the modified RHC scheme for real-time operation. Numerical experiments were conducted and the results presented to demonstrate the applicability and validity of the proposed approach.

References

  • [1] Y. Bar-Shalom, P. K. Willett, and X. Tian (2011) Tracking and data fusion. YBS publishing. Cited by: §II-B.
  • [2] J. Bento, N. Derbinsky, J. Alonso-Mora, and J. S. Yedidia (2013) A message-passing algorithm for multi-agent trajectory planning. In Advances in neural information processing systems, pp. 521–529. Cited by: §I, §V-A, §V-B, §VI-E.
  • [3] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning 3 (1), pp. 1–122. Cited by: §I, §III-B1, §III-B, §V-A, §VI-A.
  • [4] Z. Chen et al. (2003) Bayesian filtering: from kalman filters to particle filters, and beyond. Statistics 182 (1), pp. 1–69. Cited by: §II-B.
  • [5] F. Cheng, S. Zhang, Z. Li, Y. Chen, N. Zhao, F. R. Yu, and V. C. Leung (2018) UAV trajectory optimization for data offloading at the edge of multiple cells. IEEE Transactions on Vehicular Technology 67 (7), pp. 6732–6736. Cited by: §I.
  • [6] A. S. Chhetri, D. Morrell, and A. Papandreou-Suppappola (2006) Nonmyopic sensor scheduling and its efficient implementation for target tracking applications. EURASIP Journal on Applied Signal Processing 2006, pp. 9–9. Cited by: §I, §II-A.
  • [7] H. Choi, L. Brunet, and J. P. How (2009) Consensus-based decentralized auctions for robust task allocation. IEEE transactions on robotics 25 (4), pp. 912–926. Cited by: §III-A.
  • [8] H. Choi, J. P. How, and P. I. Barton (2013) An outer-approximation approach for information-maximizing sensor selection. Optimization Letters 7 (4), pp. 745–764. Cited by: §I.
  • [9] H. Choi and J. P. How (2009) On the roles of smoothing in planning of informative paths. In American Control Conference, 2009. ACC’09., pp. 2154–2159. Cited by: §VI-A.
  • [10] N. Derbinsky, J. Bento, V. Elser, and J. S. Yedidia (2013) An improved three-weight message-passing algorithm. arXiv preprint arXiv:1305.1961. Cited by: §I, §V-A.
  • [11] L. E. Dubins (1957) On curves of minimal length with a constraint on average curvature, and with prescribed initial and terminal positions and tangents. American Journal of mathematics 79 (3), pp. 497–516. Cited by: §IV-B2.
  • [12] M. Dunbabin and L. Marques (2012) Robots for environmental monitoring: significant advancements and applications. IEEE Robotics & Automation Magazine 19 (1), pp. 24–39. Cited by: §I.
  • [13] M. Ester, H. Kriegel, J. Sander, X. Xu, et al. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise.. In Kdd, Vol. 96, pp. 226–231. Cited by: §VI-D.
  • [14] N. Farmani, L. Sun, and D. J. Pack (2017) A scalable multitarget tracking system for cooperative unmanned aerial vehicles. IEEE Transactions on Aerospace and Electronic Systems 53 (4), pp. 1947–1961. Cited by: §I, §II-C, §III-A, §VI-C, §VI-D, TABLE I, TABLE II, TABLE III, TABLE IV.
  • [15] G. M. Hoffmann and C. J. Tomlin (2010) Mobile sensor network control using mutual information methods and particle filters. IEEE Transactions on Automatic Control 55 (1), pp. 32–47. Cited by: §I, §VI-A.
  • [16] D. H. Jacobson and D. Q. Mayne (1970) Differential dynamic programming. Cited by: §IV-B1.
  • [17] H. Jiang and Y. Liang (2018) Online path planning of autonomous uavs for bearing-only standoff multi-target following in threat environment. IEEE Access 6, pp. 22531–22544. Cited by: §I.
  • [18] S. Karaman and E. Frazzoli (2011) Sampling-based algorithms for optimal motion planning. The international journal of robotics research 30 (7), pp. 846–894. Cited by: §I.
  • [19] Y. Kuwata and J. How (2004) Three dimensional receding horizon control for uavs. In AIAA Guidance, Navigation, and Control Conference and Exhibit, pp. 5144. Cited by: §IV-C.
  • [20] S. Lee, S. Park, and H. Choi (2018) Potential game-based non-myopic sensor network planning for multi-target tracking. IEEE Access 6, pp. 79245–79257. Cited by: §I, §II-A.
  • [21] S. Lee, Y. Park, and H. Choi (2018) Efficient sensor network planning based on approximate potential game. International Journal of Distributed Sensor Networks 14 (6), pp. 1–17. Cited by: §I.
  • [22] D. Levine, B. Luders, and J. P. How (2010) Information-rich path planning with general constraints using rapidly-exploring random trees. Cited by: §I, §II-C.
  • [23] J. Liu, D. Petrovic, and F. Zhao (2003) Multi-step information-directed sensor querying in distributed sensor networks. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings.(ICASSP’03)., Vol. 5, pp. V–145. Cited by: §I.
  • [24] S. Liu, M. Fardad, E. Masazade, and P. K. Varshney (2014) Optimal periodic sensor scheduling in networks of dynamical systems. IEEE Transactions on Signal Processing 62 (12), pp. 3055–3068. Cited by: §I, §V-A.
  • [25] B. Luders, D. Levine, S. Ponda, and J. How (2011) Information-rich task allocation and motion planning for heterogeneous sensor platforms. In Infotech@ Aerospace 2011, pp. 1588. Cited by: §I, §III-A, §VI-C, §VI-D, TABLE I, TABLE II, TABLE III, TABLE IV.
  • [26] E. Masazade, M. Fardad, and P. K. Varshney (2012) Sparsity-promoting extended kalman filtering for target tracking in wireless sensor networks. IEEE Signal Processing Letters 19 (12), pp. 845–848. Cited by: §V-A.
  • [27] S. A. Miller, Z. A. Harris, and E. K. Chong (2009) A pomdp framework for coordinated guidance of autonomous uavs for multitarget tracking. EURASIP Journal on Advances in Signal Processing 2009, pp. 2. Cited by: §I.
  • [28] I. Mordatch and E. Todorov (2014) Combining the benefits of function approximation and trajectory optimization.. In Robotics: Science and Systems, Cited by: §I.
  • [29] J. F. Mota, J. M. Xavier, P. M. Aguiar, and M. Püschel (2013) D-admm: a communication-efficient distributed algorithm for separable optimization. IEEE Transactions on Signal Processing 61 (10), pp. 2718–2723. Cited by: §I, §III-B1.
  • [30] H. Oh, S. Kim, H. Shin, and A. Tsourdos (2015) Coordinated standoff tracking of moving target groups using multiple uavs. IEEE Transactions on Aerospace and Electronic Systems 51 (2), pp. 1501–1514. Cited by: §I, §III-A, §VI-C, TABLE I, TABLE II.
  • [31] H. Y. Ong and J. C. Gerdes (2015) Cooperative collision avoidance via proximal message passing. In 2015 American Control Conference (ACC), pp. 4124–4130. Cited by: §I.
  • [32] Y. Oshman and P. Davidson (1999) Optimization of observer trajectories for bearings-only target localization. IEEE Transactions on Aerospace and Electronic Systems 35 (3), pp. 892–902. Cited by: §I.
  • [33] S. S. Ponda, R. M. Kolacinski, and E. Frazzoli (2008) Trajectory optimization for target localization using small unmanned aerial vehicles. Ph.D. Thesis, Massachusetts Institute of Technology, Department of Aeronautics and Astronautics. Cited by: §I, §II-A, §II-C.
  • [34] S. Ragi and E. K. Chong (2013) UAV path planning in a dynamic environment via partially observable markov decision process. IEEE Transactions on Aerospace and Electronic Systems 49 (4), pp. 2397–2412. Cited by: §I, §II-A, §II-C.
  • [35] G. Reinelt (1991) TSPLIB—a traveling salesman problem library. ORSA journal on computing 3 (4), pp. 376–384. Cited by: §III-A.
  • [36] P. Skoglar (2007) UAV path and sensor planning methods for multiple ground target search and tracking-a literature survey. Linköping University Electronic Press. Cited by: §I, §II-B.
  • [37] M. I. Skolnik (1970) Radar handbook. Cited by: §VI-A.
  • [38] C. Song, S. Yoon, and V. Pavlovic (2016) Fast admm algorithm for distributed optimization with adaptive penalty.. In AAAI, pp. 753–759. Cited by: §V-A, §VI-A.
  • [39] E. Todorov and W. Li (2005) A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems. In American Control Conference, 2005. Proceedings of the 2005, pp. 300–306. Cited by: §IV-B.
  • [40] E. Todorov (2008) General duality between optimal control and estimation. In 2008 47th IEEE Conference on Decision and Control, pp. 4286–4292. Cited by: §IV-B.
  • [41] J. Van Den Berg, S. Patil, and R. Alterovitz (2012) Motion planning under uncertainty using iterative local optimization in belief space. The International Journal of Robotics Research 31 (11), pp. 1263–1278. Cited by: §II-B, §IV-B, §V-A.
  • [42] E. Wei and A. Ozdaglar (2012) Distributed alternating direction method of multipliers. Cited by: §III-B1, §III-B.
  • [43] J. L. Williams, J. W. Fisher, and A. S. Willsky (2007) Approximate dynamic programming for communication-constrained sensor network management. IEEE Transactions on Signal Processing 55 (8), pp. 4300–4311. Cited by: §I, §VI-A.
  • [44] Q. Yang and S. Yoo (2018) Optimal uav path planning: sensing data acquisition over iot sensor networks using multi-objective bio-inspired algorithms. IEEE Access 6, pp. 13671–13684. Cited by: §I.
  • [45] R. Zhang and J. Kwok (2014) Asynchronous distributed admm for consensus optimization. In International Conference on Machine Learning, pp. 1701–1709. Cited by: §I, §III-B1.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
398357
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description