NonMyopic Target Tracking Strategies for StateDependent Noise
Abstract
We study the problem of devising a closedloop strategy to control the position of a robot that is tracking a possibly moving target. The robot is capable of obtaining noisy measurements of the target’s position. The key idea in active target tracking is to choose control laws that drive the robot to measurement locations that will reduce the uncertainty in the target’s position. The challenge is that measurement uncertainty often is a function of the (unknown) relative positions of the target and the robot. Consequently, a closedloop control policy is desired which can map the current estimate of the target’s position to an optimal control law for the robot.
Our main contribution is to devise a closedloop control policy for target tracking that plans for a sequence of control actions, instead of acting greedily. We consider scenarios where the noise in measurement is a function of the state of the target. We seek to minimize the maximum uncertainty (trace of the posterior covariance matrix) over all possible measurements. We exploit the structural properties of a Kalman Filter to build a policy tree that is orders of magnitude smaller than naive enumeration while still preserving optimality guarantees. We show how to obtain even more computational savings by relaxing the optimality guarantees. The resulting algorithms are evaluated through simulations.
I Introduction
Tracking a moving, possibly adversarial target is a fundamental problem in robotics and has long been a subject of study [1, 2, 3, 4, 5, 6]. Target tracking finds applications in many areas such as surveillance [7], telepresence [8], assisted living [9], and habitat monitoring [10, 11]. Target tracking refers to broadly two classes of problems: (i) estimating the position of the target using noisy sensor measurements; and (ii) actively controlling the sensor position to improve the performance of the estimator. The second problem is distinguished as active target tracking and is the subject of study of this paper.
The main challenge in active target tracking is that the value of future measurement locations can be a function of the unknown target state. Take as example, a simple instance of estimating the unknown position of a stationary target where the measurement noise is a function of the distance between the robot and the target. If the true location of the target were known, the robot would always choose a control sequence that drives it closer to the target. Since, in practice, the true target location is unknown, we cannot determine such a control sequence exactly. A possible strategy in this case would be to plan with respect to the probability distribution of the target. However, the probability distribution itself will evolve as a function of the actual measurement values. This becomes even more challenging if the target is mobile. We use a Kalman Filter (KF) to estimate the state of a moving target with a possibly statedependent measurement model where the measurement noise is a function of the distance between the robot and the target.
When planning nonmyopically (i.e., for multiple steps in the future), one can enumerate all possible future measurements in the form of a tree. In particular, a minimax tree can be used to find the optimal (in the minmax sense) control policy for actively tracking a target [12]. The size of the tree grows exponentially with the time horizon. The tree can be pruned using pruning [13]. Our main contribution is to show how the properties of a KF can be exploited to prune a larger number of nodes without losing optimality. In doing so, we extend the pruning techniques first proposed by Vitus et al. [14] for linear systems. Using a tree, we generalize these results to a statedependent, timevariant dynamical systems. Our pruning techniques allow us to tradeoff the size of the tree (equivalently, computation time) with the performance guarantees of the algorithm. We demonstrate this effect in simulations.
The rest of the paper is organized as follows. We start with the related work in Section II. The problem formulation is presented in Section III. Our main algorithm is presented in Section IV. We validate the algorithm through simulations described in Section VI. Finally, we conclude with a brief discussion of future work in Section VII.
Ii Related Work
The target tracking problem has been studied under various settings. BarShalom et al. [1] present many of the commonlyused estimation techniques in target tracking. The fivepart survey by Li and Jilkov [2, 3, 4, 5, 6] covers commonlyused control and maneuvering techniques for active target tracking. In the rest of the section, we discuss works most closely related to our formulation.
Vitus et al. [14] presented an algorithm that computes the optimal scheduling of measurements for a linear dynamical system. Their formulation does not directly model a target tracking problem. Instead, the goal is to track a linear dynamical system using a set of sensors such that one sensor can be activated at any time instance. The posterior covariance in estimating a linear system in a Kalman filter depends on the prior covariance and sensor variance but not on the actual measurement values (unlike the case in nonlinear systems). Thus, one can build a search tree enumerating all possible sensor selections and choosing the one that minimizes the final covariance. The main contribution of Vitus et al. was to present a pruning technique to reduce the size of the tree while still preserving optimality.
Atanasov et al. [15] extended this result to active target tracking with a single robot. A major contribution was to show that robot trajectories that are nearby in space can be pruned away (under certain conditions), leading to further computational savings. This was based on a linear system assumption. In this paper, we build on these works and make progress towards generalizing the solution for statedependent observation systems.
A major bottleneck for planning for statedependent observation systems is that the future covariance is no longer independent of the actual measurement values, as was the case in linear stateindependent systems. The covariance update equations use the linear models and as such will depend on the state estimate and the measurements. Thus, the search tree will have to include all possible measurement values and not just all possible measurement locations. Furthermore, finding an optimal path is no longer sufficient. Instead one must find an optimal policy that prescribes the optimal set of actions for all possible measurements. We show how to use a tree to find such an optimal policy while at the same time leveraging the computational savings that hold for the linear case.
Iii Problem Formulation
We assume that the position of the robot is known accurately using onboard sensors (e.g., GPS, laser, IMU, cameras). The motion model of the robot is given by:
(1) 
where is the position of robot and is the control input at time . is a finite space of control inputs. We assume there are actions available as control inputs at any time:
The robot is mounted with a sensor that is capable of obtaining a measurement of the target. We assume that the target’s motion model is given by:
(2) 
where, is the 2dimensional position of the target and is Gaussian noise of known covariance.
The task of the robot is to track the target using its noisy measurements. The measurements, , can be a nonlinear function of the states of the target and robot:
(3) 
The measurement noise, , is a Gaussian whose variance depends on the distance between the robot and the target:
(4) 
where,
(5) 
When the true distance between the robot and target is within , we assume that measurement noise is proportional to the true distance. When the distance is greater than ), the variance is assumed to be a constant maximum value of .
The estimated position and the covariance matrix of the target at time , and , is given by the Kalman Filter. The uncertainty in the estimate of the target’s position is measured by the trace of the covariance matrix. The problem considered in this paper can be formally stated as follows.
Problem: Given an initial robot position and an initial target estimate , find a sequence of control laws for the robot, from time to to minimize trace of the covariance in the target’s estimate at time . That is,
The Riccati equation, , maps the current covariance matrix , under a new measurement to the covariance matrix at the next time step,
(7) 
where is the matrix of the measurement equation computed around at time . is the covariance of the measurement noise given in Equations (3)–(5).
The true position of the target is unknown making it impossible to determine exactly. Consequently, we use an estimate of using the estimated target’s position . Therefore, the optimal solution for the problem defined will be a closedloop policy that should map the estimated target’s position to the optimal control action for the robot.
Iv ClosedLoop Control Policy
References [14, 15] solve a similar problem but for a linear Gaussian system. The linearity assumption makes the Riccati equation independent of the position of the target. Consequently, they show an open loop policy can determine the optimal control sequence for the robot. In our case, the optimal control policy for this statedependent observation model case will be an adaptive (closedloop) control policy. However, this generalization comes at the expense of discretization of the set of possible target measurements. Specifically, we assume that the measurement at any time step is chosen from one of tuples of candidate measurements. That is,
These candidate measurements can be obtained by, for example, sampling from the continuous distribution of zero mean sensor noise around the current estimate of the target. For example, we can choose candidate measurements from the data within 3 standard deviation of the mean value, which contain of the possible measurements.
Iva Optimal decisions: The minimax algorithm
Minimizing the maximum trace can be thought of as playing a game against an adversary: The robot chooses the control actions to minimize the trace whereas the adversary (i.e., nature) chooses measurement noise to maximize the trace. By optimizing the trace, the robot determines the best conservative policy.
We can find this optimal strategy by building a minimax tree. This tree enumerates all possible control laws and all possible measurements that the robot can obtain. A node on the th level of the tree stores the position of the robot, , the estimated position of the target, , and the covariance matrix . Each node at an odd level has one branch per control action. Each node at an even level has one branch per candidate measurement. The robot’s state and the target’s estimate are updated appropriately along the control and measurement branches using the state transition equation (1) and the Kalman filter update equation, respectively. The minimax value is computed at the leaf nodes and is equal to the trace of the covariance matrix at that node. These values are propagated upwards to compute the optimal strategy. Figure 2 shows an example.
IvB Alpha Pruning
The number of nodes in a naive minmax tree is exponential in the depth of the tree (i.e., in the planning horizon). As a first step in reducing the size of the tree, we implement pruning [13]. The main idea in pruning is that if we have explored a part of the tree, we have an upper bound on the optimal minimax value. When exploring a new node, , if we find that the minimax value of the subtree rooted at is greater than the upper bound found, that subtree does not need to be explored further. This is because an optimal strategy will never prefer a strategy that passes through since there exists a better control policy in another part of the tree. Note that must be a control node. Measurement nodes cannot be pruned since the robot has no control over the actual measurement values. The pruning algorithm is based on the general alphabeta pruning [13] used in adversarial games. Fig. 2 shows a simple example of policy tree built using alpha pruning.
IvC Algebraic Redundancy Pruning
In addition to alpha pruning, we extend the ideas presented by [14] for linear systems and extend them to get even further computational savings. If there are multiple nodes at the same level with the same values but different target estimates, we can prune one of the nodes if it is clearly “dominated” by the others. The notion of domination encodes the property that that particular node will never be a part of an optimal (minmax) policy. Reference [14] formalized the notion of domination by defining an algebraic redundancy constraint. We adapt this result for our notation as follows:
Theorem 1 (Algebraic Redundancy [14])
Let be a set of nodes at the same level of the tree. If there exist nonnegative constants such that,
then the node is regarded as algebraically redundant^{1}^{1}1 represents that is positive semidefinite. with respect to and and all of its descendants can be pruned without eliminating the optimal solution from the tree.
They prove that the trace of any successor of cannot be lower than one of the successors of . Our main insight is that a similar redundancy constrained can be defined for the statedependent case with suitable additional constraints as described below.
Theorem 2 (Statedependent Algebraic Redundancy)
Let be a set of nodes at the same level. If there exists a node at the same level such that:

the robot states are identical, i.e., for all in ;

the least common ancestor of with any other node in is a control (i.e., min) node;

there exist nonnegative such that for any :
(8)
where, , then there exists a node in , say , such that:
That is, the node can be pruned from the minimax tree without eliminating the optimal policy.
The proof is presented in the appendix.
IvD Suboptimal Pruning algorithm
Combining alpha pruning with linear stateindependent algebraic redundancy pruning we can reduce a significant number of branches in the search tree while still guaranteeing optimality. We can further reduce the number of branches at the expense of losing optimality. This can be achieved by relaxing alphapruning and algebraic redundancy constraints. We use two parameters and as relaxation parameters for alpha pruning and algebraic redundancy pruning, respectively. In each case, we bound the loss in optimality as a function of the parameters.
Specifically, while building the tree, we prune away a node if it satisfies either of the following two conditions. When checking for alpha pruning, we prune a node if its alpha value is greater than or equal to the best minmax value found so far minus . Similarly, we replace the constraint in Theorem 2 with the following:
(9) 
By varying and , we can vary the number of nodes in the search tree. Next we bound the resulting loss in the optimality of the algorithm.
V Error analysis
In this section, we bound the value returned by the relaxed algorithm with respect to the optimal algorithm.
Theorem 3 ( Alpha Pruning)
Let be the optimal minimax value returned by the full enumeration tree. If is the value returned by the –alpha pruning algorithm, then
The proof follows directly from the fact that if a node on the optimal policy, say is pruned away, then the alpha value at is at most the alpha value of some other node, say , that is present in the tree minus . The alpha value of cannot be less than the value returned by the algorithm. The full proof is given in the appendix. The bound for algebraic redundancy pruning is more complicated.
Theorem 4 ( Statedependent Algebraic Redundancy)
Let be the optimal minimax value returned by the full enumeration tree of levels. If is the value returned by the –algebraic redundancy pruning algorithm, then
where,
where, and is the Kalman gain given by , and is the application of the Riccati equation , over measurement steps:
By combining the two results, we get
Vi Simulations
In this section, we present results from simulations to evaluate our pruning techniques. We carry out three types of evaluations. First, we investigate the savings of our algorithm by comparing the number of nodes in the pruned minimax tree and the full enumeration tree. Then, we study the effect of varying the and parameters on the number of nodes. Finally, we use the control policy given by our algorithm and execute it by drawing actual measurements from a random distribution. This represents a realistic scenario where the measurements are not necessarily adversarial. We demonstrate how our strategy can be used in such a case, and compare the average case performance with the worstcase performance.
In all simulations, the robot can choose from four actions:
where is a constant.
We build the tree using five candidate measurements at each step: . The five values are randomly generated with Gaussian noise.
Via Comparing the Number of Nodes
Fig. 3 and Fig. 4 shows the number of nodes left in the minimax tree after pruning as compared to a full enumeration tree. We prune a node by comparing it to the nodes already explored. More nodes will be pruned if initial nodes encountered are “closer” to the optimal policy. For instance, if the first set of nodes explored happen to be control laws that drive the robot close to the target, then we expect the nodes encountered later will be pruned closer to the root. To provide a fair assessment, we generate the search trees for various true positions of the target and report the average and standard deviation of the number of nodes in Fig. 4.
Fig. 4 shows that our algorithm prunes orders of magnitudes of nodes from the full enumeration tree. For a tree with depth 13, it takes to enumerate all nodes but the same optimal solution can be computed using nodes with our pruning strategy.
Even though our algorithm prunes a large percentage of the nodes, the number of nodes still grows exponentially. By sacrificing optimality, we can prune even more nodes. We evaluate this by varying and individually first, and then jointly. As shown in Fig. 5, –alpha pruning is relatively better at reducing the complexity of the minmax tree. This is intuitive because alpha pruning condition compares nearly every pair of nodes at the same depth. algebraic redundancy pruning, on the other hand, requires more conditions (same robot state) to be satisfied. Nevertheless, Fig. 5 shows that by sacrificing optimality, the number of nodes can be substantially reduced.
ViB Online Execution of the Search Tree
So far, we have discussed the problem of building the minimax tree. Once the tree is built, the robot can execute the optimal policy. At the root node, the robot executes the first control action along the optimal minmax path found. Next, the robot obtains a measurement. This measurement may not correspond to the worstcase measurement. Furthermore, the actual value of the measurement may not even be in the candidate measurements in the tree. The updated target estimate may not correspond to a node in the tree. Instead, we compute the distance between the actual measurement and find the closest match in the candidate set. The corresponding child node is now the new root node of the tree and the optimal policy starting at that node is executed, iteratively. Fig. 6 shows the execution in a simple instance.
Vii Conclusion
We investigated the problem of devising closedloop control policies for statedependent target tracking. Unlike stateindependent tracking, the value of a candidate control law in statedependent target tracking is a function of the history of measurements obtained. Consequently, planning over a horizon requires taking into account possible measurement values. In this paper, we focused on minimizing the worstcase uncertainty. Our solution consisted of building a search tree to obtain the control policy. A full enumeration tree has size exponential in the number of measurements, control laws, and the planning horizon. Instead, we exploited the structural properties of Kalman filter to yield a tree with significantly less number of possible nodes without sacrificing the optimality guarantees. We also showed how two parameters, and , can be used to yield even more computational savings at the expense of optimality.
One disadvantage of the generalization is the need to discretize the set of possible future measurements. Our immediate future work is to bound the suboptimality as a function of the number of discrete samples chosen to represent the continuous set of future measurements. Once a bound is obtained, the user may choose the correct tradeoff between the computation time and the solution quality desired. A second avenue of future work focuses on extending these results to multirobot, multitarget scenarios. Our prior work [17] has shown a greedy assignment of robot trajectories to targets yield provably approximate solutions for onestep planning. We will extend this to planning over a finite horizon using the results presented in this paper.
References
 [1] Y. BarShalom, X. R. Li, and T. Kirubarajan, Estimation with Applications to Tracking and Navigation: Theory, Algorithms, and Software. John Wiley & Sons, 2004.
 [2] X. R. Li and V. P. Jilkov, “Survey of maneuvering target tracking. part i. dynamic models,” IEEE Transactions on aerospace and electronic systems, vol. 39, no. 4, pp. 1333–1364, 2003.
 [3] ——, “Survey of maneuvering target tracking. part ii: motion models of ballistic and space targets,” IEEE Transactions on Aerospace and Electronic Systems, vol. 46, no. 1, pp. 96–119, 2010.
 [4] ——, “A Survey of Maneuvering Target TrackingâPart III: Measurement Models,” in Proceedings of SPIE, vol. 4473, 2001, p. 424.
 [5] ——, “A Survey of Maneuvering Target TrackingâPart IV: DecisionBased Methods,” in Proceedings of SPIE, vol. 4728, 2002, p. 512.
 [6] ——, “Survey of maneuvering target tracking. part v. multiplemodel methods,” IEEE Transactions on Aerospace and Electronic Systems, vol. 41, no. 4, pp. 1255–1321, 2005.
 [7] B. Rao, H. F. DurrantWhyte, and J. Sheen, “A fully decentralized multisensor system for tracking and surveillance,” The International Journal of Robotics Research, vol. 12, no. 1, pp. 20–44, 1993.
 [8] N. Karnad and V. Isler, “Modeling human motion patterns for multirobot planning,” in Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2012, pp. 3161–3166.
 [9] M. Montemerlo, J. Pineau, N. Roy, S. Thrun, and V. Verma, “Experiences with a mobile robotic guide for the elderly,” in Proceedings of the AAAI National Conference on Artificial Intelligence,, 2002, pp. 587–592.
 [10] V. Isler, N. Noori, P. Plonski, A. Renzaglia, P. Tokekar, and J. Vander Hook, “Finding and tracking targets in the wild: Algorithms and field deployments,” in International Symposium on Safety, Security, and Rescue Robotics, 2015.
 [11] P. Tokekar, E. Branson, J. Vander Hook, and V. Isler, “Tracking aquatic invaders: Autonomous robots for invasive fish,” IEEE Robotics and Automation Magazine, 2013.
 [12] P. Tokekar, J. Vander Hook, and V. Isler, “Active target localization for bearing based robotic telemetry,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2011.
 [13] S. Russell and P. Norvig, Artificial Intelligence: A modern approach. PrenticeHall, 1995.
 [14] M. P. Vitus, W. Zhang, A. Abate, J. Hu, and C. J. Tomlin, “On efficient sensor scheduling for linear dynamical systems,” Automatica, vol. 48, no. 10, pp. 2482–2493, October 2012.
 [15] N. Atanasov, J. Le Ny, K. Daniilidis, and G. J. Pappas, “Information acquisition with sensing robots: Algorithms and error bounds,” in Proceedings of IEEE International Conference on Robotics and Automation, 2014, pp. 6447–6454.
 [16] P. R. Kumar and P. Varaiya, Stochastic systems: Estimation, identification, and adaptive control. Prentice Hall, NJ, 1986, vol. 986.
 [17] P. Tokekar, V. Isler, and A. Franchi, “Multitarget visual tracking with aerial robots,” in Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 2014.
Appendix A Proof of Theorem 5
Theorem 5
(Monotonicity of state dependent Riccati equation )
If two nodes of a linear stochastic system satisfy the condition: and , then after apply one step Riccati map, we have:
where,
(10) 
(11) 
We define:
Note thatï¼ , and
Then,
(12) 
Define
(13) 
So, we have,
(14) 
Proof of Theorem 2
{proof}We first prove a special case when consists of only one node, . That is, we have:
where, .
To prove:
1) Show that the statement holds for .
When , . From the Kalman Riccati map,
(15) 
2) Inductive step: Show that if the claims holds for , then it also holds for . This can be done as follows: Assume the claim holds for . Let , based on the condition of we have,
that is,
Similar to the step (1):
Thereby showing that indeed holds.
Since both the base case and the inductive step have been performed, by mathematical induction, the statement holds for all natural numbers .
Then, we extend the proof from comparing two nodes to arbitrary nodes case, if we have
(16) 
Without loss of generality, we assume is the minimum covariance matrix (in the positive semidefinite sense) among .
(17) 
Using the result from the induction above, after minmax tree steps, there always exist a node , such that,
(18) 
Therefore, node A can be pruned without reducing the optimality of the minmax tree.
Proof of Theorem 4
{proof}For some level , suppose that we prune a node on the optimal policy. We have,
From [14], we know that and :
Applying to the above equation we get,
Let be the series of covariance matrices along the optimal minmax trajectory. Suppose that the sequence of covariance matrices along the optimal trajectory returned by –algebraic redundancy pruning algorithm is . We get,
By combining the two results, we obtain the desired bound: