Mutual InformationBased Planning for Informative Windowed Forecasting of ContinuousTime Linear Systems
Abstract
This paper presents expression of mutual information that defines the information gain in planning of sensing resources, when the goal is to reduce the forecast uncertainty of some quantities of interest and the system dynamics is described as a continuoustime linear system. The method extends the smoother approach in [5] to handle more general notion of verification entity  continuous sequence of variables over some finite time window in the future. The expression of mutual information for this windowed forecasting case is derived and quantified, taking advantage of underlying conditional independence structure and utilizing the fixedinterval smoothing formula with correlated noises. Two numerical examples on (a) simplified weather forecasting with moving verification paths, and (b) sensor network scheduling for tracking of multiple moving targets are considered for validation of the proposed approach.
kaist]HanLim Choi
291 Daehakro, Yuseong, Deajeon 305701, Republic of Korea.
Key words: Mutual information, Windowed forecasting, Informative planning, Continuoustime system
1 Introduction
Planning on utilization of sensing resources to gather information out of environment has been spotlighted in many contexts, the objective of this planning often being uncertainty reduction of some entities of interest – termed verification entities herein. Mutual information has been one of the most popular metrics adopted to define/represent this objective for various context: tracking of kinematic variables of moving targets by measurement along mobile sensor trajectories [11, 12], weather forecast improvement over some region of interest in the future with UAV sensor networks [7, 6, 5], prediction accuracy in spatially distributed field described by Gaussian processes [14], informative management of deployed fixed sensor networks [23, 8], adaptive landmark selection in simultaneous localization and mapping of mobile robots [15], and Bayesian belief propagation over the gridbased search space [13].
While many of these mutual informationbased planning studies have dealt with the case where the verification time is same or just one timestep further of the planning horizon, there is a class of problem termed informative forecasting that takes particular care for the case where the verification time is significantly greater than the mission horizon. Although less popular in the literature, the informative forecasting problem can handle applications such as (i) adaptive sampling in the context of numerical weather prediction considers design of sensor networks deployed in the near future (e.g., in 24 hours) while the goal is to improve forecast in the far future (e.g., 35 days later), and (ii) prediction of indoor contaminant distribution in some future time with wireless indoor sensor networks taken over short period of time. The present author has presented methods to efficiently but correctly quantify the mutual information in this context of informative forecasting for discrete selection case [7], discrete constrained path design [6], and continuous trajectory planning[5], taking advantage of underlying properties of mutual information.
This paper extends the approach in [5] in that a more general notion of verification quantities is introduced. For some applications, it may make more sense to reduce uncertainty in the entities of interest over some finite window of time instead of a single particular time instance (for example, weather forecast over the weekend). The smoother form in [5] cannot directly be used for this windowed forecasting case, because the mutual information between two continuous random processes (as opposed to one finitedimensional random vector and one random process) needs to be calculated. This paper presents a formula for the mutual information for this windowed forecasting that is indeed quite similar to the form in [5], while the only difference is in the process of calculating the conditional initial covariance conditioned on the verification entity. An optimalcontrol based method for fixedinterval Kalman smoothing with correlated noise in[18] is adopted for this calculation. Two numerical examples are presented to validate the proposed method for the cases with (i) timevarying verification entity, and (ii) highorder differentiable verification entity. While a preliminary work [4] proposed a relevant concept of the windowed forecasting, this article presents elaborated and corrected theoretical results for a more general problem setting and also provides much more sophisticated numerical case studies.
2 Problem Description
2.1 ContinuousTime Linear System Model
Consider the dynamics of objects/environment of interest with a finite dimensional state vector that is described by the following linear (timevarying) system:
(1) 
where is a zeromean Gaussian process noise with , which is independent of . The prime sign () denotes the transpose of a matrix. The initial condition of the state, is normally distributed as .
The system (2.1) is observed by sensors with additive Gaussian noise and admits the following measurement model for :
(2) 
where is zeromean Gaussian with , which is independent of and . Also, a measurement history over the time window is defined as
(3) 
Definition 1.
The verification variables are a possibly timevarying linear combination of the state variables whose uncertainty reduction is of interest:
(4) 
with termed as verification matrix, which is assumed to be differentiable herein. A continuous sequence of (timevarying) verification variables, termed verification path, is also defined as:
(5) 
2.2 Informative Pointwise Forecasting
One case of interest is when the verification entity is the verification variables at some fixed verification time, . In this case, the informative forecasting problem can be written as the following optimization:
(IPF) 
with some , where denoted the mutual information between two random quantities and (e.g. random variables, random processes, random functions), which represents entropy reduction of by knowledge of (or equivalently, entropy reduction of by [9]. Thus, (IPF) finds the (continuous) measurement sequence over that is expected to result in largest reduction of entropy in .
Exploiting conditional independence, we proposed an expression for the mutual information, , as the difference between the unconditioned and the conditioned mutual information for a filtering problem [5]:
(6) 
With (6), the smoother form of the mutual information for forecasting is derived as
(7) 
with and , where stands for of a positive definite matrix. The matrices , , and are determined by the following matrix differential equations:
(8)  
(9)  
(10) 
with initial conditions , and . The conditional initial covariance can be calculated in advance by a fixedpoint smoothing process, or simply by
where is the state transition matrix from to , which becomes for the timeinvariance case.
In [5], we demonstrated that the smoother form is preferred to the filter form, which explicitly calculates the prior and the posterior entropies of by integrating the Lypunov and the Riccati equation over , in terms of the computational efficiency and accessibility to onthefly knowledge of information accumulation.
2.3 Informative Windowed Forecasting
This paper newly considers a more general version of the informative forecasting problem where the entity of interest is the verification path over some time window . This generalized problem can be written as:
(IWF) 
with . This formulation allows for handling more diverse types of sensing missions such as to weather forecast over the coming weekend, better predicting behaviors of some targets of interest between 9am and noon, and so on. Note also that because the verification matrix is allowed to be timevarying, forecasting along a given path can be dealt with (e.g., weather forecasting along my itinerary in the weekend).
However, the generalization (IWF) gives rise to challenges in quantifying the mutual information in the objective function. For discretetime representation, in which and can be represented by finite dimensional random vectors, the similar generalization as in (IWF) would not incur any additional difficulty in computation of mutual information other than computational cost due to the increased dimension of the verification entity. Thus, quantification and optimization methods developed for the discretetime counterpart of (IPF) can trivially extended for the discretetime counterpart of (IWF).
In contrast, for continuoustime representation considered herein, the objective term in (IWF) is mutual information between two continuoustime random processes, while that in (IPF) is mutual information between a finitedimensional random vector and a continuoustime random process. If the mutual information is computed as a difference between the prior and the posterior entropy of one of the two random processes, a mechanism to calculate an entropy of a continuous random process is needed. Although there have been researches on the calculation of entropy of a continuous random process [19, 10], these approaches were to statistically estimate the entropy value from experimentallyobtained time series, and thus are not suitable for quantifying the information by a future measurement that is not taken yet at the decision time.
3 Main Results
3.1 Mutual Information for Windowed Forecasting
As the present author presented in [5], the mutual information in (IPF), , can be expressed in terms of unconditioned and conditional entropies of finitedimensional random vectors, exploiting the popular conditional independence between the future and the past given the present. For linear Gaussian cases, these entropy terms are represented by functions of covariance matrices. Inspired by this observation, this paper provides an expression for the mutual information in (IWF) as a function of covariance matrices for some finitedimensional random vectors as follows.
Proposition 1.
If unconditioned and conditioned covariances of the initial state vector, and , are available. The mutual information can be obtained as:
(11) 
with and . The matrices , , and are determined by integrating the following matrix differential equations from time 0 to :
(12)  
(13)  
(14) 
Proof.
The similar conditional independence exploited to derive the smoother form for (IPF) can also be used for (IWF), because the verification variables of a future time window is conditionally independent of the (past) measurement sequence , conditioned on the (current) state variables . Thus, we have:
which is derived using the fact that due to the conditional independence. Notice that the first term in the lefthand side is identical to that in the expression for (IPF). The second term represents the difference between two conditional entropies, and . Since the conditional distribution of a Gaussian vector conditioned on some Gaussian random process is still Gaussian, these two entropy expressions can be represented by of the corresponding covariance matrices:
where and . Note that the smoother form for (IPF) utilized the (symmetric) twofilter approach to fixedinterval smoothing in [21] to express the conditional covariance in terms of , , and . The key insight [21] has identified is: the information matrix for the fixed interval smoothing, i.e., , consists of the information from the past measurement, i.e., , and the information from the future (fictitious) measurement, i.e., in this context, minus the doublecounted information from the underlying dynamics, i.e., , thus yielding Notice that this key insight in fixedinterval smoothing still holds even in case the future measurement (of ) is taken over a finite window . Thus, we have
and thus the only term that did not appear in (IPF) is (or equivalently, ). This quantity this conditional covariance (or inverse covariance) can be obtained by integrating a Lyapunovlike equation in the same form as (9):
if the respective initial condition is available as assumed in this proposition. ∎
Notice that the only difference from the IPF case is in the conditional covariance on the future verification entity; the modification from IPF amounts to calculation of the initial conditional covariance. Thus, once is computed, the generalization in IWF does not incur additional complexity. However, calculation of this conditional covariance requires a sophisticated procedure, which is detailed in section 3.2, in order to deal with noisefreeness of (fictitious) measurement of taken over finite time interval.
Corollary 1.
In case the verification variables are the whole state variables, i.e., , the mutual information for (IWF) is reduced to that for (IPF):
where . This can be shown as follows. The state history over can be decomposed as . Notice that for any , it is conditionally independent of conditioned on ; thus, . Together with the chain rule of mutual information [9], this yields
3.2 Calculation of Initial Conditioned Covariance
The key difference between the mutual information for the windowed forecasting is that the initial conditional covariance needs to be calculated conditioned on a random process rather than a finitedimensional random vector. A typical way of calculating conditioned initial covariance is to pose a fixedpoint smoothing problem with the fixedpoint of interest being the initial time. But, note that for the windowed forecasting case, (1) plays a role of the measurement equation and thus there is no sensing noise in the measurement process. This lack of sensing noise prevents direct implementation of a conventional Kalman smoothing method (such as state augmentation) for computation of .
In this work, is computed by the following threestep procedure:

Calculate that is prior state covariance at by integrating the Lyapunov equation:
(15) over with given initial condition .

Calculate ; this step is not trivial and section 3.2.1 presents detailed derivation and procedure.
3.2.1 Calculation of
Note that with perfect measurement of over some time interval, its time derivative can also be obtained. For example, gives ; sensing of over is equivalent to sensing of and over . If is again differentiable, i.e., , higherorder differentiation can also be reconstructed by repeatedly taking derivatives until the derivative contains white noise component. This observation was first identified by [2] in the context of linear filtering with colored measurement noise and extended to a linear smoothing case [18].
Suppose that is times differentiable, i.e., th derivative of contains white noise while lowerorder derivatives do not contain white noise. Then,
where and
Note in this case that sensing of is equivalent to sensing of .
Assumption 1.
The verification variables are assumed to satisfy:

is either zero or full rank. Specifically, the rank is zero for and is for .

is timeinvariant over .
The first assumption can be easily relaxed by decomposing the verification variables into multiple sets depending on the order of differentiability. The second assumption readily holds for timeinvariant . For timevarying , it does not hold in general; but, in this case the verification window can be decomposed into a sequence of multiple subwindows.
Proposition 2.
The conditional covariance is given by:
(16) 
where and are obtained by a system of matrix differential equations:
(17)  
(18)  
where
and the boundary conditions are given as:
where .
Proof.
See Appendix A ∎
Remark 1.
The system of matrix differential equations in (17) and (18) is a twopoint boundary value problem for which boundary conditions for are given at the initial time, , and those for are given at the final time, . However, since the Riccati equation for is decoupled from the equation in (18), the system of equations can be solved in two steps:

The Riccati equation in (17) is integrated forward over to obtain ,
3.3 Onthefly Information and Mutual Information Rate
One benefit of exploiting conditional independence in computing mutual information is that it facilitates access to onthefly knowledge of how much information has been gathered and in what rate information is being gathered. Extending the development in [5] for IPF, onthefly information quantities can be obtained for IWF. For arbitrary time , the mutual information accumulated up to is computed by:
Also, the rate of change of this onthefly information is obtained as
(19) 
where . The derivation is straightforward from the proof of mutual information rate for informative pointwise forecasting given in [5, Proposition 4]. The mutual information rate in (19) is particularly useful in order to visualize information distribution over space, because for many cases the observation matrix is a function of the sensor location and thus the mutual information rate represents how high rate information can be obtained if sensing a certain location.
4 Numerical Examples
4.1 Idealized Weather Forecasting Along Moving Paths
The first example addresses design of continuous sensing trajectory for weather forecast improvement, when the simplified weather dynamic is described as the twodimensional Lorenz2003 model [17, 3]. The system equations of are
where for . The subscript denotes the westtoeastern grid index, while denotes the southtonorth grid index; represents some scalar meteorological variable at th grid. The boundary conditions of and , in advection terms, are applied to model the midlatitude area of the northern hemisphere as an annulus. The parameter values are and . The size of grid corresponds to 347 km 347 km in real distance, and unit time in this model is equivalent to 5 days in real time. The overall system is tracked by a nonlinear estimation scheme, specifically an ensemble squareroot filter (EnSRF) [22] data assimilation scheme, that incorporates measurements from a fixed observation network of size 186.
The path planning problem is posed for a linearized model over some local region (therefore, ) in the entire grid space. A linear timeinvariant model is obtained by deriving the Jacobian matrix of the dynamics around the nonlinear estimate for ’s at the grid points in the local region. Thus, the state vector represents the perturbation of ’s from the ensemble mean:
where represents the location vector of th grid point, and denotes the perturbation of the Lorenz variable. The prior covariance of the state, is provided EnSRF data assimilation scheme.
The continuous trajectory planning allows for sensing at an offgird point. To represent measurement at an offgrid point, squaredexponential kernel function often used in Kriging[krig] and Gaussian process regression is adopted; observation variable at some location is expressed as a linear combination of the state:
(20) 
with , where and is the th element of the matrix . The lengthscale of is used in this example.
The goal of planning is to design a 6hr flight path ( 6 hrs) for a single UAV sensor platform to improve forecast over two moving verification object of interest between 60hrs and 84hrs; the two objects are to move one grid (km) over this verification window. The motion of the sensor platform is described as 2dimensional holonomic motion:
(21) 
with constant speed grid/hr (km/hr); the path angle is the control variable to optimize on; along the trajectory the sensor continuously measure in (20) with additive Gaussian noise with .
The verification path is assumed to be a straight line starting at some point and ending at another point, each of which corresponds to some linear combination of the state variables.
where and are the verification matrices for and , respectively. This allows for computing the time derivative of verification matrix:
which will be needed to construct matrix in section 3.2.
Four different planning strategies are compared: (a) optimal IWF trajectory, (b) gradientascent steering with IWF information potential field, (c) optimal trajectory for IPF problem with and , and (d) gradientascent steering associated with IPF mutual information. To obtain the optimal solution, the control is parameterized as a piecewise linear function of time with 6 segments and the 36 straightline paths with different flight path angles are considered as initial guess to the optimization; for solving resulting nonlinear programs, TOMLAB/SNOPT [20] is used. For the twodimensional holonomic sensor motion in (21), the gradientascent steering law is obtained by taking the partial derivative of the information rate (in 19 for IWF), which is a function of spatiallydependent matrix, with respect to the spatial coordinates:
where is an dimensional column vector with (and is defined similarly.)
Fig. 2 illustrates snapshots of trajectories of the four planning strategies overlaid on the information potential field associated with the optimal IWF solution. Two verification paths are depicted with red triangles while the shape of triangle is aligned with the direction of movement of the verification paths. In the potential field, a dark part is more informative than brighter one; note that his potential field differs from that of the IWF gradient ascent solution except for the initial time, as in (19) depends on the sensing choice up to . Although not depicted, in this particular example, the IPF potential field looks substantially different from the IWF field in that the lower information peak around does not exist for the IPF case. That may be the reason the IPFbased trajectories move left up while the IWFbased trajectories start moving right down. Fig. 2 illustrates information accumulation along the four trajectories. Notice that the optimal IWF solution might not provide the most information earlier in the trajectory, but results in gathering most amount of information in the end.
4.2 Sensor Network Management for Target Tracking
The second example considers management of fixed sensor networks deployed in a twodimensional space for tracking moving targets. There are targets and each target’s motion is represented by the Singer model [16], i.e., kinematic model with firstorder Markov diffusion in acceleration:
(22) 
where the superscript denotes th target and is used in this work. The state vector is defined as and its initial covariance is computed from an extended Kalman filter run with randomly chosen measurements.
The sensor network consists of sensors that can measure the pseudorange between themselves and the targets. The linearized pseudorange measurement sensing model is given by with
where is the location vector of th sensor, is the nominal position of the th target; this linearized model is obtained by calculating the Jacobian of the pseudorange, around ; is used in this example.
The sensor management problem is to determine which sensors (out of ) to turn on to best track the targets; the total planning horizon is dividend into intervals within which the same set of sensors are turned on to make continuous observation of the targets. To goal of this management is to reduce uncertainty in velocities of the targets over the verification window. Problem parameters in this example are as follows: , , , , , , , and . The verification matrix in this case is given by where denotes the Kronecker product.
With the system described in (22), the verification variables are twice differentiable, in other words, perfect measurement of the velocity over some finite window is equivalent to perfect sensing of velocity and acceleration at and noisy sensing of the jerk. For this reason, the initial value of the Riccati matrix for the smoothing, i.e., has nonzero elements in columns and rows corresponding only to the position variables.
Remark 2.
Note that the position dynamics does not contain white noise, so does the velocity dynamics. Therefore, by taking perfect measurement of position over , the velocity and the acceleration can also be determined perfectly; in other words, if with perfect position information over some finite time window, the entire state can be reconstructed. Therefore, although the problem is formulated as an informative windowed forecasting, this is equivalent to the case in Corollary 1: .
Fig. 3 illustrates the optimal scheduling of sensor networks overlaid with the information potential calculated at the start time of the respective time period. It can be found that the optimal selection for each period does not necessarily selects the point with the highes potential point as the selection is made for the best combination of two sensors that is most informative for predicting the future. Table 1 summarizes the mutual information value acquired during each decision period; the same quantities for the cases when target positions or accelerations are the verification entity are also shown for comparison. In this particular example, the selected sensor sequences are not significantly different for different choices of verification entity (in particular for the first two decision periods).
Case  0 to 1  1 to 2  2 to 3  3 to 4 

Pos.  6.9507  6.1875  6.0211  5.9962 
Vel.  2.6486  2.7414  2.6148  2.4116 
Acc.  0.0384  0.0531  0.0527  0.0335 
Observe that the mutual information value is greatest for the position case and smallest for the acceleration case. This can be well explained on the basis of commutativity of mutual information: . Namely, informativeness of measurement to predict verification entity is equivalent to informativeness of the verification entity to predict the measurement. With the kinematic model in this example, knowing future position over finite window means knowing every kinematic variables (likewise, knowing velocity means knowing acceleration as well); thus, position is more informative than velocity, and acceleration is nearly noninformative as it is corrupted by white noise.
5 Conclusions
This paper has presented a formula of mutual information in the informative forecasting problem that is to reduce uncertainty in the future verification variables over finite time window. While underlying conditional independence relation has allowed for exploiting the smoother form of mutual information, fixedinterval smoothing with perfect measurement is utilized to compute the conditional initial covariance matrix that provides initial conditions for required matrix differential equations. Numerical examples highlighting temporal variation and differentiability of the verification entity have demonstrated the validity of the proposed approach.
Acknowledgements
This work was supported in part by AFOSR grant (FA95501210313), and in part by the KI Project via KI for Design of Complex Systems.
A Proof of Proposition 2
This section proves Proposition 2 by solving an optimal control problem equivalent to the fixedinterval smoothing problem to compute . While the key concept in the procedure is adopted from [18, 1], extended expression for highorder differentiable is newly detailed herein.
As [18, 1] pointed out, a fixedinterval smoothing problem can be viewed as an optimal control problem to find out optimal (continuous) sequence of process noise over the interval and the optimal initial state under the constraint of measurements obtained over the interval. Therefore, the smoothing problem in section 3.2 can equivalently be written as an optimal control problem with associated adjoint variables: