Mutual Information-Based Planning for Informative Windowed Forecasting of Continuous-Time Linear Systems

Mutual Information-Based Planning for Informative Windowed Forecasting of Continuous-Time Linear Systems


This paper presents expression of mutual information that defines the information gain in planning of sensing resources, when the goal is to reduce the forecast uncertainty of some quantities of interest and the system dynamics is described as a continuous-time linear system. The method extends the smoother approach in [5] to handle more general notion of verification entity - continuous sequence of variables over some finite time window in the future. The expression of mutual information for this windowed forecasting case is derived and quantified, taking advantage of underlying conditional independence structure and utilizing the fixed-interval smoothing formula with correlated noises. Two numerical examples on (a) simplified weather forecasting with moving verification paths, and (b) sensor network scheduling for tracking of multiple moving targets are considered for validation of the proposed approach.

kaist]Han-Lim Choi

291 Daehak-ro, Yuseong, Deajeon 305-701, Republic of Korea. 

Key words:  Mutual information, Windowed forecasting, Informative planning, Continuous-time system


11footnotetext: Corresponding author; Tel: +82-42-350-3727; Fax: +82-42-350-3710; E-mail:

1 Introduction

Planning on utilization of sensing resources to gather information out of environment has been spotlighted in many contexts, the objective of this planning often being uncertainty reduction of some entities of interest – termed verification entities herein. Mutual information has been one of the most popular metrics adopted to define/represent this objective for various context: tracking of kinematic variables of moving targets by measurement along mobile sensor trajectories [11, 12], weather forecast improvement over some region of interest in the future with UAV sensor networks [7, 6, 5], prediction accuracy in spatially distributed field described by Gaussian processes [14], informative management of deployed fixed sensor networks [23, 8], adaptive landmark selection in simultaneous localization and mapping of mobile robots [15], and Bayesian belief propagation over the grid-based search space [13].

While many of these mutual information-based planning studies have dealt with the case where the verification time is same or just one time-step further of the planning horizon, there is a class of problem termed informative forecasting that takes particular care for the case where the verification time is significantly greater than the mission horizon. Although less popular in the literature, the informative forecasting problem can handle applications such as (i) adaptive sampling in the context of numerical weather prediction considers design of sensor networks deployed in the near future (e.g., in 24 hours) while the goal is to improve forecast in the far future (e.g., 3-5 days later), and (ii) prediction of indoor contaminant distribution in some future time with wireless indoor sensor networks taken over short period of time. The present author has presented methods to efficiently but correctly quantify the mutual information in this context of informative forecasting for discrete selection case [7], discrete constrained path design [6], and continuous trajectory planning[5], taking advantage of underlying properties of mutual information.

This paper extends the approach in [5] in that a more general notion of verification quantities is introduced. For some applications, it may make more sense to reduce uncertainty in the entities of interest over some finite window of time instead of a single particular time instance (for example, weather forecast over the weekend). The smoother form in [5] cannot directly be used for this windowed forecasting case, because the mutual information between two continuous random processes (as opposed to one finite-dimensional random vector and one random process) needs to be calculated. This paper presents a formula for the mutual information for this windowed forecasting that is indeed quite similar to the form in [5], while the only difference is in the process of calculating the conditional initial covariance conditioned on the verification entity. An optimal-control based method for fixed-interval Kalman smoothing with correlated noise in[18] is adopted for this calculation. Two numerical examples are presented to validate the proposed method for the cases with (i) time-varying verification entity, and (ii) high-order differentiable verification entity. While a preliminary work [4] proposed a relevant concept of the windowed forecasting, this article presents elaborated and corrected theoretical results for a more general problem setting and also provides much more sophisticated numerical case studies.

2 Problem Description

2.1 Continuous-Time Linear System Model

Consider the dynamics of objects/environment of interest with a finite dimensional state vector that is described by the following linear (time-varying) system:


where is a zero-mean Gaussian process noise with , which is independent of . The prime sign () denotes the transpose of a matrix. The initial condition of the state, is normally distributed as .

The system (2.1) is observed by sensors with additive Gaussian noise and admits the following measurement model for :


where is zero-mean Gaussian with , which is independent of and . Also, a measurement history over the time window is defined as

Definition 1.

The verification variables are a possibly time-varying linear combination of the state variables whose uncertainty reduction is of interest:


with termed as verification matrix, which is assumed to be differentiable herein. A continuous sequence of (time-varying) verification variables, termed verification path, is also defined as:


2.2 Informative Pointwise Forecasting

One case of interest is when the verification entity is the verification variables at some fixed verification time, . In this case, the informative forecasting problem can be written as the following optimization:


with some , where denoted the mutual information between two random quantities and (e.g. random variables, random processes, random functions), which represents entropy reduction of by knowledge of (or equivalently, entropy reduction of by  [9]. Thus, (IPF) finds the (continuous) measurement sequence over that is expected to result in largest reduction of entropy in .

Exploiting conditional independence, we proposed an expression for the mutual information, , as the difference between the unconditioned and the conditioned mutual information for a filtering problem [5]:


With (6), the smoother form of the mutual information for forecasting is derived as


with and , where stands for of a positive definite matrix. The matrices , , and are determined by the following matrix differential equations:


with initial conditions , and . The conditional initial covariance can be calculated in advance by a fixed-point smoothing process, or simply by

where is the state transition matrix from to , which becomes for the time-invariance case.

In [5], we demonstrated that the smoother form is preferred to the filter form, which explicitly calculates the prior and the posterior entropies of by integrating the Lypunov and the Riccati equation over , in terms of the computational efficiency and accessibility to on-the-fly knowledge of information accumulation.

2.3 Informative Windowed Forecasting

This paper newly considers a more general version of the informative forecasting problem where the entity of interest is the verification path over some time window . This generalized problem can be written as:


with . This formulation allows for handling more diverse types of sensing missions such as to weather forecast over the coming weekend, better predicting behaviors of some targets of interest between 9am and noon, and so on. Note also that because the verification matrix is allowed to be time-varying, forecasting along a given path can be dealt with (e.g., weather forecasting along my itinerary in the weekend).

However, the generalization (IWF) gives rise to challenges in quantifying the mutual information in the objective function. For discrete-time representation, in which and can be represented by finite dimensional random vectors, the similar generalization as in (IWF) would not incur any additional difficulty in computation of mutual information other than computational cost due to the increased dimension of the verification entity. Thus, quantification and optimization methods developed for the discrete-time counterpart of (IPF) can trivially extended for the discrete-time counterpart of (IWF).

In contrast, for continuous-time representation considered herein, the objective term in (IWF) is mutual information between two continuous-time random processes, while that in (IPF) is mutual information between a finite-dimensional random vector and a continuous-time random process. If the mutual information is computed as a difference between the prior and the posterior entropy of one of the two random processes, a mechanism to calculate an entropy of a continuous random process is needed. Although there have been researches on the calculation of entropy of a continuous random process [19, 10], these approaches were to statistically estimate the entropy value from experimentally-obtained time series, and thus are not suitable for quantifying the information by a future measurement that is not taken yet at the decision time.

3 Main Results

3.1 Mutual Information for Windowed Forecasting

As the present author presented in [5], the mutual information in (IPF), , can be expressed in terms of unconditioned and conditional entropies of finite-dimensional random vectors, exploiting the popular conditional independence between the future and the past given the present. For linear Gaussian cases, these entropy terms are represented by functions of covariance matrices. Inspired by this observation, this paper provides an expression for the mutual information in (IWF) as a function of covariance matrices for some finite-dimensional random vectors as follows.

Proposition 1.

If unconditioned and conditioned covariances of the initial state vector, and , are available. The mutual information can be obtained as:


with and . The matrices , , and are determined by integrating the following matrix differential equations from time 0 to :


The similar conditional independence exploited to derive the smoother form for (IPF) can also be used for (IWF), because the verification variables of a future time window is conditionally independent of the (past) measurement sequence , conditioned on the (current) state variables . Thus, we have:

which is derived using the fact that due to the conditional independence. Notice that the first term in the left-hand side is identical to that in the expression for (IPF). The second term represents the difference between two conditional entropies, and . Since the conditional distribution of a Gaussian vector conditioned on some Gaussian random process is still Gaussian, these two entropy expressions can be represented by of the corresponding covariance matrices:

where and . Note that the smoother form for (IPF) utilized the (symmetric) two-filter approach to fixed-interval smoothing in [21] to express the conditional covariance in terms of , , and . The key insight [21] has identified is: the information matrix for the fixed interval smoothing, i.e., , consists of the information from the past measurement, i.e., , and the information from the future (fictitious) measurement, i.e., in this context, minus the double-counted information from the underlying dynamics, i.e., , thus yielding Notice that this key insight in fixed-interval smoothing still holds even in case the future measurement (of ) is taken over a finite window . Thus, we have

and thus the only term that did not appear in (IPF) is (or equivalently, ). This quantity this conditional covariance (or inverse covariance) can be obtained by integrating a Lyapunov-like equation in the same form as (9):

if the respective initial condition is available as assumed in this proposition. ∎

Notice that the only difference from the IPF case is in the conditional covariance on the future verification entity; the modification from IPF amounts to calculation of the initial conditional covariance. Thus, once is computed, the generalization in IWF does not incur additional complexity. However, calculation of this conditional covariance requires a sophisticated procedure, which is detailed in section 3.2, in order to deal with noise-freeness of (fictitious) measurement of taken over finite time interval.

Corollary 1.

In case the verification variables are the whole state variables, i.e., , the mutual information for (IWF) is reduced to that for (IPF):

where . This can be shown as follows. The state history over can be decomposed as . Notice that for any , it is conditionally independent of conditioned on ; thus, . Together with the chain rule of mutual information [9], this yields

3.2 Calculation of Initial Conditioned Covariance

The key difference between the mutual information for the windowed forecasting is that the initial conditional covariance needs to be calculated conditioned on a random process rather than a finite-dimensional random vector. A typical way of calculating conditioned initial covariance is to pose a fixed-point smoothing problem with the fixed-point of interest being the initial time. But, note that for the windowed forecasting case, (1) plays a role of the measurement equation and thus there is no sensing noise in the measurement process. This lack of sensing noise prevents direct implementation of a conventional Kalman smoothing method (such as state augmentation) for computation of .

In this work, is computed by the following three-step procedure:

  1. Calculate that is prior state covariance at by integrating the Lyapunov equation:


    over with given initial condition .

  2. Calculate ; this step is not trivial and section 3.2.1 presents detailed derivation and procedure.

  3. Compute from by backward integrating the Lyapunov-like equation:

    from to , coupled with (15\@footnotemark\@footnotetextThe information form in (13) with (12) can equivalently be used..

3.2.1 Calculation of

Note that with perfect measurement of over some time interval, its time derivative can also be obtained. For example, gives ; sensing of over is equivalent to sensing of and over . If is again differentiable, i.e., , higher-order differentiation can also be reconstructed by repeatedly taking derivatives until the derivative contains white noise component. This observation was first identified by [2] in the context of linear filtering with colored measurement noise and extended to a linear smoothing case [18].

Suppose that is -times differentiable, i.e., -th derivative of contains white noise while lower-order derivatives do not contain white noise. Then,

where and

Note in this case that sensing of is equivalent to sensing of .

Assumption 1.

The verification variables are assumed to satisfy:

  1. is either zero or full rank. Specifically, the rank is zero for and is for .

  2. is time-invariant over .

The first assumption can be easily relaxed by decomposing the verification variables into multiple sets depending on the order of differentiability. The second assumption readily holds for time-invariant . For time-varying , it does not hold in general; but, in this case the verification window can be decomposed into a sequence of multiple sub-windows.

Proposition 2.

The conditional covariance is given by:


where and are obtained by a system of matrix differential equations:



and the boundary conditions are given as:

where .


See Appendix A

Remark 1.

The system of matrix differential equations in (17) and (18) is a two-point boundary value problem for which boundary conditions for are given at the initial time, , and those for are given at the final time, . However, since the Riccati equation for is decoupled from the equation in (18), the system of equations can be solved in two steps:

  1. The Riccati equation in (17) is integrated forward over to obtain ,

  2. The system of equations (17) and (18) are integrated together backwards to obtain .

3.3 On-the-fly Information and Mutual Information Rate

One benefit of exploiting conditional independence in computing mutual information is that it facilitates access to on-the-fly knowledge of how much information has been gathered and in what rate information is being gathered. Extending the development in [5] for IPF, on-the-fly information quantities can be obtained for IWF. For arbitrary time , the mutual information accumulated up to is computed by:

Also, the rate of change of this on-the-fly information is obtained as


where . The derivation is straightforward from the proof of mutual information rate for informative point-wise forecasting given in [5, Proposition 4]. The mutual information rate in (19) is particularly useful in order to visualize information distribution over space, because for many cases the observation matrix is a function of the sensor location and thus the mutual information rate represents how high rate information can be obtained if sensing a certain location.

4 Numerical Examples

4.1 Idealized Weather Forecasting Along Moving Paths

The first example addresses design of continuous sensing trajectory for weather forecast improvement, when the simplified weather dynamic is described as the two-dimensional Lorenz-2003 model [17, 3]. The system equations of are

where for . The subscript denotes the west-to-eastern grid index, while denotes the south-to-north grid index; represents some scalar meteorological variable at -th grid. The boundary conditions of and , in advection terms, are applied to model the mid-latitude area of the northern hemisphere as an annulus. The parameter values are and . The size of grid corresponds to 347 km 347 km in real distance, and unit time in this model is equivalent to 5 days in real time. The overall system is tracked by a nonlinear estimation scheme, specifically an ensemble square-root filter (EnSRF) [22] data assimilation scheme, that incorporates measurements from a fixed observation network of size 186.

The path planning problem is posed for a linearized model over some local region (therefore, ) in the entire grid space. A linear time-invariant model is obtained by deriving the Jacobian matrix of the dynamics around the nonlinear estimate for ’s at the grid points in the local region. Thus, the state vector represents the perturbation of ’s from the ensemble mean:

where represents the location vector of -th grid point, and denotes the perturbation of the Lorenz variable. The prior covariance of the state, is provided EnSRF data assimilation scheme.

The continuous trajectory planning allows for sensing at an off-gird point. To represent measurement at an off-grid point, squared-exponential kernel function often used in Kriging[krig] and Gaussian process regression is adopted; observation variable at some location is expressed as a linear combination of the state:


with , where and is the -th element of the matrix . The length-scale of is used in this example.

The goal of planning is to design a 6-hr flight path ( 6 hrs) for a single UAV sensor platform to improve forecast over two moving verification object of interest between 60hrs and 84hrs; the two objects are to move one grid (km) over this verification window. The motion of the sensor platform is described as 2-dimensional holonomic motion:


with constant speed grid/hr (km/hr); the path angle is the control variable to optimize on; along the trajectory the sensor continuously measure in (20) with additive Gaussian noise with .

The verification path is assumed to be a straight line starting at some point and ending at another point, each of which corresponds to some linear combination of the state variables.

where and are the verification matrices for and , respectively. This allows for computing the time derivative of verification matrix:

which will be needed to construct matrix in section 3.2.

Four different planning strategies are compared: (a) optimal IWF trajectory, (b) gradient-ascent steering with IWF information potential field, (c) optimal trajectory for IPF problem with and , and (d) gradient-ascent steering associated with IPF mutual information. To obtain the optimal solution, the control is parameterized as a piece-wise linear function of time with 6 segments and the 36 straightline paths with different flight path angles are considered as initial guess to the optimization; for solving resulting nonlinear programs, TOMLAB/SNOPT [20] is used. For the two-dimensional holonomic sensor motion in (21), the gradient-ascent steering law is obtained by taking the partial derivative of the information rate (in 19 for IWF), which is a function of spatially-dependent matrix, with respect to the spatial coordinates:

where is an -dimensional column vector with (and is defined similarly.)

Fig. 2 illustrates snapshots of trajectories of the four planning strategies overlaid on the information potential field associated with the optimal IWF solution. Two verification paths are depicted with red triangles while the shape of triangle is aligned with the direction of movement of the verification paths. In the potential field, a dark part is more informative than brighter one; note that his potential field differs from that of the IWF gradient ascent solution except for the initial time, as in (19) depends on the sensing choice up to . Although not depicted, in this particular example, the IPF potential field looks substantially different from the IWF field in that the lower information peak around does not exist for the IPF case. That may be the reason the IPF-based trajectories move left up while the IWF-based trajectories start moving right down. Fig. 2 illustrates information accumulation along the four trajectories. Notice that the optimal IWF solution might not provide the most information earlier in the trajectory, but results in gathering most amount of information in the end.

Fig. 1: Trajectories by different strategies overlaid with information potential field for the optimal IWF trajectory

Fig. 2: Information accumulation along different sensing trajectories

4.2 Sensor Network Management for Target Tracking

The second example considers management of fixed sensor networks deployed in a two-dimensional space for tracking moving targets. There are targets and each target’s motion is represented by the Singer model [16], i.e., kinematic model with first-order Markov diffusion in acceleration:


where the superscript denotes -th target and is used in this work. The state vector is defined as and its initial covariance is computed from an extended Kalman filter run with randomly chosen measurements.

The sensor network consists of sensors that can measure the pseudo-range between themselves and the targets. The linearized pseudo-range measurement sensing model is given by with

where is the location vector of -th sensor, is the nominal position of the -th target; this linearized model is obtained by calculating the Jacobian of the pseudo-range, around ; is used in this example.

The sensor management problem is to determine which sensors (out of ) to turn on to best track the targets; the total planning horizon is dividend into intervals within which the same set of sensors are turned on to make continuous observation of the targets. To goal of this management is to reduce uncertainty in velocities of the targets over the verification window. Problem parameters in this example are as follows: , , , , , , , and . The verification matrix in this case is given by where denotes the Kronecker product.

With the system described in (22), the verification variables are twice differentiable, in other words, perfect measurement of the velocity over some finite window is equivalent to perfect sensing of velocity and acceleration at and noisy sensing of the jerk. For this reason, the initial value of the Riccati matrix for the smoothing, i.e., has non-zero elements in columns and rows corresponding only to the position variables.

Remark 2.

Note that the position dynamics does not contain white noise, so does the velocity dynamics. Therefore, by taking perfect measurement of position over , the velocity and the acceleration can also be determined perfectly; in other words, if with perfect position information over some finite time window, the entire state can be reconstructed. Therefore, although the problem is formulated as an informative windowed forecasting, this is equivalent to the case in Corollary 1: .

Fig. 3: Optimal sensor schedules overlaid on information field

Fig. 3 illustrates the optimal scheduling of sensor networks overlaid with the information potential calculated at the start time of the respective time period. It can be found that the optimal selection for each period does not necessarily selects the point with the highes potential point as the selection is made for the best combination of two sensors that is most informative for predicting the future. Table 1 summarizes the mutual information value acquired during each decision period; the same quantities for the cases when target positions or accelerations are the verification entity are also shown for comparison. In this particular example, the selected sensor sequences are not significantly different for different choices of verification entity (in particular for the first two decision periods).

Case 0 to 1 1 to 2 2 to 3 3 to 4
Pos. 6.9507 6.1875 6.0211 5.9962
Vel. 2.6486 2.7414 2.6148 2.4116
Acc. 0.0384 0.0531 0.0527 0.0335
Table 1: Information gains per decision period

Observe that the mutual information value is greatest for the position case and smallest for the acceleration case. This can be well explained on the basis of commutativity of mutual information: . Namely, informativeness of measurement to predict verification entity is equivalent to informativeness of the verification entity to predict the measurement. With the kinematic model in this example, knowing future position over finite window means knowing every kinematic variables (likewise, knowing velocity means knowing acceleration as well); thus, position is more informative than velocity, and acceleration is nearly non-informative as it is corrupted by white noise.

5 Conclusions

This paper has presented a formula of mutual information in the informative forecasting problem that is to reduce uncertainty in the future verification variables over finite time window. While underlying conditional independence relation has allowed for exploiting the smoother form of mutual information, fixed-interval smoothing with perfect measurement is utilized to compute the conditional initial covariance matrix that provides initial conditions for required matrix differential equations. Numerical examples highlighting temporal variation and differentiability of the verification entity have demonstrated the validity of the proposed approach.


This work was supported in part by AFOSR grant (FA9550-12-1-0313), and in part by the KI Project via KI for Design of Complex Systems.

A Proof of Proposition 2

This section proves Proposition 2 by solving an optimal control problem equivalent to the fixed-interval smoothing problem to compute . While the key concept in the procedure is adopted from [18, 1], extended expression for high-order differentiable is newly detailed herein.

As [18, 1] pointed out, a fixed-interval smoothing problem can be viewed as an optimal control problem to find out optimal (continuous) sequence of process noise over the interval and the optimal initial state under the constraint of measurements obtained over the interval. Therefore, the smoothing problem in section 3.2 can equivalently be written as an optimal control problem with associated adjoint variables: