Universal, ContinuousDiscrete Nonlinear Yau Filtering I: Affine, Linear State Model with StateIndependent Diffusion Matrix
Abstract:
The continuousdiscrete filtering problem requires the solution of a partial differential equation known as the FokkerPlanckKolmogorov forward equation (FPKfe). In this paper, it is pointed out that for a state model with an affine, linear drift and stateindependent diffusion matrix the fundamental solution can be obtained using only linear algebra techniques. In particular, no differential equations need to be solved. Furthermore, there are no restrictions on the size of the time step size, or on the measurement model. Also discussed are important computational aspects that are crucial for potential realtime implementation for higherdimensional problems. The solution is universal in the sense that the initial distribution may be arbitrary.
1 Introduction
The problem of continuousdiscrete (continuouscontinuous) filtering is to estimate the state that is described by a continuoustime stochastic process from the observations of a related discretetime (continuoustime) stochastic process called the measurement process. The complete solution of the filtering problem, in the Bayesian sense, is given by the conditional probability density function. Given the conditional probability density we can define optimality under any criteria. Usually, the conditional mean, which is the minimum meansquares estimate, is studied. A solution is said to be universal if the initial distribution is arbitrary (see, for example, [1]).
Continuousdiscrete filtering requires the solution of a partial differential equation known as the FokkerPlanckKolmogorov forward equation (FPKfe) [2]. In general, such problems are often solved via discretization of the PDE, such as using the method of finite differences, [3, 4], spectral methods [5] and finite element methods[6]. The numerical solution of a PDE is nontrivial. In particular, there are severe time step size and grid spacing restrictions (especially for high accuracy solutions) for stability, accuracy, and consistency of the numerical method. In addition, a plain implementation places very large computational and memory requirements, especially for higher dimensional problems. It is therefore desirable to be able to exactly solve for the fundamental solution of the FPKfe, in terms of which the FPKfe can be solved.
In this paper it is pointed out that the fundamental solution of the FPKfe for the affine, linear state model with stateindependent diffusion matrix, a special case of the Yau filter (discussed below), can be computed using only linear algebra methods. In particular, the fundamental solution for such a model can be computed exactly for an arbitrary time step size without having to solve a differential equation. It is also noted that the computational requirements are considerably less than that suggested by the size of the grid space. In particular, the conditional probability density can be computed efficiently (especially, concerning memory requirements) and rapidly with the use of sparse multilinear data structures. In many applications, such as target tracking, the state model is linear and the measurement model is nonlinear. Therefore, the results in the paper are applicable to a large number of practical problems.
The Yau filter is defined to be one where state model drift a linear plus a gradient term satisfying the Beněs condition and with a time and state independent diffusion matrix [7]. In continouscontinuous filtering theory, the Yau filtering system is defined as one with an additional restriction that the measurement process is a continuoustime stochastic process with linear drift. The Yau filtering system plays a fundamental role in continuouscontinuous filtering. The reason is that it has been shown that a finitedimensional filter of maximal rank has to be a Yau filter[7]. In light of its importance, some realtime implementable solutions for the Yau filtering system have been investigated. In [8, 9], the solution of the Yau equation has been shown to be equivalent to a solution of a system of linear ODEs. The case of nonlinear observations has also been investigated in some papers. In particular, it is shown in [10] that some nonlinear observation models can also be solved via ODEs. Finally, in [11], a solution is presented in terms of a power series.
In this paper, the special case of the continuousdiscrete Yau filter, namely one with an affine, linear state model drift and stateindependent (but possibly timedependent) diffusion matrix, is studied. Note that the measurement model in the continuousdiscrete Yau case can be arbitrary, and so a more general filtering problem is solved than for the continuouscontinuous Yau case. In a following paper, it is shown that the fundamental solution of the FPKfe for another special case of the Yau filter, namely one with a gradient drift satisfying a certain condition (and also known as the Beněs model drift), can also be solved exactly using such elementary methods.
It is noted that the methods developed for solving the continuouscontinuous filtering problem quoted in the papers above can be used to solve the prediction part of the continuousdiscrete filtering problem. However, they require solution of ODEs, or a power series expression, rather than the closed form expressions derived here.
The relevant aspects of filtering theory are reviewed in Section 2. The fundamental solutions are derived for the additive noise case in Section 3 and 4. Some implementational aspects are discussed in Section 5 and related comments on computational aspects of discretediscrete filtering are presented in Section 6. An example is presented in Section 7. In Appendix A, the formula for the state transition matrix for the general timedependent case is reviewed. In Appendix B, the linear manoeuvering target tracking models reviewed in [12] are explicitly solved. Some of the results are wellknown and have been derived elsewhere, sometimes presented in a different form. Here use is made of the elegant method of [13] that enables a simple and systematic derivation of explicit formulas for arbitrary affine, linear models up to fourdimensional state space problems.
2 Review of ContinuousDiscrete Filtering Theory
2.1 Langevin Equation, the FPKfe and its Fundamental Solution
The general continuoustime state model is described by the following stochastic differential equation (SDE):
(1) 
Here and are dimensional column vectors, the diffusion vielbein is an matrix and is a dimensional column vector. The noise process is assumed to be Brownian with covariance and the quantity is termed the diffusion matrix. All functions are assumed to be sufficiently smooth. Equation 1 is also referred to as the Langevin equation. It is interpreted in the Itô sense.
Let be the initial probability distribution of the state process. Then, the evolution of the probability distribution of the state process described the Langevin equation, , is described by the FPKfe, i.e.,
(2) 
Let , and consider the following PDE:
(3) 
The quantity is known as the fundamental solution of the FPKfe. In this instance, the physical interpretation is that it is the transition probability density.
From the fundamental solution one can compute the probability at a later time for an arbitrary initial condition as follows^{1}^{1}1In this paper, all integrals are assumed to be from to , unless otherwise specified.:
(4) 
Therefore, in order to solve the FPKfe it is sufficient to solve for the transition probability density . Note that this solution is universal in the sense that the initial distribution can be arbitrary.
2.2 ContinuousDiscrete Filtering
In ths paper, it is assumed that the measurement model is described by the following discretetime stochastic process
(5) 
where , and the noise process is assumed to be a white noise process. Note that . It is assumed that is known.
Then, the universal continuousdiscrete filtering problem can be solved as follows. Let the initial distribution be and let the measurements be collected at time instants . We use the notation . Prior to incorporating the measurements, the state evolves according to the FPKfe, i.e.,
(6) 
This is the prediction step.
According to Bayes’ rule
(7) 
Now and since measurement noise is white it follows that . Hence, the ‘corrected’ conditional density at time is
(8) 
This is then the initial condition of the FPKfe for the next prediction step which results in
(9) 
and so on.
3 Fundamental Solution I: Additive Noise
In this section, the following general affine, linear state model with additive noise is considered:
(10) 
It is assumed that commutes at different times, i.e.,
(11)  
The case when does not commute with itself at different times is discussed in Appendix A. Also, the matrix notation is used here.
The state model can be written as the following Langevin equation
(12)  
where is correlated and Gaussian distributed as follows:
(13)  
The formal solution of Equation 12 is
(14) 
where the state transition matrix is the solution of the following equation:
(15) 
This can be verified as follows. The Leibniz rule is
(16) 
where are assumed to be sufficiently smooth functions of their arguments. From the Leibniz rule it therefore follows that
(17)  
Also, satisfies the semigroup property:
(18) 
which immediately implies that
(19) 
Since the matrix commutes at all times
(20) 
If the matrix is timeindependent
(21) 
Since a linear transformation of Gaussian variables is also Gaussian, is also a Gaussian random variable. Therefore, it is completely characterized by the mean vector and covariance matrix.
The mean vector is
(22)  
and the covariance matrix is
(23)  
Combining the results for and , the fundamental solution is
(24) 
Finally, observe that
(25)  
where the Leibniz rule (Equation 16) has been used in the second step. This result will be used in the following section.
4 Fundamental Solution II: StateIndependent Diffusion Matrix
In Section 3, the fundamental solution was derived for the timeindependent affine, linear state model with additive noise. In this section, it is pointed out that a similar result follows if the noise is multiplicative but with state dependent diffusion matrix. This is a straightforward generalization of the result derived in [14]; it also assumes that the diffusion matrix is timeindependent.
The state model considered here is
(26) 
The state model is assumed to be affine and linear and possibly explicitly time dependent. The diffusion matrix is assumed to be independent of state, but possibly (explicitly) time dependent.
Observe that this includes the case studied in the previous section. Since the diffusion vielbein is state dependent (or the state model noise is multiplicative), the methods used in the previous section cannot be applied to this case. This model is also more general than the Yau filter case where no explicit time dependence is assumed.
The FPKfe for the transition probability density is
(27) 
Let be the Fourier transform of with respect to , i.e.,
(28) 
Since
(29)  
it follows that
(30) 
Since is Gaussian, so is and we can choose the following ansatz for :
(31) 
Without loss of generality, . Note that if there is no explicit time dependence, the argument of and is the difference between the times, i.e., .
Therefore
(32) 
This implies that and obey the following ODEs:
(33) 
and
(34) 
From the discussion in the previous section, the solution of these firstorder ODEs are
(35) 
and
(36) 
so that
(37) 
Thus, the expression is the same as derived earlier even for the more general case (i.e., although constant diffusion vielbein implies constant diffusion matrix (for constant , constant difusion matrix does not imply constant diffusion vielbein).
5 Practical Implementation
In this section, some practical implementational aspects are discussed. Specifically, a computationally efficient filtering algorithm based the results derived in the previous sections is presented in Section 5.1. Some additional aspects are discussed in Section 5.2.
It is important to note that the transition probability is given in terms of an exponential function. This implies that is nonnegligible only in a very small region, or the transition probability density tensor is sparse. The sparsity property is crucial for storage and computation speed.
5.1 Sparse Kernel Grid Filtering Algorithm
The implementation of the continousdiscrete filtering solution exploiting the sparsity property of the transtition probability density, or kernel, is thus straightforward.

Precompute the transition probability density is given by ()
(38) 
Threshold the transition probability density and save it as a sparse tensor. That is, zero the entries corresponding to exponent larger than a certain (userspecified) threshold.

At time

The prediction step is accomplished by
(39) Note that is simply the initial condition .

The measurement at time are incorporated in the correction step via
(40)

This algorithm is referred to as the sparse grid filtering (SGF) algorithm. The thresholding step is crucial for realtime implementation.
Note that the precomputation step assumes that the grid spacing and time intervals between measurements are known.
In many applications, the measurement model is described by an additive Gaussian noise model, i.e.,
(41) 
with . Then, at observation time , the conditional density is given by
(42) 
where is given by
(43) 
and is given by
(44) 
5.2 Additional Remarks
Observe also that the fundamental solution has a simple and clear physical interpretation. Specifically, when the signal model noise is small the transition probability is significant only near trajectories satisfying the noiseless equation. The noise variance quantifies the extent to which the state may deviate from the noiseless trajectory.
The following additional observations can be made :

It is noted that some of the results derived in this paper are wellknown and used in EKF. In particular, the result of Section 3 is used to derive the discretetime version of the continuoustime state model (see, for instance, [15]).
However, the use of these results is very different from that in EKF. Specifically, the EKF linearizes the nonlinear measurement model and propagates the conditional mean and variance, assuming the prior conditional probablility density is Gaussian. Thus, the EKF fails if the nonlinearity is not benign, or if the Gaussian approximation for the prior is not valid (e.g., multimodal). In the gridbased filtering, the standard deviation computed from the conditional probability density is a reliable measure of the filter performance. This is not the case of suboptimal methods like the EKF.
In other words, the SGF uses the exact expression for the fundamental solution that is given by a Gaussian function, while the EKF make the unjustified approximation that the conditional probability density at any time is Gaussian.

The conditional probability density calculated at grid points is exact (ignoring roundff errors) at those grid points. So, an interpolated version of the fundamental solution at coarser grid is close to the actual value. This suggests that a practical way of verifying the validity of the approximation is to note if the variation in the statistics with grid spacing, such as the conditional mean, is minimal.
Observe that the time step size may be arbitrary. This means that the optimal solution (in the sense of being the correct solution) can be computed using this using a grid of sufficiently small grid spacing. However, the conditional state estimation will not be as good for large time steps, a fundamental limitation due to large measurement sample interval.

In this paper, the correction step was carried out using a uniform grid. It is clear that a much faster approach for higher dimensional problems would be to use Monte Carlo integration methods.

The prediction step computation can be speeded up considerably by restricting calculation only in areas with significant conditional probability mass (or more accurately, in the union of the region of significant probability mass of and .

Although it is possible to compute the transition probability density offline, it may be more appropriate to compute it online computing only those terms where initial points are in the region where the density is significant. It is also then possible to use an ‘adaptive grid’, both in resolution and location/domain. Observe that the onestep approximation of the path integral can be stored more compactly as it is Gaussian. Note that when the noise is small, grid spacing needs to be finer in order to adequately sample the conditional probability density.

Although the exact analytical expression of the conditional probability density is unknown, the accuracy of the gridbased method can be made very high and robust with small enough grid spacing. This situation is similar to the case of numerical solution of partial differential equations, where the exact solution is not known, but high accuracy numerical solutions (even machine precision) can still be obtained. Thus, for practical purposes “numerically exact” solutions obtained with a fine grid are optimal. Note that in the case of the PDEs, even when the exact solution is known, it often does not yield accurate numerical solutions.
6 Comments on DiscreteDiscrete Sparse Kernel Grid Filtering and Particle Filtering
6.1 DiscreteDiscrete Filtering
In Section 3, the continuoustime state process has been converted to a discretetime state sequence. This result is simple and exact because the model is linear. This is not possible for a general nonlinear state model, i.e., there is no such general formula relating the continuoustime process with its equivalent discretetime sequence. In fact, a simple, nonlinear state process is not equivalent to a simple nonlinear state sequence.
Observe that a discretetime sequence description is more general than a continuoustime process. This is because less is assumed in the specification of the state sequence (for example, no assumption on continuity, differentiability are needed). However, the fundamental dynamical laws of classical physics are expressed in terms of continuoustime ODEs. Note that a simple, nonlinear dynamical model may not result in a “simple” discretetime dynamical model.
Often, a nonlinear discretetime state process, or sequence, is studied. This can be either as a simple (e.g., Euler) approximation of a continuoustime state model, or directly via other means. In this section, the state sequence model is assumed to be given. Thus, one is led to consider the following general discretediscrete filtering problem where the signal and measurement sequences are as follows:
(45) 
Here, is a nonlinear function of the state is an independent, identically distributed (i.i.d) process noise sequence, and are dimensions of the state and process noise vectors. Similarly, is a nonlinear function, is an i.i.d. measurement noise sequance, and are dimensions of the measurement and measurement noise vectors.
The universal discretediscrete filtering problem is to evolve the conditional probability distribution of the state at time , based on the set of all available measurements , i.e., . The recursive solution is wellknown and similar to that discussed in Section 2. Thus, if is known (note is the prior, or initial pdf), then
(46) 
Note that due to the firstorder Markov property of the state process.
Observe that “discrete” here refers to the time, not the state; the states take values in the continuum ().
This solution assumes that and are known. Of course, they are defined by the state and measurement processes. In a practical situation, a closed form expression for these quantities is desired. This is indeed possible for several cases of practical interest, such as additive Gaussian noise.
Of course, no closed form expression for the conditional probability density is known for an arbitrary initial distribution. Also, such a closed form solution, if it exists, may be unsuitable for an accurate numerical evaluation.
6.2 Sparse Kernel Grid Filtering
It is usually stated that gridbased methods are computationally prohibitive when dealing with highdimensional spaces. However, the transition probability density tensor is sparse, with the sparsity determined by the grid spacing, grid size and signal model noise. Likewise, the correction due to measurements is going to be sparse. Then, the conditional probability density is significant only in a small fraction of the grid space. Therefore, the prediction step needs to be carried out in a very small region of the grid space. This implies that the memory requirements and the number of flops is considerably less than that suggested by naive counting.
The arguments in Section 5 carry over to this case as well. In particular, naïve flops and memory estimate are unduly pessimistic for excellent performance using SGF.
6.3 Some Remarks on Particle Filtering
An alternative to grid based techniques is particle filtering (see, for instance, [16]). A recursive Bayesian filter is obtained by Monte Carlo simulations. The idea is to represent the required conditional probability density by a set of random samples with associated weights and then to compute estimates based on these samples and weights. As the number of samples becomes very large, this approaches the optimal Bayesian solution. It has been shown that the performance of a PF is very good for many cases. An important aspect of particle filtering is a good choice of the proposal density. For a discretetime state process with linear measurement process, optimal proposal density known for PF. Note that PF solution is not robust for the case when the measurement model is nonlinear.
A common problem with the particle filter is the degeneracy phenomenon, where after a few iterations, all but one particle will have negligible weight. In fact, the variance of the importance weights can only increase over time, and thus, it is impossible to avoid the degeneracy phenomenon. This degeneracy implies that a large computational effort is devoted to updating particles whose contribution to the approximation to is almost zero. Methods of alleviating this problem include choosing a good importance density and resampling.
From our discussion, it follows that the sample requirements in PF are analogous to the number of grid points in SGF (which is set by the threshold). The key point is that the SGF method does not require evaluation at all grid points due to the sparsity of the conditional probability density and the transition probability density.
It is also noted that as long as the grid spacing is small enough to adequately sample the conditional probability density, there are no problems in propagating the conditional probability density.
The SGF approach is likely to be useful in providing an estimate of the number of particles needed for the particle filtering. It is an interesting exercise to compare the computational load of the SGF with that of a robust PF solution (which is problem dependent).
7 Example
In the examples, use is made of the Tensor toolbox in MATLAB developed by Bader and Kolda [17]. It has the multininear sparse tensor class, essential for SGF.
Consider the 2D coordinated turn model with with and time step size is 0.5 s. The measurement model drift is , or the angle, with noise variance . Thus, the measurement model is manifestly nonlinear function of the state variables. The transition probability density was computed in the interval with grid spacing along using the Tensor toolbox in MATLAB [18]. Note that the Courant number so that a onestep explicit scheme is unstable (an implicit scheme such as the CrankNicholson scheme is stable, but not as accurate [19]).
Finally, the sparsity of the grid is about 99% when the threshold is set to 10 (i.e., of exponent is greater than 10, the transition probability multilinear array element is set to be 0). The sparsity of tensor is crucial in enabling a realtime computation of the on a current PC. In addition, it places considerably less memeory requirements.
Figures 1 and 2 plot that the result obtained used the formula for the fundamental solution and using SGF. Also plotted are the conditional means and bounds using the conditional probability density. It is seen that the tracking performance is very good in the sense that the conditional mean is found to be within a most of the time. The root mean square errors for the two states over 100 Monte Carlo runs is found to be around 3 at . Observe that the transition probability denisty tensor needed to be computed only once.
In summary, proposed SGF method is seen to be stable and accurate even for such large time steps; the errors introuced in sprasifying the transition probability density are seen to be minimal, while the computational savings (in terms of speed and memory) are immense.
8 Conclusion
In this paper it has been shown that the continuousdiscrete filtering problem with an affine, linear state model and with state independent diffusion matrix can be solved accurately using the exact fundamental solution of the corresponding FPKfe valid for an arbitrary time step size. Unlike the continuouscontinuous case studied by Stephen Yau and collaborators, there are no restrictions on the measurement model in the continuousdiscrete case. The sparsity of the transition probability density is then exploited to demonstrate that sparse grid filtering techniques can be used to solve higher dimensional problems with considerably less computational effort. Similar comments apply to the discretediscrete filtering problem as well.
It is also interesting to note that the path integral approach can be used to derive the expression for the fundamental solution for the FPKfe for the nonlinear state model case. Results for the additive and multiplicative noise cases are derived in [20] and [21]. In fact, it has been shown that the curdest approximation of the path integral formulas leads to very accurate results (see [22]). Thus, the sparse grid filtering approach yields a unified approach to tackling the general nonlinear filtering problem. Since it enables one to focus computation in regions of significant probability mass, it offers the possibility of solving higher dimensional filtering problems in realtime.
Appendix A General Time Dependent Case
The state transition matrix is the solution of the following equation:
(47) 
The results derived in Section 3 and 4 are not valid when the timedependent matrix, , does not commute at different times (note that , i.e., it commutes at equal times).
A formal solution for the general timedependent case can be written down using the concept of timeordered product. Let be a string of timedependent matrices and let . The, the timeordered product
(48) 
In other words, the timeordered product of a string of operators (with time as its argument) is defined to be the string rearranged so that the eariler time operators are to the right of the later ones. In general, there is an ambiguity when the operators do not commute at equal times. In the case of interest, , so they commute at equal times.
Observe that under the timeordering symbol, everything commutes. Hence, a time derivative can be taken in the usual manner:
(49)  
Here the exponential of the matrix is defined via the usual power series expansion of the exponential function. In the second step, use is made of the fact that is the latest time, so that timeordering puts it on the leftist. Since is the latest time, it can be taken out of the timeordering:
(50) 
Thus, the solution of Equation 47 is
(51) 
The satisfaction of the boundary condition is obvious. Also, uniqueness of the solution follows because it is the solution of a firstorder differential equation with a given initial value.
An alternative form is the following:
(52) 
The correctness of the formula follows from observing that when the right hand side is differentiated with respect to time, each term gives the previous one times . Observe that the various factors of are timeordered, i.e., matrices on later time are at the left. In fact, equality of this expression to Equation 51 follows from the following identity: