Universal Nonlinear Filtering Using Feynman Path Integrals II: The Continuous-Continuous Model with Additive Noise

Universal Nonlinear Filtering Using Feynman Path Integrals II: The Continuous-Continuous Model with Additive Noise

Bhashyam Balaji
Radar Systems Section,
Defence Research and Development Canada, Ottawa,
3701 Carling Avenue,
Ottawa ON K1A 0Z4 Canada
Email: Bhashyam.Balaji@drdc-rddc.gc.ca

In this paper, the Feynman path integral formulation of the continuous-continuous filtering problem, a fundamental problem of applied science, is investigated for the case when the noise in the signal and measurement model is additive. It is shown that it leads to an independent and self-contained analysis and solution of the problem. A consequence of this analysis is Feynman path integral formula for the conditional probability density that manifests the underlying physics of the problem. A corollary of the path integral formula is the Yau algorithm that has been shown to be superior to all other known algorithms. The Feynman path integral formulation is shown to lead to practical and implementable algorithms. In particular, the solution of the Yau PDE is reduced to one of function computation and integration.

Fokker-Planck Equation, Kolmogorov Equation, Optimal Nonlinear Filtering, Duncan-Mortensen-Zakai equation, Yau filters

1 Introduction

1.1 Motivation

The fundamental dynamical laws of physics, both classical and quantum mechanical, are described in terms of variables continuous in time. The continuous nature of the dynamical variables has been verified at all length scales probed so far, even though the relevant dynamical variables, and the fundamental laws of physics, are very different in the microscopic and macroscopic realms. In practical situations, one often deals with macroscopic objects whose state variables are phenomenologically well-described by classical deterministic laws modified by external disturbances that can be modelled as random noise. It is therefore natural to consider the problem of the evolution of a state of a signal of interest described by a continuous-time stochastic dynamical model. Roughly speaking, the state of the system is described by a noisy version of a deterministic dynamical system termed the state model, i.e., the dynamics is governed by a system of first-order differential equations in the state variable () with an additional contribution due to noise that is random(). The noise in the state model is referred to as the signal noise. If the noise is Gaussian (or more generally multiplicative Gaussian) the state process is a Markov process. Since the process is stochastic, the state process is completely characterized by a probability density function. The Fokker-Planck-Kolmogorov foward equation (FPKfe) describes the evolution of this probability density function (or equivalently, the transition probability density function) and is the complete solution of the state evolution problem. For an excellent discussion of stochastic processes and filtering theory, see [1].

However, in many applications the signal, or state variables, cannot be directly observed. Instead, what is measured is a nonlinearly related stochastic process () called the measurement process. The measurement process can often be modelled as yet another continuous stochastic dynamical system called the measurement model. Loosely speaking, the observations, or measurements, are discrete-time samples drawn from a different system of noisy first order differential equations. The noise in the measurement dynamical system is referred to as measurement noise. For simplicity, the signal and measurement noise are often assumed to be independent.

The problem of optimal nonlinear filtering is to estimate the state of a stochastic dynamical system optimal under a certain criterion, given the observations that are samples of a related stochastic measurement process. The conditional probability density function of the state parameters given the observations is the complete solution of the filtering problem since it contains all the probabilistic information about the state process that is in the measurements and in the initial condition. This is the Bayesian approach, i.e., the a priori initial data about the signal process contained in the initial probability distribution of the state() is incorporated into the solution. Given the conditional probability density, optimality may be defined under various criterion. Usually, the conditional mean, which is the least mean-squares estimate, is studied due to its richness in results and mathematical elegance. The solution of the optimal nonlinear filtering problem is termed universal, if the initial distribution can be arbitrary.

1.2 Fundamental Sochastic Filtering Results

When the state and measurement processes are linear, the linear filtering problem was considered by Kalman and Bucy[2], [3]. The Kalman filter has been successfully applied to a large number of problems.

Neverthless, the Kalman filter suffers from some major limitations. The Kalman filter is not optimal even for the linear filtering case if the initial distribution is not Gaussian111It may still be optimal for a linear system under certain criteria, such as minimum mean square error, but not a general criterion.. In other words, the Kalman filter is not a universal optimal filter, even when the filtering problem is linear. In the general nonlinear case, the filter state is infinite dimensional, namely the whole conditional probability distribution function, and the Kalman filter cannot be optimal, although it may still work quite well if the nonlinearity is mild enough.

The continuous-continous nonlinear filtering problem (i.e., continuous-time state and measurement stochastic processes) was studied in [4, 5], [6] and [7]. This led to a stochastic differential equation, called the Kushner equation, for the conditional probability density in the continuous-continuous filtering problem. It was noted in[8], [9], and [10] that the stochastic differential equation satisfied by the unnormalized conditional probability density, called the Duncan-Mortensen-Zakai (DMZ) equation, is linear and hence considerably simpler than the Kushner equation. In [11] and [12] were derived the robust DMZ equation, a partial differential equation (PDE) that follows from the DMZ equation via a gauge transformation.

A disadvantage of the robust DMZ equation is that the coefficients depend on the measurements. Thus, one does not know the PDE to solve prior to the measurements. As a result, real-time solution is impossible. In [13], it was proved that the robust DMZ equation is equivalent to a partial differential equation that is independent of the measurements, which is referred to as the Yau Equation (YYe) in this paper (see discussion at the end of Section 2.6 for why it is being distinguished from the FPKfe). Specifically, the measurements only enter as initial condition at each measurement step. Thus, no on-line solution of a PDE is needed; all PDE computations can be done off-line. As discussed later, there are other real-time solutions possible; however, the approximation error in those algorithms are not as well controlled.

However, numerical solution of partial differential equations presents several challenges. A naïve discretization may not be convergent, i.e., the approximation error may not vanish as the grid size is reduced. Alternatively, when the discretization spacing is decreased, it may tend to a different equation, i.e., be inconsistent. Furthermore, the numerical method may be unstable. Finally, since the solution of the YYe is a probability density, it must be positive which may not be guaranteed by the discretization.

A different approach to solving the PDE was taken in [14] and [15]. An explicit expression for the fundamental solution of the YYe as an ordinary integral was derived. It was shown that the formal solution to the YYe may be written down as an ordinary, but somewhat complicated, multi-dimensional integral, with the integrand an infinite series. In addition, an estimate of the time needed for the solution to converge to the true solution was presented.

1.3 Outline of the Paper

In this paper, the Feynman path integral (FPI) formulation is employed to tackle the continuous-continuous nonlinear filtering problem. Specifically, phrasing the stochastic filtering problem in a language common in physics, the solution of the stochastic filtering problem is presented. In particular, no other result in filtering theory (such as DMZ equation, robust DMZ equation, etc) is used. The path integral formulation leads to a path integral formula for the transition probability density for the general aditive noise case. A corollary of the FPI formalism is the path integral formula for the fundamental solution of the YYe and the Yau algorithm—the central result of nonlinear filtering theory. It is noted that this paper provides detailed derivation of results that were used in [16]. In view of the wide applicability of the results derived here, this paper provides a brief summary of the central results of filtering theory and pedagogical discussion of Feynman path itegral methods that may be found to be useful by a wider audience, i.e., people specializing in different areas of applied science and engineering.

The Feynman path integral was introduced by Feynman in quantum mechanics[17, 18]. It has had a long and impressive history of results in theoretical physics. Inspired by an observation/remark of Dirac[19, 20], Feynman discovered the path integral representation of quantum physics [17, 18]. The importance of path integrals grew (in the mid 1960’s) when Fadeev and Popov used the path integrals to derive Feynman rules for the non-Abelian gauge theories. Its status was further enhanced when ’t Hooft and Veltman used path integral methods to prove that the Yang-Mills theories were consistent (i.e., renormalizable). Since then, the path integral formulation of quantum field theory has led to extraordinary insights into quantum field theory, such as renormalization and renormalization group, anomalies, skyrmions, monopoles, instantons, and supersymmetry (see, for instance, the freely available text [21]). Much of the advance in the past 40 years in particle theory would not have been possible without the Feynman path integral. The path integral has also led to numerous insights into modern mathematics, especially in the work of Edward Witten.

A path integral formulation is especially attractive from a computational point of view. An excellent example is lattice quantum chromodynamics (QCD) (see, for instance, [22]). Specifically, lattice QCD has enabled high-precision calculations of observables in the non-Abelian gauge theory of strong interactions—an infinite dimensional, highly nonlinear problem. The numerical results derived using the Feynman path integral based lattice QCD cannot be achieved by any other known technique.

The following point needs to be emphasized to readers familiar with the discussion of standard filtering theory—Feynman path integral is different from the Feynman-Kǎc path integral. In filtering theory literature, it is the Feynman-Kǎc formalism that is used. The Feynman-Kǎc formulation is a rigorous formulation and has led to several rigorous results in filtering theory. However, in spite of considerable effort it has not been proven to be useful in the development of relaible practical algorithms with desirable numerical properties. It also obscures the physics of the problem.

In contrast, it is shown that the Feynman path integral leads to formulas that are eminently suitable for numerical implementation. It also provides a simple and clear physical picture. Finally, the theoretical insights provided by the Feynman path integral are highly valuable, as evidenced by numerous examples in modern theoretical physics (see, for instance, [21]), and shall be employed in subsequent papers.

The outline of this paper is as follows. In Section 2, aspects of continuous-continuous filtering are reviewed and some of the recent work on continuous-continuous filtering is also discussed. In the following section, some topics in path integrals are reviewed and the interpretation of transition probability density as ensemble averages. For more details on the path integral methods, see any modern text on quantum field theory, such as [21], and especially [23] which discusses application of Feynman path integral to the study of stochastic processes. In Sections 4 and 5, the path integral formulas for the transition probability density are derived for the time-independent and time-dependent cases. In Section 6, the partial differential equation satisfied by the path integral formula (derived for sampled measurements) in the previous sections is derived. It is shown that in the time-dependent case the partial differential equation is simply the YYe of continuous-continuous filtering. The conclusions are presented in Section 10.

2 Continuous-Continuous Filtering and the Yau Equation

In this section, the main results of (continuous-continuous) nonlinear filtering theory are summarized. The most important results are the YYe and the Yau algorithm. In subsequent sections, it will be seen that they follow directly (and simply) from the Feynman path integral method.

2.1 The Continuous-Continuous Model

The signal and observation model in continuous-continuous filtering is the following:


Here and are , , and valued stochastic processes, respectively, and and . These are defined in the Itô sense. The Brownian processes and are assumed to be independent222Hence, the large body of work on the correlated noise case is not considered in this paper with and covariance matrices and , respectively. We denote by , a matrix. Also, is referred to as the drift, as the diffusion vielbein, and as the diffusion matrix.

In this paper, the additive noise case is considered, i.e., . Then, the Itô and Stratanovich forms are the same. In a future paper, the more general, multiplicative and correlated noise case shall be tackled.

In this section, some of the relevant work on continuous-continuous filtering is summarized. Hence, it is assumed that , and are vector-valued smooth functions, is an orthogonal matrix-valued smooth function, is a identity matrix, and and are identity matrices. No explicit time dependence is assumed in the model.

2.2 The DMZ Stochastic Differential Equation

The unnormalized conditional probability density, of the state given the observations satisfies the DMZ equation:




where is the zero-degree differential operator of multiplication by , is the probability density of the initial time , and


The DMZ equation is to be interpreted in the Stratanovich sense. Note that






2.3 The Robust DMZ Partial Differential Equation

Following [11] and [12] introduce a new unnormalized density


Under this transformation, the DMZ SDE is transformed into the following time-varying PDE


This is called the robust DMZ equation. Here is the Laplacian. The solution of a PDE when the initial condition is a delta function is called the fundamental solution.

2.4 Universal Linear and Finite Dimensional Filtering I: Lie Algebra Method and the Maximal Rank Case

The Kalman filter is not a universal optimal linear filter since it assumes that the initial distribution is Gaussian, although it may still be optimal under certain criteria such as minumum variance. In the general case, the DMZ equation needs to be considered. The use of Lie algebraic methods to solve the DMZ equation had been proposed in [24] and [25]. The Lie algebraic viewpoint has been useful in giving a deeper understanding of the DMZ equation.

The estimation algebra associated with the continuous-continuous model is the Lie algebra generated by . It is said to be of maximal rank if for any there exist constants such that are in . When , if , where is of maximal rank, then is of maximal rank. Note that the explicit recursive solution for the Beněs filtering system (see [26]) was originally derived only for the maximal rank case. In some cases, maximal rank condition is assumed in deriving the explicit solutions.

If the linear filtering system is completely reachable, controllable, and observable, the estimation algebra is dimensional with basis


The Wei-Norman approach to solve the robust DMZ equation, a time-variant PDE, is to note that the solution is of the form[27]


where satisfy a system of ODEs. Therefore, the Wei-Norman approach requires the solution of a system of ODEs, linear PDEs, and an FPKfe type of equation. However, if the linear system is not completely reachable or completely observable, the basis of the estimation algebra is not known beforehand and therefore needs to to calculated. Hence, one cannot even write down the finite system of ODEs explicitly.

2.5 Universal Linear and Finite Dimensional Filtering II: Direct Solution

A direct method for solving the universal linear filtering problem was first presented in [28]. It was shown that by a combination of a gauge transformation and a time-dependent spatial translation of the unnormalized conditional probability density, the robust DMZ equation for the linear filtering problem reduces to an on-line system of ODEs and an off-line FPKfe. This solution does not require controllability or observability assumptions, and is also universal as the initial distribution could be arbitrary. This solution was further simplified and extended in [29] and [30].

The direct solution for the Beněs case was also derived by noting that if is a vector field of a potential function , then


If , then the DMZ equation is


where is the Laplacian operator. If is a quadratic polynomial in , the semigroup generated by the differential operator is known and can be used to explicitly derive solutions to the equation when are linear in . This technique is also related to the concept of equivalence of parabolic equations (see [28] for details). The solution was further generalized and simplified to the case of a more general finite-dimensional filter (Yau filter333A Yau filter is one where the drift in the signal model is the sum of a linear term and a gradient of a function and the measurement model is linear. The function is assumed to have no quadratic form piece; else it can be absorbed into the linear term. When the gradient term vanishes, the Kalman filter results. When the linear term vanishes and the gradient term is such that is quadratic, the Beněs filter[26] is obtained. with quadratic ). In summary, they showed that if one considers defined as


then the robust DMZ equation for is reduced to a system of ODEs for , and a FPKfe-type of equation for . It has since been shown that even the off-line FPKfe-type of equation can be reduced to a system of linear ODEs [31, 32]. Note that, like the Lie algebra solution, the state and measurement models are assumed to be explicitly time-independent.

2.6 The Yau Equation

The finite dimensional filtering problem is a very restrictive class of the general nonlinear filtering problem. Recently, it was proved that the real-time solution of the general nonlinear filtering problem can be obtained reliably[13], [33]. Let be a partition of the time interval , and let the norm of the partition be defined as . If satisfies the equation


in the time interval , then the function defined as


satisfies the parabolic partial differential equation


in the same time interval. The converse of the statement is also true. In [34], it was also noted that it is sufficient to use the previous observation, i.e., satisfies Equation 15 if and only if defined as


satisfies Equation 17 in the time interval . We refer to Equations 16 (18) and Equation 17 as the post-measurement (pre-measurement) forms of the YYe.

Observe that Equation 15 is obtained by setting to in Equation 9444In contrast, note that defined for solving the finite-dimensional Yau filter in Equation 14 in the aforementioned papers is equivalent to solving the robust DMZ equation for at each time instant.. It was proved that the solution of Equation 15 approximates very well the solution of the robust DMZ equation (Equation 9), i.e., it converges to in both pointwise sense and sense. Thus, solving Equation 9 is equivalent to solving Equation 17. Finally,


Thus, the solution of the YYe (as ) is the desired unnormalized conditional probability density.

Observe that when , it is simply the FPKfe. However, unlike the FPKfe, the YYe does not satisfy the current conservation condition, i.e., the right-hand term is not a total divergence. This means that it does not conserve probability. This fundamental difference is traced to the fact that the FPKfe evolves the normalized probability density (and preserves the normalization), while the YYe evolves the unnormalized conditional probability density. Therefore, this distinction is made between the two equations in this paper.

2.7 The Yau Algorithm

We may summarize the real-time algorithm, based on both the pre- and post-measurement forms of the YYe, of Yau as follows. Suppose measurements are available at times


We seek the solution , which is the solution of the robust DMZ equation. Let the initial distribution be . Then, evolve the initial distribution to the first measurement instant, , using the YYe:


The solution of equation 21 at time is . Note that is given by


Next, solve the YYe to the next measurement instant with initial condition , i.e.,


to obtain . In fact, for , can be computed from , where satisfies the equation


The initial condition in Equation 24 follows from noting that (since )


Thus the solution of the robust DMZ equation at the th step is given by


Note that the Yau algorithm is a recursive algorithm as it does not need any previous observation data. Furthermore, the YYe is independent of data and so can be computed off-line, and that the YYe is much simpler than the robust DMZ equation. Finally, note that the output of the Yau algorithm is the desired unnormalized conditional probability density.

2.8 Other Real-Time Algorithms

It is noted that there have been several other numerical methods proposed for solving the DMZ equation such as [35, 36, 37, 38]. Some numerical examples were presented in [39]. As noted in [39], most papers on the approximate solution of the DMZ equation deal with algorithms that are not (or, at least not easily) implementable. The most well-known of thsese “implementable” methods for the general filtering problem (rather than those applicable to only problems like finite-dimensional filters) are now briefly reviewed.

Recall that the DMZ SDE for the unnormalized conditional probability density is given by


where , i.e., is the adjoint of the infinitesimal generator of the state Markov process, or the forward Fokker-Planck-Kolmogorov (FPK) operator.

The simplest scheme is the explicit one-step Euler scheme:


Some other algorithms follow from observing that Equation 28 can be considered as the combination of a linear stochastic equation and the FPKfe:


The standard idea of operator splitting for solving deterministic higher dimensional PDEs can then be applied to this case. Since the linear stochastic equation can be solved explicitly, the following two splitting-up approximations result:


Here is the operator corresponding to the evolution via the FPK forward equation (FPKfe).

The spectral separation scheme utilizes the fact that the conditional probability density can be written as a two-tensor basis with one of them obeying an observation-independent FPKfe type of equation (deterministic Hermite-Fourier coefficients) and the other a Wick polynomial that are completely defined by the observation process (see [38] for details). This separation of parameters and observations also enables real-time implementation. It can be shown that it includes the explicit Euler one-step and splitting-up approximations.

Finally, it will be seen that the path integral formulas derived here can be used to solve the FPKfe arising in these techniques. In fact, this formula already arises in the investigation of the continuous-discrete filtering problem (see [40]). Note that in the Yau algorithm, the “prediction” part is accomplished via the YYe. In contrast, in the Splitting-up method, it is done using the FPKfe, as in continuous-discrete filtering theory.

2.9 Why the Yau Algorithm is the most suitable algorithm

The following two reasons explain why the Yau algorithm is superior to all other known algorithms:

Firstly, recall that the coefficients of the robust DMZ equation (Equation 9) contain the measurements. This means that the partial differential equation to solve for continuous-continuous filtering is not even known prior to the measurements. In other words, the robust DMZ equation has to be solved on-line; its solution cannot be pre-computed. In contrast, in the YYe (Equation 17), measurements are absent from the partial differential equation. The measurements only enter the initial condition at each measurement step. This implies that the YYe can be be solved off-line; there is no need for on-line solution of PDEs555This assumes that the measurement steps are equidistant or known.. This makes real-time solution feasible even if state dimension is large. Specifically, suppose the measurements are performed at equidistant time intervals. The set of all possible initial conditions may be expressed as a linear combination of basis functions. Then, one may precompute and store the solution to the YYe for the basis functions. Upon arrival of measurements, the initial condition incorporating the measurements is computed. This is projected onto the basis functions to ascertain the components. The solution (due to linearity of the PDE), till the next measurement, is then simply the sum of the computed off-line solutions for the basis functions.

Secondly, all other known algorithms assume the signal and measurement model drift were bounded. In fact, error bounds derived for these methods depended (often exponentially) on the bounds so that there are severe time-step restrictions. Note that these results cannot even cover the Kalman filter (i.e., linear drift) case, which is a major limitation in solving practical problems. In contrast, the results of [13] and [41] assume only that the drifts are nonlinear with linear growth. The errors are independent of the bounds on the signal or measurement model drift. In [33], this is further weakened: is is sufficient that the growth of the observation drift be greater than the growth of the signal model drift.

It will be seen that the Feynman path integral approach naturally leads to the Yau algorithm. In that sense, the Feynman path integral formulation gives a physical explanation why the Yau algorithm is the natural choice. In addition, the path integral formula explicitly solves the PDE.

2.10 Integral Formula for the Solution of the YYe

A solution of the YYe in terms of ordinary integrals was presented in [14], [15]. The formal solution derived is an infinite series in (for details see the reference [15]). In addition, an estimate of the time interval on which this solution converges was presented. For the purposes of this paper, it is sufficient to note that it has better numerical properties (than solving PDEs) since integration can be computed reliably, as opposed to partial differential equations. However, the integrand is rather complicated (e.g., convolutions) and it does not appear to be easily implementable for a general model. We shall see that the path integral formula is considerably simpler to evaluate.

3 Preliminaries

In this section, a brief review of relevant aspects of functional methods common in theoretical physics is presented. For more details on these topics, the reader is referred to [23].

3.1 A Brief Review of Functional Calculus

The infinite dimensional generalization of a function is called a functional, , which gives a number for each function . Loosely speaking, the functional derivative may be defined analogous to the ordinary derivative as follows:


Alternatively, the functional derivative, , of the functional with respect to variation of the function at is defined by the following equation


Note that the functional derivative of a functional is also a functional. Functional differentiation obeys the standard algebraic rules (linearity and Leibnitz’s rule):


Just as the derivative of with respect to is 1, the functional derivative of with respect to is the unit matrix in infinite dimensions, namely the delta function:


The functional derivatives of a formal power series in function can be seen to be analogous to the ordinary derivative of a power series in the variable . Specifically, it is straightforward to verify that if




In particular, a functional representable as a power series in functions, like the exponential functional, can be functionally differentiated using this result. Thus


The functional delta function may be written as


In the application at hand, determinants of operators (matrices) arise in Gaussian integration in the infinite (finite) dimesional case. In particular, the operator needs to be evaluated for the problem at hand, for some operator . It is assumed that traces of all powers of exist. From the identity


it follows that


Note that if , only the first term is nonzero.

Finally, recall the the sequence of identities from multivariable calculus


Then, it follows that




Using the Fourier integral representation of the delta function, it is clear that


The functional analogue of Equation 43 is


Finally, the Gaussian integral identity




3.2 Langevin Equation and Gaussian Measure

In the path integral formulation, as in physics literature, the continuous-continuous model is interpreted as


Such equations are called Langevin equations. Heuristically,


Gaussian noise process is represented by a path integral measure


where is a real vector for each .

It is straightforward to verify that the mean is zero


and the variance is 666This is also consistent with the previous discussion in terms of Itô calculus, i.e., implies that (54) :


and higher-order moments can be easily written down using Wick’s theorem for bosonic (commuting) variables. The results are in accord with the expectation that the measure represents a Gaussian process with these first two moments. Finally, implicit in the use of these formal technques is the use of the Feynman convention, or symmetric discretization for the drift.

3.3 Transition Probability Density and Signal and Measurement Ensemble Averages

Consider an ensemble of systems with state variables described by the Langevin equation. Due to the random noise, each system leads to a different vector that depends on time. Although only one realization of the stochastic process is ever observed, it is meaningful to speak about an ensemble average. For fixed times , the probability density of finding the random vector in the (dimensional)interval is given by


where is an dimensional column vector and denotes averaging with respect to the signal model noise . The complete information on the random vector is contained in the infinite hierarchy of such probability densities. The quantity of interest here is the conditional probability density


Now the process described by the Langevin equation with correlated Langevin force is a Markov process, i.e., the conditional probability density depends only on the value at the immediate previous time:


Hence the complete information for a Markov process is contained in the transition probabilities