State Sensitivity Evaluation within UD based Array Covariance Filters
This technical note addresses the UD factorization based Kalman filtering (KF) algorithms. Using this important class of numerically stable KF schemes, we extend its functionality and develop an elegant and simple method for computation of sensitivities of the system state to unknown parameters required in a variety of applications. For instance, it can be used for efficient calculations in sensitivity analysis and in gradient-search optimization algorithms for the maximum likelihood estimation. The new theory presented in this technical note is a solution to the problem formulated by Bierman et al. in , which has been open since 1990s. As in the cited paper, our method avoids the standard approach based on the conventional KF (and its derivatives with respect to unknown system parameters) with its inherent numerical instabilities and, hence, improves the robustness of computations against roundoff errors.
Linear discrete-time stochastic state-space models, with associated Kalman filter (KF), have been extensively used in practice. Application of the KF assumes a complete a priori knowledge of the state-space model parameters, which is a rare case. As mentioned in , the classical way of solving the problem of uncertain parameters is to use adaptive filters where the model parameters are estimated together with the dynamic state. This requires determination of sensitivities of the system state to unknown parameters. Other applications with similar requirements arise, for instance, in the field of optimal input design  etc.
Straight forward differentiation of the KF equations is a direct approach to compute the state sensitivities to unknown parameters. This leads to a set of vector equations, known as the filter sensitivity equations and a set of matrix equations, known as the Riccati-type sensitivity equations. The main disadvantage of the standard approach is the problem of numerical instability of the conventional KF (see, for instance, discussion in ). The alternative approach can be found in, so-called, square-root (SR) algorithms developed to deal with the problem of the numerical instability. Among all existing implementation methods the array SR filters are currently the most preferable for practical implementations. Such methods have the property of better conditioning and reduced dynamical range. They also imply utilization of numerically stable orthogonal transformations for each recursion step that leads to more robust computations (see  for an extended explanation). Since the appearance of the KF in 1960s a number of array SR filters have been developed for the recursive KF  and smoothing formulas . Recently, some array SR methods have been designed for the estimation in . Here, we deal with the array UD factorization based filters; see . They represent an important class of numerically stable KF implementation methods.
For the problem of the sensitivities computation, the following question arises: would it be possible to update the sensitivity equations in terms of the variables that appear naturally in the mentioned numerically favored KF algorithms? The first attempt to answer this question belongs to Bierman et al. In  the authors proposed an elegant method that naturally extends the square-root information filter (SRIF), developed by Dyer and McReynolds , on the case of the log likelihood gradient (Log LG) evaluation. Later this approach was generalized to the class of covariance-type filters in . However, the problem of utilizing the UD based KF algorithms to generate the required quantities has been open since 1990s. In this technical note we propose and justify an elegant and simple solution to this problem. More precisely, we present a new theory that equips any array UD based KF algorithm with a means for simultaneous computation of derivatives of the filter variables with respect to unknown system parameters. As in , this avoids implementation of the conventional KF (and its direct differentiation with respect to unknown system parameters) because of its inherent numerical instability and, hence, we improve the robustness of computations against roundoff errors. The new results can be used, e.g., for efficient calculations in sensitivity analysis and in gradient-search optimization algorithms for the maximum likelihood estimation of unknown system parameters.
2Problem statement and the Conventional Kalman Filter
the discrete-time linear stochastic system
where and are, respectively, the state and the measurement vectors; is a discrete time, i.e. means . The process noise, , and the measurement noise, , are Gaussian white-noise processes, with covariance matrices and , respectively. All random variables have known mean values, which we can take without loss of generality to be zero. The noises , and the initial state are taken from mutually independent Gaussian distributions.
The associated KF yields the linear least-square estimate, , of the state vector given the measurements that can be computed as follows :
where , are innovations of the KF, and , , . The error covariance matrix
satisfies the difference Riccati equation
In practice, the matrices characterizing the dynamic model are often known up to certain parameters. Hence, we move to a more complicated problem. Assume that system (Equation 1)–( ?) is parameterized by a vector of unknown system parameters that needs to be estimated. This means that the entries of the matrices , , , , and are functions of . For the sake of simplicity we will suppress the corresponding notations below, i.e instead of , , etc. we will write , , etc.
Solving the parameter estimation problem by the method of maximum likelihood requires maximization of the likelihood function (LF) with respect to unknown system parameters. It is often done by using the gradient approach
3UD Based Array Covariance Filter
Notations to be used: Let denotes a diagonal matrix; and are, respectively, unit upper and lower triangular matrices; and are, respectively, strictly upper and lower triangular matrices. We use Cholesky decomposition of the form , where is an upper triangular matrix. For convenience we will write , and implies the partial derivative of the matrix with respect to the th component of (we assume that the entries of are differentiable functions of a parameter ). Besides, we use modified Cholesky decomposition of the form .
The first UD based filter was developed by Bierman . Later, Jover and Kailath  have presented the advantageous array form for the UD filters. In this technical note we use the UD based array covariance filter (UD-aCF) from . First, one have to set the initial values , and use the modified Cholesky decomposition to compute the factors , , . Then, we recursively update the as follows (): given a pair of the pre-arrays
apply the modified weighted Gram-Schmidt (MWGS) orthogonalization  of the columns of with respect to the weighting matrix to obtain a pair of the post-arrays
where , , is the MWGS transformation that produces the block upper triangular matrix and diagonal matrix .
The state estimate can be computed as follows:
where , .
Instead of the conventional KF, which is known to be numerically unstable, we wish to utilize stable UD-aCF filter presented above to compute the Log LF:
where is -step measurement history and the innovations, , , are generated by the discrete-time KF.
One can easily obtain the expression for Log LF (Equation 5) in terms of the UD-aCF variables:
Let denote the vector of parameters with respect to which the likelihood function is to be differentiated. Then from (Equation 6), we have
Taking into account that the matrix is diagonal and using Jacobi’s formula, we obtain the expression for the Log LG evaluation in terms of the UD-aCF variables ():
Our goal is to compute Log LF (Equation 6) and Log LG (Equation 7) by using the UD-aCF variables. As can be seen, the elements and are readily available from UD based filter (Equation 2)–(Equation 4). Hence, our aim is to explain how the last two terms, i.e. and , , can be computed using quantities available from the UD-aCF algorithm.
In this section, we present a simple and convenient technique that naturally augments any array UD based filter for computing derivatives of the filter variables. To begin constructing the method, we note that each iteration of the UD based implementation has the following form: given a pair of the pre-arrays , compute a pair of the post-arrays by means of the MWGS orthogonalization, i.e.
where , and is the MWGS transformation that produces the block upper triangular matrix . The diagonal matrices , satisfy and (see  for an extended explanation).
For the sake of simplicity we transpose the first equation in (Equation 8) to obtain . As shown in , the matrix can be represented in the form of where is the matrix with orthonormal columns, i.e. , and is an identity matrix. Next, we define and note that . The mentioned matrices , exist since the , are invertible. Indeed, the diagonal matrix is a positive definite matrix, i.e. , and satisfies (see  for further details).
Multiplication of both sides of the equality by the matrix and, then, their differentiation with respect to yield
Now, we consider the product in (Equation 9). For any diagonal matrix , we have , and .
Taking into account that , we further obtain
Next, we study the last term in (Equation 10). First, taking into account that is a diagonal matrix, we derive
Thus, the above formula and an equality yield
Furthermore, we show that the term required in (Equation 10) is a skew symmetric matrix. For that, we differentiate both sides of the formula with respect to and arrive at , or in the equivalent form
The latter implies that the matrix is skew symmetric and can be presented as a difference of two matrices, i.e. where is a strictly upper triangular matrix.
Next, from and , we derive . Further multiplication of both sides of (Equation 12) by yields
Now, let us discuss equation (Equation 13) in details. We note that the term in (Equation 13) is a full matrix. Hence, it can be represented in the form of where , and are, respectively, strictly lower triangular, diagonal and strictly upper triangular parts of . Next, the matrix product in (Equation 13) is a symmetric matrix and, hence, it has the form where and are, respectively, diagonal and strictly upper triangular parts of . Thus, equation (Equation 13) can be represented in the following form:
Next, we note that the left-hand side matrix in (Equation 14), i.e. the matrix , is a strictly lower triangular (since is a unit lower triangular matrix). Hence, the matrix on the right-hand side of (Equation 13) should be also a strictly lower triangular matrix. In other words, the strictly upper triangular and diagonal parts of the matrix on the right-hand side of (Equation 14) should be zero. Hence, the formulas , imply that
Clearly, the first equation in (Equation 15) is exactly the second formula in ( ?). Eventually, the substitution of both formulas in (Equation 15) into (Equation 14) validates the first relation in ( ?). More precisely, it results in , and the latter formula means that where stands for . This completes the proof of Lemma ?.
We see that the proposed computational procedure utilizes only the pre-arrays , , their derivatives with respect to the unknown system parameters, the post-arrays , and the MWGS orthogonal transformation in order to compute the derivatives of the post-arrays.
5UD Filter Sensitivity Evaluation
Further, we suggest a general computation scheme that naturally extends any array UD based filtering algorithm to the above-mentioned derivative evaluation. We stress that our method allows the filter and Riccati-type sensitivity equations to be updated in terms of stable array UD filters.
5.1Summary of Computations. Extended UD based KF scheme.
6.1Simple Test Problem
First, we would like to check our theoretical derivations presented in Lemma 1. To do so, we apply the proposed UD based computational scheme to the following simple test problem.
Since the pre-arrays and are fixed by ( ?), we skip Steps 0–3 in the above-presented computational scheme. We also skip steps 8 and 17, because we do not need to compute the state estimate (and its derivatives) in this simple test problem. Besides, the unknown parameter is a scalar value, i.e. . Next, we show the detailed explanation of only one iteration step, i.e. we set in Step 4. The obtained results are summarized in Table 1. As follows from (Equation 8), we have . Thus, the derivatives of both sides of the latter formula must also agree. We compute the norm . This confirms the correctness of the calculation and the above theoretical derivations presented in Lemma 1.
6.2Application to the Maximum Likelihood Estimation of Unknown System Parameters
The computational approach presented in this paper can be used for the efficient evaluation of the Log LF and its gradient required in gradient-search optimization algorithms for the maximum likelihood estimation of unknown system parameters. To demonstrate this, we apply the proposed UD based algorithm to the problem from aeronautical equipment engineering.
In our simulation experiment we assume that the true value of is . We compute the negative Log LF and its gradient by the proposed array UD based scheme and then compare the results to those produced by the classical, i.e. the conventional KF approach. The computations are done on the interval with the step . The outcomes of these experiments are illustrated by Fig. ?. As can be seen from Fig. ?, all algorithms for the gradient evaluation produce exactly the same result and give the same zero point. Besides, the zero point coincides with the minimum point of the negative Log LF; see Fig. ?. Furthermore, it is readily seen that the obtained maximum likelihood estimate, , coincides with the true value . All these evidences substantiate our theoretical derivations in Sections ?, ?.
6.3Ill-Conditioned Test Problems
Although we learnt from the previous examples that both methods, i.e. the conventional KF approach and the proposed array UD based scheme, produce exactly the same results, numerically they no longer agree. To illustrate this, we consider the following ill-conditioned test problem.
When , Example ? coincides with well-known ill-conditioned test problem from . The difficulty to be explored is in the matrix inversion. As can be seen, although , for any fixed value of the parameter , the matrices and are ill-conditioned in machine precision, i.e. as . This leads to a failure of the conventional KF approach and, as a result, destroys the entire computational scheme. So, we plan to observe and compare performance of the “conventional KF technique” and our new “array UD based approach” in such a situation. Additionally, we compare these techniques with the “array SR based approach” presented in . Also, we have to stress that Example ? represents numerical difficulties only for the covariance-type implementations. As a result, the SRIF based method  can not be used in our comparative study because of its information-type. This follows from  where numerical insights of various KF implementations are analyzed and discussed at large.
In order to judge the quality of each above-mentioned computational technique, we conduct the following set of numerical experiments. Given the “true” value of the parameter , say , the system is simulated for samples for various values of the problem conditioning parameter . Then, we use the generated data to solve the inverse problem, i.e. to compute the maximum likelihood estimates by the above-mentioned three different approaches. The same gradient-based optimization method with the same initial value of is applied in all estimators. More precisely, the optimization method utilizes the negative Log LF and its gradient that are calculated by the examined techniques. All methods are run in MATLAB with the same precision (64-bit floating point). We performed Monte Carlo simulations and report the posterior mean for , the root mean squared error (RMSE) and the mean absolute percentage error (MAPE).
Having carefully analyzed the obtained results presented in Table ?, we conclude the following. When , all the considered approaches produce exactly the same result. The posterior means from all the estimators are all close to the “true” value, which is equal to . The RMSE and MAPE are equally small. Hence, we conclude that all three methods work similarly well in this well-conditioned situation. However, as , which corresponds to growing ill-conditioning, the conventional KF approach degrades much faster than the array UD and SR based approaches. Already for , the RMSE [MAPE] of the classical technique (i.e. the “differentiated” KF) is times [ times] greater that of the RMSE [MAPE] from the array UD and SR methods. For , the output generated by the “differentiated” KF is of no sense because of huge errors. On the other hand, the new array UD based scheme and the previously proposed array SR method work robustly, i.e. with small errors, until , inclusively. Besides, we may note that both the array UD and SR approaches work equally well. Indeed, the RMSE and MAPE of these techniques are approximately the same. Moreover, their results change slightly for the above ’s (see the last two panels in Table ?).
In conclusion, the “differentiated” KF performs markedly worse compared to the proposed array UD scheme (and previously known array SR approach). Moreover, both the UD and SR based techniques work equally well on the considered test problems. This creates a strong argument for their practical applications. However, further investigation and comparative study have to be performed to provide a rigorous theoretical and numerical analysis of the existing methods for the filter sensitivity computations. These are open questions for a future research. Here, we just mention that the derived UD based method requires utilization of the MWGS orthogonal transformation while the array SR based approach uses any orthogonal rotation. Thus, the latter technique is more flexible in practice, but might produce less accurate answers for some problems because of its flexibility.
- More about computational aspects of maximum likelihood estimation, different gradient-based nonlinear programming methods and their applicability to maximum likelihood estimation could be found in .
- G. J. Bierman, M. R. Belzer, J. S. Vandergraft, and D. W. Porter, “Maximum likelihood estimation using square root information filters,” IEEE Trans. Automat. Contr., vol. 35, no. 12, pp. 1293–1298, Dec. 1990.
- S. Särkkä and A. Nummenmaa, “Recursive noise adaptive kalman filtering by variational bayesian approximation,” IEEE Trans. Automat. Contr., vol. 54, no. 3, pp. 596–600, Mar. 2009.
- R. K. Mehra, “Optimal input signals for parameter estimation in dynamic systems – survey and new results,” IEEE Trans. Automat. Contr., vol. AC-19, no. 6, pp. 753–768, Dec. 1974.
- N. K. Gupta and R. K. Mehra, “Computational aspects of maximum likelihood estimation and reduction in sensitivity function calculations,” IEEE Trans. Automat. Contr., vol. AC-19, no. 6, pp. 774–783, Dec. 1974.
- M. Verhaegen and P. Van Dooren, “Numerical aspects of different kalman filter implementations,” IEEE Trans. Automat. Contr., vol. AC-31, no. 10, pp. 907–917, Oct. 1986.
- T. Kailath, A. H. Sayed, and B. Hassibi, Linear Estimation.1em plus 0.5em minus 0.4emNew Jersey: Prentice Hall, 2000.
- M. Grewal and A. Andrews, Kalman filtering: theory and practice. 1em plus 0.5em minus 0.4emNew Jersey: Prentice Hall, 2001.
- P. G. Kaminski, A. E. Bryson, and S. F. Schmidt, “Discrete square-root filtering: a survey of current techniques,” IEEE Trans. Automat. Contr., vol. AC-16, no. 6, pp. 727–735, Dec. 1971.
- M. Morf, G. Sidhu, and T. Kailath, “Some new algorithms for recursive estimation in constant, linear discrete-time systems,” IEEE Trans. Automat. Contr., vol. AC-19, no. 4, pp. 315–323, Aug. 1974.
- M. Morf and T. Kailath, “Square-root algorithms for least-squares estimation,” IEEE Trans. Automat. Contr., vol. AC-20, no. 4, pp. 487–497, Aug. 1975.
- P. Park and T. Kailath, “New square-root algorithms for alman filtering,” IEEE Trans. Automat. Contr., vol. 40, no. 5, pp. 895–899, May 1995.
- A. H. Sayed and T. Kailath, “Extended handrasekhar recursion,” IEEE Trans. Automat. Contr., vol. AC-39, no. 3, pp. 619–622, Mar. 1994.
- P. Park and T. Kailath, “Square-root ryson-razier smoothing algorithm,” IEEE Trans. Automat. Contr., vol. 40, no. 4, pp. 761–766, Apr. 1995.
- ——, “Square-root smoothing algorithm,” Int. J. Control, vol. 62, no. 5, pp. 1049–1060, Nov. 1995.
- ——, “New square-root smoothing algorithms,” IEEE Trans. Automat. Contr., vol. 41, no. 5, pp. 727–732, May 1996.
- R. G. Gibbs, “Square root modified ryson-razier smoother,” IEEE Trans. Automat. Contr., vol. 56, no. 2, pp. 452–456, Feb. 2011.
- B. Hassibi, T. Kailath, and A. H. Sayed, “Array algorithms for estimation,” IEEE Trans. Automat. Contr., vol. 45, no. 4, pp. 702–706, Apr. 2000.
- G. J. Bierman, Factorization Methods For Discrete Sequential Estimation.1em plus 0.5em minus 0.4emNew York: Academic Press, 1977.
- P. Dyer and S. McReynolds, “Extensions of square root filtering to include process noise,” J. Opt. Theory Appl., vol. 3, no. 6, pp. 444–459, Jun. 1969.
- M. V. Kulikova, “Likelihood gradient evaluation using square-root covariance filters,” IEEE Trans. Automat. Contr., vol. 54, no. 3, pp. 646–651, Mar. 2009.
- J. M. Jover and T. Kailath, “A parallel architecture for kalman filter measurement update and parameter estimation,” Automatica, vol. 22, no. 1, pp. 43–57, 1986.
- A. Björck “Solving least squares problems by orthogonalization,” BIT, vol. 7, pp. 1–21, 1967.
- C. Broxmeyer, Inertial Navigation Systems.1em plus 0.5em minus 0.4emBoston: McGraw-Hill Book Company, 1956.