Misspecified models

Analysis, detection and correction of misspecified discrete time state space models

Salima El Kolei ENSAI, UBL, Campus de Ker Lann, rue Blaise Pascal, BP 37203, 35172 Bruz cedex - France  and  Frédéric Patras Université Côte d’Azur
CNRS, UMR 7351
Parc Valrose
06108 NICE Cedex 2 - France
September 24, 2019
Abstract.

Misspecifications (i.e. errors on the parameters) of state space models lead to incorrect inference of the hidden states. This paper studies weakly nonlinear state space models with additive Gaussian noises and proposes a method for detecting and correcting misspecifications. The latter induce a biased estimator of the hidden state but also happen to induce correlation on innovations and other residues. This property is used to find a well-defined objective function for which an optimisation routine is applied to recover the true parameters of the model. It is argued that this method can consistently estimate the bias on the parameter. We demonstrate the algorithm on various models of increasing complexity.

Key words and phrases:
Keywords: Kalman filter, Extended Kalman filter, State space models, Misspecified models, Robust estimation

1. Introduction

This paper is concerned with the following family of discrete time state space models with additive Gaussian noises:

(1)

The variables are assumed to be independent standard normal variables, , (resp. ) are (resp. , with positive definite) matrices, and stands for the vector of parameters of the model. The functions are assumed to be differentiable. The hidden states (or unobserved signal process) take value in and the observations in . We also denote the noise covariance matrices , where stands for the transpose.

The aim of filtering is to make inference about the hidden state conditionally to the observations denoted thereafter. In order to do so, there are various ways to estimate the parameters that, in most situations of interest are unknown and have to be approximated. They may for example be estimated using standard techniques (MLEs…), or be incorporated to the set of random quantities to be estimated. To quote only one example in the recent literature, Particle Gibbs samplers have proven to be a good way to simulate the joint distribution of hidden processes and model parameters in hidden Markov chain models, see e.g. [Andrieu et al., 2010, Chopin and Singh, 2015, del Moral et al., 2016]

Here, we face a different problem: we consider the situation where has been uncorrectly estimated, for example using a given biased estimator such that (the way the estimator has been deviced is of no matter for our purposes). Our interest for these questions originated in the study of random volatility models such as Heston’s, where some parameters are difficult to estimate. We wanted to understand how errors on the model parameters could impact the volatility estimates. The detection of errors method that is the purpose of the present article first arose from statistical phenomena detected in numerical simulations. We realized soon that the phenomena were universal, and related to theoretical properties of mispecified models. Application domains include for example engineering and control where the parameters may be known at inception but may change to a new value , for example due to a mechanical problem, so that becomes a wrong value for the true model parameters. Detecting the change from to may then be useful not only to improve the inference process, but also to detect the underlying problem.

It is well-known that using such incorrect filter models deteriorates the filter performance and may even cause the filter to diverge. Various results have been obtained in the literature on the impact of on the estimator of the hidden state; error covariance matrices have been studied and compared with the covariance matrices of the conditional distribution of and knowing . These results are described in [Jazwinski, 2007], where the reader can also find a survey of the classical literature on the subject.

The aim of the present article is different: we want to take advantage of the theoretical properties of misspecified state space models, not only to understand the impact of on the estimation of the hidden states but also, ultimately, to use its statistical properties in order to get a correct set of parameters for the state space model.

The key result underlying our analysis is that misspecifications do certainly induce a biased estimator of the hidden state but also, and most importantly for our purposes, they happen to induce correlation on the innovations and other residues associated to observations. This property is used to find a well-defined objective function for which an optimisation routine is applied to recover the true parameters of the model. It is argued that this method can consistently estimate the bias on the parameter. The method is easy to implement and runs fast. We demonstrate the algorithm on various models of increasing complexity.

Discrete time state space models are notoriously ubiquitous; their use is discussed in most textbooks on filtering from the early [Kalman, 1960, Jazwinski, 2007, Sage and Melsa, 1971, Anderson and Moore, 1979] to the recent literature -we refer e.g. to [Durbin and Koopman, 2012] for a survey. Application domains of our results include, besides finance, control and engineering: ecology, economy, epidemiology, meteorology and neuroscience.

The paper is organized as follows. Section 2 presents the model assumptions and introduces various estimators and processes, including the “interpolation process” (Eq. (4)) that plays a central role in the article. Section 3 states the theoretical results. In section 4, we describe the method and in the following one demonstrate the algorithms on three examples: the first application is largely pedagogical and studies an elementary autoregressive linear model for which our approach can be easily understood. We move then to a nonlinear (square root) model, and, to conclude, apply our approach to a complex and nonlinear model, that is the Heston model, widely used in finance for option pricing and portfolios hedging. The behaviour of this last model when it comes to parameter estimation is notoriously difficult; our method behaves nevertheless quite satisfactorily. We compare finally our method and estimator (based on the interpolation process) with the estimator using the same strategy but based instead on innovations. Some concluding remarks are provided in the last section. The technical proofs are gathered in Appendix A and B.

The theoretical results on misspecified models underlying the constructions in this article were mostly obtained in the first Author’s PhD thesis [El-Kolei, 2012].

Notation: for any continuously differentiable function , denotes the vector of the partial derivatives of w.r.t .

2. The misspecified (Extended) Kalman Filter

In the linear case, the model (1) reads ():

(2)

If the vector of parameters is perfectly known, the optimal filtering is Gaussian and the Kalman filter gives exactly the two first conditional moments: and . In particular, the Kalman filter estimator is the BLUE (Best Linear and Unbiased Estimator) among linear estimators.

In most real applications, the linearity assumption of the functions and is not satisfied. A linearization by a first order Taylor series expansion can be performed and the Extended Kalman filter (EKF) consists in applying the Kalman filter on this linearized model. Concretely, for the EKF, the matrix is the derivative of the function with respect to (w.r.t.) computed at the point where . The matrix is the derivative of the function w.r.t. computed at the point and the functions and are defined as:

In this paper, we assume that the vector of parameters is not perfectly known, so that the inference of the hidden state conditionally to is made with a parameter , where stands for the error of specification. This case is frequent in practice since in general the vector of parameters is unknown and need to be estimated by an ordinary method. The resulting estimator can be biased and this bias is propagated on the estimation of the hidden state by the filter.

We now run the Kalman filter (resp. the EKF in the non linear case) with the misspecified model. The filter design reads therefore (take care that we still use the notations , … for the estimator of , its variance… but from now on the notation will refer to the estimators build using the biaised parameter )

(3)

with initial conditions , (recall that , ).

We also introduce the residues, called respectively the filter error, the innovation and the interpolation processes.

(4)

where

(5)

Notice in particular the introduction of the interpolation process, that we specifically designed for parameters error-tracking purposes.

3. Main result

The empirical and theoretical properties of the interpolation process (precisely, its auto-covariance) are the object of the present section. They will lead to propose a method to detect a misspecified model. Although the detection is useful in practice, we will give also a new method to approximate the bias and so to estimate the true parameter .

Let us consider first the linear case. Recall that, by assumption, the functions are differentiable. Recall also that if the vector of parameters is exactly known, the error a posteriori at time is given by the following formula:

where is the Kalman matrix which minimizes the variance matrix of the hidden state ( see [Kalman, 1960]).

The following Theorem gives the propagation of the error a posteriori and of for the Kalman Filter when is not exactly known.

Theorem 3.1.

Consider the model (2). If , then:

(6)

with:

(7)
(8)
(9)

Additionally, the interpolation process is equal to:

(10)

with:

(11)
Proof.

See Appendix (A). ∎

We note that the terms depending on : , and (resp. , and ) are the corrective terms coming from the bias of the parameters estimates and they do not appear when the model is well specified.
Besides, we can see in Eq.(6) that at time , the propagation of the state error depends on but also on the state variable . Notice in particular the term that contributes non trivially to the auto-correlation of the process ; this term is proportional to but, contrary to the other terms contribution to the expansion is not proportional to a filter error term (such as ) or to a noise term (such as ).

For linear and gaussian state space models we can express explicitely this auto-covariance for all and :

We can now express the auto-covariance of the interpolation processus .

Proposition 3.2.

Let defined in (10) and , we have, keeping leading contributions

where the various covariance terms can be computed explicitely.

The example of the computation of the most complex covariance term () is detailed in the Appendix B.

4. Parameter estimation: method

The main idea of the approach consists in minimizing empirically the auto-covariance between in order to reduce as far as possible the corrective terms that appear in the propagation equations (6) and (10). The results obtained on a variety of examples detailed later in the article show the meaningfulness of the approach.

Let us denote the following objective function:

where denotes the auto-covariance of the coordinate of the vector for the lag when model parameters are chosen to be (recall that we know only and want to estimate ).

We use as estimator the empirical covariance given by:

Definition 4.1.
(12)

where is the mean of the .

We will therefore minimize the following objective function

(13)

As we will see in the numerical application, the choice of the lag range has no strong impact on the results.

An estimator of the bias is obtained as

This means that is estimated in function of the tracking error .

5. Applications

5.1. Estimation of the linear AR(1) process

Let us consider the following autoregressive process:

(14)

where and . The noises and are supposed i.i.d. with centered and standard Gaussian law. The variances and are equal to and respectively.

We have run a Kalman filter estimation by assuming that the two parameters and are biased. We choose and to construct the function and we apply the minimization procedure to estimate the true parameter .
The Mean Squared Error (MSE) was used to measure the quality of the estimation of with (number of Monte Carlo simulations) equal to 100. The result is summarized in Table 5.1.

\captionof

tableMSE for for MC=100 with and .
0.907 2.97 MSE CPU (sec) 0.22

5.2. Estimation of a weakly Nonlinear model

Let us consider the following nonlinear model

(15)

where and . The noises and are supposed i.i.d. with centered and standard gaussian law. The variances and are equal to and respectively.

Since this model is nonlinear we apply an EKF estimation by assuming that the two parameters and are biased. For the initialisation we choose and to construct the function given in (13) and we apply the minimization procedure to estimate the parameter . The results are summarized in Table 5.2.

\captionof

tableMSE for for MC=100 with and . 4.99 0.0081 MSE CPU (sec) 0.23

5.3. Estimation of a strongly Nonlinear model: the Heston model

In 1993, Heston extends the Black-Scholes model by making the volatility parameter stochastic. More precisely, the volatility is modeled by a Cox Ingersoll Ross (CIR) process and the stock price follows the well-known Black-Scholes stochastic differential equation. The Heston stochastic volatility model is widely used in practice for option pricing. The reliability of the calibration of its parameters is important since a possible bias will be repercuted on the volatility estimates and, ultimately, on option prices and hedging strategies.

The model is given by

(16)

where and are two correlated standard Brownian motions such that , is the initial variance, the mean reversion rate, the long run variance and the volatility of variance. We set .

The volatility process is always positive and cannot reach zero under the Feller condition . Furthermore, under this assumption, the process has a Gamma invariant distribution with and .

5.3.1. Simulated Data

We sample the trajectory of the variance CIR with a time step day over days. Conditionally to this trajectory, we sample the trajectory of the logarithm stock price given by Itô’s formula and discretized by a classical Euler scheme.
For the CIR process we use the discrete time transition equation of a CIR process given by a a non-central chi-square distribution up to a constant:

where is the degree of freedom, is the parameter of non-centrality and

We assume that each day , the observation corresponds to nine call prices for different strikes . Here, of the stock prices and . The data length is days.

Then, the discrete time Heston model is given by the following nonlinear state space model with additive noises:

(17)

where the functions and (see [Duan and Simonato, 1995]) are given by

The call prices are computed by the Heston formula given in [Heston, 1993]. We assume that these prices are observed with Gaussian measurement error with zero mean and variance independent of . These measurement errors can reflect the presence of different prices (bid-ask prices, closing prices, human errors in data handling) in financial markets.
For the vector of parameters we choose which is consistent with empirical applications of daily data (see [Chen, 2007]) and the risk free interest rate is equal to 0.05.

5.3.2. Empirical detection of misspecified models

Since the Heston model is not linear, we run an EKF estimation of by assuming for convenience that only the coordinate of the estimator of denoted by is biased, the others are equal to for .
For each parameter , we represent the autocorrelation of the interpolation process .
For each parameter of the Heston model, we note a presence of correlation of the interpolation process when the model is misspecified (see Figures 5.3.2 up to 5.3.2). We can also remark that this correlation is more important for the mean speed reversion parameter and for the long run variance .

\captionof

figureParameter : Autocorrelation of of the EKF estimation with .

\captionof

figureParameter : Autocorrelation of of the EKF estimation with .

\captionof

figureParameter : Autocorrelation of of the EKF estimation with .

\captionof

figureParameter : Autocorrelation of of the EKF estimation with .

Furthermore, in order to illustrate the behaviour of the autocorrelation with respect to the bias we apply an EKF estimation by considering the three following cases (only for the speed mean reversion parameter, the conclusion is the same for the others parameters): a) (that is the model is well-specified) ; b) ; c) . The comparison is illustrated in Figure 5.3.2. As expected, we observe that no correlation appaers when the model is well-specified (that is in Case a) and in return when a bias is introduced a correlation of the interpolation process appears and most importantly this correlation growths with the bias (see Figure 5.3.2 Case b and c).

\captionof

figure Autocorrelation of the for the three cases. Top: Case a. Bottom Left: Case b. Bottom Right: Case c.

5.3.3. Parameter estimation

In Figure 5.3.3, we represented the objective function defined in (13) with respect to the parameters of the Heston model. We represent only for the long run variance parameter since for the others parameters the result is the same. We can see that the function is minimal for the true value of , that is (see Figure 5.3.3).

\captionof

figureFunction with respect to the parameter and .

5.3.4. Estimation of the long run variance in the Heston model

In a first step, we have run an EKF estimation by assuming that only the long run variance parameter is biased. We choose and we recall that its true value is . The number of observations used for the construction of the function given in (13) is here and we apply the minimization procedure to recover the parameter .
The MSE was used to measure the quality of the estimation of the parameter with equal to 50. The results are summarized in Table 5.3.4 and Figure 5.3.4.

\captionof

tableMSE for for MC=50 with and . MC 50 0.03 MSE CPU (sec) 300

\captionof

figureBoxplot of the estimation of for and .

5.3.5. Sensibility w.r.t the lag

In order to see the impact of the lag on the autocorrelation, we have run our approach for different lags and compute the MSE (with MC=50). We note (see Table 5.3.5) that the choice of has not a strong impact on the results. Hence, for the next numerical application we choose equal to .

\captionof

tableMSE for for MC=50 for different lags and . 2 6 8 10 0.03 0.03 0.03 0.03 MSE 2.5554e-09 2.3431e-09 2.7503e-09

In Table 5.3.5, we illustrate the MSE for different number of observations, up to . We see that the estimation is very bad for a small which is not suprising since in this case the empirical estimator of is not consistent. As we expect, the MSE decreases with the number of observations.

\captionof

tableMSE for for MC=50 and different numbers of observations and . 20 30 50 70 80 90 100 110 0.0018 0.0224 0.0274 0.0277 0.0279 0.0293 0.03 0.03 MSE 1.46e-04 2.23e-05 1.82e-05 1.53e-05 5.09e-06 2.34e-09 2.31e-09

5.3.6. Estimation of the Heston model

In this part, we want to estimate all parameters of the Heston model. So, we consider that all parameters are biased with different bias (see on Table 5.3.6 and 5.3.6) and that the true parameter is given by .

\captionof

tableEstimation of for MC=1, and . 3.7809 0.0250 0.4294 -0.5498 3.9671 0.0301 0.4000 -0.4774

\captionof

tableEstimation of for MC=1, and . 3.7853 0.0250 0.4309 -0.5514 3.9950 0.0302 0.4107 -0.4858

In Table 5.3.6 we repeat our procedure of estimation with MC equal to . We note that our approach leads to estimate simultaneously all parameters. We also note that the long run variance parameter and the speed mean reversion parameter are easier to estimate than the others parameters. Furthermore, we have seen in Figures 5.3.2 and 5.3.2 of Section 5.3.2 that the correlation was more important for these two parameters.

\captionof

tableEstimation of for MC=50, and . 3.9970 0.0302 0.4087 -0.4836 MSE 6.8984e-05 5.0153e-05 8.9008e-04 4.1223e-04

For , the choice of the initial condition for is with and where stands for the centered and standard gaussian law.

6. Comparison with the use of standard innovations


Methods for detection of departures from optimality are usually based on the innovation process . Performance analysis of Kalman filters based on the innovation was introduced in [Wei et al., 1991]. In their papers, the authors propose a test based on the innovations for fault detection and a two-step Kalman filtering procedure to estimate the parameters. Let us mention also [Grewal and Andrews, 2015] where in page 370 the authors give a short discussion on detecting unmodeled state dynamics by Fourier analysis of the filter innovations.

In this part, we compare our minimisation routine (13) with the analogous minimisation routine when one replaces the interpolation process with the innovation process in order to estimate the parameters.
For this comparison we use the three models defined in the previous section and assume that only one parameter is biased for each model.
We note that the MSE is significantly smaller when one uses the interpolation process instead of the standard innovations and most importantly using the interpolation process to correct the bias is better for complex models with nonlinear effects. The results are summarized in Table 6.

\captionof

tableMSE: comparison with standard innovations: MC=50, and (In bold: the parameter that we biased. In gray: the smallest MSE.) Gaussian model: Nonlinear model: Heston model:

7. Conclusion

In this paper, we propose a new approach to detect and estimate parameters of weakly nonlinear hidden states models. These models are supposed to be misspecified due to the choice of uncorrect parameters. We propose to exploit the autocorrelation of a suitably defined interpolation process based on the estimate of the hidden state with the biased parameters. We vary then the model parameters around the initial misspecified value and apply an optimization procedure to minimize the auto-covariance of this process. We show that this approach leads to detect misspecified models and to estimate the parameters. The computing time is fast and the implementation is easy. Furthermore, we note that the autocorrelation lag parameter has not a strong impact on the results. All results are illustrated on various models of increasing complexity and in particular on the Heston model widely used in practice for portfolio hedging.

Appendix A Proof of Theorem 3.1:


The proof is essentially based on a first order Taylor expansion of the functions and with respect to . We have

where we used Since , and similarly for the other functions of , we get

(18)

Furthermore,

so that:

Rewriting

we get:

Define,

we obtain:

By combining Eq.(18) and Eq.(A), we have:

where,

One can deduce the Propagation of the interpolations (or residues a posteriori):