Asymptotic Analysis of Semi-Parametric
Weighted Null-Space Fitting Identification
Standard system identification methods often provide biased estimates with closed-loop data. With the prediction error method (PEM), the bias issue is solved by using a noise model that is flexible enough to capture the noise spectrum. However, a too flexible noise model (i.e., too many parameters) can cause additional numerical problems for PEM. In this paper, we perform a theoretical analysis of the weighted null-space fitting (WNSF) method when a parametric noise model is not estimated. With this method, the system is first captured using a non-parametric ARX model, which is then reduced to a parametric model of interest. In the reduction step, a noise model does not need to be estimated if it is not of interest. In open loop, this still provides asymptotically efficient estimates of the dynamic model. In closed loop, the estimates are consistent, and their covariance is optimal for a non-parametric noise model. In this paper, we prove these results, which require additional technical details compared with the case with a full parametric model structure. In particular, we use a geometric approach for variance analysis, deriving a new result that will be instrumental to our end. Finally, we use a simulation study to illustrate the benefits of the method when the noise model cannot be parametrized by a low-order model.
The prediction error method (PEM) is a benchmark for estimation of linear parametric models. If the model orders are correct and the noise is Gaussian, PEM with a quadratic cost function is asymptotically efficient [ljung99]: the asymptotic covariance of the estimates is the Cramér-Rao (CR) bound—the lowest covariance attainable by a consistent estimator.
Two models can typically be distinguished in a parametric model structure: the dynamic model and the noise model. Because the noise sequence is often the result of different noise contributions aggregated in a complex or even intractable manner, the concept of a “correct order” for the noise model is often inappropriate in practice. If the noise-model order is chosen too small with PEM, the dynamic-model estimate will be consistent in open loop, but biased in closed loop.
The bias issue with closed-loop data is not exclusive of PEM. Instrumental variable methods [ivbook_soderstrom_stoica] require the reference signal to construct the instruments in closed loop [cliv, clriv]. For classical subspace methods [verdew92, vanodem94], the bias issue in closed loop has been overcome by more recent algorithms [verhaegen:closedloop, jansson03].
With PEM, the bias issue can in theory be solved by letting the noise model structure be more flexible (i.e., letting the number of parameters become very large), guaranteeing that the noise spectrum is captured. If the global minimum of the PEM cost function is found, in open loop this will asymptotically not affect the dynamic-model estimates; in closed loop, consistency is attained but not efficiency. The problem is that, because the noise model might require many parameters, the PEM cost function will potentially have more local minima, thus aggravating the numerical search for the global minimum.
Some methods use a semi-parametric approach, estimating a non-parametric noise model estimated in a first step, which is then used in a second step to estimate the dynamic model. Possible approaches have been delineated both in the frequency [freqdom_nonparH, Pintelon2010a, Pintelon2010b] and time domains [zhu_book, bjsm, MORSM].
The weighted null-space fitting (WNSF) method [galrinho14] also first estimates a non-parametric model, and then reduces it to a full parametric model (i.e., dynamic and noise models). Moreover, WNSF does not apply non-linear optimization techniques, but uses weighted least squares iteratively. In this sense, the method can be seen as belonging to the family of iterative least-squares methods. These methods date back to [sanathanankoerner], and have later been denoted as iterative quadratic maximum likelihood (IQML) methods, with applications to filter design [evansfischl73, bresler86] and identification of dynamical systems [shaw94, shawmisra94, lemmerlingettal01]. The Steiglitz-McBride method [stmcb_original] belongs also to this class of methods, being equivalent to IQML for an impulse-input case [mcclellanlee91]. However, unlike in the aforementioned identification works, WNSF is asymptotically efficient in one iteration in open and closed loop for Box-Jenkins systems [wnsf_tac].
Instead of simultaneously estimating the dynamic and noise models, WNSF also allows to disregard the parametric noise model, reducing the non-parametric model estimate to obtain a parametric dynamic model only. This guarantees asymptotically efficient estimates in open loop, and consistent estimates in closed loop with optimal asymptotic covariance when an infinite-order noise model is used. This distinguishes this method from [bjsm, MORSM], which are developed for open loop. The asymptotic properties of the proposed method correspond to the asymptotic properties of PEM with an infinite-order noise model [Forssell_cl] in both open and closed loop, but performed with a robust numerical procedure.
In [wnsf_tac], an extensive study of the fully parametric WNSF method has been conducted, including theoretical analysis and simulations. The case where WNSF is used with no parametric noise-model estimate—denoted semi-parametric WNSF—has been addressed and illustrated in [wnsf_tac], but a formal proof of the asymptotic properties has not been provided. In this paper, we perform the analysis of this setting, which has additional difficulties because of the non-squared dimension of some matrices. For this purpose, we use the geometric approach of [geometric_jonas], extending the results therein to the setting of our problem. Thus, our contributions are: first, we derive a result for variance analysis in system identification; second, we use this result to perform a theoretical analysis of the semi-parametric WNSF method, proving consistency and deriving the asymptotic covariance of the estimates in open and closed loop. This analysis is complemented with a simulation study to illustrate the benefits of the method.
Essentially, the same notation, definitions, and assumptions used in [wnsf_tac] apply to this paper. For convenience, we reproduce the notation and assumptions here; for the definitions of -stability and -quasi-stationarity, see [ljung&wahlberg92, wnsf_tac].
, with an vector.
, with a matrix and a vector of appropriate dimensions.
(i.e., the Frobenius norm), with a matrix.
, with a transfer matrix.
denotes any constant, which need not be the same in different expressions.
, where is the backward time-shift operator.
is the complex conjugate transpose of the matrix .
is the lower-triangular Toeplitz matrix of size () with first column , where . The dimension may be infinity, denoted .
denotes expectation of the random vector .
means that the function tends to zero at a rate not slower than , as , w.p.1.
, where and are transfer matrices of appropriate sizes.
Assumption 1 (True system and parametric model).
The system has scalar input , scalar output , and is subject to scalar noise . These signals are related by
where and are rational functions according to
The transfer functions , , and are assumed to be stable. The polynomials and —as well as and —do not share common factors.
We parametrize as
is the parameter vector to estimate, with known orders and . The noise model orders and are not known.
Assumption 2 (Noise).
The noise sequence is a stochastic process that satisfies
Assumption 3 (Input).
The input sequence is defined by under the following conditions.
The sequence is assumed to be independent of , -quasi-stationary with , and uniformly bounded: , .
Let the spectral factorization of the power spectral density of be . Then, is assumed to be -stable, with .
The closed loop system is -stable with .
The feedback transfer function is bounded on the unit circle.
The spectral density of the process is bounded from below by the matrix , (this implies an informative experiment).
Operation in open loop is obtained by taking .
Alternatively to (1), the true system can be written as
are stable (Assumption 1). In an intermediate step, WNSF estimates truncated and , using the ARX model
Because the order needs to be infinite for the system to be in the ARX model set, we make the model order depend on the sample size —denoted —according to the following assumption.
Assumption 4 (ARX-model order).
The ARX model order is selected according to:
, as ;
, for some , as .
Ii-C Prediction Error Method
The prediction error method minimizes a cost function of the prediction errors
where is some noise model parametrization, function of a parameter vector . Using a quadratic cost function, which is optimal when the noise sequence is Gaussian, the PEM estimate of the parameters is obtained by minimizing
where is the sample size.
Let be such that there exists for which . Denoting by the parameter vector that (together with some ) minimizes (12), the estimate is asymptotically distributed as [ljung99]
where stands for the Gaussian distribution. The covariance matrix satisfies
with (arguments often omitted for notational simplicity)
and the spectrum of
where is the sensitivity function.
In open loop (in which case is simply the input spectrum), , and it corresponds to the CR bound. In closed loop, when the number of parameters in tends to infinity. In this case, does not correspond to the CR bound, but to the best possible covariance (from a prediction error perspective) with a non-parametric noise model [Forssell_cl].
The interest of estimating a non-parametric noise model in closed loop is that even if the noise spectrum needs to be captured by a high-order model, it will still be possible to obtain an estimate of the dynamic model . However, estimating a non-parametric noise model simultaneously with a parametric dynamic model with PEM is not realistic. The reason is that, as the number of parameters in increases, the prediction error (11) becomes a more complicated function of , and finding the global minimum of the non-convex cost function (12) is a more difficult task. Consequently, the result that PEM with a non-parametric noise model provides estimates with covariance corresponding to may not always be useful in practice. With WNSF, this setting can be handled without increasing the difficulty of the problem.
Iii Semi-Parametric Weighted Null-Space Fitting
The WNSF method consists of three steps [wnsf_tac]. First, we estimate a non-parametric ARX model, with least squares. Second, we reduce this estimate to a parametric model, with least squares. Third, we re-estimate the parametric model, with weighted least squares. We now detail the procedure for each step, without estimating a parametric noise model.
For the first step, consider (8) in the regression form
Then, the least-squares estimate of is obtained by
The asymptotic distribution of the estimates is [ljung&wahlberg92]
Because we are not interested in estimating a parametric noise model, we do not consider the part of (7) from where such a model could be obtained. By convolution, (23) can be written in matrix form as
Motivated by (24), we replace by its estimate (and the same for , which is a part of ) and obtain an estimate of with least squares:
For the third step, we re-estimate taking into account the errors in . As is replaced by in (24), the residuals are given by where
The variance of the estimate is then minimized by solving a weighted least-squares problem, where the weighting is the inverse of the covariance of the residuals. Although this covariance is dependent on the true parameters, a consistent estimate is available by
and the estimate obtained in this step is thus given by
The algorithm may be summarized as follows.
Optionally, we may continue to iterate, potentially improving the estimation quality for finite sample size. However, we show in the next section that Algorithm 1 has the same asymptotic properties as PEM with an infinite-order noise model. However, WNSF estimates the non-parametric noise model in a separate step, as part of an ARX model, which is linear in the model parameters. Thus, it does not make the problem computationally more difficult, unlike if (12) is minimized with an arbitrary large number of noise model parameters.
Iv Theoretical Analysis
In this section, we perform a theoretical analysis of the semi-parametric WNSF method. In particular, we will show that Step 3 in Algorithm 1 provides a consistent and asymptotically efficient estimate. For that, we will need some auxiliary results.
Iv-a Auxiliary Results
To show the aforementioned results, we will need that the estimate obtained in Step 2 of Algorithm 1 is consistent. For that, we have the following result.
The result is a specific case of [wnsf_tac, Theorem 1]. ∎
Although consistency of Step 2 with semi-parametric WNSF is a specific case of the fully-parametric results [wnsf_tac], consistency and asymptotic distribution of Step 3 require additional results. To get further insight into why, we begin by writing, for the estimate from Step 3,
where and , recalling that is a function of according to Assumption 4 (for notational simplicity, we use only instead of in matrix subscripts). To study consistency and asymptotic distribution, the limit value of (33) and the asymptotic distribution of will be analyzed. The challenge in this analysis (compared to [wnsf_tac]) is in how to treat the matrix , and consequently also the matrix , which contains . From (30), , which in [wnsf_tac] may be written as , as the matrix is square therein. However, in the semi-parametric version treated in this paper, is not square, and we cannot analyze by taking inverses of the individual matrices it consists of.
To deal with this issue, we use the approach in [geometric_jonas], writing the aforementioned matrices as projections of the rows of some matrix onto the subspace spanned by the rows of another matrix. This will be applied to the limit value of the matrix , defined by
we may express as
Following the approach in [geometric_jonas], we recognize that the term in (36) can be written as
where denotes the projection of the rows of onto the subspace spanned by the rows of . As , the dimensions of the matrix increase, and the subspace spanned by its rows approaches . Then, the limit value of the projection will be the causal part of the projected matrix.
For a simplified case, suppose that were causal and that its dimension did not depend on (i.e., ). In this case, we would have . In turn, we would then have that . If we now reintroduce that the dimension of depends on (), and we assume that the rows of span as , we have that . These arguments follow from results in [geometric_jonas]. However, handling the dimensional increase of with requires additional technical developments. One of the key results for the asymptotic analysis in this paper is that the aforementioned result (i.e., that ) still holds when the dimensions of increase with . This is considered in the following theorem.
where () are exponentially stable (i.e., , ), , and and are exponentially stable. Then, if ,
See Appendix B. ∎
Iv-B Consistency and Asymptotic Efficiency
Using the auxiliary results derived above, we now show that semi-parametric WNSF is consistent and asymptotically efficient. Regarding consistency of Step 3 in Algorithm 1, we have the following result.
See Appendix C. ∎
Theorem 3 implies that semi-parametric WNSF provides consistent estimates of .
Regarding the asymptotic distribution and covariance of Step 3 in Algorithm 1, we have the following result.
See Appendix D. ∎
In this section, we perform two simulation studies. In the first, we illustrate the asymptotic properties of the method. In the second, we illustrate how estimating a non-parametric noise model with WNSF may be useful in scenarios where a low-order parametrization for the noise model does not capture the noise spectrum accurately enough.
V-a Illustration of asymptotic properties
According to the results in Section IV, semi-parametric WNSF is asymptotically efficient in open loop, with the asymptotic covariance of the dynamic-model estimates given by . In closed loop, the asymptotic covariance is still given by , but in this case it does not correspond to the CR bound, but to the optimal asymptotic covariance when the noise-model order tends to infinity.
To illustrate this, we perform open- and closed-loop simulations such that the closed-loop data are generated by
and the open-loop data by
where and are independent Gaussian white sequences with unit variance, , and
We perform 1000 Monte Carlo runs, with sample sizes . We apply WNSF with an ARX model of order 50 with the open- and closed-loop data. Performance is evaluated by the mean-squared error of the estimated parameter vector of the dynamic model,
As this simulation has the purpose of illustrating asymptotic properties, initial conditions are assumed known—that is, the sums in (20) start at instead of .
The results are presented in Fig. 1, with the average MSE plotted as function of the sample size (closed loop in solid line, open loop in dashed line). We plot also (dotted line), which the average MSE attains both in open and closed loop: because the data were generated such that , the spectrum of (16), is the same for both data sets, both scenarios have the same asymptotic covariance, in accordance to our theoretical results.
V-B Random noise model
When a low-order parametrization of the noise model is not enough to capture the noise spectrum, the noise model may require many parameters. With PEM, a simultaneous estimate of the dynamic model and a long noise model is not numerically robust due to the non-convexity of the cost function. The semi-parametric WNSF is appropriate to deal with this scenario, because the noise spectrum is captured beforehand with a non-parametric ARX model.
Modeling the correct noise spectrum is particularly important in closed loop, where the estimates of the plant will be biased if the noise model is not flexible enough to capture the noise spectrum. For this reason, we consider a closed-loop setting, where data are generated by
The signals and are Gaussian white noise sequences with variances 1 and 4, respectively. The system is given by
the controller by , and the true noise model by
with , where is drawn from a Gaussian distribution with zero mean and unit variance. Here, differently than Assumption 1, stability of is not ensured. However, this is not an issue if the noise is Gaussian, as there must exists an inversely stable for which the noise sequence has the same spectrum.
It may be difficult to find an appropriate low-order parametrization for the noise model (50) in the form
In this case, a Box-Jenkins model is estimated. The most appropriate noise model order may be chosen by using some information criterion, such as the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) [ljung99].
The alternative is to use a high-order model for the noise model. For example,
In particular, the choice (54) has the same structure as the noise model estimated by semi-parametric WNSF.
Motivated by these alternatives, we compare the following methods:
semi-parametric WNSF, with non-parametric model order ;
PEM, with default MATLAB initialization, and noise model (51) with , where the order is chosen using the AIC or BIC criterion (denoted and , respectively);
PEM, with default MATLAB initialization, and noise model (54) with (denoted , where ‘np’ stands for non-parametric).
PEM uses the implementation in MATLAB2016b System Identification Toolbox. All the methods use a maximum of 100 iterations, but stop early upon convergence (default settings for PEM, as tolerance for the normalized relative change in the parameter estimates for WNSF). The search algorithm used by PEM is chosen automatically. The noise model (53) was not used with PEM for computational reasons: the optimization becomes extremely slow as stability of the inverse of the noise model when computing the prediction errors (11) is difficult to fulfill with so many parameters, while the inverse of any estimate of (54) is always stable.
We use sample sizes and perform 100 Monte Carlo runs. Performance is evaluated by the FIT of the impulse response of the dynamic model, given by
in percent, where is a vector with the impulse response parameters of ( is its average), and similarly for but for the estimated model. In (56), sufficiently long impulse responses are taken to make sure that the truncation of their tails does not affect the FIT.
The FITs for the different sample sizes are shown in Fig. 2. For , WNSF has the most robust performance, with smaller whiskers than the remaining algorithms and no low-performance occurrences, which occur when PEM is used with AIC/BIC or with a a non-parametric noise model. Among the PEM alternatives, an AIC/BIC criterion with a Box-Jenkins model with orders up to 30 performed better than using a non-parametric noise model. For , WNSF and PEM with non-parametric noise model have similar median performance, put the algorithm for PEM failed more often. Here, PEM with AIC/BIC had no low-performance outliers, but the median performance was poorer than for WNSF and PEM. Similar conclusions can be drawn for , where we observe that PEM does not necessarily provide better results with more data samples, potentially due to numerical problems.
Overall, WNSF showed more robust performance among the sample sizes used. However, an even more evident advantage is the computational time. Table I shows the average times, in seconds, required for the identification using WNSF, PEM with all the orders computed for AIC and BIC, and , for the different sample sizes (all the computations were performed in the same computer). Here, we observe how WNSF requires much lower computational time than the alternatives. This is a consequence of PEM estimating the noise model in the non-linear optimization procedure, while in WNSF the high-order model is estimated in a previous least-squares step, which is numerically robust. Moreover, the time required for WNSF and PEM with AIC/BIC does not change significantly with , unlike with PEM. In this case, the time does not necessarily decrease for smaller . The problem arising when using smaller is that the cost function more likely becomes ill-conditioned at some parameter values during the optimization.
Many standard system identification methods provide biased estimates with closed-loop data. In the particular case of PEM, the bias issue is avoided by choosing a noise-model order that is large enough to capture the noise spectrum. An appropriate order is often difficult to choose, and making it arbitrarily large increases the numerical problems of PEM. In [wnsf_tac], where WNSF is analyzed, it is stated that the method can be used without a parametric noise-model estimate, named the semi-parametric WNSF. In this paper, we deepen this discussion.
A simulation study illustrates the importance of separating the dynamic- and noise-model identification when a high-order noise model is required, both in terms of performance and computational time. With WNSF, this separation always occurs, as the method first estimates a high-order ARX model. Then, a low-order noise model does not need to be obtained, as the noise spectrum has been captured in the first step.
We also provide a theoretical analysis of the asymptotic properties. To this end, we extend the geometric approach in [geometric_jonas], deriving a more general result that may also be useful for variance analysis of other methods. We show that semi-parametric WNSF provides consistent estimates of the dynamic model with closed-loop data. With open-loop data, the estimates are also asymptotically efficient; with closed-loop data, the asymptotic covariance of the estimates corresponds to the best possible covariance with a non-parametric noise model. This gives WNSF attractive features in terms of flexibility of noise-model structure and asymptotic properties: if a correct parametric noise model is estimated, the dynamic-model estimates are asymptotically efficient; if not, they are consistent and optimal for a non-parametric noise model. We used a simulation study to illustrate how semi-parametric WNSF is an appropriate method for scenarios where the noise model cannot be accurately modeled with a low-order parametrization.
Appendix A Auxiliary Results for the Proof of Theorem 2
The following results will be used to prove Theorem 2.
Extension of Cauchy-Schwarz inequality for transfer-matrix inner products
Let and be transfer matrices and and be vectors of appropriate dimensions. Then, we have
Bound for spectral norm of inner product of transfer matrices
Let be a transfer matrix. Then, we have
where the second inequality follows from for a positive semi-definite matrix .
Bound for Toeplitz operators of stable filters
Let . From [peller, Chapter 3], we have
where the inequality is due to the matrix on the right-hand side of (59) being a block of the infinite-dimensional matrix on the left-hand side.
Appendix B Proof of Theorem 2
In this appendix, we prove Theorem 2.