Asymptotic Analysis of SemiParametric
Weighted NullSpace Fitting Identification
Abstract
Standard system identification methods often provide biased estimates with closedloop data. With the prediction error method (PEM), the bias issue is solved by using a noise model that is flexible enough to capture the noise spectrum. However, a too flexible noise model (i.e., too many parameters) can cause additional numerical problems for PEM. In this paper, we perform a theoretical analysis of the weighted nullspace fitting (WNSF) method when a parametric noise model is not estimated. With this method, the system is first captured using a nonparametric ARX model, which is then reduced to a parametric model of interest. In the reduction step, a noise model does not need to be estimated if it is not of interest. In open loop, this still provides asymptotically efficient estimates of the dynamic model. In closed loop, the estimates are consistent, and their covariance is optimal for a nonparametric noise model. In this paper, we prove these results, which require additional technical details compared with the case with a full parametric model structure. In particular, we use a geometric approach for variance analysis, deriving a new result that will be instrumental to our end. Finally, we use a simulation study to illustrate the benefits of the method when the noise model cannot be parametrized by a loworder model.
I Introduction
The prediction error method (PEM) is a benchmark for estimation of linear parametric models. If the model orders are correct and the noise is Gaussian, PEM with a quadratic cost function is asymptotically efficient [ljung99]: the asymptotic covariance of the estimates is the CramérRao (CR) bound—the lowest covariance attainable by a consistent estimator.
Two models can typically be distinguished in a parametric model structure: the dynamic model and the noise model. Because the noise sequence is often the result of different noise contributions aggregated in a complex or even intractable manner, the concept of a “correct order” for the noise model is often inappropriate in practice. If the noisemodel order is chosen too small with PEM, the dynamicmodel estimate will be consistent in open loop, but biased in closed loop.
The bias issue with closedloop data is not exclusive of PEM. Instrumental variable methods [ivbook_soderstrom_stoica] require the reference signal to construct the instruments in closed loop [cliv, clriv]. For classical subspace methods [verdew92, vanodem94], the bias issue in closed loop has been overcome by more recent algorithms [verhaegen:closedloop, jansson03].
With PEM, the bias issue can in theory be solved by letting the noise model structure be more flexible (i.e., letting the number of parameters become very large), guaranteeing that the noise spectrum is captured. If the global minimum of the PEM cost function is found, in open loop this will asymptotically not affect the dynamicmodel estimates; in closed loop, consistency is attained but not efficiency. The problem is that, because the noise model might require many parameters, the PEM cost function will potentially have more local minima, thus aggravating the numerical search for the global minimum.
Some methods use a semiparametric approach, estimating a nonparametric noise model estimated in a first step, which is then used in a second step to estimate the dynamic model. Possible approaches have been delineated both in the frequency [freqdom_nonparH, Pintelon2010a, Pintelon2010b] and time domains [zhu_book, bjsm, MORSM].
The weighted nullspace fitting (WNSF) method [galrinho14] also first estimates a nonparametric model, and then reduces it to a full parametric model (i.e., dynamic and noise models). Moreover, WNSF does not apply nonlinear optimization techniques, but uses weighted least squares iteratively. In this sense, the method can be seen as belonging to the family of iterative leastsquares methods. These methods date back to [sanathanankoerner], and have later been denoted as iterative quadratic maximum likelihood (IQML) methods, with applications to filter design [evansfischl73, bresler86] and identification of dynamical systems [shaw94, shawmisra94, lemmerlingettal01]. The SteiglitzMcBride method [stmcb_original] belongs also to this class of methods, being equivalent to IQML for an impulseinput case [mcclellanlee91]. However, unlike in the aforementioned identification works, WNSF is asymptotically efficient in one iteration in open and closed loop for BoxJenkins systems [wnsf_tac].
Instead of simultaneously estimating the dynamic and noise models, WNSF also allows to disregard the parametric noise model, reducing the nonparametric model estimate to obtain a parametric dynamic model only. This guarantees asymptotically efficient estimates in open loop, and consistent estimates in closed loop with optimal asymptotic covariance when an infiniteorder noise model is used. This distinguishes this method from [bjsm, MORSM], which are developed for open loop. The asymptotic properties of the proposed method correspond to the asymptotic properties of PEM with an infiniteorder noise model [Forssell_cl] in both open and closed loop, but performed with a robust numerical procedure.
In [wnsf_tac], an extensive study of the fully parametric WNSF method has been conducted, including theoretical analysis and simulations. The case where WNSF is used with no parametric noisemodel estimate—denoted semiparametric WNSF—has been addressed and illustrated in [wnsf_tac], but a formal proof of the asymptotic properties has not been provided. In this paper, we perform the analysis of this setting, which has additional difficulties because of the nonsquared dimension of some matrices. For this purpose, we use the geometric approach of [geometric_jonas], extending the results therein to the setting of our problem. Thus, our contributions are: first, we derive a result for variance analysis in system identification; second, we use this result to perform a theoretical analysis of the semiparametric WNSF method, proving consistency and deriving the asymptotic covariance of the estimates in open and closed loop. This analysis is complemented with a simulation study to illustrate the benefits of the method.
Ii Preliminaries
Essentially, the same notation, definitions, and assumptions used in [wnsf_tac] apply to this paper. For convenience, we reproduce the notation and assumptions here; for the definitions of stability and quasistationarity, see [ljung&wahlberg92, wnsf_tac].
Iia Notation

, with an vector.

, with a matrix and a vector of appropriate dimensions.

(i.e., the Frobenius norm), with a matrix.

, with a transfer matrix.

.

denotes any constant, which need not be the same in different expressions.

, where is the backward timeshift operator.

is the complex conjugate transpose of the matrix .

is the lowertriangular Toeplitz matrix of size () with first column , where . The dimension may be infinity, denoted .

denotes expectation of the random vector .

.

means that the function tends to zero at a rate not slower than , as , w.p.1.

, where and are transfer matrices of appropriate sizes.
IiB Assumptions
Assumption 1 (True system and parametric model).
The system has scalar input , scalar output , and is subject to scalar noise . These signals are related by
(1) 
where and are rational functions according to
(2)  
The transfer functions , , and are assumed to be stable. The polynomials and —as well as and —do not share common factors.
We parametrize as
(3) 
where
(4) 
is the parameter vector to estimate, with known orders and . The noise model orders and are not known.
Assumption 2 (Noise).
The noise sequence is a stochastic process that satisfies
(5) 
Assumption 3 (Input).
The input sequence is defined by under the following conditions.

The sequence is assumed to be independent of , quasistationary with , and uniformly bounded: , .

Let the spectral factorization of the power spectral density of be . Then, is assumed to be stable, with .

The closed loop system is stable with .

The feedback transfer function is bounded on the unit circle.

The spectral density of the process is bounded from below by the matrix , (this implies an informative experiment).
Operation in open loop is obtained by taking .
Alternatively to (1), the true system can be written as
(6) 
where
(7)  
are stable (Assumption 1). In an intermediate step, WNSF estimates truncated and , using the ARX model
(8) 
where
(9)  
(10) 
Because the order needs to be infinite for the system to be in the ARX model set, we make the model order depend on the sample size —denoted —according to the following assumption.
Assumption 4 (ARXmodel order).
The ARX model order is selected according to:

, as ;

, for some , as .
IiC Prediction Error Method
The prediction error method minimizes a cost function of the prediction errors
(11) 
where is some noise model parametrization, function of a parameter vector . Using a quadratic cost function, which is optimal when the noise sequence is Gaussian, the PEM estimate of the parameters is obtained by minimizing
(12) 
where is the sample size.
Let be such that there exists for which . Denoting by the parameter vector that (together with some ) minimizes (12), the estimate is asymptotically distributed as [ljung99]
(13) 
where stands for the Gaussian distribution. The covariance matrix satisfies
(14) 
with (arguments often omitted for notational simplicity)
(15) 
and the spectrum of
(16) 
where is the sensitivity function.
In open loop (in which case is simply the input spectrum), , and it corresponds to the CR bound. In closed loop, when the number of parameters in tends to infinity. In this case, does not correspond to the CR bound, but to the best possible covariance (from a prediction error perspective) with a nonparametric noise model [Forssell_cl].
The interest of estimating a nonparametric noise model in closed loop is that even if the noise spectrum needs to be captured by a highorder model, it will still be possible to obtain an estimate of the dynamic model . However, estimating a nonparametric noise model simultaneously with a parametric dynamic model with PEM is not realistic. The reason is that, as the number of parameters in increases, the prediction error (11) becomes a more complicated function of , and finding the global minimum of the nonconvex cost function (12) is a more difficult task. Consequently, the result that PEM with a nonparametric noise model provides estimates with covariance corresponding to may not always be useful in practice. With WNSF, this setting can be handled without increasing the difficulty of the problem.
Iii SemiParametric Weighted NullSpace Fitting
The WNSF method consists of three steps [wnsf_tac]. First, we estimate a nonparametric ARX model, with least squares. Second, we reduce this estimate to a parametric model, with least squares. Third, we reestimate the parametric model, with weighted least squares. We now detail the procedure for each step, without estimating a parametric noise model.
For the first step, consider (8) in the regression form
(17)  
(18) 
Then, the leastsquares estimate of is obtained by
(19) 
where
(20) 
The asymptotic distribution of the estimates is [ljung&wahlberg92]
(21) 
where
(22)  
For the second step, we obtain an estimate of , from the nonparametric ARXmodel estimate. For this purpose, we use (2) and (7) to write
(23) 
Because we are not interested in estimating a parametric noise model, we do not consider the part of (7) from where such a model could be obtained. By convolution, (23) can be written in matrix form as
(24) 
where is given by (9) evaluated at the true coefficients of (7), consists of the last coefficients of , and
(25)  
(26) 
Motivated by (24), we replace by its estimate (and the same for , which is a part of ) and obtain an estimate of with least squares:
(27) 
For the third step, we reestimate taking into account the errors in . As is replaced by in (24), the residuals are given by where
(28)  
(29) 
The variance of the estimate is then minimized by solving a weighted leastsquares problem, where the weighting is the inverse of the covariance of the residuals. Although this covariance is dependent on the true parameters, a consistent estimate is available by
(30) 
and the estimate obtained in this step is thus given by
(31) 
The algorithm may be summarized as follows.
Algorithm 1.
Optionally, we may continue to iterate, potentially improving the estimation quality for finite sample size. However, we show in the next section that Algorithm 1 has the same asymptotic properties as PEM with an infiniteorder noise model. However, WNSF estimates the nonparametric noise model in a separate step, as part of an ARX model, which is linear in the model parameters. Thus, it does not make the problem computationally more difficult, unlike if (12) is minimized with an arbitrary large number of noise model parameters.
Iv Theoretical Analysis
In this section, we perform a theoretical analysis of the semiparametric WNSF method. In particular, we will show that Step 3 in Algorithm 1 provides a consistent and asymptotically efficient estimate. For that, we will need some auxiliary results.
Iva Auxiliary Results
To show the aforementioned results, we will need that the estimate obtained in Step 2 of Algorithm 1 is consistent. For that, we have the following result.
Proof.
The result is a specific case of [wnsf_tac, Theorem 1]. ∎
Although consistency of Step 2 with semiparametric WNSF is a specific case of the fullyparametric results [wnsf_tac], consistency and asymptotic distribution of Step 3 require additional results. To get further insight into why, we begin by writing, for the estimate from Step 3,
(33) 
where and , recalling that is a function of according to Assumption 4 (for notational simplicity, we use only instead of in matrix subscripts). To study consistency and asymptotic distribution, the limit value of (33) and the asymptotic distribution of will be analyzed. The challenge in this analysis (compared to [wnsf_tac]) is in how to treat the matrix , and consequently also the matrix , which contains . From (30), , which in [wnsf_tac] may be written as , as the matrix is square therein. However, in the semiparametric version treated in this paper, is not square, and we cannot analyze by taking inverses of the individual matrices it consists of.
To deal with this issue, we use the approach in [geometric_jonas], writing the aforementioned matrices as projections of the rows of some matrix onto the subspace spanned by the rows of another matrix. This will be applied to the limit value of the matrix , defined by
(34) 
Writing , , and (defined in (26), (29), and (20), respectively) in the frequency domain as
(35)  
we may express as
(36) 
where
(37)  
Following the approach in [geometric_jonas], we recognize that the term in (36) can be written as
(38) 
where denotes the projection of the rows of onto the subspace spanned by the rows of . As , the dimensions of the matrix increase, and the subspace spanned by its rows approaches . Then, the limit value of the projection will be the causal part of the projected matrix.
For a simplified case, suppose that were causal and that its dimension did not depend on (i.e., ). In this case, we would have . In turn, we would then have that . If we now reintroduce that the dimension of depends on (), and we assume that the rows of span as , we have that . These arguments follow from results in [geometric_jonas]. However, handling the dimensional increase of with requires additional technical developments. One of the key results for the asymptotic analysis in this paper is that the aforementioned result (i.e., that ) still holds when the dimensions of increase with . This is considered in the following theorem.
Theorem 2.
Let
(39)  
(40) 
where () are exponentially stable (i.e., , ), , and and are exponentially stable. Then, if ,
(41) 
Proof.
See Appendix B. ∎
IvB Consistency and Asymptotic Efficiency
Using the auxiliary results derived above, we now show that semiparametric WNSF is consistent and asymptotically efficient. Regarding consistency of Step 3 in Algorithm 1, we have the following result.
Proof.
See Appendix C. ∎
Theorem 3 implies that semiparametric WNSF provides consistent estimates of .
Regarding the asymptotic distribution and covariance of Step 3 in Algorithm 1, we have the following result.
Proof.
See Appendix D. ∎
V Simulations
In this section, we perform two simulation studies. In the first, we illustrate the asymptotic properties of the method. In the second, we illustrate how estimating a nonparametric noise model with WNSF may be useful in scenarios where a loworder parametrization for the noise model does not capture the noise spectrum accurately enough.
Va Illustration of asymptotic properties
According to the results in Section IV, semiparametric WNSF is asymptotically efficient in open loop, with the asymptotic covariance of the dynamicmodel estimates given by . In closed loop, the asymptotic covariance is still given by , but in this case it does not correspond to the CR bound, but to the optimal asymptotic covariance when the noisemodel order tends to infinity.
To illustrate this, we perform open and closedloop simulations such that the closedloop data are generated by
(44)  
and the openloop data by
(45)  
where and are independent Gaussian white sequences with unit variance, , and
(46) 
We perform 1000 Monte Carlo runs, with sample sizes . We apply WNSF with an ARX model of order 50 with the open and closedloop data. Performance is evaluated by the meansquared error of the estimated parameter vector of the dynamic model,
(47) 
As this simulation has the purpose of illustrating asymptotic properties, initial conditions are assumed known—that is, the sums in (20) start at instead of .
The results are presented in Fig. 1, with the average MSE plotted as function of the sample size (closed loop in solid line, open loop in dashed line). We plot also (dotted line), which the average MSE attains both in open and closed loop: because the data were generated such that , the spectrum of (16), is the same for both data sets, both scenarios have the same asymptotic covariance, in accordance to our theoretical results.
VB Random noise model
When a loworder parametrization of the noise model is not enough to capture the noise spectrum, the noise model may require many parameters. With PEM, a simultaneous estimate of the dynamic model and a long noise model is not numerically robust due to the nonconvexity of the cost function. The semiparametric WNSF is appropriate to deal with this scenario, because the noise spectrum is captured beforehand with a nonparametric ARX model.
Modeling the correct noise spectrum is particularly important in closed loop, where the estimates of the plant will be biased if the noise model is not flexible enough to capture the noise spectrum. For this reason, we consider a closedloop setting, where data are generated by
(48)  
The signals and are Gaussian white noise sequences with variances 1 and 4, respectively. The system is given by
(49) 
the controller by , and the true noise model by
(50) 
with , where is drawn from a Gaussian distribution with zero mean and unit variance. Here, differently than Assumption 1, stability of is not ensured. However, this is not an issue if the noise is Gaussian, as there must exists an inversely stable for which the noise sequence has the same spectrum.
It may be difficult to find an appropriate loworder parametrization for the noise model (50) in the form
(51) 
where
(52) 
In this case, a BoxJenkins model is estimated. The most appropriate noise model order may be chosen by using some information criterion, such as the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC) [ljung99].
The alternative is to use a highorder model for the noise model. For example,
(53) 
or
(54) 
where
(55) 
In particular, the choice (54) has the same structure as the noise model estimated by semiparametric WNSF.
Motivated by these alternatives, we compare the following methods:

semiparametric WNSF, with nonparametric model order ;

PEM, with default MATLAB initialization, and noise model (51) with , where the order is chosen using the AIC or BIC criterion (denoted and , respectively);

PEM, with default MATLAB initialization, and noise model (54) with (denoted , where ‘np’ stands for nonparametric).
PEM uses the implementation in MATLAB2016b System Identification Toolbox. All the methods use a maximum of 100 iterations, but stop early upon convergence (default settings for PEM, as tolerance for the normalized relative change in the parameter estimates for WNSF). The search algorithm used by PEM is chosen automatically. The noise model (53) was not used with PEM for computational reasons: the optimization becomes extremely slow as stability of the inverse of the noise model when computing the prediction errors (11) is difficult to fulfill with so many parameters, while the inverse of any estimate of (54) is always stable.
We use sample sizes and perform 100 Monte Carlo runs. Performance is evaluated by the FIT of the impulse response of the dynamic model, given by
(56) 
in percent, where is a vector with the impulse response parameters of ( is its average), and similarly for but for the estimated model. In (56), sufficiently long impulse responses are taken to make sure that the truncation of their tails does not affect the FIT.
The FITs for the different sample sizes are shown in Fig. 2. For , WNSF has the most robust performance, with smaller whiskers than the remaining algorithms and no lowperformance occurrences, which occur when PEM is used with AIC/BIC or with a a nonparametric noise model. Among the PEM alternatives, an AIC/BIC criterion with a BoxJenkins model with orders up to 30 performed better than using a nonparametric noise model. For , WNSF and PEM with nonparametric noise model have similar median performance, put the algorithm for PEM failed more often. Here, PEM with AIC/BIC had no lowperformance outliers, but the median performance was poorer than for WNSF and PEM. Similar conclusions can be drawn for , where we observe that PEM does not necessarily provide better results with more data samples, potentially due to numerical problems.
Overall, WNSF showed more robust performance among the sample sizes used. However, an even more evident advantage is the computational time. Table I shows the average times, in seconds, required for the identification using WNSF, PEM with all the orders computed for AIC and BIC, and , for the different sample sizes (all the computations were performed in the same computer). Here, we observe how WNSF requires much lower computational time than the alternatives. This is a consequence of PEM estimating the noise model in the nonlinear optimization procedure, while in WNSF the highorder model is estimated in a previous leastsquares step, which is numerically robust. Moreover, the time required for WNSF and PEM with AIC/BIC does not change significantly with , unlike with PEM. In this case, the time does not necessarily decrease for smaller . The problem arising when using smaller is that the cost function more likely becomes illconditioned at some parameter values during the optimization.
1000  5000  10000  

WNSF  1.03  0.907  1.29 
30.3  26.8  38.1  
641  133  236 
Vi Conclusion
Many standard system identification methods provide biased estimates with closedloop data. In the particular case of PEM, the bias issue is avoided by choosing a noisemodel order that is large enough to capture the noise spectrum. An appropriate order is often difficult to choose, and making it arbitrarily large increases the numerical problems of PEM. In [wnsf_tac], where WNSF is analyzed, it is stated that the method can be used without a parametric noisemodel estimate, named the semiparametric WNSF. In this paper, we deepen this discussion.
A simulation study illustrates the importance of separating the dynamic and noisemodel identification when a highorder noise model is required, both in terms of performance and computational time. With WNSF, this separation always occurs, as the method first estimates a highorder ARX model. Then, a loworder noise model does not need to be obtained, as the noise spectrum has been captured in the first step.
We also provide a theoretical analysis of the asymptotic properties. To this end, we extend the geometric approach in [geometric_jonas], deriving a more general result that may also be useful for variance analysis of other methods. We show that semiparametric WNSF provides consistent estimates of the dynamic model with closedloop data. With openloop data, the estimates are also asymptotically efficient; with closedloop data, the asymptotic covariance of the estimates corresponds to the best possible covariance with a nonparametric noise model. This gives WNSF attractive features in terms of flexibility of noisemodel structure and asymptotic properties: if a correct parametric noise model is estimated, the dynamicmodel estimates are asymptotically efficient; if not, they are consistent and optimal for a nonparametric noise model. We used a simulation study to illustrate how semiparametric WNSF is an appropriate method for scenarios where the noise model cannot be accurately modeled with a loworder parametrization.
Appendix A Auxiliary Results for the Proof of Theorem 2
The following results will be used to prove Theorem 2.
Extension of CauchySchwarz inequality for transfermatrix inner products
Let and be transfer matrices and and be vectors of appropriate dimensions. Then, we have
(57)  
Bound for spectral norm of inner product of transfer matrices
Let be a transfer matrix. Then, we have
(58)  
where the second inequality follows from for a positive semidefinite matrix .
Bound for Toeplitz operators of stable filters
Let . From [peller, Chapter 3], we have
(59) 
where the inequality is due to the matrix on the righthand side of (59) being a block of the infinitedimensional matrix on the lefthand side.
Appendix B Proof of Theorem 2
In this appendix, we prove Theorem 2.
Inner Projection:
Let
(60) 
where
(61)  