Gray Box Identification of StateSpace Models Using Difference of Convex Programming
Abstract
Graybox identification is prevalent in modeling physical and networked systems. However, due to the nonconvex nature of the graybox identification problem, good initial parameter estimates are crucial for a successful application. In this paper, a new identification method is proposed by exploiting the lowrank and structured Hankel matrix of impulse response. This identification problem is recasted into a differenceofconvex programming problem, which is then solved by the sequential convex programming approach with the associated initialization obtained by nuclearnorm optimization. The presented method aims to achieve the maximum impulseresponse fitting while not requiring additional (nonconvex) conditions to secure nonsingularity of the similarity transformation relating the given statespace matrices to the graybox parameterized ones. This overcomes a persistent shortcoming in a number of recent contributions on this topic, and the new method can be applied for the structured statespace realization even if the involved system parameters are unidentifiable. The method can be used both for directly estimating the graybox parameters and for providing initial parameter estimates for further iterative search in a conventional graybox identification setup.
I Introduction
Nowadays, the control and identification of structured statespace system model have attracted great attention in the control community. There are two main sources of structured statespace models: the modeling of practical physical systems [1, 2, 3] and the description of networked systems [4, 5]. When modeling physical systems, the nonzero entries of the system matrices always have physical meanings such as masses, velocity, acceleration, and so on. Identification of the physical parameters can provide us a better understanding of the inner physical mechanism of the investigated object. On the other hand, a network connected system often can be represented as a structured system with the structure straightforwardly determined by the interconnections among the involved subsystems. Identification of such kind of structured system models provides the foundation for the modelbased network control.
In the literature, there are two kinds of methods to identify structured statespace models. One is the traditional graybox setup, to identify the parameterized statespace models directly from the inputoutput (IO) data using the predictionerror method [2],[3]. Since the involved identification problem is always a nonconvex optimization problem, the conventional nonlinear optimization methods, such as regularized GaussNewton method [2, Section 10.2], and the gradient project method [3, Chapter 7], are sensitive to the initial parameter estimate. This traditional setup thus requires reasonable knowledge of the parameters and structures to be identified. Since the gray box situation starts from some physical insight, this knowledge may be sufficient in some cases, but too demanding in other. Resorting to testing random initial parameters may not be feasible for large problems.
The other approach to structured statespace models is to first estimate an unstructured, blackbox model using, e.g., subspace identification methods, followed by the recovery of the physical parameters embedded in the concerned structured model. Using the classical subspace identification methods, such as MOESP and N4SID [2, 3], the system matrices in the first step can be consistently estimated under some mild conditions. The parameter recovery in the second step turns out to be a smallscale bilinear optimization problem.
To solve the bilinear optimization problem involved with the graybox identification, an alternating minimization algorithm was developed in [6] and a nullspace based method was proposed in [7]. In order to prevent the singular similarity transformation, a nonsmooth optimization approach was presented in [8]. Furthermore, in order to avoid estimating the similarity transformation, an normbased identification algorithm was proposed in [9]. The above mentioned algorithms are sensitive to initial conditions. To cope with this problem, the bilinear optimization problem was reformulated into a sumofsquares of polynomials which is then solved by the semidefinite programming method [10]; however, this technique is limited to solving smallscale problems having only a few unknown variables.
In this paper, a differenceofconvex programming (DCP) based method is developed for the identification of structured statespace models. This approach estimates the system parameters by the structured factorization of a block Hankel matrix of system impulse response, which is inspired by the HoKalman decomposition method [3]. More explicitly, the proposed method boils down to solving a lowrank structured matrix factorization problem. In this paper, this nonconvex optimization problem is transformed into a differenceofconvex programming (DCP) problem which is then tackled by the sequential convex programming technique [11].
The advantages of the proposed method against many recently developed methods are as follows. Different from the identification method in [8], the proposed algorithm framework avoids the nonsingularity constraint on the similarity transformation and can be applied to the realization of nonidentifiable graybox models. Unlike the modelmatching method [9] which requires to solve an infinitedimensional optimization problem, the proposed identification method is finitedimensional so that it is more computational amenable. Moreover, compared with other gradientbased or alternating minimization methods [6, 3], the proposed identification method performs well in practice thanks to the highquality initial conditions obtained by solving the convexified lowrank and structured matrix factorization problem.
The rest of the paper is organized as follows. Section II formulates the identification problem of graybox models. Section III reviews the graybox identification using the classical predictionerror method. Section IV gives an alternative way for the gray box identification, which is to identify the blackbox model first, following the identification of system parameters by solving a bilinear optimization problem. Section V provides a new method for solving the bilinear optimization problem. Section VI demonstrates the performance of the proposed identification method, followed by some conclusions in Section VII.
Ii Problem statement
We consider a parameterized statespace model as follows
(1) 
where and are system input, state, output, and measurement noise, respectively; is the parameter vector; and represent continuous and discrete time indices, respectively; is the sampling period.
The parameter vector in (1) typically represents unknown values of physical coefficients. Here, we assume that the structured system matrices are affine with respect to , i.e.
where the coefficient matrices and are known. Besides the structures of the system matrices, the system order of (1) is known as a priori knowledge as well.
Denote the corresponding true continuoustime transfer function by:
(2) 
Although stateevolution equation in (1) is continuous, we can only obtain sampled IO data in practice with sampling period . Denoting the discretetime system, obtained by the sampling period with the system input being piecewise constant between the sampling instants , as
(3) 
where
Given the sampled IO data for that are generated from model (1) for a certain value , the concerned graybox identification problem is to estimate the parameter vector from the measured IO data.
To address the concerned identification problem, the following assumptions are made throughout the paper.

The system in (1) is minimal;

The magnitudes of the imaginary parts of the eigenvalues of are less than the Nyquist frequency .

The input sequence is persistently exciting of any finite order [2, Chapter 13];

The measurement noise is uncorrelated with the system input .
Assumption A2 ensures that the corresponding discretetime model of (1) is minimal when Assumption A1 is satisfied [12]. Assumptions A3A4 are standard assumptions for the consistent identification of the discretetime system model .
Iii Graybox approach
The estimation of the parameter vector using the sampled IO data is typically a graybox identification problem. The traditional identification method is the predictionerror method [2] in which the predicted or simulated outputs are computed using the discretetime model for any . The corresponding prediction error criterion is written as
(4) 
This general method has the best possible asymptotic accuracy, but the main drawback is that the optimization problem is (highly) nonconvex and may have many local minima. The gradientbased optimization algorithms such as GaussNewton method [2, Section 10.2], and gradient projection method [3, Chapter 7] can be used to solve (4). However, the performance mainly relies on the selection of initial parameter estimate. The graybox structure information may be sufficient to provide such initial estimates that are in the domain of attraction of the global minimum but otherwise one may have to resort to random initial parameters. It is shown in [10] that the chances to reach the global minimum of (4) from random starting points may be very slim for problems of realistic sizes.
Iv Blackbox + Algebraic Approach
Besides the graybox approach, there exist other routes to estimate the parameter vector from the sampled IO data. Even though the graybox approach may end up in local minima, it is still possible to find the true system from data by a blackbox approach. Subspace approaches like N4SID and MOESP [2, 3] can, under mild conditions, obtain the true discretetime system as the length of the IO data tends to infinity. That discretetime system can be easily transformed to continuoustime using the zeroorder hold interpolation approach [13]. As a result, the continuoustime transfer function will be known, but in an unknown statespace basis:
(5) 
The identification problem has now been transformed to an algebraic problem:
Given the values of , determine the parameter vector satisfying
(6) 
The estimate of obtained in this way can then be used as initial estimate in the minimization of (4).
This approach was discussed in [6, 10, 8].
V Solving the Algebraic problem
To solve the algebraic problem in (6), two routes are provided here: one is the similarity transformation of the statespace realization and the other is the lowrank and structured factorization of the block Hankel matrix of impulse response.
Va Using Similarity Transformation
Equation (6) means that there exists a similarity transformation such that
(7) 
From that we can form the criterion
(8) 
The optimization problem in (8) is a bilinear estimation problem and an alternating minimization method was proposed in [6]. In [10], the optimization problem in (8) was minimized by a convex sumofsquares method in case are affine in ; however, this method is limited to solving smallscale problems having only rather few
unknown variables.
The equation group in (7) can be equivalently written in a vector form:
(9) 
To solve the above bilinear equation, a gradient projection method was given in [3, Chapter 7.5.4], a nullspacebased optimization method was developed in [7] and a differenceofconvex based method was proposed in [14].
Even if we can find a global optimal solution by one of the above mentioned methods, it might not be meaningful for the identification purpose stated in Section II. The reason for this is that the optimal solution might be singular and the obtained transfer function might not be equal to . In fact, equations (6) and (7) are equivalent if and only if is nonsingular [15]. To deal with this problem, a conditionnumber constraint on was considered in [8], which turns out to be a nonsmooth and highly nonconvex optimization problem.
Another way to deal with the possible mismatch between equations (6) and (7) is to minimize the modelmatching criterion using either norm or norm, as suggested in [10]. The norm based modelmatching method has been investigated in [9]. Compared with the minimization of (8), the norm based modelmatching method reduces the number of unknown variables but at the price of the introduction of a semiinfinite and nonsmooth program [9].
VB Using the Hankel Matrix of Impulse Response
In this section, aiming at dealing with the possible drawback of minimizing equation (8), a new identification approach is developed by exploiting the structured and lowrank factorization of the block Hankel matrix of impulse response.
After obtaining a fullparameterized statespace realization , we can obtain the associated impulse response sequence denoted by
for . Let be a block Hankel matrix constructed by Markov parameters
(10) 
where the subscripts , satisfying , denote the number of block rows and number of block columns, respectively. Given the block Hankel matrix , the concerned graybox identification problem is formulated as
(11) 
In the above equation, the block Hankel matrix has a lowrank factorization as
(12) 
where and denote the extended observability and controllability matrix, respectively.
Denote . By exploiting the shift properties embedded in extended observability and controllability matrices, the optimization problem (11) can be recasted into a lowrank structured matrix factorization problem:
(13) 
In the above optimization problem, the first and the last two constraints in the above equation are bilinear. To solve this problem, the DCPbased identification framework [14] will be adopted, which contains the following three steps: (i) the bilinear optimization problem is transformed into a rank constrained optimization problem; (ii) the rank constrained problem is recasted into a DCP problem; (iii) the DCP problem is then solved using the sequential convex programming technique.
Step 1: The first constraint, , in (13) can be equivalently written as a rank constraint.
Lemma 1
[16] The bilinear equation is equivalent to the rank constraint
(14) 
The equivalent rank constraints for the last two constraints of (13) are derived below. To simplify the notation, we denote and . The last two constraints can be represented as
(15) 
An equivalent form of the combination of the fourth and fifth constraints is given in the following lemma.
Lemma 2
The constraint equation (15) is equivalent to
(16) 
To prove the lemma, it is sufficient to prove that the variables in (15) and those in (16) are onetoone mapping. On the one hand, the variables in (16) can be uniquely determined from those in (15) by assigning and . On the other hand, the variables in (15) can be uniquely determined by the SVD decomposition of the matrix on the lefthand side of the following equation
By Lemmas 1 and 2, the bilinear optimization problem in (13) can be equivalently formulated as a rankconstrained optimization problem as follows:
(17) 
The above optimization contains two rank constraints. To deal with the above rank constrained optimization, we shall further formulate it as a difference of convex optimization problem.
Step 2: For notational simplicity, we denote
Let be the th largest singular value of for . Define
It is remarked that is a Ky Fan norm, which is a convex function [17].
Inspired by the truncated nuclear norm method in [18, 14], the rank constraint can be replaced by
(18) 
The above equation means that all the singular values except the largest one of are zero. Since , the above equation can be represented as
Using the above strategy, instead of directly solving the rank constrained optimization problem in (17), we try to solve the following regularized optimization problem:
(19) 
where are regularization parameters. It is remarked that all the constraints in (19) are linear functions with respect to the unknown variables and the objective function is a differenceofconvex function. Although the formulations (17) and (19) may not be strictly equivalent, they have the same global optimum.
Step 3: We shall develop a sequential convex programming method to solve the DC optimization problem in (19). In order to develop a sequential convex programming method, it is essential to linearize the concave terms in the objective function of (19). Let be the estimate of at the th iteration and its SVD decomposition be given as
(20) 
where and are respectively the left and right singular vectors corresponding to the largest singular values. It can be established that [17]
(21) 
Then, the linearization of at the point is
(22) 
Let be the estimate of at the th iteration and its SVD decomposition be given as
(23) 
where and are respectively the left and right singular vectors corresponding to the largest singular value. Then, the linearization of at the point is
(24) 
Based on the linearizations in (22) and (24), the convex optimization problem to be solved at the th iteration is as follows:
(25) 
where is a very small positive regularization parameter and the proximal term is added to ensure the convergence of the sequential convex programming approach, as shown in Lemma 3.
To ease the reference, the above sequential convex programming procedure is summarized in Algorithm 1.
Algorithm 1 Sequential convex programming method for (19) 
1) Set and . 
2) Repeat 
21): Compute respectively the left and right singular vectors of 
and as shown in (20) and (23). 
22): Obtain the estimates and by solving (25). 
3) until with a small value. 
By applying the iterative optimization method in Algorithm 1, the convergence is guaranteed.
Lemma 3
The above lemma can be proven by the results of Theorems 12 in [19].
Since the differenceofconvex optimization problem in (19) is still nonconvex, the performance of the provided sequential convex programming procedure relies on the initial conditions. However, by setting and , we can find that the optimization problem in (19) is a nuclearnorm relaxation of the rankconstrained optimization problem in (17). Due to the fact that the nuclear norm is the convex envelope of the lowrank constraint on the unit spectral norm ball [20], the associated nuclearnorm optimization usually yields a good candidate for the starting point of the sequential convex programming procedure.
Vi Numerical Simulations
In this section, two simulation examples will be carried out to demonstrate the performance of the proposed method  Algorithm 1. For comparison purposes, the predictionerror method (PEM) [2], [10] and the differenceofconvex programming (DCP) method [14] are simulated. The implementation details of these three methods are given below.

Algorithm 1 is simulated by empirically setting the regularization parameters in (25) to , and . The tolerance of relative error is set to .

PEM is simulated by firstly configuring the structure object using the Matlab command idgrey, see [21] and then implementing the identification method using the Matlab command pem. The initial conditions are randomly generated following the standard Gaussian distribution.

DCP is simulated by setting the regularization parameter in equation (17) of [14] to . The tolerance of relative error is set to .
In the simulations, the maximum number of iterations for the these three methods is set to 100.
Via Randomly generated structures
The first simulation example is conducted following the way in [10]. The statespace model of (1) is randomly generated by the Matlab command rss, and the system parameters to be estimated are randomly picked from the generated models. Since the system model is randomly generated, it is difficult to find a unified sampling period for all systems. Therefore, when simulating DCP and Algorithm 1, the system matrices and in (6) are assumed to be known.
To ensure the identifiability of the system parameters, the number of unknown parameters cannot be larger than ; however, system parameters less than may not always be identifiable [2, 3]. Therefore, we use the impulseresponse fitting to measure the identification performance. In the simulation, we choose the system order and the input/output dimension . For each fixed number of free parameters, we carry out 100 MonteCarlo trials by randomly generating the system model and randomly picking a fixed number of free parameters. The success rate is obtained by counting the number of successful trials using the criterion . Denote by the estimate of at the th MonteCarlo trial. The impulseresponse fitting (IRF) of the th MonteCarlo trial is defined as
(26) 
where the dimension parameters and are defined in (10).
The identification performance of PEM, DCP and Algorithm 1 is shown in Fig. 1 from which we can draw the following conclusions.

When the number of parameters is larger than 3, DCP and Algorithm 1 perform much better than PEM. This is because DCP and Algorithm 1 can find good initializations by nuclearnorm regularized optimization. In contrast, when the number of parameters are less than or equal to 3, PEM has a slightly better performance than DCP and Algorithm 1. This might be relevant to the selection of the regularization parameters of DCP and Algorithm 1.

When the number of parameters is larger than 6, the success rate of Algorithm 1 is higher than that of DCP up to 20%. This might be caused by the fact that DCP does not consider the nonsingularity constraint of the similarity transformation, while Algorithm 1 implicitly guarantees the nonsingularity of the similarity transformation. However, when the number of parameters is less than or equal to 6, DCP and Algorithm 1 have similar performance. This might be because, when the number of free parameters becomes smaller, singular similarity transformations are less likely to occur.
ViB Compartmental structures
The second simulation example is to identify the compartmental structure having the following form [4]
(27) 
For each fixed system order, 100 MonteCarlo trials are carried out by randomly generating the system parameters. The success rate is obtained by counting the number of successful trials using the criterion .
Fig. 2 shows the identification performance of three investigated methods in terms of IRF. It can be found that the success rates of Algorithm 1 stay around 90% for different system orders, which demonstrates the better performance of Algorithm 1 with relation to DCP and PEM. This can be explained by that Algorithm 1 can obtain good initializations and guarantee the nonsingularity of the similarity transformation.
Vii Conclusions
In this paper, we have proposed a new graybox identification method by exploiting the lowrank and structured factorization of the Hankel matrix of impulse response. This method uses the system impulseresponse fitting as the objective function while avoiding the explicit nonsingularity constraint on similarity transformation; thus, it can be applied to the statespace realizations of nonidentifiable graybox models. Compared with the classical predictionerror method initialized at random parameter values, the proposed method can yield better performance since it can find a good initialization by nuclearnorm based optimization.
Although the proposed identification algorithm demonstrates good performance in terms of system impulseresponse fitting, its computational complexity is higher than the classical predictionerror method. Thus, investigation will be made in our future work on improving the computational efficiency of the proposed identification method.
References
 [1] R. C. Dorf and R. H. Bishop, Modern control systems. Pearson, 2011.
 [2] L. Ljung, System Identification: Theory for the User. Pearson Education, 1999.
 [3] M. Verhaegen and V. Verdult, Filtering and System Identification: A Least Squares Approach. Cambridge University Press, 2007.
 [4] R. Bellman and K. J. Åström, “On structural identifiability,” Mathematical biosciences, vol. 7, no. 3, pp. 329–339, 1970.
 [5] J. Van den Hof, “Structural identifiability of linear compartmental systems,” Automatic Control, IEEE Transactions on, vol. 43, no. 6, pp. 800–818, 1998.
 [6] L.L. Xie and L. Ljung, “Estimate physical parameters by black box modeling,” in Proceedings of the 21st Chinese Control Conference, pp. 673–677, 2002.
 [7] O. Prot, G. Mercère, and J. Ramos, “A nullspacebased technique for the estimation of lineartime invariant structured statespace representations,” in Proceedings of the IFAC Symposium on System Identification, 2012.
 [8] G. Mercere, O. Prot, and J. Ramos, “Identification of parameterized graybox statespace systems: From a blackbox linear timeinvariant representation to a structured one,” Automatic Control, IEEE Transactions on, vol. 59, pp. 2873–2885, Nov 2014.
 [9] D. Vizer, G. Mercère, O. Prot, and E. Laroche, “normbased optimization for the identification of graybox lti statespace model parameters,” Systems Control Letters, vol. 92, pp. 34 – 41, 2016.
 [10] L. Ljung and P. Parrilo, “Initialization of physical parameter estimates,” 2003.
 [11] S. Boyd, “Sequential convex programming,” Lecture Notes, Stanford University, 2008.
 [12] T. Chen and B. Francis, Optimal SampledData Control Systems. Communications and Control Engineering, Springer London, 1994.
 [13] G. F. Franklin, J. D. Powell, and M. L. Workman, Digital control of dynamic systems, vol. 3. Addisonwesley Menlo Park, 1998.
 [14] C. Yu, M. Verhaegen, S. Kovalsky, and R. Basri, “Identification of structured lti mimo statespace models,” arXiv preprint arXiv:1509.08692, 2015.
 [15] T. Kailath, Linear systems, vol. 1. PrenticeHall Englewood Cliffs, NJ, 1980.
 [16] R. Doelman and M. Verhaegen, “Sequential convex relaxation for convex optimization with bilinear matrix equalities,” in 2016 European Control Conference, 2016.
 [17] L. Qi and R. S. Womersley, “On extreme singular values of matrix valued functions,” Journal of Convex Analysis, vol. 3, pp. 153–166, 1996.
 [18] Y. Hu, D. Zhang, J. Ye, X. Li, and X. He, “Fast and accurate matrix completion via truncated nuclear norm regularization,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 35, no. 9, pp. 2117–2130, 2013.
 [19] C. Lu, J. Tang, S. Yan, and Z. Lin, “Nonconvex nonsmooth low rank minimization via iteratively reweighted nuclear norm,” IEEE Transactions on Image Processing, vol. 25, no. 2, pp. 829–839, 2016.
 [20] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization,” SIAM review, vol. 52, no. 3, pp. 471–501, 2010.
 [21] L. Ljung, The System Identification Toolbox: The Manual. Natick, MA, USA: The MathWorks Inc. 1st edition 1986, Edition 8.3 2013, 2013.