Input Design for Model Discrimination and Fault Detection via Convex Relaxation

Input Design for Model Discrimination and Fault Detection
via Convex Relaxation

Seunggyun Cheong and Ian R. Manchester This work was supported by the Australian Research Council.The authors are with Australian Centre for Field Robotics (ACFR) and School of the Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, NSW 2006, Australia {s.cheong,i.manchester}

This paper addresses the design of input signals for the purpose of discriminating among a finite set of models dynamic systems within a given finite time interval. A motivating application is fault detection and isolation. We propose several specific optimization problems, with objectives or constraints based on signal power, signal amplitude, and probability of successful model discrimination. Since these optimization problems are nonconvex, we suggest a suboptimal solution via a random search algorithm guided by the semidefinite relaxation (SDR) and analyze the accuracy of the suboptimal solution. We conclude with a simple example taken from a benchmark problem on fault detection for wind turbines.

I Introduction

In many applications of control and automation there will occur events that necessitate re-identifying a system model or detecting that the dynamics have changed. For example, it may be that the system dynamics have slowly changed due to aging, or abruptly changed to a fault. It is desirable to adjust the current control law accordingly or fix the fault. In this paper we investigate design of “probing signals” that improve the reliability of such a process. In particular, we consider the problem of discriminating among a fixed finite set of models, and finding the one which best matches the current system behaviour.

There is a long history of research into input design for system identification. The majority of papers have considered continuous parameterizations of models, with identification quality measured by estimated parameter variances or the Fisher information matrix. Basic approaches are summarised in [1], [2], including pseudo-random binary signals and optimized multi-sine signals. In [3] semidefinite programming was used to design optimal signals in the frequency domain subject power constraints. Robust procedures were proposed in [4]. In [5], [6], time domain signals were designed subject to power and amplitude constraints using semidefinite relaxation.

Discriminating among a finite set of models is obviously more limiting in some ways than a continuous parameterization, but it also offers certain advantages, e.g. one can easily have different models of different order and structure. In the case of fault detection and isolation, there are frequently a finite number modes of operation, corresponding to failures of different components, each of which is well understood: see, e.g., [7] and references therein.

The majority of papers on fault detection assume the input signal is “given”, and cannot be adjusted, and focus on the statistical estimation [7]. In this paper we consider the case when the input can be adjusted to some small degree. The input design problem then could be posed in two general formats: maximize model discrimination probability subject to signal constraints related to nominal system operation, or minimize probing signal magnitude (in some sense) subject to constraints on reliability of model discrimination. These are related to the “traditional” and “least costly” input design, respectively, for continuously parameterized model sets [8].

In [9], a frequency-domain approach to input design problem was proposed for model discrimination in terms of cumulative sum and probability ratio tests. In [10] it was assumed that initial conditions of the system and disturbance signals are bounded by a known value, and the objective is an input signal with the least power such that it is impossible for models to have the same output signal. Although this method brings an absolute discrimination, the corresponding optimization problem is quite demanding. In contrast, [11] proposes to maximizing the Kullback-Leibler (K-L) divergence [12] of the probability density functions (PDFs), corresponding to the output signals of models, from each other, assuming that initial conditions of the system can be chosen and a measurement noise signal is an iid random process with a normal distribution.

In the time domain, input-design problems have the structure of a nonconvex quadratic program. Recently, semidefinite relaxation techniques have been applied successfully to such problems in a wide variety of application areas [13] and in some cases can be proven to be very accurate, e.g. [14]. In this paper, we utilize and extend some of these methods for the problem of model discrimination.

The main contributions of this paper are: Section II: a model-selecting criterion, based on a modified version of the prediction error method (PEM) [2] which admits rigorous analysis in terms of hypothesis testing; Section III: a family of optimization problems for input design, subject to different discrimination criteria and signal constraints; Section IV: an approximate solution method for these (nonconvex) optimization problems using semidefinite relaxation; Section V: some cases when the proposed method is optimal, and VI: an analysis of the quality of the solutions when the method is sub-optimal. We conclude by presenting a simple example based on a benchmark problem in fault detection for a wind turbine. All the proofs of the main results are provided in APPENDIX.

We use the following notation conventions: denotes Euclidean norm of a finite-dimensional vector, is the -norm of a vector, is the induced norm of a matrix with respect to . The set of symmetric positive semidefinite matrices is denoted by , the operator selects the diagonal elements of a square matrix, the expectation operator is denoted by , and is the critical value of the chi-squared distribution with degrees of freedom and significance level .

Ii Model Discrimination

We consider an uncertain system of the form


with single-input single-output (SISO) operators and , and where is an input signal that we apply to the system, is an observed output signal, and is an unobserved signal that produces the additive disturbance, and is an initial condition (if present). We suppose that, based on a priori knowledge of the system in (1), there is available a finite number of causal, linear time-invariant (LTI), discrete-time, and SISO models


for each of which satisfies the following:

  1. The initial conditions of each model is described by a finite-dimensional vector where and are predetermined vector and matrix, respectively, and is a Gaussian random vector that has a zero mean and a covariance matrix for some constant .

  2. The disturbance dynamics is invertible and monic and has a zero initial condition at time .

  3. The signal is an iid random process and is also independent of . Each has a normal distribution, denoted by , with a zero mean value and some variance .

Note that the appearance of in the first and third assumption is just for notational simplicity, and does not involve any loss of generality because of the flexibility in .

Naturally, we assume that the models are distinct. Furthermore, the models may have different orders, which means that the dimensions of ’s may be different. For simplicity, the elements of and have the same variance , which works as a fitting parameter later.

Let be a positive integer and represent the length of an experiment. Then, using (2), we can describe the output signal of the -th model in “lifted form”: by


with and where the matrices , , and represent the -th model. For example, if is a state-space representation of the -th model, then we have

with and for . The matrices ’s are defined similarly to ’s. Note that the matrices and are lower triangular due to the causality of the models and, in particular, the ’s are invertible.

For the purposes of model discrimination, given input-output data of the real system: and , we construct for each model a vector and a signal like so:


where and means the Moore-Penrose pseudo inverse. These signals represent the initial condition and noise signals that would have been necessary for system to have produced the observed input-output data set, and parallels the use of “fictitious” reference signals in unfalsified adaptive control (See, for example, [15] and [16] for details).

If the elements of in (4) are plausible realisations of the random process associated with the -th model, then we may conclude that the -th model successfully describes the input-output data and . Note that if the initial condition is a deterministic vector, i.e. , we remove and in (4) and the signal reduces to the one-step-ahead prediction error [2] corresponding to the -th model. The presence of initial conditions for our criterion is important in fault detection because we expect to be examining the system during its normal operation, and the statistics of the initial conditions may come from, e.g., a bank of Kalman filters [7].

Based on the signals , we can use the maximum likelihood method to select a particular model. Denote the dimension of by so that we have . Also, denote by the PDF of a -dimensional multivariate normal distribution with a zero mean vector and a covariance matrix . Given in (4), an estimate


provides the maximum likelihood (over ) for the -th model, and then we select a model that has the least value for , i.e.


The selection criterion in (6) provides a definitive selection and only one model stands after this method is applied to collected data. On the other hand, sometimes we may want to consider all the models that do not show some evident inadequacy. We formulate this as a hypothesis testing for each model with

Null Hypothesis : The -th model produced the data with some less than or equal to a known value .

Unless this null hypothesis is rejected based on the collected data, we keep the -th model as a potential origin of the data. Thus, possibly multiple models remain as candidates after this testing.

For the -th model, we may reject the null hypothesis if the data shows a significant evidence that the estimated variance in (5) is greater than . Thus, using a chi-squared test, we reject the null hypothesis when a statistic is greater than . Therefore, the candidate models based on the collected data are described by a set


Iii Input Design for Model Discrimination

If the real system in (1) is well-described by one of the models, it is important to be able to distinguish it from other models. In this section, we consider design of a probing input signal that ensures the model selection procedure in Section II is successful.

Suppose that we wish to discriminate between models and , when in reality the system in (1) is compatible with model with a noise variance bounded by a known constant . Then, using (3), the output signal is given by

Now, if we test the data against model by applying the procedure in (4), we find that is


where and



Since is a Gaussian random vector with a zero mean vector and a covariance , it follows, from (8), that is a normally distributed random vector with a mean vector and a covariance matrix . Note in particular that is an affine function of and all other quantities are independent of . In the following theorem it is shown that if is sufficiently large for any , then the hypothesis testing in Section II brings statistically reliable results.

Theorem 1

Suppose that real system data are compatible with the -th model with some less than or equal to a known value . If, for any , either




then, with at least probability, only the -th model is not rejected by the hypothesis test (7).

Note that the probability of the correct model being selected increases as we increase the critical values of the chi-squared distribution in (10) and (11).

Based on this proposition, we design an input signal satisfying


for all satisfying where

Note that, for any given unordered pair or any given two models, there is only one condition imposed by (12). Thus, the total number of conditions is .

Roughly speaking, purpose of the conditions (10) and (11) is to make and for any to be far apart from each other with high probability by placing the vectors and away from the zero vector. This can be achieved by suitably chosen . Alternatively, when the dimensions of and are the same, we can also pursue the same goal by increasing the Kullback-Leibler divergence

of from where and are the PDFs of and , respectively. This formulation was studied in [11], however our condition in (12), combined with the hypothesis testing in Section II, provides an explicit reliability measure for model discrimination.

Iii-a A Family of Nonconvex Quadratic Optimization Problems

We have shown that model discrimination is improved by increasing the norm of the vectors for each pair of models . With this in mind, there are a number of reasonable formulations of specific optimization problems, trading off different measures of signal size and model discrimination ability.

Since each is an affine functions of the control input , unless constraints are imposed on the input it is clear that maximizing model discrimination leads to an unbounded and ill-posed optimization problem. Given some positive bounds we consider some natural constraints:


where the subscripts refer to input and output, and 2-norm and -norm. Notice that the constraints on the output are the maximum over all models in the set. There may be other constraints that are natural to consider, e.g. move size (change in ), or the size of other states in a state-space model, or a weighted combination of inputs and outputs as in a linear quadratic regulator. These could also easily be used in the framework we propose.

Let be the total number of the unordered pairs of the models, and


We consider two natural measures of model discrimination, a “worst case” criterion, derived from (12):


and a somewhat easier “weighted average” case:


where ’s are the weights. The latter may be appropriate if based on prior data certain models are highly likely and should be favored for discrimination.

Note that all of (13)-(16) and (19), (20) are convex quadratic functions of . With these signal properties and discrimination factors, we can consider either the “least costly” input signal for guaranteed discrimination:


with and . Alternatively, we can consider maximizing discrimination reliability subject to hard constraints on the input:


Both of these formulations are nonconvex quadratic optimization problems, due to the constraint in (21) and the maximization in (22). There is no known polynomial-time algorithm for nonconvex quadratic optimization, and it is generally considered unlikely one will be found since they belong to the class of NP-hard problems [13]. For this reason, we pursue a convex relaxation technique.

Iv Relaxation to a Semidefinite Program

Semidefinite relaxation is a general approach for problems of the form subject to where and , . In particular, it is not assumed that are positive semidefinite (which would bring convexity). These functions can be homogenised as

The simple fact that , and the fact that any matrix which is rank one can be decomposed as leads to the equivalent problem:

This is an exact reformulation of the problem, and all constraints are convex except for the rank constraint. Semidefinite relaxation consists of dropping the rank constraint, which results in a semidefinite program [14], [13]. Relaxations always give an “optimistic” value, since they optimize the same objective function over a larger feasible set. The quality of a relaxation is determined by the gap between the true optimum and the optimum of the relaxation.

To apply this idea to the model discrimination problem, we construct homogenised forms of the key quantities from the previous section. The discrimination factor is given by


and the various signal constraints can be represented as


where is the indicator vector for element , and can be imposed for any vector by .

These can be equivalently represented in terms of a matrix variable :

with additional conditions , , and . The optimization problems are now described in terms of instead of . The SDR is completed by dropping the rank constraint .

Given the various and from the previous section, we denote the semidefinite relaxation forms of these by and . So we can again have general problems of the form




as relaxations of (21) and (22), respectively. Clearly, one can also add multiple constraints (e.g. on the input and output) and retain the same SDP structure, or alternative cost functions such as LQR. We do not give every possible permutation here, but for example, the relaxation of


is given by




The SDP relaxations generally give “optimistic” results, i.e. under-estimations of required signal power in (21) and over-estimations of discrimination power in (22). However, the advantage is that they can be efficiently solved (polynomial time to a given accuracy) using freely available solvers such as Sedumi [17] and interfaces such as Yalmip [18] and CVX [19].

V Optimal Solutions of the True Problems From the Relaxation

Although in general the objective value of (21) and (25), or (22) and (26), will be different, there are some cases in which they are the same. That is, there is no gap between the optimal values of the relaxed problem and the true problem. In particular, this is the case for either the “least costly” formulation (21) or “traditional” formulation (22) under the following situations:

  1. The signal conditions are of “power” type, i.e. or or another quadratic, e.g. LQR, and

    1. The model discrimination constraint is of the weighted average form , or

    2. The model discrimination is absolute but there is only two models to discriminate between, giving in (19).

Under any of these scenarios, (21) or (22) become problems to optimize (minimize or maximize) a quadratic function subject to a single quadratic constraint. For this problem structure, it is known that the semidefinite relaxation has no gap [20], [21, Appendix B].

Let us also consider a more general case involving more constraints or models to distinguish:


where and are positive integers, is a constant, and ’s and ’s are matrices. It should be clear that SDPs in (25) and (26) are special cases of this SDP. In particular, the “zero gap” cases we described above correspond to .

Then, the following proposition is a straightforward extension [22], [23], [24].

Proposition 1

If the optimization problem in (31) with has an optimal solution, then there exists an optimal solution satisfying

Remark 1

When , as in the cases described above, Proposition 1 guarantees that a solution to the corresponding SDP has rank , which implies that there exists satisfying . And, this is the optimal solution to the original optimization problem and satisfies and .

In the case that , we develop a slightly weaker result:

Proposition 2

If the optimization problem in (31) has an optimal solution , then there exists an optimal solution satisfying


The proof of this proposition provides a simple rank reduction algorithm and we can employ this algorithm in order to search for an optimal solution of the lowest possible rank.

Vi Suboptimal Solutions via the Sdr and Randomization

If an optimal solution of one of the SDPs in (25) and (26) satisfies as studied in Section V, then the solution can be easily decomposed into an optimal solution of the original optimization problem. If is of low rank, but greater than 1, then this in a sense reduces the dimensionality of the search space for a control input. In fact, there are some cases where it can be proved that near-optimal feasible solutions can be generated from the relaxation and sampling schemes.

Considering problem (22) with input amplitude is constrained, i.e. with , and the weighted average model discrimination as an objective function, i.e. , then the results of [14, Sec 4.1] can be directly applied to prove that

Furthermore, a simple randomization procedure with rounding will achieve that accuracy in expectation. We omit details here due to space, but it is the same as in the proof of the main result in [14], and also applied in [5].

Moving to more general problems, consider the optimization problem in (21) with the choice of (19) and its stochastic version

with a random vector that has a standard normal distribution. Then, we can rewrite this stochastic optimization problem as


It can be shown that this optimization problem is the same as the SDP in (25) in the sense that for an optimal solution to the SDP in (25) and an optimal solution and to the SDP in (33). Thus, the SDP in (25) can be viewed as a stochastic version of the original optimization problem in (21). Similarly, the other SDPs can be viewed as stochastic versions of their original optimization problems in (21) and (22), respectively.

In light of the fact that the SDPs are stochastic versions of the original optimization problems, randomization approaches (e.g. [13] and [6]) are natural choices in order to extract, from the solutions of the SDPs, a feasible solution to the original optimization problems. We employ the following algorithm for each optimization problem.

Algorithm 1

Given an SDP from (25) and (26), denote its optimal solution by .

Step : Generate a realization of a standard normal distribution.

Step : Search for a constant such that (i) is a feasible solution to the original optimization problem corresponding to the SDP and (ii) produces a better optimal value for the original optimization problem than with any other constant . If such a constant does not exist, go to Step .

Step : Update if this vector produces the best objective value so far through this algorithm. If the number of the generations of is less than a certain positive number, then go to Step . Otherwise, terminate the algorithm.

Even though the vectors that are generated in Step suggest a solution to the original optimization problem in the sense of average, each vector may not satisfy the constraints of the original optimization problem. Thus, the vectors are scaled by in Step in order to meet the constraints and, at the same time, to find a better solution than . This can be viewed as a line search.

It is possible that a constant in Step does not exist for some . For example, consider the optimization problem in (21) with a choice of (19) and its corresponding SDP in (25) for Algorithm 1. If, for a vector generated in Step , there exists an such that and , then the scaling scheme does not produce a feasible solution and, hence, Algorithm 1 returns back to Step and generate another vector. However, since ’s and are not zero matrices and a random vector has a continuous PDF, we have , which means that there exists in Step , with probability . Thus, every iteration of Algorithm 1 produces, with probability , a feasible solution to the original optimization problems.

When Algorithm 1 is performed on an optimization problem whose optimal value is approximately known, we can modify Algorithm 1 to be terminated in Step if the current vector produces an objective which is sufficiently accurate. In the next section, we construct such an approximation of the optimal value.

Vi-a Quality of the suboptimal solutions

Although the SDPs and Algorithm 1 can provide fairly good solutions with high probability, the optimal values of the original optimization problems are unknown. Instead, in this section, we attain, for some optimization problems in (21) and (22), regions where the optimal values reside in. These regions can provide some ideas about the accuracy of the computations via SDR.

We consider the optimization problem in (21) and its SDP in (25) in Lemma 1 below. This lemma is an extension from Theorem 1 in [13] and, thus, the proof of the lemma follows, in general, the proof of the theorem.

Lemma 1

Let and be the optimal solutions of the optimization problems in (21) with choices of in (19) and in (13) or (15) and its corresponding SDP in (25), respectively. Then, we have

where is a random vector with a standard normal distribution and

where , , are the singular values of .

Note that, with the choice of , we have and . Further, if , then Proposition 3 below shows that and , from which it follows that and and we have

where is replaced with since there is no additional constraint from homogenization (see Theorem in [13]).

The upper bound in Lemma 1 depends on , the least sum of the squares of the singular values of . A larger value for is preferable for a tighter bound and the following proposition provides its property.

Proposition 3

Let and be optimal solutions to the optimization problem in (21) with choices of in (19) and in (13) or (15) and the SDP in (25), respectively. The constant in Lemma 1 satisfies

Furthermore, in the case that , either we have or a matrix is also an optimal solution to the SDP in (25), which leads to

Vii An Example

In this section, the input signal design algorithm is applied to a fault detection problem for wind turbines.

A pitch angle of a blade of a wind turbine is the angle between the rotor plane and the blade chord line and, thus, a pitch angle means that the blade is aligned in parallel with the rotor plane. The blade is rotated by a hydraulic system and a popular model of this actuator is a closed-loop transfer function between the pitch angle and a reference angle

where and are the damping ratio and the natural frequency, respectively. See, for example, [25] for the details. In a normal condition, the parameters are and .

There are two major faults that can happen in the pitch angle control and we consider only one of them that is caused by an abrupt drop of the hydraulic pressure. In this case, the parameters change to and .

In order to detect the fault, i.e. to distinguish between two models based on their input-output signal, we first discretize two models of the actuators to obtain and corresponding to and , respectively. The discretization is performed with a sampling time and a zero-order hold. In order to complete the model structures as in (2), we use identity operators for both and and the initial conditions, the current pitch angle and pitch angular velocity, of both models have mean vectors and for their covariances. And, we assume that the values and are less than or equal to .

Then, we search for an input signal, for a time horizon , using an optimization problem in (22) with choices of in (19) and in (13) combined with , i.e. we maximize the level of model discrimination while keeping the power of the input signal below . As guaranteed by Remark 1, its corresponding SDP in (26) has an optimal solution of rank . Thus, this SDP is solved by the CVX followed by a rank reduction procedure. The designed input signal is shown in Fig. 1 (Top).

Fig. 1: The designed input signal (Top) and the corresponding empirical PDFs of and when the system is in the normal condition (Middle) and in the faulty condition (Bottom).

First, we apply the designed input signal to the first model , which corresponds to the normal condition, with from time to and, then, compute estimates and . This simulation is repeated times to obtain empirical PDFs of and , which are shown in Fig. 1 (Middle). As shown in the figure, the second model, which corresponds to the faulty condition, produces greater values for the estimate with high probability and, thus, the model discrimination method in Section II selects the first model, which is the correct model, with high probability. In Fig. 1 (Bottom), empirical PDFs of and are shown when the system is in the faulty condition, from which it is evident that the model discrimination method selects the correct model, which is the second model, with high probability.

For comparison, we employ a step input signal with as its amplitude, shown in Fig. 2 (Top), for the same simulation and obtain empirical PDFs of and in Fig. 2 (Middle) and (Bottom) when the system is in the normal condition and the faulty condition, respectively.

Fig. 2: A step input signal (Top) and the corresponding empirical PDFs of and when the system is in the normal condition (Middle) and in the faulty condition (Bottom).

This input signal is common in practice but, as can be seen from the figures, model discrimination is impossible.

Viii Conclusion

In this paper, we give a procedure for the design of probing input signals for model discrimination and fault detection over finite time intervals. The design method uses a likelihood-based model selection criteria, which we obtain from a modification of PEM to accommodate probabilistic structures of initial conditions of models. From this, we obtain conditions on input signals that guarantee that the hypothesis testing distinguishes models from each other with a given level of confidence.

From this general setting, several specific optimization problems are constructed from different constraints on the input, and different criteria for model discrimination. These optimization problems are nonconvex, and difficult to solve in general, so we suggest a solution procedure based on semidefinite relaxation and sampling.

The quality of this relaxation scheme is assessed based on some known results from duality of nonconvex quadratic programming and randomization algorithms. The utility of the method is assessed with an example of fault detection in a wind turbine.

An interesting future application will be to combine this design scheme with some preexisting control laws to form a so-called dual control, that optimizes both model discrimination ability and some other control objectives. This may be useful for switching adaptive control. It is straightforward to include in our proposed method linear and quadratic costs, so a natural controller to combine with would be MPC.


Viii-a Proof of Theorem 1

Since is a Gaussian random vector with a zero mean vector and a covariance , we have, from the definition of the value ,

from which, together with (5) and the fact that and , it follows that