An optimization framework for resilient batch estimation in CyberPhysical Systems
Abstract
This paper proposes a class of resilient state estimators for LTV discretetime systems. The dynamic equation of the system is assumed to be affected by a bounded process noise. As to the available measurements, they are potentially corrupted by a noise of both dense and impulsive natures. The latter in addition to being arbitrary in its form, need not be strictly bounded. In this setting, we construct the estimator as the setvalued map which associates to the measurements, the minimizing set of some appropriate performance functions. We consider a family of such performance functions each of which yielding a specific instance of the general estimator. It is then shown that the proposed class of estimators enjoys the property of resilience, that is, it induces an estimation error which, under certain conditions, is independent of the extreme values of the (impulsive) measurement noise. Hence the estimation error may be bounded while the measurement noise is virtually unbounded. Moreover, we provide several error bounds (in different configurations) whose expressions depend explicitly on the degree of observability of the system being observed and on the considered performance function. Finally, a few simulation results are provided to illustrate the resilience property.
1.05
I Introduction
Context
We consider in this work the problem of designing state estimators which would be resilient against an (unknown) sparse noise sequence affecting the measurements. By sparse noise we refer here to a signal sequence which is of impulsive nature, that is, a sequence which is most of the time equal to zero, except at a few instants where it can take on arbitrarily large values. The problem is relevant for example, in the supervision of CyberPhysical Systems (CPS) [7]. In this application, the supervisory data may be collected by spatially distributed sensors and then sent to a distant processing unit through some communication network. During the transmission, the data may incur intermittent packet losses or adversarial attacks consisting in e.g., the injection of arbitrary signals. Beyond CPS, there are many other applications where the sparse noise model of uncertainty is relevant: robust statistics [13], hybrid system identification [1], intermittent sensor fault detection, etc.
Related works
The problem of estimating the state of CPS under attacks has been investigated recently through many different approaches.
Since the measurements are assumed to be affected by a sequence of outliers which is sparse in time, a natural scheme of solution to the state estimation problem may be to first process the data so as to detect the occurrences of the nonzero instances of that sparse noise, remove the corrupted data and then proceed with classical estimation methods such as the Kalman filter or the Luenberger type of observer [16, 19]. Regarding the scenarios where the sporadic noise is modeled in a probabilistic setting, there exists a body of interesting results providing performance limits of estimation schemes [24, 17, 20].
Another category of approaches, which are inspired by some recent results in compressive sampling [6, 10], rely on sparsityinducing optimization techniques. A striking feature of these methods is that they do not treat separately the tasks of detection, data cleaning and estimation. Instead, an implicit discrimination of the wrong data is induced by some specific properties of the tobeminimized cost function. One of the first works that puts forward this approach for the resilient state estimation problem is the one reported in [9]. There, it is assumed that only a fixed number of sensors are subject to attacks (sparse over time but otherwise arbitrary disturbances). The challenge then resides in the fact that at each time instant, one does not know which sensor is compromised. Note however that the assumptions in [9] were quite restrictive as no process noise or measurement noise (other than the sparse attack signal) was considered.
These limitations open ways for later extensions in many directions. For example, [23] suggests a reformulation which is argued to reduce computational cost by using the concept of eventtriggered update
; [18] considers an observation model which includes dense noise along with the sparse attack signal. In [8], the assumption of a fixed number of attacked sensors is relaxed. Finally, the recent paper [12] proposes a unified framework for analyzing resilience capabilities of most of these (convex) optimizationbased estimators. Although a bound on the estimation error was derived in this paper, it is not quantitatively related to the properties (e.g., observability) of the dynamic system being observed. The state estimation problem treated there is rather viewed as a linear regression problem similarly to [2, 5].
Contributions
The contributions of the current paper consist in the design and the analysis of a class of optimizationbased resilient estimators for Linear TimeVarying (LTV) discretetime systems. The available model of the system assumes bounded noise in both the dynamics and the observation equation with the latter being possibly affected by an unknown but sparse attack signal. Contrary to the settings considered in some existing works, we did not impose here any restriction on the number of sensors which are subject to attacks, that is, any sensor can be compromised at any time. Note also that no statistical significance is attached to the uncertainties modeled by noise. In this setting, by generalizing our previous work reported in [15], the current paper proposes a general (robust) estimation framework for the state of LTV systems. We propose a class of state estimators constructed as the setvalued maps which associate to the output measurements, the minimizing set of some appropriate performance functions. A variety of performance functions are considered for the design of the estimator and handled in a unified analysis framework: convex nonsmooth/smooth loss functions and nonconvex saturated ones. Our main theoretical results concern the resilience analysis of the proposed class of estimators. We show that the estimation error associated with the new class of estimators can be made, under certain conditions, insensitive to the (possibly very large) amplitude of the sparse attack signal. The proposed analysis relies on new quantitative characterizations of the observability property of the system whose state is being observed. Although the derived error bounds may be conservative, they have the important advantage of being explicitly expressible in function of the properties of the considered dynamic system and those of the optimized loss functions. This makes them valuable qualitative tools for assessing the impact of the estimator’s design parameters and that of the system matrices on the quality of the estimation. For example, the proposed error bounds reflect the intuition that the more observable the system is with respect to the new criteria, the larger the number of instances of gross values (of the output noise) it can handle and the smaller the values of the bounds. Finally the paper shows that for some choice of the design functions (loss functions), some instances of the proposed family of estimators enjoy the exact recoverability property in the particular situation where the measurements are corrupted only by sparse noise. We present a condition for this property that can be numerically verified by means of convex optimization.
Outline
The rest of the paper is structured as follows. We start by introducing in Section II, the settings for the resilient state estimation problem. We then define in Section III the new class of optimizationbased estimators proposed here to address this problem. The analysis of this new framework is presented in Section IV. In Section V, we discuss further the property of exact recoverability of the proposed set of estimators in the presence of only sparse measurement noise. In Section VI, we comment on the numerical verification of the conditions derived in the analysis part. Some numerical results are described in Section VII and finally, concluding remarks are given in Section VIII.
Notations
(respectively ) is the set of nonnegative (respectively positive) reals. designates the set of real numbers excluding zero. We note the set of (column) vectors with real elements and , the set of real matrices with rows and columns. If , then will designate the transposed matrix of . will refer to the (square) identity matrix of appropriate dimension. The notation will denote a norm over a given set (which will be specified when necessary). denotes the norm (for ) or the quasinorm (for ) defined for in by . The limit of this when gives the socalled norm of , i.e., the number of nonzero entries in . Its limit when gives the infinity norm denoted and returning the maximum value of the . For , refers to the exponential function applied to .
If is a set, then is the power set of . If is a finite set, the notation refers to the cardinality of .
functions [14]. We name the set of functions which are continuous, zero at zero, strictly increasing and satisfy . Similarly, we use the notation to denote the set of saturated functions which are continuous, zero at zero, strictly increasing on and such that for all . For any invertible function , we use to denote its inverse function.
Supremum. Given a function over and a subset of , the notation , with , will mean that for all in , . This notation includes the case where the supremum is but is not attained by any element of .
Ii The Resilient Estimation Problem
Consider a discretetime Linear TimeVarying (LTV) system described by
(1) 
where is the state vector of the system at time and is the output vector at time ; and are families of matrices with appropriate dimensions; is an unobserved bounded noise sequence. As to , it is regarded here as an (unobserved) arbitrary noise sequence affecting the measurements. For clarity of the exposition, it may be convenient to view as a combination of two types of sequences: a bounded sequence and a sparse sequence (this decomposition is indeed always possible for an arbitrary noise signal). Hence we may write
(2) 
where is a sensor noise of moderate amplitude and is a sparse noise sequence in the sense that its (entrywise and/or timewise) components are mostly equal to zero but its nonzero elements can take on (possibly) arbitrarily large values. Such a sparse sequence may account for adversarial attacks in the same spirit as in [9, 12], intermittent sensor faults, or data losses, in particular when a communication network is involved in the data acquisitiontransmission chain. In the sequel, we may also refer to and in (1) and (2) as dense noises and to the largest elements of as outliers.
Problem
The problem considered in this paper is the one of estimating the states of the system (1) on a time period given measurements of the system output. We shall seek an accurate estimate of the state despite the uncertainties in the system equations (1) modeled by and the characteristics of which are described above. In particular, we would like the tobedesigned estimator to produce an estimate such that the estimation error is, when possible, independent of the maximum amplitude of . Such an estimator will be called resilient.
Denote with
(3) 
the actual state trajectory of the system on a finite time horizon of length . Similarly, we use the notation
(4) 
to refer to the collection of measurements on the same time horizon. The state estimation problem is approached here from an offline perspective, therefore is fixed. For the sake of simplicity, the index will be dropped from the variable names and it will be assumed that matrices without an index represent values on the period . To simplify further the formulas, we also pose while will be a set indexing the sensors (or the rows of the matrices in (1)).
Iii Optimizationbased approach to Resilient state estimation
Iiia The state estimator
In this section we present an optimizationbased framework for solving the state estimation problem defined above. To define formally the proposed state estimator, let us first introduce the tobeminimized objective function. Given the matrices of the system (1) and output measurements , we consider a performance function defined by
(5) 
where is a hypothetical trajectory matrix with denoting the th column of ; is a userdefined parameter which aims at balancing the contributions of the two terms involved in the expression of the performance index . and are two families of positive functions (called here loss functions) defined on and respectively. For the sake of simplicity, we will assume throughout the paper that for all in , and can be expressed by
(6)  
(7) 
where and are two fixed loss functions and and are two families of nonsingular weighting matrices with appropriate dimensions.
Definition 1.
Given a system such as the one in (1) and given an output measurement matrix , we define a state estimator to be a setvalued map which maps to a subset of the space of possible trajectories of the system.
We consider a class of state estimators defined by
(8) 
As such the estimator is welldefined if for any fixed , admits a non empty minimizing set, that is, if there exists at least one such that for all . To ensure this property we will need to put an observability assumption on the system whose state is being estimated and require some further properties on the loss functions and entering in the definition of the objective function .
IiiB Welldefinedness of the estimator
Let us start by stating the properties required for the loss functions involved in the definition of . Due to the multiple usages that will be made of these properties, it is convenient to state them for a generic loss function defined on a set of matrices (of which vectors constitute a special case). Throughout this paper, a loss function is a positive function which will be required to satisfy a subset (depending of the specific usage) of the following properties:

Positive definiteness: and for all

Continuity: is continuous

Symmetry: for all

Generalized Homogeneity (GH): There exists a function such that for all and for all ,
(9)

Generalized Triangle Inequality (GTI): There exists a positive real number such that for all , in
(10)
It can be usefully observed for the future developments that (10) can be equivalently written as
Examples of loss functions
Note that norms on satisfy naturally the properties 1–5 with and , hence yielding the classic homogeneity property and triangle inequality. It can also be checked that functions of the form with , fully qualify as loss functions in the sense that they fulfill all the properties 1–5. In this case, in (10) can be taken equal to if and otherwise. Lastly we note that if satisfies 1–3 and 5, then so does the function defined by (see Lemma 7 in the appendix). Similarly, saturated functions of the form for some satisfy 1–3 and 5. In the case of convex functions, a link can be established between 4 and 5.
Lemma 1 ([15]).
Observe that quadratic functions of the form with being a positive definite matrix and referring to the trace of a matrix, satisfy properties 1–4 with a function . Since such functions are convex, it follows from Lemma 1 above that they also verify 5 for .
We now recall from [15] a technical lemma which will play a fundamental role in analyzing the properties of the estimator (8). In particular, our proof of welldefinedness relies on this lemma.
Lemma 2 (Lower Bound of a loss function).
Proof.
We start by observing that the unit hypersphere is a compact set in the topology induced by the norm . By the extreme value theorem, being continuous, admits necessarily a minimum value on , i.e., there is such that for all . For any nonzero , so that . On the other hand, by the relaxed homogeneity of ,
Moreover, this inequality holds for . It therefore holds true for any . ∎
Proposition 1 (Welldefinedness of the estimator).
Hence the condition of the proposition guarantees that is non empty for all . Before proving this result, we first make the following observation.
Lemma 3 (Equivalent condition of Observability).
A proof of this lemma is reported in Appendix B. The function can be interpreted here as a gain function which measures how much the system is observable with regards to the two families and : the more the system is observable, the more amplifies its argument magnitude, making different trajectories more discernible.
Proof of Proposition 1: The idea of the proof is to show that is coercive (i.e., continuous and radially unbounded) for any given and then apply a result^{1}^{1}1Note that radial unboundedness is equivalent to the levelboundedness in the terminology of [21]. in [21, Thm 1.9] to conclude on the attainability of the infimum (which certainly exists since is a positive function). Clearly, is continuous as a consequence of and being continuous by assumption (see property 2). We then just need to prove the radial unboundedness of , i.e., for an arbitrary norm on the space and for all fixed . Since satisfies property 5, there exists a constant such that . Applying this property leads naturally to
where
(15) 
It can then be shown (following a similar reasoning as in Appendix B), under the observability assumption, that satisfies the conditions of Lemma 2. It follows that for any norm on , there exists a function such that
Combining this with the inequality above, we obtain that
which implies the radial unboundedness of for any fixed . Hence the estimator (8) is welldefined as stated. ∎
As it turns out from Proposition 1, observability of system (1) and properties 1–4 imposed on and ensure that is a non empty set for any given . We then call any member of , an estimate of the state trajectory of system (1) on the time interval . In particular, is called an estimate of the state at time .
To conclude this section, note that the definition of the estimator in (8) does not require any convexity assumption on the objective function . Hence the theoretical analysis to be presented in the next sections does not make use of convexity either. However, we may prefer in practice to select convex loss functions and . In effect, the elements of are not necessarily expressible through an explicit formula. So, in practice one would resort instead to numerical solvers to approach the solution of the underlying optimization problem. And the numerical search process is more efficient when is a convex function of [3, 11].
Iv The resilience property of the proposed class of estimators
In this section, we prove that the state estimator proposed in (8) possesses the resilience property under some conditions. More specifically, our main result states that the estimation error, i.e., the difference between the real state trajectory and the estimated one, is upper bounded by a bound which does not depend on the amplitude of the outliers contained in provided that the number of such outliers is below some threshold.
For convenience, let us introduce a few more notations. Let and be defined by
(16)  
(17) 
We also introduce the partial cost function defined for any by . We will assume throughout the paper that the loss functions and satisfy a subset of the properties 1–5 and in particular, when they are required to satisfy the GTI 5, we will denote the associated positive constants with and respectively. Finally, let us pose
(18) 
We will organize the resilience analysis for the estimator (8) along two cases: first, the scenario where the gross error vector sequence in (2) is blocksparse in time and then the situation where it is both componentwise and temporally sparse. To be more precise, if we denote with the matrix formed from the sequence , then the first case refers to columnwise blocksparsity of while the second is related to an entrywise sparsity. Note that the two cases coincide when the system of interest is singleinput singleoutput (SISO).
Iva Resilience to intermittent timewise blocksparse errors
We start by introducing the concept of Resilience index of an estimator such as the one in (8), a measure which depends of the system matrices, the structure of the performance function and on the loss functions and .
Definition 2.
Let be a nonnegative integer. Assume that the system in (1) is observable on . We then define the Resilience index of the estimator in (8) (when applied to ) to be the real number given by
(19) 
where is as defined in (18). The supremum is taken here over all nonzero in and over all subsets of with cardinality equal to .
The index can be interpreted as a quantitative measure of the observability of the system . The observability is needed here to ensure that the denominator of (19) is different from zero whenever . Furthermore, it should be remarked that for any , which implies that the defining suprema of are welldefined.
In order to state the resilience result for the estimator (8) when applied to system , let us introduce a last notation to be used in the analysis. Let be a given number. For any admissible sequence in (1), we can split the time index set into two disjoint label sets,
(20) 
indexing those which are upper bounded by and indexing those which are possibly unbounded. It is important to keep in mind that is just a parameter for decomposing the noise sequence in two parts in view of the analysis (and not necessarily a bound on elements of the sequence ). For example, taking would be appropriate for analyzing the properties of the estimator when is strictly sparse and each of its nonzero elements is treated as an outlier.
Theorem 1 (Upper bound on the estimation error).
Consider the system defined by (1) with output together with the state estimator (8) in which the loss functions and are assumed to obey 1–5. Denote with and the constants associated with the GTI 5 for and respectively.
Let and set .
If the system is observable on and , then there exists a function such that for any norm on ,
(21) 
with denoting the true state matrix from (1), and and being defined by
(22)  
(23) 
Proof.
Let in . By definition of in (8), we have , which gives explicitly
(24) 
Using the fact that from (1) and applying the GTI and the symmetry properties of , we can write
with . It follows that the first term on the left hand side of (24) is lower bounded as follows
(25) 
Similarly, by making use of (1), observe that . We now apply the GTI and symmetry of in two different ways depending on whether belongs to or :
These inequalities imply that the second term on the left hand side of (24) is lower bounded as follows
(26)  
Combining (24), (25) and (26) gives
which, by using (17), (18), (22), can be written as
with . As has elements, applying the definition of gives
(27) 
By the assumption that we have that
, and consequently, that
(28) 
Given that the system is observable on , it can be shown, thanks to Lemma 5 in the Appendix that satisfies properties 1–4 (the proof of this is quite similar that of Lemma 3 in Appendix B). We can therefore apply Lemma 2 to conclude that for any norm , there exists a function such that
(29) 
with defined by . Finally, the result follows by selecting in (21) to be with denoting the inverse of . ∎
Strict resilience
Now we state our (strict) resilience result as a consequence of Theorem 1 when the output error measuring loss function satisfies the triangle inequality.
Corollary 1 (Resilience property).
Let the conditions of Theorem 1 hold with the additional requirement that . Then
(30) 
Proof.
The resilience property of the estimator (8) lies here in the fact that, under the conditions of Theorem 1 and Corollary 1, the bound in (30) on the estimation error does not depend on the magnitudes of the extreme values of the noise sequence . Considering in particular the function , we remark that it can be overestimated as follows
(31) 
The first term on the left hand side of (31) represents the uncertainty brought by the dense noise over the whole state trajectory. It is bounded since is bounded by assumption (see the description of the system in Section II). The second term is a bound on the sum of those instances of whose magnitude is smaller that .
Because is a function of , the bound in (30) represents indeed a family of bounds parameterized by . Since is a mere analysis device, a question would be how to select it for the analysis to achieve the smallest bound. Such favorable values, say , satisfy
Another interesting point is that the inequality stated by Theorem 1 holds for any norm on . Note though that the value of the bound depends (through the parameter ) on the specific norm used to measure the estimation error. Moreover, different choices of the performancemeasuring norm will result in different geometric forms for the uncertain set, that is, the ball (in the chosen norm) centered at the true state with radius equal to the upper bound displayed in (30).
We also observe that the smaller the parameter is, the tighter the error bound will be, which suggests that the estimator is more resilient when is lower. A similar reasoning applies to the number which is desired to be large here. These two parameters (i.e., and ) reflect properties of the system whose state is being estimated. They can be interpreted, to some extent, as measures of the degree of observability of the system. In conclusion, the estimator inherits partially its resilience property from characteristics of the system being observed. This is consistent with the wellknown fact that the more observable a system is, the more robustly its state can be estimated from output measurements.
Approximate resilience
As discussed above, when the loss function does not satisfy the triangle inequality (i.e., ), the term in (21) is unlikely to vanish completely. However we can prevent it from growing excessively by an appropriate choice of in (7). To see this, assume for example that is defined by . Then since for all , saturates to a constant value regardless of how large the are for . On the other hand, this choice introduces a new technical challenge related to the fact that the function in (29) is no longer a function but a bounded (saturated) function. Handling this situation will require some additional condition on the upper bound in (28). To sum up, by selecting a saturated loss function for , we obtain the following approximate resilience result.
Corollary 2 (Case where ).
Let the conditions of Theorem 1 hold. Assume further that the loss function in (7) is defined by where satisfies 1–5 and that let be such that
(32) 
where and . Then there exists a continuous and strictly increasing function (obeying and ) such that for any norm on ,
(33) 
with in (32) defined as in the proof of Theorem 1 using the norm .
Proof.
That the particular function specified in the statement of the corollary satisfies the properties 1–3 and 5 is a question which is fully answered by Lemma 7 in Section C of the appendix. Consequently, let us observe that (28) still holds true here. As to (29) it also holds as well but with the slight difference that is just a saturated function in (as defined in the notation section) with bounded range . This results in fact from Lemma 7 and the proof of Lemma 2. We can therefore write
with . Note from the definition of the class , that implies that (since otherwise we would have ). Letting be the restriction of such a function on , we have with being invertible. We can now apply to each member of this inequality to reach the desired result since lies in the range of . ∎
IvB Resilience to attacks on the individual sensors
We now consider the situation where the matrix formed from in (2) may be sparse entrywise. This case is relevant when any individual sensor may be faulty (or compromised by an attacker) at any time. To address the resilient state estimation problem in this scenario, we select the loss functions to have a separable structure. To be more specific, let be such that for any
(34) 
where, consistently with (7), with and , , being some loss functions on enjoying the properties 1–5. As in the statement of Corollary 1, we shall require that . It follows that one can set to be the absolute value without loss of generality. Let therefore set so that and
(35) 
with being a diagonal matrix having the , , on its diagonal.
To state the resilience property in this particular setting, we partition the index set of the entries of as
(36)  
with denoting the th entry of the vector . Also, in order to account for the specificity of the new scenario, let us refine slightly the Resilience index (19) to be
(37) 
where is defined as in (18) from in (35) and is th row of the observation matrix . The difference between in (19) and in (37) resides in the index sets for counting possible error occurrences which are and , respectively.
With these notations, we can provide the following theorem which is the analog of Corollary 1 in the case where the disturbance matrix (see Eq. (2)) is entrywise sparse.
Theorem 2 (Upper bound on the estimation error: Separable case).
Consider the system defined by (1) with output together with the state estimator (8) in which is assumed to obey 1–5 and is defined as in (35).
Let and set with defined in (36).
If the system is observable on and if , then there exists a function such that for all norm on ,
(38) 
with denoting the true state matrix from (1) and defined by
is defined as in the statement of Theorem 1 with being constructed from in (35).
To some extent, Theorem 2 can be viewed as a special case of Theorem 1 in which the function is taken to be the norm and the data set is modified to be . Hence the proof follows a similar line of arguments as that of Theorem 2.
An interesting property of the estimator can be stated in the absence of dense noise, i.e., when only the sparse noise is active:
Corollary 3.
Proof.
This follows directly from the fact that in the case where there is no dense noise and . ∎
Therefore, the estimator (8) has the exact recoverability property, that is, it is able to recover exactly the true state of the system (1) when only the sparse noise is active in the measurement equation provided that the number of nonzero in the sequence is small enough for to be less than . According to our analysis, the number of outliers that can be handled by the estimator in this case can be underestimated by
(39) 
V Further discussions on the exact recoverability property of the estimator
In this section, we discuss further the exact recoverability property of the estimator (8) evoked by Corollary 3. For this purpose let us assume that the process noise in (1) is identically equal to zero and that the sequence is sparse in the sense that its dense component displayed in (2) does not exist.
In this setting we can obtain a more resilient (to sparse noise in the measurement) estimator than (8) by making it aware of the absence of dense process noise. This can be achieved by contraining the searched state matrix to be in the set defined by