Optimal Attack Strategies Subject to Detection Constraints Against Cyber-Physical Systems

# Optimal Attack Strategies Subject to Detection Constraints Against Cyber-Physical Systems

Yuan Chen, Soummya Kar, and José M. F. Moura Yuan Chen {(412)-268-7103}, Soummya Kar {(412)-268-8962}, and José M.F. Moura {(412)-268-6341, fax: (412)-268-3890} are with the Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15217 {yuanche1, soummyak, moura}@andrew.cmu.eduThis material is based upon work supported by the Department of Energy under Award Number DE-OE0000779 and by DARPA under agreement number DARPA FA8750-12-2-0291. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government.
###### Abstract

This paper studies an attacker against a cyber-physical system (CPS) whose goal is to move the state of a CPS to a target state while ensuring that his or her probability of being detected does not exceed a given bound. The attacker’s probability of being detected is related to the nonnegative bias induced by his or her attack on the CPS’s detection statistic. We formulate a linear quadratic cost function that captures the attacker’s control goal and establish constraints on the induced bias that reflect the attacker’s detection-avoidance objectives. When the attacker is constrained to be detected at the false-alarm rate of the detector, we show that the optimal attack strategy reduces to a linear feedback of the attacker’s state estimate. In the case that the attacker’s bias is upper bounded by a positive constant, we provide two algorithms – an optimal algorithm and a sub-optimal, less computationally intensive algorithm – to find suitable attack sequences. Finally, we illustrate our attack strategies in numerical examples based on a remotely-controlled helicopter under attack.

## I Introduction

Security vulnerabilities in cyber-physical systems (CPS), systems that interface sensing, communication, and control with an underlying physical process, allow for sophisticated cyber attacks that cause catastrophic physical harm. In the past, events such as StuxNet [1] and the Maroochy Sewage Control Incident [2] have demonstrated the vulnerability of industrial processes. More recently, cyber-physical attacks have targeted automobiles [3], military vehicles [4], and commercial drones [5]. These examples show that CPS remain susceptible to cyber-attacks, and, in response, there have been significant efforts to improve the security of CPS.

Part of the effort in improving cyber-physical security has been devoted to categorizing different types of attacks and developing security countermeasures for each type [1, 6]. One particular type of attack is the integrity attack, in which an attacker manipulates the CPS’s sensor readings and alters its actuator control signals [6, 7]. Prior work has analyzed CPS security sensor attacks, determining the fundamental limits of attack detection [8] and developing methods to reconstruct sensor attacks [9, 10, 11]. Existing work has also studied the capabilities of an integrity attacker, relating the ability of the attacker to perform undetectable attacks to certain geometric control-theoretic properties of the CPS [12, 13, 14]. For systems affected by process and sensor noise, references [7] and [15] characterize the state estimation error caused by an attacker who tries to avoid detection.

In addition to analyzing the ability of an integrity attacker to cause damage and evade detection, prior work has also studied how an attacker should behave in order to achieve his or her objectives. Reference [16] considers a noisy CPS and designs an attack to optimally disrupt the system’s feedback controller, while avoiding detection. Instead of attackers who seek to cause general disruption and damage to a CPS, our previous work studies attackers with specific control objectives [17, 18]. In [18], we considered an attacker whose goal is move the CPS to a target state while evading detection, formulated a cost function that penalized the deviation from the target state and the magnitude of the detection statistic, and we determined the optimal attack for such a cost function reduced to a linear feedback of the attacker’s state estimate.

This paper studies an attacker who wishes to move the system to a target state, but, unlike [18], we impose an explicit bound on the probability of an attack being detected. We model the CPS as a linear dynamical system subject to sensor and process noise equipped with a Kalman filter for state estimation and an LQG controller. The CPS uses a detector as an attack detector, which reports an attack if the energy of the Kalman filter innovation exceeds a certain threshold [19]. This model has been used in the literature to model CPS under attack (see, e.g., [20, 21]). The attacker’s goal is to design a sequence of attacks that counters the system’s LQG controller and minimizes the deviation of the CPS’s state from the target state subject to a bound on the (non-negative) bias induced on the detection statistic.

We define a linear quadratic cost function that captures the attacker’s control objective by penalizing the distance between the system’s state and the target state. Then, we formulate the attack design problem as an optimization problem of finding a sequence of attacks that minimizes the cost function subject to an upper bound constraint on the bias induced in the detection statistic. This differs from our previous work [18] that studies unconstrained attack design where the attacker’s detection avoidance goals is an additional term of the overall cost function. This paper, unlike [18], requires attacks at each time step to satisfy explicit constraints. Since we compute the optimal attack sequence in a causal manner, we must ensure that attacks at each time step are recursively feasible [22] to guarantee that it is possible to satisfy the constraints of future time steps. We use geometric control properties (similar to those studied in [23]) of the CPS model to express, the constraint placed on the detection statistic bias as a linear constraint on the attack at each time step. From a practical perspective, this paper provides guarantees on the optimal attacks’ probability of being detected (reference [18] does not provide such guarantees).

We consider separately two cases: 1) when the induced bias is constrained to be zero and 2) in which the induced bias is upper bounded by a positive constant. When the bias is zero, which restricts the attacker to be detected at the false alarm rate of the detector, we apply constrained dynamic programming to show that the optimal attack reduces to a linear feedback strategy. For bounded bias, we provide two algorithms to determine a suitable sequence of attacks. The first algorithm is more computationally intensive but finds an optimal sequence of attacks. The second, less computationally intensive, algorithm finds a (sub-optimal) sequence of attacks that satisfies the detection constraint. A preliminary version of part of this work appears in [24], designing attacks constrained to be detected at the false alarm rate, but when the CPS are not equipped with LQG controllers. When the CPS is equipped with its own controller, which we consider here, the attacker must account for system input in designing his or her attack.

The rest of the paper is organized as follows. Section II provides the model and assumptions for the CPS and attacker, reviews the detector and the concept of recursive feasibility, and formally states the problems we address. In Section III, we determine the set of all recursively feasible attacks at each time step. In Section IV, we use dynamic programming to find an optimal strategy when the attacker’s probability of being detected is constrained to be the detector’s false alarm rate. Section V studies the case when the bias induced in the detection statistic is upper bounded by a positive constant; we provide two algorithms for computing attack sequences that achieve the attacker’s objectives. We provide numerical examples of a remotely-controlled helicopter under attack (from each of our proposed strategies) in Section VI, and we conclude in Section VII.

## Ii Background

### Ii-a Notation

Let denote the reals, denote the space of -dimensional real (column) vectors, and denote the space of real by matrices. The multivariate Gaussian distribution with mean and covariance is denoted as . The by identity matrix is denoted as . For a matrix , denotes the range space of , denotes the null space of , and denotes the Moore-Penrose pseudoinverse. For a symmetric matrix , denotes that is positive semidefinite, and denotes that is positive definite. For , let denote the -weighted -norm. That is, for and ,

### Ii-B System Model

We use the same, linearized111If the CPS is nonlinear, the model (1) represents its dynamics after linearization about an operating point. CPS model as [18]:

 xt+1=Axt+But+Γet+wt,yt=Cxt+Ψet+vt, (1)

where describes the system’s state, is the system input, is the attacker’s input222The attack also models the case in which the attacker may separately attack the CPS’s actuators and sensors. Define , where and are the actuator and sensor attacks, respectively, and define and , and and are the process and sensor noise, respectively. The sensor and process noise are independently, identically, distributed (i.i.d.) in time and mutually independent; has distribution , and has distribution , with . The system starts running at time , and the initial state of the system has distribution with and is independent of the noise processes. The pair is observable, and the pair is controllable. The matrices and describe the attacker. The model (1) is commonly adopted in studies of CPS under attack [18, 21, 20].

The system knows the matrices , and and the statistics of the noise processes and initial state, but does not know the matrices and (since they describe the attacker). The system causally knows the system input and the sensor output , but not the attack . We assume the system’s goal is to regulate the system state to the origin. Because the system cannot directly observe the state , it uses its sensor measurements to construct an estimate of the state using a Kalman filter. Then, the system performs feedback control on the state estimate to regulate the state to the origin. The system constructs its Kalman filter and controller assuming nominal operating conditions (i.e., for all ).

Under nominal operating conditions, the system’s Kalman filter calculates , the minimum mean square error (MMSE) estimate of given all sensor measurements up to time and input up to time . Since the system starts at , the Kalman filter has fixed gain:

 K =PCT(CPCT+Σv)−1, (2) P=APAT+Σw−APCT(CPCT+Σv)−1CPAT, (3) ˆxt =ˆxt|t−1+K(yt−Cˆxt|t−1), (4) ˆxt+1|t =Aˆxt+But. (5)

To regulate the state , the system has a feedback controller of the form

 ut=Lˆxt, (6)

where the feedback matrix is chosen such that is stable. One controller that takes the form of equation (6) is the infinite horizon LQG controller that minimizes the cost function where , , and the pair is observable.

The CPS is equipped with a detector to determine if, for some , . The attack detector [19] uses the innovations sequence of the Kalman filter, , defined as to determine whether or not an attack has occurred. The term is the MMSE estimate of given all sensor measurements and system input up to time , assuming nominal operating conditions. When there is no attack (i.e., for all ), the innovations sequence is i.i.d. , where , and is orthogonal to  [25]. The reports an attack if the statistic where is the window size of the detector, exceeds a threshold , which is chosen à priori to balance the false alarm and missed detection probabilities [19]. In this paper, we consider a detector with window size , so .

There are attack detectors other than the attack detector (see, e.g., [9, 12, 26].) These detectors require noiselessness [9, 12], bounded energy noise [26], or batch measurements [9, 26]. This paper studies attacks against CPS under broader conditions on the noise and provides a recursive solution. We consider an on-line detector for systems with process and sensor noise. The linear, state space model with a Kalman filter, feedback controller, and attack detector with window size is a standard model for a CPS subject to attack [18, 21, 20].

### Ii-C Attacker Model

The attacker knows the system model and statistical properties, the controller feedback matrix , and which sensors and actuators he or she can attack (i.e. the matrices and )333In future work, we will consider defense strategies against such attackers and study the interaction between the attacker and CPS in a game theoretic framework. Thus, in this paper, we assume the worst-case, most powerful attacker who knows the system model perfectly. In addition, future work will study attack strategies that are robust to imperfections in the attacker’s knowledge of the system model.. Following [18], we assume, without loss of generality, that the matrix is injective. The attacker causally knows the sensor output and the attack . Additionally, the attacker causally knows , the value of the sensor output at time before it is altered by the attack at time .

The attacker performs Kalman filtering, separately from the system, to estimate the state. The attacker also uses his or her knowledge to compute the estimate produced by the system’s Kalman filter. The attacker knows and the system’s state estimate, so he or she knows . We design attack strategies that depend on the attacker knowing the system’s input. In general, so long as the attacker knows for all , the CPS’s control input need not be restricted to the form of equation (6). For this paper, we only consider the case of feedback control, but our methodology may be tailored toward other control laws. The attack begins at time , i.e., for , . During the time interval , the attacker observes the system output and keeps track of the state estimate .

The attacker’s objective is to design an attack sequence over the finite time interval to that moves the system state to a target state while satisfying a detection-avoidance constraint. The attacker chooses the sequence

 γ(0,N)={e0,…,eN},

to accomplish his or her goals such that at time , the attack only depends on the attacker’s available information at time , . Following [18], is the classical information pattern [25]: If a nonzero attack occurs, the attacker’s Kalman filter then produces a different estimate than the system’s Kalman filter and becomes:

 ˜xt+1|t =Aˆxt+But+Γet, (7) ˜xt =ˆxt|t−1+K(˜yt−Cˆxt|t−1), (8)

The attacker’s Kalman filter produces the MMSE state estimate given , i.e., .

The attack induces a bias in the system’s innovation . Under an attack , we have where is the value of the system’s innovation in the case that there had been no attack (i.e., ). The following state space dynamical system describes the relationship between and  [18]:

 θt+1=ˆAθt+ˆBet,ϵt=ˆCθt+ˆDet, (9)

where , , , , and .

The -weighted -norm of relates to the probability of the attack being detected at time  [27]. Let be the detection probability at time . If , then, for any positive detector window size, the probability of detection at time is equal to the false alarm probability of the detector, since there is no induced bias in . For nonzero bias (and detector window size ), the following lemma relates the bound on to the probability of being detected.

###### Lemma 1 (Detection Probability Bound [27]).

For any , if , then

 PD,t≤P(g0t>(√τ−√δ)2),

where is the value of the statistic when there is no attack444The statistic is i.i.d. (in time) with degrees of freedom..

To model the attacker’s control objectives, define the cost function:

 (10)

with . The cost function penalizes deviation of the state from the target state. The attacker’s goal is to design an attack that achieves cost

 J∗=minγ(0,N)E[N∑t=0∥(xt−x∗)∥2Qt]s.t.∥ϵt∥2Σ−1ν≤δ,∀t=0,…,N, (11)

the minimum cost of subject to constraints on the -weighted -norm of . The constraints in the optimization problem (11) model the attacker’s goals of evading detection.

### Ii-D Recursive Feasibility

The attacker designs the attack in real time: at time , the attacker chooses the attack based on his or her information . Note that the constraint in (11) is for all times . It is necessary that the attack be recursively feasible [22]: the attack must be chosen such that and, for all future times , there exist attacks such that . The recursive feasibility of (11) is related to the output minimization problem presented in [23]:

###### Lemma 2 ([23]).

Consider the system in (9) with arbitrary initial state . Then, for any ,

 mine0,…,ek−1k−1∑t=0∥ϵt∥2Σ−1ν=θT0ˆPkθ0, (12)

where follows the solution to the Riccati equation

 (13)

with .

Furthermore, the matrix is positive semidefinite, and if and only if  [23].

### Ii-E Augmented State Space Notation

For the remainder of this paper, we use the augmented state space description of the cyber-physical system and attacker provided in [18]. Define the augmented state

 (14)

where denotes the system’s state estimate in the case that . The state follows the dynamics

 ξt+1=Aξt+Bet+K\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0˜νt+1, (15)

where denotes the attacker’s innovation at time , , , , and .

Further define , , and . Then, we have

 ϵt =˜Cξt+˜Det, (16) \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0˜xt−x∗ =Hξt. (17)

One important property of is that, given , the attacker can exactly determine the value of  [18]. Accordingly, the attacker can use to determine his or her attack at time .

Following [18] and [25], we manipulate the cost function by substituting , where is the estimation error. It is well known [25] that, given , is conditionally distributed as , where and is conditionally orthogonal to . Performing this substitution, the optimal attack design problem becomes.

 minγ(0,N)N∑t=0trace(ˆPQt)+E[N∑t=0∥Hξt∥2Qt]s.t.∥ϵt∥2Σ−1ν≤δ,∀t=0,…,N, (18)

where does not depend on .

### Ii-F Problem Statement

This paper addresses three main problems. Consider the optimal attack design problem (18). First, determine, for any and any time , the set of recursively feasible attacks. Second, find an optimal attack sequence when . This corresponds to finding the optimal attack under the constraint that the probability of being detected at any time is equal to the false alarm probability of the detector. Third, find an optimal attack sequence when .

## Iii Feasibility Sets

In this section, we determine which attacks are recursively feasible at time . Recursively feasible attacks are attacks such that and there exists such that . From equations (15) and (16), we see that the recursively feasibility of an attack depends on the state . Define the sets , as follows:

 ΞN={ξN∈R6n∣∣∣∃eN,∥∥˜CξN+˜DeN∥∥2Σ−1ν≤δ},Ξt={ξt∈R6n∣∣∣∃et,∥∥˜Cξt+˜Det∥∥2Σ−1ν≤δ,Aξt+Bet∈Ξt+1},t=0,…,N−1. (19)

In the definition of , we have the condition , which ignores the term . From the structure of , , and , we see that membership in depends only on the component of , which is unaffected by . That is, we have if and only if for any .

We use the sets to determine the existence of recursively feasible attacks at time .

###### Lemma 3.

There exists a recursively feasible attack if and only if . That is, there exists a sequence of attacks such that if and only if .

The proof of Lemma 3 is found in the appendix. The set is nonempty for all if the component of is equal to 0, then . This is because, if , then, following system (9), the attack sequence is one such that Recall that system (9) has initial state , so we have . This means that the optimization problem (18) is feasible for any nonnegative value of , i.e., the attacker can always satisfy the detection constraint by choosing not to attack the system.

## Iv Attacks Under False Alarm Constraints

In this section, we find an attack sequence that minimizes the cost function under the constraint that , corresponding to finding the optimal attack under the restriction that the probability of being detected is equal to the false alarm probability of the detector. For the case of , we can relate the sets to the output minimization problem presented in Lemma 2 and [23]. Define

 G=[0I3n00]. (20)

The matrix selects the variable from (i.e., ).

###### Lemma 4.

For and for , the set is the null space of . That is,

The proof of Lemma 4 is found in the appendix.

The following theorem gives the optimal sequence of attacks when .

###### Theorem 1 (Optimal Attack Strategy with δ=0 Detection Constraint).

An attack sequence that solves (18) with is

 et=−Ft(FTtBTQt+1BFt)†FTtBT×Qt+1(A−BD†tCt)ξt−D†tCtξt, (21)

where

 CN (22)

and, for ,

 (23) Ft =Is−D†tDt. (24)

The matrix is given recursively backward in time by

 Qt=HTQtH+(A−BD†tCt)TQt+1(A−BD†tCt)−(A−BD†tCt)TQt+1BFt(FTtBTQt+1BFt)†×FTtBTQt+1(A−BD†tCt), (25)

with terminal condition .

Theorem 1 states that the optimal attack under the detection constraint is a linear feedback of the state , which is exactly determined by the attacker information . Equation (21) shows that the optimal attack depends on the matrix , which in turn depends on the matrix . If the matrix has full column rank, then, , since, by definition, is the orthogonal projector onto . If the matrix has full column rank for all , then the optimal attack becomes This corresponds to the case in which the attacker is not powerful enough, and his or her only option to satisfy the detection constraint is to not attack the system.

Before we prove Theorem 1, we provide intermediate results that show that the optimal attack exists and that the optimal attack sequence is unique (the attack may not be unique). The proofs are found in the appendix.

###### Lemma 5.

For all , there exists such that

###### Lemma 6.

For all , .

Define the set

 Zt(ψ)={z∈Rs|−FTtBTQt+1BFtz=FTtBTψ}. (26)

One consequence of Lemma 6 is that is nonempty for all and for all .

###### Lemma 7.

For any and for any , if , then .

###### Proof:

We resort to dynamic programming to solve (18) with . The term in (18) does not depend on . Define the optimal cost-to-go function for information as follows:

 (27) (28)

Equations (27) and (28) restrict the attack at each time to be recursively feasible.

We begin with . At time , the attack does not affect the value of , so we choose only to satisfy the constraint . Thus, we have and where Proceeding to , we first reformulate the constraints. Applying Lemma 4, the constraint becomes

 ˆP1G(Aξt+Bet)=0. (29)

Combining (29) with the constraint and using the fact that , we have

 CN−1ξN−1+DN−1eN−1=0, (30)

where and are given by (23). To solve (28), we eliminate the constraint in (30) (following [28]) and consider attacks of the form

 eN−1=FN−1zN−1−D†N−1CN−1ξN−1, (31)

where Equation (31) describes all recursively feasible since .

After eliminating constraints and performing algebraic manipulations, (28) becomes

 (32)

where The optimal satisfies

 0=FTN−1BQN¯¯¯ξN−1. (33)

As a consequence of Lemma 6, such a exists. One particular that satisfies (33) is

 (34)

There may be more than one that satisfies (34). Manipulating (34), we have that satisfies

 −(FTN−1BTQNBFN−1)zN−1=FTN−1BTψ, (35)

with By definition, all that satisfy (35) belong to . Then, since we have, from Lemma 7, that the optimal attack is unique.

Substituting (34) into (32) and performing algebraic manipulations, we have

 J∗N−1(IN−1)=ξTN−1HTQN−1HξN−1+ΠN−1, (36)

where

 (37)

and Repeating the dynamic programming procedure for , we find that the optimal attack has the same form as (34), were we replace with . ∎

## V Attacks Under General Detection Constraints

In this section, we solve (18) with positive . We design a procedure to find the sequence that minimizes under the constraint for (the optimal attack does not have a closed form). This procedure becomes computationally intensive for large . Thus. we also design a less computationally-intensive procedure that finds a sub-optimal and feasible attack sequence.

### V-a Optimal Attack with δ>0

For this section only, we introduce the following notation: let denote the expectation taken over , and let denote the expectation taken over . Further, define the operator as That is, is an operator that takes an attack sequence over time steps and returns the first attack. To solve (18) with , we consider, for , the problem

 γ∗t(t,N)=% argminγt(t,N)E{ξk}Nt+1[N∑k=t∥Hξk∥2Qk]s.t.∥ϵk∥2Σ−1ν≤δ,k=t,…,N, (38)

where is an attack sequence in which each attack only depends on 555The state is a sufficient statistic for the information set .. This differs from the definition of , in which each attack depends on , respectively. Problem (38) has a convex objective and convex constraints, so it can be efficiently solved.

###### Theorem 2 (Optimal Attack Strategy with δ>0 Detection Constraint).

Algorithm 1 gives an attack sequence that solves (18) with .

Algorithm 1 works as follows. At time step , for , we find , the sequence of attacks depending only on that solves problem (38). The attack is then set as the first attack in the sequence . In the last () time step, the attack is set as the last attack component of the sequence . By construction, every attack produced by Algorithm 1 is recursively feasible: after attacking the system with , the subsequence is a feasible attack sequence at time . In order to prove Theorem 2, we require the following Lemma from [25]:

###### Lemma 8 ([25]).

Let be a function such that, for any , exists and is a class of functions for which exists. Then,

###### Proof:

From problem (18), we have that the optimal cost-to-go function at time , , is defined as

 (39) J∗t(ξt)=∥Hξt∥2Qt+minetEξt+1[J∗t+1(ξt+1)|ξt]s.t.∥ϵt∥2Σ−1ν≤δ,Aξt+Bet∈Ξt+1. (40)

Let denote the attack sequence produced by Algorithm 1. The attack sequence has the form

 ˜γ(0,N)={π0(γ∗0(0,N)),…,πN(γ∗N(N,N))}. (41)

To show that is an optimal attack sequence, we show that each attack is the optimal attack at time , for 666We ignore the attack because it does not affect the cost associated with ..

In order to show that is the optimal attack, we prove the intermediate result that, for ,

 J∗t(ξt)=minγt(t,N)E{ξk}Nt+1[N∑k=t∥Hξk∥2Qk]s.t.∥ϵk∥2Σ−1ν≤δ,k=t,…,N. (42)

We resort to induction. In the base case, we show that (42) is true for . () Consider the right hand side of (42) for . Expressed in terms of , (42) becomes