Real-time Evasion Attacks with Physical Constraints on Deep Learning-based Anomaly Detectors in Industrial Control Systems

# Real-time Evasion Attacks with Physical Constraints on Deep Learning-based Anomaly Detectors in Industrial Control Systems

Alessandro Erba1, Riccardo Taormina2, Stefano Galelli2, Marcello Pogliani3, Michele Carminati 3, Stefano Zanero3, Nils Ole Tippenhauer 1

A major part of this work was done while Alessandro Erba 1 was student at Politecnico di Milano, visiting SUTD. 1CISPA Helmholtz Center for Information Security
{alessandro.erba, tippenhauer}@cispa.saarland
2Singapore University of Technology and Design
{riccardo_taormina, stefano_galelli}@sutd.edu.sg
3Politecnico di Milano
{marcello.pogliani, michele.carminati, stefano.zanero}@polimi.it
###### Abstract

Recently, a number of deep learning-based anomaly detection algorithms were proposed to detect attacks in dynamic industrial control systems. The detectors operate on measured sensor data, leveraging physical process models learned a priori. Evading detection by such systems is challenging, as an attacker needs to manipulate a constrained number of sensor readings in real-time with realistic perturbations according to the current state of the system. In this work, we propose a number of evasion attacks (with different assumptions on the attacker’s knowledge), and compare the attacks’ cost and efficiency against replay attacks. In particular, we show that a replay attack on a subset of sensor values can be detected easily as it violates physical constraints. In contrast, our proposed attacks leverage manipulated sensor readings that observe learned physical constraints of the system. Our proposed white box attacker uses an optimization approach with a detection oracle, while our black box attacker uses an autoencoder (or a convolutional neural network) to translate anomalous data into normal data. Our proposed approaches are implemented and evaluated on two different datasets pertaining to the domain of water distribution networks. We then demonstrated the efficacy of the real-time attack on a realistic testbed. Results show that the accuracy of the detection algorithms can be significantly reduced through real-time adversarial actions: for the BATADAL dataset, the attacker can reduce the detection accuracy from 0.6 to 0.14. In addition, we discuss and implement an Availability attack, in which the attacker introduces detection events with minimal changes of the reported data, in order to reduce confidence in the detector.

## I Introduction

Computational and physical infrastructures are nowadays interconnected. Computers, communication networks, sensors and actuators allow to control physical processes. Data are retrieved from the sensors and communicated to computers, where they are analyzed, and decisions are made. Finally, these decisions are sent from computers back to the physical infrastructures as commands to actuators. Such systems are commonly referred to as cyber-physical systems (CPS). Examples of such systems are industrial control systems (ICS), autonomous vehicles, smart grids and, more in general, all systems falling under the umbrella definition of “Internet of Things” [cardenas11attacks]. Since these systems operate in the physical world, they should guarantee security, safety and reliability in order to succeed in their tasks without harming the environment in which they operate. Moreover, CPS can have a strategic role, such as controlling interconnected critical infrastructures like power grids [liu2011false] and water supply systems [AminLitricoSastryBayen2013PartI].

The integration of modern security features into existing ICS is challenging, as industrial devices are resource constrained, and protocols need to be legacy compliant (i.e., they have to be backward compatible to decades old devices in the field, which do not support authentication or encryption). For that reason, complementary security solutions such as passive process data monitoring are promising. In recent years, a number of authors have proposed such solutions, and implemented anomaly detection approaches based on a broad range of techniques, including control theory [giraldo18survey] and Machine Learning (ML) [zhu2010scada, ghaeini16hamids, goh2017anomaly, aoudi18truth, ahmed18noiseprint, kravchik2018detecting, taormina2018deep]. In general, the goal of such systems is to leverage reported sensor data in order to detect attacks and anomalies that affect actuators.

Adversarial Machine Learning (AML) plays and important role to explore the robustness of machine-learning based anomaly detectors against manipulations. So far, the potential of AML has been explored in a few areas of computer science—e.g., image or speech recognition—, but little is known about the potential of AML approaches to evade attack detection in ICS. Evasion attacks in our context are challenging as they need to manipulate (in real-time) reported data from one or multiple sensors to induce a wrong classification of the system’s state, while matching physical laws imposed by the system. While in other contexts, universal adversarial perturbations [moosavi2017universal, li2019adversarial] are used to perform real-time manipulations (using precomputed patterns), manipulations in ICS cannot be precomputed as they need to be consistent with the current dynamic conditions of the system (with a large potential state space). In particular, successful application of AML algorithms in the ICS domain must account for two key features characterizing process-based anomaly detectors. First, process-based anomaly detectors typically account for the spatial and temporal correlation characterizing the underlying physical processes [taormina18battle]. Second, detectors in the ICS domain are trained to detect not only outliers, but also contextual anomalies (i.e., observations classified as abnormal only when viewed against other variables that characterize the behavior of the physical process [hayes2015contextual]). In contrast to related work that assumes unlimited computational power to compute pertubations [carlini2019evaluating], AML algorithms for ICS will also need to produce adversarial examples in real-time111With real-time, we mean examples are crafted wrt. the current dynamic state of the system, in less time than the sampling rate., to react to the dynamic system.

In this work, we propose and evaluate attacks on process-based anomaly detectors for simulated and real world ICS, and propose two techniques to craft adversarial examples222We differentiate between sample (original set of sensor readings), and adversarial example (manipulated set of sensor readings). in real-time. In particular, the classifier under attack is the anomaly detection system, while the samples are the sensor readings that the classifier uses to decide if the system is ‘safe’ or ‘under attack’. The attacker’s goal is to change the classification outcome by manipulating a subset of sensor readings, in order to hide an ongoing manipulation over the physical process (called Integrity attack in [huang2011adversarial], described as ‘Integrity attacks result in intrusion points being classified as normal’). We explore attacks on such detectors in two settings with different information available to the attacker, and compare them against replay attacks on a subset of sensors. Our results show that a) constrained replay attacks are easily detected as they violate physical correlations, b) using our white box model a powerful attacker can leverage knowledge on the system to perform efficient (but computationally expensive) attacks, and c) using our proposed black box attacks it is possible to craft effective adversarial samples in real-time.

In addition, we explore Availability attacks [huang2011adversarial] (‘availability attacks cause so many classification errors, both false negatives and false positives, that the system becomes effectively unusable’), in which the attacker looks for small perturbations to legitimate features that will—seemingly incorrectly—trigger ML-based attack detection schemes. This is useful to force the defender to increase detection thresholds (reducing its detection rate), or to eventually ignore alarms.

We summarize our main contributions as follows:

• We propose and design evasion attacks on ICS process-based anomaly detectors that produce example which do not violate physical constraints, leveraging the knowledge of the Anomaly Detection System (white box).

• We propose and design a system to hide attacks from an unknown Deep Learning Anomaly Detection System (black box) using adversarially trained autoencoders, enabling dynamic attacks in real-time.

• We evaluate and discuss the proposed attacks, and compare their performance against replay attacks. The evaluation is conducted over a simulated ICS process dataset and a real ICS process dataset, both containing data of water distribution systems.

• We practically implement and demonstrate the attacks in real-word Industrial Control System testbed, and show that they are possible in real-time.

• We also show that it is possible to use our framework for Availability attacks, i.e., to produce false positives, causing the detector to raise alarms without any actual physical process manipulation.

The remainder of this work is structured as follows. Background concepts are introduced in Section II. We present the problem of adversarial learning attacks on ML-based detectors in Section III. Our design of attacks is proposed in Section IV, and their implementation and evaluation is presented in Section LABEL:sec:implementation. We discuss our work and next steps in Section LABEL:sec:discussion, and summarize related work in Section LABEL:sec:related. The paper is concluded in Section LABEL:sec:conclusions.

## Ii Background on Evasion Attacks

In this section, we provide a brief overview on Evasion Attacks. A more complete review of related work is presented in Section LABEL:sec:related. In Adversarial learning, an evasion attack is launched by an adversary to control the output behavior of a machine learning model through crafted inputs, called adversarial examples. Several evasion attack and defenses mechanisms have been proposed in the context of image, speech recognition and malware detection. The attacker scope and constraints vary from context to context [biggio18wild].

In the case of image recognition, the attacker’s goal could be the misclassification of the sample, either on a random target class or on a desired target class. In both cases, a constraint over the sample is the human indistinguishability of the sample, e.g., an attacker aiming to craft a dog sample (to have it classified as a cat) should not change the human perception of it. This is achieved by solving an optimization problem that minimizes distance between the sample and the adversarial example e.g. by minimizing norms: L0, L2, L. The work by [szegedy13intriguing] is the first that specifically studies adversarial manipulation to image classification using neural networks. The authors found that only a small portion of the image needs to be modified to achieve the attacker’s goal.

In the case of malware detection, the task is binary (malware vs. benign software), so the attacker’s goal is the misclassification of a malware sample. The constraint over the adversarial example is to leave malware behavior unchanged, meaning that the distortion introduced to the malware should not eliminate its malicious properties. Works such as [grosse17adversarial] craft highly effective adversarial examples for neural networks used for malware classification.

The authors of [biggio18wild] characterize attacks on machine learning models using a 4-tuple representation of the system under attack. The tuple is characterized by the training dataset , the feature set (e.g., the set of features used to train the model), the learning algorithm , and the trained parameters . In an adversarial setting, an attacker can have complete or partial knowledge of each component of the system; limited knowledge of a component is denoted with the symbols , , and respectively. In particular, the authors characterize three types of attack scenario: Perfect-knowledge white box attackers characterized by the tuple , Limited-knowledge gray box attacks and Zero-knowledge black box attacks . In Section III, we use that notation to introduce our proposed solution and position it within the related literature.

## Iii Evasion Attacks on Process-based Anomaly Detection

In this section, we introduce our system and attacker model, and our general problem statement for concealment and Availability Attacks. Then, we present our abstract approach for the white and black box attacker.

### Iii-a System Model

We consider a system under attack (Figure 1) consisting of a number of sensors and actuators, connected to one or more PLCs, which are in turn connected to a SCADA system that gathers data from the PLCs. In our work, we assume that the SCADA is passive, so it does not send control commands to the PLCs (e.g., to actively probe for manipulations). The SCADA feeds an attack detection system, whose goal is to accurately identify the instances in which the attacker manipulates the physical process, while minimizing the number of false detections. The attack detection system generally consists of two main components: a system model, which is used to generate additional features, and a classifier, which, for each time step, classifies the system as either under attack or under normal operating conditions (see Section LABEL:sec:related for more details on prior work on classifiers in this context).

### Iii-B Attacker Model

Attacker Goal and Capabilities. In an ICS environment, an attacker can perform an evasion attack to achieve one of the two following goals.

A first goal (Integrity Attack) is to conceal ongoing manipulations of the physical process, which requires changing the commands sent to the actuators. We assume that the attacker is already able to precisely control a subset of the actuators, and that the attacker manipulates a subset of traffic signals from the PLCs to SCADA (i.e., the sensor data) to conceal this attack from the detector.

An alternative second goal is Availability attack: The attacker aims to

introduce alarms into the detection system with minimal changes in the reported sensor data (and no change in the underlying process). When such alarms would be investigated, the reported sensor data would be sufficiently close to the state of the process, and thus the efficacy of the detection system would be questioned, potentially allowing for future alarms to be taken less seriously.

Attacker Knowledge. Using the notation introduced in Section II, an evasion attack is characterized by the knowledge of the attacker about the training dataset , feature set , learning algorithm , and trained parameters . In particular, we classify attacks as white box, black box, and replay. For all attacks, we assume that the attacker aims to manipulate the subset of sensor readings that will change the detector’s classification label, knowing them explicitly (white box) or not (replay and black box).

The attacks are conducted in real time (i.e., per time step), not a posteriori (i.e., applied retrospectively to a longer sequence of sensor readings after they are fully received by the attacker).

White Box attack. In a white box attack, the attacker knows the exact system model and its variables (such as the currently estimated system state), and the exact thresholds of the classification system. Thus, the white box attacker is characterized by the tuple . With that information, the attacker could either run basic exhaustive search, basic optimization strategies, or more complex approaches (especially solutions that use the gradient signal from the attacked model).

Black Box attack. In a black box attack, the attacker is aware of the general detection scheme (e.g., type of system model), but unaware of internal variables of the system model and exact thresholds used in the classification. We note that our black box attack is different from the one defined in [biggio18wild], , from a threefold perspective:

First, our attack does not require the knowledge of or its approximation . In the usual setting, even if the attack does not require to build a surrogate model , the attacker is assumed to be able to query the classifier under attack in a black-box fashion. This allows him to get feedback on the provided labels or confidence scores (this is done for example in [tramer2016stealing, xu2016automatically, chen2017zoo, dang2017evading]).

However, in our case, the nature of the environment imposes that the attacker cannot query the system even in a black-box manner, as this would mean potentially raising the alarm. Thus, we consider that the only assumption of the attacker with respect to is that Deep Learning techniques are used for anomaly detection.

The second difference imposed by the ICS environment is the knowledge of the feature set (sensor readings). In order to detect anomalies using information coming from sensors, the defender is likely to use all the information he has. Under this assumption, the attacker crafts adversarial examples leveraging the complete set of features that he intercepts between PLC and SCADA. Referring to or is the same, since the attacker assumes that the best case for the defender is to use all available features.

Finally, we assume that the attacker can collect an approximation of the training dataset (i.e., network traffic captured and decoded during the normal operation of the system). In the ICS case, recording normal operations at different time steps gives samples from the same dynamical physical process (assuming overall periodic operations with multiple stages). The more data the attacker collects, the better the training dataset is approximated. In fact, collecting more data will bring the attacker to see the realization of different stages of the ICS (potentially all stages involved in the ICS normal operations). In general, we can say that the attacker is able to collect , but, according to time spent collecting data, the attacker can reach the complete knowledge of . Thus, we can define our black box attacker as , since the attacker does not need the usage of these elements.

Replay Attacks. In this work, we use replay attacks (proposed in related work [mo2009secure]) as a baseline to compare to. In a replay attack, the attacker records sensor readings for a certain amount of time and repeats them afterwards, e.g., while manipulating a physical process by sending an exogenous control input [mo2009secure]. By doing so, the attacker aims to avoid detection by a monitoring system based on reported sensor data. In this work, we assume that the attacker was able to record selected data in the system over a certain length of time (e.g. one day), and will then replay that data at the start of the attack. In this kind of attacks there is no adversarial learning involved. The resulting tuple of a replay attack is , that corresponds to the one of black box attack.

### Iii-C Problem Statement

The goal of the attacker is to launch an evasion attack on an ICS to hide the true state of the process from an anomaly detector. In particular, we assume that the anomalous physical process results in a feature vector , which triggers the detection system. The attacker thus needs to find an alternative vector , which prevents detection of the attack.

Integrity Attack. We formalize the integrity attack as follows: given a feature vector and a classification function s.t. the detector correctly classifies ‘under attack’, the attacker is looking for a perturbation s.t. ‘safe’. We assume two different settings for the attacker. Unconstrained attack, that the attacker can manipulate all the features in , and her perturbations are limited in terms of L0 distance to be at most . Constrained attack we assume that the attacker is constrained to perturb a subset of out of variables in , and her perturbations are limited in terms of L0 distance to not exceed distance .

Availability Attack. We formalize the availability attack problem as follows: given normal operations sensor readings correctly classified as ‘safe’, the attacker aims to distort them in order to cause false alarms by the detector. More formally, given a feature vector and classification function s.t. the detector correctly classifies ‘safe’, the attacker is looking for a modification s.t. ‘under attack’. As in the Integrity Attack, we consider L0 attacks.

### Iii-D Example of an Integrity Attack

We now illustrate an example of concealment over one time step of a water distribution system. Consider an attacker that aims to empty a water tank by changing the control signal to the pumps, i.e., by forcing them to be OFF even after the water in the tank falls below the level triggering their activation. An anomaly detection system could detect this anomalous condition by comparing the resulting sensor data with the readings realized during normal operations. In order to hide the attack,

the adversary has to modify some sensor readings that will bring the system state to be classified as ‘safe’.

Since the data reflect a physical process, the effect of a control command over an actuator affects different system components—so, not only the components that are the target of attacker’s manipulation will be affected. In our case, manipulating only the sensor readings related to the target water pumps and tank does not assure to remain stealthy. For example, as illustrated in the simplified example of Figure 2, even if the attacker’s process manipulation is only targeting Tank 2, in order to remain stealthy, the attacker needs to manipulate four sensor readings. Two of them (Tank 2, Pump 2) are explicitly related to the actuator manipulation, while the other two are consequently modified to be consistent with the learned physical model, even if the corresponding physical process is not manipulated.

### Iii-E Proposed Framework for Attack Computation

For both the white box and black box case, the attacker is assumed to intercept and manipulate sensor readings in real time. The white box attacker is able to interactively query a classification oracle to determine which features to manipulate, and to which values to set those features. For the black box attacker, the target features to manipulate and their manipulated value are computed without oracle’s feedback.

For the white box attack, we propose to compute the manipulations using an iterative algorithm (without using a more complex machine learning based approach). This algorithm calculates solutions that are ‘safe’ from the detector perspective. The algorithm is tunable, i.e., the attacker can act on some algorithm parameters that impact over time the computation and, consequently, the evasion efficacy. Moreover, the algorithm is constrainable, i.e., the attacker can decide the maximum number of features to be modified for each time step. Again, this speeds up computation but can impact the solution quality. Keeping the solution simple underlines the fact that, if the attacker steals the model, he does not strictly need a strong theoretical background to succeed. As we shall see later, even such simple white box attacks will be quite effective (although expensive).

For the black box attack, we propose the use of a Deep neural network that is capable of outputting concealed sensor readings. The attacker is adversarially training the neural network to learn how the detector expects the ICS to behave. This trained neural network then receives the traffic coming from the PLC. When the attacker manipulates the commands sent to the actuators, the neural network adjusts the anomalous data to resemble ‘safe’ data. This manipulated version is sent to the SCADA. This method can also be used for Availability Attack: first, we learn how the system behaves when targeted by an attack to the actuators; then, we use the network to transform sensor readings to resemble ‘under attack’.

## Iv Replay, Black Box, and White Box Evasion

We now present a detailed design for the three attacks that we consider. We start with details on the autoencoder-based attack detector (proposed in prior work [taormina2018deep]), then introduce the replay attack (proposed in prior work [mo2009secure]). We provide details on the white box attack (which uses a classification oracle to optimize the manipulations). We then conclude with the black box approach, which leverages an online concealment method without any prior knowledge about the physical process that generates the sensor readings and the detection scheme (except that it uses Deep Learning). Given these premises, we note that, while adversarial examples found using the white box approach depend on the internal structure of the attacked anomaly detector, examples crafted through the black box approach are independent from the addressed detection scheme.

### Iv-a Deep Learning-based Attack Detector

In this work, we focus on the anomaly detection systems proposed in [goh2017anomaly, kravchik2018detecting, taormina2018deep], which are based on the same underlying idea (see Section LABEL:sec:related). The anomaly detector consists of two parts, namely a Deep Learning model (with features as input and output) trained over the normal operation sensors readings of an ICS, and a comparison analysis between the input and output of the model. The idea is that the deep model has learned to reproduce the system behaviour under normal operating conditions with a low reconstruction error, so it reproduces a higher reconstruction error when fed with anomalous sensor readings (sensor readings are anomalous either if sensor values are outside normal operation ranges or if there are contextual anomalies among values). The comparison between input and output of the deep model is used to decide if the system is ‘safe’ or ‘under attack’.

In particular, we use the specific autoencoder proposed in [taormina2018deep], which is available as open source [aeed18repository]. The autoencoder (AE) receives as input the -dimensional vector of sensor readings. AE outputs an n-dimensional vector , where represents the reconstructed value w.r.t. the input reading . In order to decide if the system is under attack, the mean squared reconstruction error between observed and predicted features are computed. If the mean squared reconstruction error exceeds a threshold , the system is classified as under attack. The authors chose as 99.5 percentile (Q99.5) of the average reconstruction error over the training set.

We formalize this as follows. Given an input , we define: as the reconstruction error -dimensional vector, as the corresponding average reconstruction error:

 ε(→e)=1nn∑i=1di2, (1)

and as the classified state of the water distribution system out of AE Intrusion Detection System. Given an input , is ‘under attack’ if greater than :

 y(→x)={under attack' ifε(→e)>θsafe' otherwise (2)

Moreover, the authors propose a window parameter that takes into consideration the mean of of the last window time steps to decide if the current tuple is ‘safe’. This helps diminish the amount of false positives, since an alarm is raised only if in the last window time steps the mean of is above .

### Iv-B Replay Attack

In the replay attack setting (prior work, used here as baseline), the attacker does not know how detection is performed. In order to avoid detection, the attacker is able to replay sensor readings that have been recorded while no anomalies were occurring in the system. In particular, we assume that the attacker was able to record selected data occurring exactly days before. I.e., if the evasion attack starts at 10 a.m., the attacker starts replaying data from 10 a.m. one day before.

### Iv-C White Box Attack

In the white box setting, the attacker knows how detection is performed, all thresholds and parameters of the detector, as well as the normal operations ranges for each one of the model features. For example, the attacker knows which sensor readings are common during normal operation of the physical process. As a result, the attacker essentially has access to an oracle of the autoencoder, where the attacker can provide arbitrary features and gets the individual values of the reconstruction error vector .

The attacker then computes and finds the sensor reading with the highest reconstruction error from .

In order to satisfy , the attacker attempts to decrease the reconstruction error error by changing . Sensor readings are modified in the range of normal operating values; this guides the computation to a solution that is consistent with the physical process learned by the detector. For example, if normal operations of sensor are in the range , the attacker tries to substitute the corresponding value of according to its range to see if the related reconstruction error decreases. This results in , where and, accordingly, . Figure 3 shows the steps followed by the attacker in such context, while Algorithm 1 is the pseudo-code applied to compute sensor readings modifications.

In order to find the value of that decreases the most, we can introduce as the matrix containing the mutations of w.r.t. .

 X=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣r1…r1i...rnr1…r2i...rn⋮⋱⋮⋱⋮r1…rmi...rn⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦

were . Among the all mutations, we select the one that generates the lower reconstruction error . After choosing the best value over the variable the algorithm repeats until a solution with average reconstruction error lower than is found.

Two stopping criteria are put in place: patience and budget. It could happen that no lower reconstruction errors are found by changing the value of a chosen reading . In this case, we try to change the other readings in descending order of reconstruction error. patience mechanism is put in place to avoid wasting of computation. If no improved solutions are found in patience iterations, the input is no more optimized.

According to the communication mechanism between PLCs and SCADA, the attacker may be constrained to send the data in a certain amount of time. budget is the maximum amount of times that loop at Line 8 (Algorithm 1) can be performed. After budget attempts without finding a set of modified readings that satisfies , the input is no more optimized, and no solution is found.

Exiting the loop at Line 8 due to a stopping criterion is not providing a misclassified example. Even though a solution such that is not found, the resulting tuple is likely to have a lower , i.e., .

### Iv-D Black box attack

In the black box setting, the attacker does not know anything about the detection mechanism except the fact that it relies on a Deep Learning Model: the attacker can only intercept and manipulate the communication between the PLCs and the SCADA. However, the nature of the ICS environment allows us to assume that a detection mechanism trained over a specific CPS should represent its physical rules in order to spot anomalies.

In this case, a reasonable attack scheme could be divided into five steps (Figure LABEL:fig:BlackBox_zoom). The attacker first intercepts traffic from PLCs to SCADA in order to collect information on how the ICS behaves under normal conditions. Second, collected data are used to learn how the system behaves normally and train a Deep Learning model. Third, the attacker manipulates the physical process; anomalous data are generated as a consequence. Fourth, the adversarial trained model is used to conceal anomalous readings, by morphing them into concealed data that will be classified as ‘safe’; the concealed data is forwarded to the SCADA.

You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters