A Supervisory Control Approach to Dynamic CyberSecurity
Abstract
An analytical approach for a dynamic cybersecurity problem that captures progressive attacks to a computer network is presented. We formulate the dynamic security problem from the defender’s point of view as a supervisory control problem with imperfect information, modeling the computer network’s operation by a discrete event system. We consider a minmax performance criterion and use dynamic programming to determine, within a restricted set of policies, an optimal policy for the defender. We study and interpret the behavior of this optimal policy as we vary certain parameters of the supervisory control problem.
Keywords:
keywords CyberSecurity, Computer Networks, Discrete Event Systems, Finite State Automata, Dynamic Programming
1 Introduction
Cybersecurity has attracted much attention recently due to its increasing importance in the safety of many modern technological systems. These systems are ubiquitous in our modern day life, ranging from computer networks, the internet, mobile networks, the power grid, and even implantable medical devices. This ubiquity highlights the essential need for a large research effort in order to strengthen the resiliency of these systems against attacks, intentional and unintentional misuse, and inadvertent failures.
The study of cybersecurity problems in the existing literature can be divided into two main categories: static and dynamic.
Static problems concern settings where the agents, commonly considered to be an attacker and a defender, receive no new information during the time horizon in which decisions are made. Problems of this type in the security literature can largely be classified under the category of resource allocation, where both the defender and attacker make a single decision as to where to allocate their respective resources. The main bodies of work involve infrastructure protection [bier2007, bohme2010, hart2008discrete] and mitigation of malware and virus spread in a network [bloem2007, bloem2009, chen2006, mastroleon2009]. Some of the above works consider settings where the agents are strategic [bier2007, hart2008discrete]. The presence of strategic agents results in a game between the attacker and defender. The strategic approaches in the above works are commonly referred to as allocation games. The survey by Roy et al. [roy2010survey], as well as [blotto], provide useful outlines of some static game models in security.
Dynamic security problems are those that evolve over time, with the defender taking actions while observing some new information from the environment.^{1}^{1}1This new information could consist of the attacker’s actions, events in nature, or the state of a some underlying system. The formulation of a security problem as a dynamic problem, instead of a static one, offers numerous advantages. The first advantage is clear; since realworld security problems have an inherently dynamic aspect, dynamic models can more easily capture realistic security settings, compared to static models. Also, most attacks in cybersecurity settings are progressive, meaning more recent attacks build upon previous attacks (such as denialofservice attacks, bruteforce attacks, and the replication of viruses, malware, and worms, to name a few). This progressive nature is more easily modeled in a dynamic setting than in a static setting.
The literature within the dynamic setting can be further subdivided into two areas: models based on control theory [rowe2012, khouzani2012maximum, schneider2000, ligatti2005, ligatti2009] and models based on game theory [khouzani2012saddle, yin2010, van2013, roy2010survey].
The control theory based security models in the literature differ in the ways in which the dynamics are modeled. The work by Khouzani et al. [khouzani2012maximum] studies the problem of a malware attack in a mobile wireless network; the dynamics of the malware spread are modeled using differential equations. A large part of the literature on control theory based models focuses on problems where the dynamics are modeled by finite state automata. The works of [ligatti2005, ligatti2009, schneider2000] implement specific control policies (protocols) for security purposes. The work of Schneider [schneider2000] uses a finite state automaton to describe a setting where signals are sent to a computer. Given a set of initial possible states, the signals cause the state of the computer to evolve over time. An entity termed the observer monitors the evolution of the system and enforces security in realtime. Extensions of Schneider’s model are centered around including additional actions for the observer. Ligatti et al. [ligatti2005] extend Schneider’s model by introducing a variety of abstract machines which can edit the actions of a program, at runtime, when deviation from a specified control policy is observed. More recent work [ligatti2009] develops a formal framework for analyzing the enforcement of more general policies. Another category of dynamic defense concerns scenarios where the defender selects an adaptive attack surface^{2}^{2}2For example, changing the network topology. in order to change the possible attack and defense policies. A notion termed moving target defense (a term for dynamic system reconfiguration) is one class of such dynamic defense policies. The work of Rowe et al. [rowe2012] develops control theoretic mechanisms to determine maneuvers that modify the attack surface in order to mitigate attacks. The work involves first developing algorithms for estimation of the security state of the system, then formalizing a method for determining the cost of a given maneuver. The model uses a logical automaton to describe the evolution of the state of the system; however, it does not propose an analytical approach for determining an optimal defense policy.
The next set of security models in the literature are based on the theory of dynamic games. The work in [lye2005] considers a stochastic dynamic game to model the environment of conflict between an attacker and a defender. In this model, the state of the system evolves according to a Markov chain. This paper has many elements in common with our model; however, it assumes the attacker and defender have perfect observations of the system state. In our paper, we consider the problem from the defender’s point of view and assume that the defender has imperfect information about the system state. The work by Khouzani [khouzani2012saddle] studies a zerosum twoagent (malware agent and a network agent) dynamic game with perfect information. The malware agent is choosing a strategy which trades off malware spread and network damage while the network agent is choosing a countermeasure strategy. The authors illustrate that saddlepoint strategies exhibit a threshold form. The work of Yin et al. [yin2010] (dynamic game version of [bier2007]) studies a Stackelberg game where the defender moves first and commits to a strategy. The work addresses how the defender should choose a strategy when it is uncertain whether the attacker will observe the first move. Van Dijk et al. [van2013] propose a two player dynamic game, termed Flipit, which models a general setting where a defender and an attacker fight (in continuous time) over control of a resource. The results concern the determination of scenarios where there exist dominant strategies for both players. We refer the reader to Roy et al. [roy2010survey], and references therein, for a survey on the application of dynamic games to problems in security.
While models based on game theory have generated positive results in the static setting, there has been little progress in the dynamic setting. We believe this is for two reasons; first, dynamic security has not been fully investigated in a nonstrategic context and second, the results in the theory of dynamic games are limited.
In this paper, we develop a (supervisory) control theory approach to a dynamic cybersecurity problem and determine the optimal defense policy against progressive attacks. We consider a network of computers, each of which can be in one of four security states, as seen in Figure 1. The state of the system is the tuple of the computer states and evolves in time with both defender and attacker actions. We use a finite state logical automaton to model the dynamics of the system. The defender adjusts to attacks based on the information available.
Our model takes a different approach than the existing papers in the literature. One fundamental difference of our work from the existing literature that make use of automata is the development of an analytical framework for determining optimal defense policies within a restricted set of policies. Other works involving automata propose methods for enforcing a predetermined policy, rather than determining an optimal policy. Also, our control theoretic approach considers imperfect information regarding attacker actions, which we feel is an aspect that is engrained into security problems.
1.1 Contribution
The contribution of this paper is the development of a formal model for analyzing a dynamic cybersecurity problem from the defender’s point of view. Our approach has the following desirable features: (i) It captures the progressive nature of attacks; (ii) It captures the fact that the defender has imperfect knowledge regarding the state of the system; this uncertainty is a result of the fact that all attacks are uncontrollable and most are unobservable, by the defender; (iii) It allows us to quantify the cost incurred at every possible state of the system, as well as the cost due to every possible defender action; (iv) It allows us to quantify the performance of various defender policies and to determine the defender’s optimal control policy, within a restricted set of policies, with respect to a minmax performance criterion.
1.2 Organization
The paper is organized as follows. In Section 2 we discuss our dynamic defense model. This is done by introducing the assumptions on the computer network and corresponding state, as well as the events which drive the evolution of the system state. In Section 3, we model the defender’s problem of keeping the computer network as secure as possible while subjected to progressive attacks. We provide a simplified problem formulation that is tractable. In Section 4, we determine an optimal control policy for the defender based on dynamic programming. We discuss the nature of the optimal policy in Section 5. We offer conclusions and reflections in Section 6.
2 The Dynamic Defense Model
The key features of our model are characterized by assumptions (A1) – (A6). We first describe the assumptions related to the computer network, discussed in assumption (A1). In assumption (A2) we introduce the notion of the computer network system state. Next, in assumptions (A3) – (A5), we discuss the events that can occur within the system. We describe how the events cause the system state to evolve, as well as specify which events are controllable and observable by the defender. In (A6) we discuss an assumption on the rules of interaction between the attacker and the defender. As mentioned in the introduction, we consider the cybersecurity problem from the defender’s viewpoint; the model we propose reflects this viewpoint.
Assumption 1  Computer Network: We assume a set of networked computers, . Each computer, , can be at security level where is the set of security states.
Each computer, , is assumed to have three security boundaries, denoted by , representative of a layered structure to its security. These security boundaries partition the set of security states . Throughout this paper, we assume that the set of security states is defined as follows.

Normal (): Computer is in the normal state if none of the security boundaries have been passed by the attacker.

Compromised (): Computer is compromised when security boundary has been passed by the attacker. In this state, the attacker has exploited some vulnerability on the computer and has managed to obtain userlevel access privilege to the computer.

Fully Compromised (): Computer is fully compromised when both boundaries and have been passed by the attacker. The attacker has exploited some additional vulnerability on the computer and has managed to obtain root level or execute privilege to the computer.

Remote Compromised (): Computer is remote compromised when all security boundaries , , and have been passed by the attacker. The attacker has managed to obtain enough privileges to attack another computer and obtain userlevel access privilege on that computer.
Assumption 2  System State: We assume that the computer network operates over an infinite time horizon, . The state of the computer network, , which evolves with time , is the combination of the states of all the computers at time . Each state has a corresponding cost.
The state of the network, denoted , is a tuple of all of the computer states.^{3}^{3}3For example, a three computer network could have a network state of . Notice that state is distinct from state . The set denotes the set of all possible states, , where is the number of system states.
The cost of the network state is defined by the costs of the states of the computers. We assign a cost, , to each computer depending upon its state . This cost is defined as follows
(1) 
with . The cost of state is then defined as
(2) 
The state of the network, , evolves in time due to events, which we discuss in the next set of assumptions.
Assumption 3  Events: There is a set of events, , where are the attacker’s actions and are the defender’s actions.
We assume that the attacker has access to three types of actions. The set of attacker actions, , is defined as follows.

, null: The attacker takes no action. The null action does not change the system state and is admissible at any state of a computer.

, security boundary attack: Attacking the security boundary of computer causes the security state of computer to transition across the security boundary. Specifically, causes computer to transition from normal, , to compromised, ; from to ; and from to . Actions , , and are only admissible from states , , and , respectively.

, network attack: Using a computer in state to attack any other normal or compromised computer in the network that is in state to bring computer to state . The action is admissible at state for .
We assume that the defender knows the set as well as the resulting state transitions due to each action in .
The defender has access to three types of costly actions. These actions are admissible at any computer state. The set of defender actions, denoted by , is defined as follows.

, null: The defender takes no action. The null action does not change the system state.

, sense computer : The sense action, , reveals the state of computer to the defender. The sense action does not change the system state.

, reimage computer : The reimage action, , brings computer back to the normal state from any state that it is currently in. For example, applied to state results in .
The costs of the actions in are defined by , , , where for all .
Assumption 4  Defender’s Controllability of Events: The actions in are uncontrollable whereas the actions in are controllable.
Since the problem is viewed from the perspective of the defender, all actions in are controllable. For the same reason, the defender is unable to control any of the attacker’s actions .
Assumption 5  Defender’s Observability of Events: All actions in and some actions in are assumed to be observable.
Again, due to taking the defender’s viewpoint, all actions in are observable. Although we assume that the defender knows the set , we assume that it cannot observe or any actions; it can only observe actions of the type . One justification for this is that the the network attack involves passing sensitive information of computer through the routing layer of the system to computer .^{4}^{4}4This sensitive information could be the login credentials of computer . We assume that the routing layer is able to detect the transfer of sensitive data through the network, and thus the defender is aware when an action of the form occurs.
Assumption 6  Defender’s Decision Epochs: The defender acts at regular, discrete time intervals. At these time intervals, the defender takes only one action in . The attacker takes one action in between each defender action.
We require that the defender should consider taking a single action in at regular time instances. We assume that between any two such instances, the attacker can only take one action in . This order of events is illustrated in Figure 2 for a given time . We introduce intermediate states, denoted by , which represent the system states at which events from are admissible (that is, the states in which the attacker takes an action). The system states, denoted by , are the states at which actions from are admissible.
Assumption (A6) is, in our opinion, reasonable within the security context. Since time has value in security problems,^{5}^{5}5A computer that is compromised by the attacker for two time steps is more costly to the defender than a computer that is compromised for one time step. the defender should take actions at regular time intervals (note that at these instances the defender may choose , that is, choose to do nothing). In general, a finite number of events in may occur between any two successive defender actions; however, to reduce the dimensionality of the problem, we assume that only one event in can occur.
One important implication of assumption (A6) is related to the defender’s observability of events in . By (A6), the defender is aware when an event in occurs. Since the event is observable, if the defender does not observe when an event in is known to occur, then it knows that one of the unobservable events, or one of , has occurred. To incorporate this fact into the defender’s knowledge about the system’s evolution, we group the above mentioned unobservable events into one event, denoted . This philosophy is used in constructing the system automaton from the defender’s point of view, as well as in defining the defender’s information state (discussed in Section 3). As a result of the above grouping, the set of events is observable by the defender. Notice, however, that by performing this grouping, we have introduced nondeterminism into the system; that is, the event can take the system to many possible system states. All unobservable events in the problem have been eliminated due to Assumption (A6) and the grouping of unobservable events in .
As a result of assumptions (A1) – (A6), the evolution of the system state, , from the defender’s viewpoint, can be modeled by a discrete event system represented by a finite state automaton, which we term the system automaton. Due to assumption (A6), we duplicate the system states by forming the set of intermediate states, denoted by . The set of intermediate states represents the states at which an event from can occur. The set of system states, denoted by , are the states at which the defender takes an action . The resulting automaton has states. The set of events that can occur is described by the set ; the transitions due to these events follow the rules discussed in assumption (A3). The system automaton takes the form of a bipartite graph, as seen in Figure 3.
Notice that, like the null action, the sense actions, , for all , do not change the underlying system state. The purpose of sense is to update the defender’s information state, which will be defined and explained in the following section.
3 The Defender’s Problem
We now formulate the defender’s problem – protecting the computer network. The defender must decide which costly action to take, at each time step, in order to keep the system as secure as possible given that it has imperfect knowledge of the network’s state.
3.1 The Defender’s Optimization Problem
Let , denote a control policy of the defender, where
(3) 
and and denote the space of the defender’s actions and observations up to , respectively. Let denote the space of admissible control policies for the defender.
The defender’s optimization problem is
()  
subject to 
where denotes a sequence of states generated by control policy and is the defender’s action at generated according to Equation (3). Problem () is a supervisory control problem with imperfect observations.
3.2 Discussion of Problem ()
The notion of an information state [kumar1986stochastic] is a key concept in supervisory (and general) control problems with imperfect information. Because of the nature of the performance criterion and the fact that the defender’s information is imperfect, an appropriate information state for the defender at time is , the field generated by the defender’s actions and observations, respectively, up to . Using such an information state, one can, in principle, write the dynamic program for Problem (). Such a dynamic program is computationally intractable. For this reason, we formulate another problem, called , where we restrict attention to a set of defense policies that have a specific structure; in this problem we can obtain a computationally tractable solution.
3.3 Specification of Problem
We define the defender’s observer as follows. The defender’s observer is built using the defender’s observable events, , and its actions, . The observer’s state at time , denoted by , consists of the possible states that the network can be in at time from the defender’s perspective. We denote by the space to which belongs, for any .
The evolution of the observer’s state is described by the function . The observer’s state follows the update
where is the realization of the defender’s action and its effect at time , and is the realization of the defender’s observation at . The precise form of the function is determined by the dynamic defense model of Section 2. Thus, the dynamics of the defender’s observer are described by a finite state automaton with state space and transitions that obey the dynamics defined by the function .
Using the defender’s observer we formulate Problem as follows.
()  
subject to  
where .
4 Dynamic Programming Solution for the Defender’s Problem
4.1 The Dynamic Program
We solve Problem () using dynamic programming. The dynamic program corresponding to Problem () is
(4) 
for every (see [kumar1986stochastic, bertsekas1995dynamic]), where is the set of observer states that can be reached by when the defender’s action is and the true system state in is . The set is determined as follows. If at time the observer’s state is and the defender takes action then, before the effect of at time and the observation at time are realized, there will be several potential candidate observer states at . Only a subset of these possible observer states can occur when the true state of the system at time is . This subset is . We illustrate the form of the set by the following example.
Example 1. Assume a network of three computers and a current observer state of
If the defender takes action then, before the effect of and the observation at are realized, the possible observer states are
If the true system state is then
4.2 Solution of the Dynamic Program
We obtain the solution of the dynamic program, Equation (4), via value iteration [kumar1986stochastic, bertsekas1995dynamic]. For that matter, we define the operator by
(5) 
We prove the following result.
Theorem 4.1
The operator , defined by Equation (5), is a contraction map.
Proof
We use Blackwell’s sufficiency theorem (Theorem 5, [blackwell1965]) to show that is a contraction mapping. We show:

Bounded value functions: First, note that , and that we have bounded costs, , ; , . Starting from any bounded value function, with we have
for all .

Monotonicity: Assume . Then, for all and ,
Therefore, for all and
Hence,

Discounting: Assume . Then, for all
By Blackwell’s sufficiency theorem, the operator is a contraction mapping.
Since is a contraction mapping, we can use value iteration to obtain the solution to Equation (4), which we term the stationary value function, . From the stationary value function, we can obtain an optimal policy, , as follows
The optimal policy, , is not always unique. That is, for a given observer state , there could be multiple which achieve the same minimum value of . We denote by the set of optimal actions for a given observer state . In the event that is not a singleton for a given state , we choose a single action based on a quantity we define as the confidentiality threat. The confidentiality threat is a measure of the degree to which computer is presumed (by the defender) to be compromised and is defined as follows
where , , is the cost of the state, as defined in Equation (1), of the computer in the candidate system state . Summing over all candidate system states in the observer state for a given computer , we obtain the confidentiality threat . Next, we compare the confidentiality threat of each computer and choose the action that corresponds to the highest confidentiality threat. In the case of equal confidentiality threats (which arise when the observer state is symmetric), we choose the action in corresponding to the computer with the lower index .^{6}^{6}6This choice is arbitrary; we could randomize the choice as well.
5 Optimal Defender’s Policy
We now discuss the characteristics of the optimal policy for Problem (), henceforth referred to as the optimal policy. We illustrate sensitivity analysis via numerical results for both a two computer and a three computer network. We also discuss some qualitative observations of the optimal policy.
First we note that determining the set of observer states and its associated dynamics is not a trivial computational task, even for moderately sized networks. Our calculations show for the case of a two computer network, the defender’s observer automaton consists of states and transitions. Extending the system to a three computer network results in states with transitions. To automate the procedure, we have developed a collection of programs which makes use of the UMDESLIB software library [umdes1]. The specific procedure is discussed in Appendix A.
The sensitivity analysis studies how the cost of reimaging affects the optimal policy. For both the two computer and three computer networks, we increase the reimage cost, , and observe how the optimal policy behaves. Since the number of observer states in the two computer network, denoted , is modest, , we are able to plot the behavior for each observer state , as seen in Figure 4(a).^{7}^{7}7The “ordering” of these states is arbitrary. In the three computer network, the size of observer state space, , is much larger than that of the two computer network. As a result, we plot the percentage of observer states that have the optimal action , for all , and analyze how the percentage changes as we increase , as seen in Figure 4(b).
The behavior of the optimal policy due to increasing reimage costs, , is intuitive. As increases, the optimal policy exhibits a threshold form,^{8}^{8}8In the simulations that we have performed. switching from specifying more expensive actions to less expensive actions. For very low reimage costs, the optimal policy specifies in the majority of the observer states. As increases, observer states for which was optimal, switch to either sense, , or null, . Once the optimal action is null, it remains null for all higher values of . For the observer states where the action switched to sense, a further increase in may result in a switch to null; however, there exist some observer states where the optimal action is sense for all higher values of . This threshold behavior is clearly depicted in Figure 4(a).
As a result of the aforementioned threshold behavior, for high enough values of , the optimal policy eventually specifies or for all states . The argument to see why there is no reimage action for high values of is straightforward; at these values of the cost of reimaging is prohibitively expensive and the defender would rather incur the cost of being in a poor system state (see Equation (2)).
An interesting (related) observation can be seen by analyzing the characteristics of the observer states and how these characteristics influence when the policy undergoes a switch as increases. Consider Figure 4(a), and observe the behavior of the optimal policy around the reimage cost of . There is a collection of observer states (with indices 74 – 87) that contain the element (both computers are in the remote compromised state) where the optimal policy specifies a switch from reimage to null. In these observer states, the defender believes that the true system state is so poor that, even if the a computer were to be reimaged, the events in would cause the system to transition back to a poor state in so few iterations that the defender would just be wasting its resources by reimaging. That is, the number of time steps that it takes for the system to return to a poor state is not high enough to justify the cost that the defender must incur to keep the system in a secure operating mode. For this reason, in these observer states, the defender exhibits the passive behavior of giving up by choosing the cheapest action, . An interesting related observation is that for other observer states in the system (the observer states that do not contain the element ) the optimal policy specifies a switch away from reimage at a higher reimage cost (around ). In these observer states the defender views the process of securing the system as economically efficient because it can be returned to a secure operating mode in a small enough number of iterations (compared to the observer states that contain the system state ). This observed behavior reflects the fact that attacks are progressive and that time has value in our model.
Another observation is that there are sets of parameters for which the sense action is useful (as seen starting in Figure 4(a) around and peaking in Figure 4(b) around ). In these cases the act of sensing a computer results in a split observer state that has a lower future cost than if the defender were to choose either null or reimage. Thus, paying the cost to sense can result in the defender having a better idea of the underlying system state and thus make a wiser decision on which future action to take. However, for low values of , we can see that the defender prefers to reimage over obtaining a better estimate of the system (and similarly for high values of , the defender prefers to take the null action). This behavior highlights the duality between estimation and control.
Interestingly, sensing remains an optimal action even for high values of when there is no reimage action prescribed in the optimal defense policy. In these cases, even though sensing does not change the state of the network, it refines the defender’s information which then results in a lower future cost for the defender. Even though the sense action is more expensive than the null action, this lower future cost causes the defender to choose sense over null.
The intent of determining an optimal policy is to offer a set of procedures for the defender such that the network is able to be kept as secure as possible. After the defender specifies its costs for actions and costs for states, the optimal policy specifies a procedure that the defender should follow. For each action the defender takes, , and for each event it observes, , the resulting observer state is known through the dynamics of the observer state. For each of these observer states resulting from the sequence of defender actions and observed events, the optimal policy specifies whether to sense or reimage a particular computer, or to wait and do nothing. The resulting defender behavior will keep the network as secure as possible under the minmax cost criterion.
6 Conclusion and Reflections
In this paper we have proposed a supervisory control approach to dynamic cybersecurity. We have taken the viewpoint of the defender whose task is to defend a network of computers against progressive attacks. Some of the attacker actions are unobservable by the defender, thus the defender does not have perfect knowledge of the true system state. We define an observer state for the defender to capture this lack of perfect knowledge.
We have assumed that the defender takes a conservative approach to preserving the security of the system. We have used the minmax performance criterion to capture the defender’s conservative approach.
Dynamic programming was used to obtain an optimal defender policy to Problem (). The numerical results show that the optimal policy exhibits a threshold behavior when the cost of actions are varied. We have also observed the duality of estimation and control in our optimal policy.
We believe that our approach is suitable for modeling interactions between an attacker and a defender in general security settings. In general, we can use our approach to study dynamic defense against attacks in a network of resources each with (orderable) security levels and security boundaries. The attack actions can penetrate through some of these boundaries to compromise a resource, or use a compromised resource to attack other resources in the network. Some of these actions can be unobservable to the defender. On the other hand, the defender can take actions to change the state of resources to a more secure operating mode or sense the system state to obtain more refined information about the system’s status.
The model we have defined is rich enough to be extended to capture more complicated environments. Some examples of such environments can be heterogeneity of the network’s computers^{9}^{9}9Placing an importance weight on each computer. or the introduction of a dummy computer^{10}^{10}10The dummy computer contains no sensitive information and is meant to mislead the attacker. into the system so as to increase the network’s resiliency to attacks.
One bottleneck of our approach is that the number of states and transitions grows exponentially with the number of computers. One solution to this is to use a hierarchical decomposition for the system. For example an Internet Service Provider (ISP) can model a collection of nodes in their network as one region (resource). Once a nonsecure region is observed in the system, the ISP can more carefully analyze the nodes within that region and take appropriate actions. Approximate dynamic programming methods could also be useful in dealing with systems with a large number of computers.
Acknowledgement
This work was supported in part by NSF grant CNS1238962 and ARO MURI grant W911NF1310421. The authors are grateful to Eric Dallal for helpful discussions.
A Appendix – UMDESLIB
The UMDESLIB library [umdes1] is a collection of Croutines that was built to study discrete event systems that are modeled by finite state automata. Through specification of the states and events of a system automaton (along with the controllability and observability of events), the library can construct an entity termed the observer automaton. In our problem the observer automaton is the defender’s observer automaton, since we take the viewpoint of the defender. Thus, the observer automaton consists of the defender’s observer states.
In this appendix we describe an automated process^{11}^{11}11Source code is available upon request. for extracting the defender’s observer state from the system automaton that makes use of UMDESLIB. This requires first constructing the system automaton in an acceptable format for the library while preserving all the features of our model. After running the library on the provided system automaton, we extract the defender’s observer state from the observer automaton output.
This method allows one to construct the defender’s observer state for any number of computers.^{12}^{12}12The only bottleneck being the (potentially large) dimensionality of the problem.
Constructing the System Automaton. The input that we provide to UMDESLIB is the system automaton from the defender’s viewpoint, as illustrated earlier in Figure 3.
In order to preserve all features of our model in the resulting observer automaton, we need to introduce additional sensing actions. Recall that the sense action, , causes the system automaton to transition to the same state as the null action, (see Figure 3). However, as stated in Section 2, the sense action updates the information state of the defender. In order to ensure that UMDESLIB captures this functionality, we expand the sense action for each computer into distinct actions, denoted by , which represent sensing computer when it is in state . This results in a reduced level of uncertainty for the defender as it splits the observer state into, at most, possible sets of observer states. The admissible actions from , at a given system state, are the sense actions that correspond to the true system state. For example, from the system state , the admissible sense actions are , , and . The above example of the expanded sense action is perhaps worrisome at first glance – if the only admissible sense actions from the current state are the ones that correspond to the current state of the computer, then the defender will know what the current state of each computer is, eliminating the need for a sense action. However, the observer state that is obtained from each expanded sense action is the same as the observer state that is obtained if the defender were to observe the true, unknown state of a computer.
Running UMDESLIB on the system automaton with the expanded sense actions results in the observer automaton.
Extracting the Defender’s Observer State. The output of UMDESLIB is the observer automaton, from which we must extract the defender’s observer state. First, since the defender does not have the ability to choose the expanded sense actions, , we regroup them into a single, nondeterministic action, , for each . Next, we need to extract the function, from the observer automaton. The observer automaton, generated by UMDESLIB, takes the form of a bipartite graph; one collection of states of the bipartite graph is observer states over system states , denoted , whereas the other collection is observer states over intermediate states , denoted . Defender actions, , are the only admissible actions from observer states . The defense action causes a transition^{13}^{13}13This transition may be nondeterministic due to the sense action. to an observer state in , where only events in are admissible. Each event causes a transition back to an observer state in . Repeating this process for all observer states in , actions , and events , the function is defined. To construct the set we follow the approach described in Section 4.1 and illustrated by Example 1.