A Polynomial Time Algorithm for Spatio-Temporal Security Games

# A Polynomial Time Algorithm for Spatio-Temporal Security Games

Soheil Behnezhad Department of Computer Science, University of Maryland. Email: {soheil,mahsaa,hajiagha}@cs.umd.edu. Supported in part by NSF CAREER award CCF-1053605, NSF BIGDATA grant IIS-1546108, NSF AF:Medium grant CCF-1161365, DARPA GRAPHS/AFOSR grant FA9550-12-1-0423, and another DARPA SIMPLEX grant.    Mahsa Derakhshan11footnotemark: 1    MohammadTaghi Hajiaghayi11footnotemark: 1    Aleksandrs Slivkins Microsoft Research. Email: slivkins@microsoft.com.
###### Abstract

An ever-important issue is protecting infrastructure and other valuable targets from a range of threats from vandalism to theft to piracy to terrorism. The “defender” can rarely afford the needed resources for a 100% protection. Thus, the key question is, how to provide the best protection using the limited available resources.

We study a practically important class of security games that is played out in space and time, with targets and “patrols” moving on a real line. A central open question here is whether the Nash equilibrium (i.e., the minimax strategy of the defender) can be computed in polynomial time. We resolve this question in the affirmative. Our algorithm runs in time polynomial in the input size, and only polylogarithmic in the number of possible patrol locations . Further, we provide a continuous extension in which patrol locations can take arbitrary real values. Prior work obtained polynomial-time algorithms only under a substantial assumption, e.g., a constant number of rounds. Further, all these algorithms have running times polynomial in , which can be very large.

\pdfstringdefDisableCommands

## 1 Introduction

Protecting infrastructure and other valuable targets from a range of threats from vandalism to theft to piracy to terrorism is an ever-important issue around the world, aggravated recently by increased threats of piracy and terrorism. Providing 100% protection usually requires more money or other resources than the “defender” can commit. Thus, the key question is, how to provide the best protection using the limited resources that are available.

A successful recent approach casts this issue in game-theoretic terms, modeling it as a security game: a zero-sum game between the defender who has some targets to protect, and the attacker who strives to inflict damage on these targets. Usually the defender needs to commit to a particular allocation of resources, such as the schedule of patrols, whereas the attacker can strike at will; this corresponds to a classic game-theoretic model called a Stackelberg game. The defender can (and should) randomize, e.g. so as to prevent the attacker from exploiting a particular gap in the patrol schedule. The attacker can be strategic and optimize his attack according to his beliefs about the defender’s strategy. The literature has mostly adopted a pessimistic view, in which the attack is the exact best response to the defender’s actual strategy. Thus, the defender’s goal is to use an optimal (minimax) strategy.

This approach has resulted in a flurry of research activity, including several awards and nominations. Further, it has been adopted in a number of real-world deployments, ranging from patrol boats to airport checkpoints to US air marshals to an urban transit system to wildlife protection. (Many of them have been recognized with commendations and awards.) Other potential applications include protecting aid convoys in unstable regions, and protecting ships from piracy.

Most applications of security games are spatio-temporal in the sense that the patrols move from one location to another with a limited speed, and only protect targets that are sufficiently close. Then the defender’s strategy is a rather complicated object: a pure strategy should specify a trajectory for every patrol, possibly choosing from a very large number of possible locations. Further, in many applications targets have their own trajectories that need to be taken into consideration.

The first-order question in spatio-temporal security games is computing the equilibrium, i.e., the minimax strategy for the defender. More specifically, we focus on exact equilibrium computation in polynomial time. The prior work (e.g., [8, 5, 20]), as well as the present paper, considers a one-dimensional space (that is, patrol and target locations are on the real line), and discretizes it uniformly into the possible patrol locations. The relevant parameters are: the number of patrols (), the number of targets (), the number of rounds of scheduling (), and the number of possible patrol locations (). The input specifies trajectories of targets and their values; the trajectories may be arbitrary, and the values may change over time. Thus, the input size is . What makes the problem particularly challenging is that the number of pure strategies — tuples of patrol trajectories — is as large as .

A central open question here is whether the Nash equilibrium (i.e., the minimax strategy of the defender) can be computed in polynomial time. We resolve this question in the affirmative: our main result is an algorithm that computes the exact equilibrium in time polynomial in the input size. In particular, the running time scales only polylogarithmically in the number of possible patrol locations. Moreover, we provide a continuous extension in which patrol locations can take arbitrary real values, under a mild technical assumption that the target locations are rational. The dependence on the number of patrols is argued away: while a pure strategy of the defender must specify a trajectory for each patrol, we prove that patrols suffices to protect all targets. The output is a distribution over -many pure strategies.

We improve over the state-of-art prior work [8, 20] in several ways. First, the running time in [8] is exponential in the number of patrols () and becomes impractical even for [20]. Second, [20] achieves a polynomial running time only under a substantial assumption: either a constant number of rounds, or that all targets have a unit value at all times, or that the “protection ranges” of the patrols are so small that they cannot overlap for any two adjacent patrol locations. Third, the running times in [8, 20] depend polynomially on the the number of patrol locations (). Finally, the polynomial running time in [20] relies on the Ellipsoid Algorithm for solving linear programs, which is notoriously slow in practice.

Our techniques. Our main algorithm works on the discretized version of the problem, in which the possible locations for patrols are integers from to (there is no such restriction on the location of targets). The algorithm consists of three parts: partitioning the spatial domain, formulating patrol placements in a single time point, and combining them to find the optimal strategy for all time points. Below we describe these three parts one by one.

First, we partition the spatial domain into a relatively small number of intervals so that the patrol locations inside each interval are “equivalent” to one another as far as our problem is concerned. Then we use these intervals as “atomic” patrol locations, thereby replacing the dependence on with the dependence on the number of intervals. The partitioning algorithm starts from the last round and goes backwards in time: for each round it constructs a collection of intervals based on the target locations at time and the intervals constructed for time , so as to ensure the desired “equivalence” property. We bound the number of intervals by .

Second, for each time point we construct a graph which models any possible snapshot of the patrol placements at this time. More specifically, every patrol placement at time can be mapped to a specific path in , and every randomized patrol placement can be mapped to a specific unit flow in . Furthermore, we define the cost of a path/flow in such that it equals the maximum utility of the attacker under the corresponding (randomized) patrol placement at time , and can be computed via a linear program.

Third, we create a linear program that “unifies” the graphs , and use this LP to construct the minimax strategy for the defender. The LP ensures that the (randomized) patrol placements computed in each are consistent with one another, in the sense that there is a valid transition from one round to the next, without violating the speed restriction. To accomplish this, the LP finds min-cost flows in each , and includes additional linear constraints that guarantee consistency. We post-process the solution of this LP and remove the crossing edges in the flows. Finally, we incrementally construct a mixed strategy of the defender based on the post-processed solution and prove that it is indeed the optimal strategy.

In the continuous version of the problem, patrol locations can take arbitrary real values, and the target locations are rational. We first re-scale all target locations to integers, and prove that this re-scaled problem instance admits a discrete solution. Then we use the algorithm from the discretized version. It is essential that the running time of the latter is polylogarithmic in .

Related work. Security games have been studied extensively in the past decade, see the book [18] as well as more recent work, e.g. [8, 5, 12, 20, 19]. The research concerned both theoretical foundations as well as applications. Publicized real-world deployments include: US Coast Guard patrol boats [8], canine-patrol and vehicle-checkpoints scheduling in Los Angeles airport (LAX) [16], scheduling flights for air marshals by US Federal Air Marshal Service [10], airport passenger screening by US Transportation Security Administration [6], fare inspection in Los Angeles transit system [21], and wildlife protection in Malaysia [9].

Most relevant to the present work are papers on computing minimax strategy in zero-sum spatio-temporal security games. While the initial work assumed static targets [18], some of the later work addressed moving targets [5, 8, 20] (as discussed above). On a related note, if the patrols are allowed to accelerate, with an upper bound on the acceleration, then computing the defender’s minimax strategy becomes NP-hard [20, 19]. Other work concerned solving security games that are not (necessarily) spatio-temporal or zero-sum, e.g. [7, 11, 20] A notable line of work in security games assumes that the defender does not fully know attackers’ values for the targets, but can learn more about them over multiple rounds of interaction with the said attackers (see [15] for a recent survey of a subset of this work, as well as [13, 14, 4, 2]).

In a broader game-theoretic context, our work is related to Stackelberg games and equilibrium computation. Originally introduced to model competing firms, Stackelberg games is a classic concept in game theory which appears in many textbooks and countless papers.

Computing Nash Equilibria is a central problem in algorithmic economics. While this problem is known to be PPAD-hard in general, polynomial-time algorithms exist for many natural classes of games, particularly for zero-sum games (for background, see a survey [17] and references therein). Yet, these algorithmic results are insufficient for games in which the number of pure strategies can be exponential in the input size (see [1, 3] for examples of such games).

Further directions. Many ideas in this paper may be useful for solving other spatio-temporal security games. In particular, the overall algorithmic framework of locally solving each “time layer” under some compatibility constraints and then merging the “time layers” to compute the global optimum solution appears broadly applicable.

We believe our techniques can be extended to achieve a polynomial time algorithm for several extensions of the model. In particular, we can incorporate additional constraints on the patrols, such as obstacles that the patrols cannot cross over, or speed limits that depend on a particular location. We can also handle scenarios when the spatial domain or the timeline are not evenly discretized. (For ease of presentaton, we do not include these extensions in the present paper).

A general way to model such extensions is to assume that the range of valid movements for each location is given in the input. Whenever we still have a property that the patrols do not need to cross each other in an optimal solution (which indeed is a very natural property for homogeneous patrols), our techniques achieve a polynomial-time algorithm in the input size. However, the input size for this extended model gets large, and no longer scales polylogarithmically in .

That said, some important special cases allow for succinct input. For example, a small number of obstacles can be specified directly, rather than given implicitly via the ranges of valid movements. Designing a polynomial-time algorithm for such cases requires a problem-specific pre-processing step for partitioning the locations (which could potentially be very different from the partitioning step in this paper). However, the rest of the algorithm could be essentially the same.

It is very tempting to extend our model to a two-dimensional space. We believe some of our techniques can be useful for this extension, most importantly the compatibility constraint technique from Section 3.3. The main challenge in extending our approach is an appropriate generalization of the “day graphs”.

## 2 Preliminaries

Our goal is to find an optimal patrol scheduling strategy for the defender to protect a set of mobile targets in a one-dimensional space. Figure 1 illustrates the problem in a 2-D diagram; the x-axis denotes the evenly discretized temporal domain containing time points, and the y-axis denotes the one-dimensional space of length . We say a spatial position is above , if and similarly define a spatial position to be under , if .

The defender has homogeneous patrols to protect a set of moving targets from a potential attack. Patrols have a maximum speed of . This means, a move from a position at time to at time is invalid if . We consider two models of the problem: discretized model, denoted by (Figure 1-a), and continuous model, denoted by (Figure 1-b). In the discretized model, the position of any patrol at any time point is an integer between 0 and , but in the continuous model, the patrol locations are not restricted to be integers (note that the temporal domain is still discretized). Furthermore, for any target , and respectively denote its position and weight at time . Note that in both models, there is no restriction on the position and the speed of targets and the weight of the targets could change from time to time (e.g., ferries may not carry the same number of people at different times). The patrols protect any target within their protection radius (a fixed number for all patrols). That is, a patrol at position at time , protects a target , if . In Figure 1, the grey ranges around patrols denote the area they protect. We denote the set of targets, the set of spatial positions, the set of patrols and the set of all time points by , , and respectively.

A patrol path, is a sequence of positions , such that for any , a move from to does not violate the speed limit (the black paths in Figure 1 denote patrol paths). A pure strategy of the defender is a set of patrol paths denoted by . A mixed strategy of the defender, is a probability distribution over her pure strategies. A pure strategy of the attacker is a single target-time pair which means the attacker attacks target at time . Let be the pure strategy of the defender and be the pure strategy of the attacker, attacker’s utility is if target is protected by at least one patrol at time and it is if it is not protected by any patrols. We assume the game is zero-sum and find minmax strategies.

Without loss of generality, we can assume . This observation comes from the fact that with only patrols, the defender can provide a 100% protection without needing any more patrols. To do this, for any target-time pair , the defender can put a still patrol at the location of target at time .

## 3 Discrete Model

The main goal of this section is to prove the following theorem:

###### Theorem 3.1

There is a polynomial time (in input size) algorithm to solve .

### 3.1 Partitioning The Positions

The number of pure strategies in a single time point, even with only one patrol, is not polynomial in the input size, since the number of possible locations, , could be exponentially larger than the input size. To overcome this difficulty, we partition the spatial positions into polynomially many sets of consecutive positions which we call intervals and we only keep track of these intervals instead of maintaining the exact position of a patrol within the intervals. For example, assume there is only one target, one patrol and , then it only matters whether the patrol’s protection range contains the location of the target or not and the exact position of the patrol does not matter.

We use Algorithm 1 to partition the positions into meaningful intervals. For any given time point , the function GetIntervalPoints, generates a sorted array of numbers that we call interval points and GetIntervals uses these generated interval points to partition the spatial positions of any time point to intervals:

 It=⟨[p1,p2),[p2,p3),…,[pnt−1,pnt)⟩.

The intervals are assumed to be left-closed and right-open to simplify the calculations. We use to denote the -th interval in and use to denote the set of consecutive intervals .

The following lemma proves the total number of intervals is polynomial in the input size and as a corollary of that, Algorithm 1 runs in polynomial time in the input size.

###### Lemma 3.2

The total number of intervals created by Algorithm 1 is .

Proof. To prove this, we charge any interval point to a target/time pair and show no target/time pair will be charged more than times and since there are at most target/time pairs, there will not be more than interval points. Note that there are two ways for interval points to be added to a partitioning set (Figure 2):

1. For any target at time point , two interval points and are added to . We charge these two interval points to .

2. For any interval point in two interval points and are added to . We recursively charge these two intervals to the target/time pair that the interval point is charged to.

This means an interval point could be charged to a target/time pair , only if or for any integer where . Although this condition is only necessary and not sufficient, but it implies at most interval positions could be charged to an arbitrary target/time pair and therefore the total number of partition points is .

Note that since by Lemma 3.2, the total number of interval points at any time point is polynomial in the input size, function GetIntervalPoints(, ), which is simply a loop over and the targets, runs in polynomial time.

###### Corollary 3.3

Algorithm 1 halts in polynomial time in the input size.

The following lemmas prove two important properties of the partitions generated by Algorithm 1. These properties basically imply all patrol locations within the same interval are equivalent as far as the problem is concerned.

###### Lemma 3.4

Let and be two patrols in the same interval at any time . The set of targets that and protect at time are equal.

###### Lemma 3.5

Let and be two arbitrary intervals in and respectively. If there exists a feasible move from an arbitrary position in to a position in , for any position in , there exists a feasible move to a position in .

We can now define the feasible set of a set of consecutive intervals:

###### Definition 3.6 (feasible sets)

We define the feasible set of , denoted by , to be a subset of , containing an interval iff there exists a feasible move from a position in some interval in to . We may occasionally abuse this notation and use the simpler form of instead of (Figure 3).

It is easy to see the following corollary of Definition 3.6:

###### Corollary 3.7

For any there exists a consecutive interval set such that

 feast(It[i:j])=It+1[i′:j′].
###### Definition 3.8 (interval path)

We define an interval path to be a sequence of intervals , such that for any time point , . Moreover, for any interval path , we define to be the set of all patrol paths that are within . More formally, a patrol path is within if and only if for any time point , is in interval .

Note that any patrol path is within exactly one interval path, since intervals do not overlap and they cover all locations. It could also be obtained from the following lemma that is never empty for an interval path . The proof is to choose any position in and following the valid movements until we reach a position in .

###### Lemma 3.9

For any interval path , there is at least one patrol path in .

Note that by Lemma 3.4, two patrols that are in the same interval, protect the same set of targets at that specific time. This implies that the patrol paths that are within the same interval path, protect the same set of targets at all times and could be replaced with one another in any strategy, without changing the utilities. This means the amount of information encoded in an interval path is sufficient to describe the important characteristics of strategies and find the optimal one.

### 3.2 Strategies In a Single Time Point

In this section we explain how to locally find the best strategy for a single time point ignoring the speed limitations. Note that although we proved the number of intervals is polynomial in the input size, there are still exponentially many different ways to place our patrols in them. This section describes how we can resolve this problem and find the best strategy.

We use the term snapshot to denote a patrol placement at a single time point and formally define it as follows:

###### Definition 3.10 (snapshots)

A pure snapshot at time point , is an assignment of patrols to intervals of time . We denote it by a sorted sequence of intervals such that for any , . A mixed snapshot, denoted by is a probability distribution over pure snapshots where denotes the probability of choosing pure snapshot and .

For any time point , we construct a weighted directed graph (called a day graph) and give a one-to-one mapping between pure snapshots at time and paths from (source vertex) to (sink vertex), where and are two specific vertices of . Moreover, we map any mixed snapshot at time point , to a network flow of 1 unit from to . This mapping is not necessarily one-to-one and many mixed snapshots may be mapped to the same network flow; however, the maximum utility of the attacker in all such mixed snapshots, will be the same.

In Definition 3.11 we formally explain how is constructed. An informal explanation of it is as follows: the vertex set of , as shown in Figure 4-a, includes a vertex , a vertex and a grid of vertices (recall that is the number of intervals at time ) each denoted by . There is an edge from to any vertex in the first column of the grid , and there is an edge from any vertex in the last column of the grid to . Also for any , , and , there is an edge from to if . Furthermore, we define a canonical path to be any path from to and define a canonical flow to be any flow of unit 1 from to (Definition 3.12). We give a one-to-one mapping between canonical paths in and pure snapshots at time point in Definition 3.13 and map any mixed snapshot to a canonical flow in Definition 3.14. Figure 4-b shows a sample day graph, a canonical path in it and its equivalent pure snapshot. Moreover, we assign weights to the edges of such that the maximum payoff of the attacker for a pure (mixed) snapshot equals the cost of its corresponding canonical path (flow). The cost of a canonical path and a canonical flow is defined in Definition 3.12.

For any target and any intervals and at time , we define the binary value to be if the following two conditions hold: (1) target is located in a position between and (non-inclusive) at time , and, (2) could not be protected by any patrol at any arbitrary position in or ; otherwise we set to be 0. We similarly define two binary variables and for the border cases. We set to be 1 iff target is in a position below where no patrol in can protect it and set to be 1 iff target is in a position above where no patrol in can protect it. Assume a patrol placement does not protect a target at time . Let and be the closest intervals to target , that contain at least one patrol and are below and above respectively (set them to be if no such interval exits), then by definition . Using this definition, we can now formally define a day graph and canonical path/flow.

###### Definition 3.11 (day graph)

Given a time point , we construct graph as follows:

1. Graph contains a vertex (source), a vertex (sink) and other vertices, each denoted by for and .

2. For any such that , there is an edge from to . If denotes an edge of this kind, for any target , we define to be .

3. For any such that , there is an edge from to . If denotes an edge of this kind, for any target , we define to be .

4. For any two vertices and , if , there is an edge from to . If denotes this edge, for any target , we define .

###### Definition 3.12 (canonical path/flow)

In a day graph any path from to is a canonical path. Let be the set of edges in a canonical path, and be an arbitrary target. We define the cost of this canonical path for target to be:

 ∑e∈Ece,a.wt,a (1)

Also, any flow of unit 1 from to is a canonical flow. We denote any canonical flow with a function , where denotes the flow passing through an edge . The cost of an arbitrary canonical flow , for day graph is:

 max∑e∈E(Gt)f(e).ce,a.wt,a∀a; where a∈[A.] (2)

Intuitively speaking, is 1 if and only if having edge in a canonical path implies that in the “equivalent” pure snapshot of , target is not covered by any patrol. The formal mapping of canonical paths and flows to snapshots is as follows.

###### Definition 3.13 (pure snapshot mapping)

Let be a pure snapshot (recall that for any ). We map to the following canonical path:

 pt=⟨St,Vt[1,y1],Vt[2,y2],…,Vt[K,yK],S′t⟩

and similarly map to and say and are equivalent.

###### Definition 3.14 (mixed snapshot mapping)

Let be a mixed snapshot at time . Also let denote a flow of unit from to through the edges of the equivalent canonical path of . We construct flow as follows: for any edge of , . Note that since by definition of a mixed snapshot, , is a flow of unit 1 from to , and hence is a canonical flow. We map to .

In Lemma 3.15 we prove that the payoff of the attacker if he attacks target at time while the placement of patrols is represented by the pure strategy , equals to the cost of the target in the canonical path equivalent to . Then in Lemma 3.16 we prove that the maximum payoff of the attacker at time while the strategy of defender is represented by the mixed snapshot is equal to the cost of canonical flow equivalent to . These two lemmas can be directly obtained by the given definitions, however, for space limitations, their formal proofs are left to the appendix.

###### Lemma 3.15

Let be a pure snapshot at time , and be an arbitrary target. The payoff of the attacker with respect to , if he attacks the target at time , equals the cost of the target in the canonical path equivalent (Definition 3.13) to .

###### Lemma 3.16

Let be a mixed snapshot at time and let denote the canonical flow that is mapped to. The maximum expected payoff of the attacker at time with respect to , equals the cost of .

### 3.3 Best Strategy For All Time Points

Lemma 3.16 implies if our goal is to minimize the maximum payoff of the attacker at a single time point , it suffices to find a canonical flow in with minimum cost. Although this works for the special case when , but it does not consider the movement of patrols and their speed limits. More precisely, a pure strategy for the defender could be shown as a sequence of pure snapshots . However there is one important condition: for any , there must be a feasible transition from to . This is also the case for mixed snapshots and two consecutive ones may not be necessarily compatible. In this section we resolve this issue and prove Theorem 3.1.

Our algorithm to find the optimal strategy of the defender consists of three main steps. In the first step, which is explained in more details in Section 3.3.1, we run an LP that returns a canonical flow for each day graph , …, . Apart from the constraints to ensure we get valid canonical flows with minimum overall cost, our LP contains an extra constraint for compatibility of these canonical flows. In the second step (Section 3.3.2), while keeping the overall characteristics of these canonical flows unchanged, we adjust them in a way to make sure no two crossing edges in any of the day graphs have a positive flow. Finally, in the third step (Section 3.3.3), we construct a mixed strategy for the defender based on the adjusted canonical flows.

Let and denote two pure snapshots representing the placement of patrols in a valid pure strategy at two consecutive times. In the following lemma we prove that there exists a feasible move from -th interval of to the -th interval of (recall that intervals in pure snapshots are sorted based on their position). We prove this lemma by induction on the number of patrols. At each step we prove that there exists a feasible move from the top most interval in to the top most interval in and we prove if we match these two together and remove them, we can construct another pure strategy that contains the remaining intervals.

###### Lemma 3.17

If and are two pure snapshots at time and in at least one valid pure strategy , then for any we have .

In the following definition, we define what it means for a patrol path to be intervally above, below or equal to another patrol path:

###### Definition 3.18

Let and be two patrol paths. We say and are intervally equal if for any , and are in the same interval. We also define to be intervally under , if for any , either or and are in the same interval. Similarly, we define to be intervally above , if is intervally under .

Next, in Lemma 3.19 we prove that there exists an optimal strategy of the defender that for any pure strategy in its support, there is an ordering of interval paths in such that, -th interval path is always intervally under -th interval path if . To prove this we use Lemma 3.17 that indicates there exists a possible move from the -th interval in pure snapshot to the -th interval in pure snapshot if and represent the patrols’ placement of a pure strategy in two consecutive times and . For any patrol , we construct an interval path that contains the -th interval of all the pure snapshots in p, and we assign patrol to this interval path. It is easy to see that if we order the patrols from 1 to the interval path assigned to patrol is under the interval path assigned to patrol if . This lemma is very similar to Lemma 3 of [20] but adopted to intervals paths.

###### Lemma 3.19

There exists an optimal mixed strategy of the defender, such that for every pure strategy in its support there is an ordering of interval paths such that the following condition holds for this ordering: for any two interval paths and , in the pure strategy p, is intervally under if .

Again, for space limitations, the full proof is left to the appendix. However, intuitively, starting from any given optimal solution one can swap the remaining path of any two patrols that cross each other without losing anything. This eventually resolves all crosses and gives a desired optimal solution.

Let denote an optimal strategy of the defender that satisfies the condition mentioned in Lemma 3.19, and let denote the ordering of interval paths in pure strategy in support of such that is intervally under if . Without loss of generality we assume for any the same patrol is assigned to interval path for all in support of , and it is denoted by . Therefore, if denotes a pure snapshot that represents the patrols’ placement in an arbitrary time point in pure strategy in support of , is the position of patrol in pure strategy at this time. Moreover, let denote the mixed snapshot of strategy at time . The flow passing through the vertex , denotes the probability with which patrol is placed in the -th interval at time . So, all the data related to position of patrol at time is in the column of the day graph of time . We use this later in the paper.

#### 3.3.1 Linear Programming

In this section we explain how the first step of our algorithm, the LP, works.

Note that a flow of 1 unit in with minimum weight, minimizes the attacker’s payoff at time . To minimize the attacker’s payoff at all time points, we need to minimize the cost of the canonical flow with the maximum cost. To do this, for any edge in any day graph we define an LP variable which specifies the amount of flow passing through . Moreover for any time point , we include the following constraints in our LP:

1. For each vertex of (except for the source vertex and the sink vertex ) the amount of ingoing flow to is equal to the amount of outgoing flow from .

2. The amount of outgoing flow from is 1.

3. The amount of ingoing flow to is 1.

4. The amount of flow passing through any edge is not negative.

5. The cost of flow through any edge and for any target in , specified by is not more than .

And we set the objective function of our LP to minimize , which is the overall cost of canonical flows. However, as we said earlier the canonical flows we find must be compatible; therefore apart from the aforementioned constraints, we define a compatibility constraint. Recall that denotes a collection of intervals at time and contains an interval , iff there is a valid move from an interval in to . We define a very similar concept for day graphs:

###### Definition 3.20

Let denote the set of consecutive grid vertices in . Recall that by definition of , any vertices in is equivalent to an interval in . We define as follows: is a subset of grid vertices of containing a vertex if and only if is equivalent to an interval in .

The compatibility constraint we use in our LP is as follows: for any set of consecutive vertices , the amount of flow passing through the vertices in is not more than the amount of flow passing through the vertices in . Intuitively, this constraint indicates that for any set of consecutive intervals , the probability that there exists a patrol in it, should not be more than the probability of having a patrol in its feasible set () in the next time point. Note that by definition, contains all of the valid intervals that a patrol in can move to; therefore it is obvious why this constraint is necessary. The sufficiency of this constraint to prove compatibility of snapshots, however, comes later when we explain how we construct an optimal strategy based on the adjusted LP solution. The formal definition of the LP is given in Linear Program 1.

By the end of Section 3.3, we prove the solution of LP 1 is equal to the utility of the attacker if both players play their optimal strategies. Lemma 3.21 proves a weaker claim:

###### Lemma 3.21

The solution of Linear Program 1 gives a lower bound for the utility of the attacker when both players play their optimal strategies.

To prove Lemma 3.21, we start from an optimal strategy of the defender satisfying the condition of Lemma 3.19 that the interval paths do not cross each other. Based on this strategy, we construct a feasible solution for LP 1 in which the value of is equal to the maximum possible utility of the attacker. Note that this only proves LP 1 gives a lower bound for the utility of the attacker when both players play their minimax strategies since the feasible solution we considered is not necessarily the optimum solution of the LP (although as we said before, we will later prove that they are exactly the same).

#### 3.3.2 Adjusting The LP Solution

In this section we give an algorithm that adjusts any optimal solution of LP 1 to resolve their “crossing flows”. We later use the adjusted solution to construct the defender’s optimal strategy.

We start by defining what we mean by crossing edges and crossing flows:

###### Definition 3.22 (crossing edges and crossing flow)

Let and be two arbitrary edges of day graph where . We say and cross, if and only if . Moreover, if is a canonical flow of , the crossing edge pair is a crossing flow in , if and .

In the following definition, we give a total ordering on the crossing flows in a canonical flow, which we later use in the algorithm we provide to resolve them.

###### Definition 3.23 (crossing flows’ ordering)

Let and be two crossing flows of a canonical flow. Also let , , , and be the vertices of these edges. We say if and only if one of the following conditions hold:

1. and .

2. and and .

3. and and and .

4. and and and and .

Algorithm 2 is the formal pseudo-code of how we resolve all crossing flows. At each step, the algorithm resolves the minimum crossing flow (minimum based on the total ordering defined in Definition 3.23) and continues this process until there is no other one. Figure 5 illustrates how a single crossing flow is resolved.

###### Lemma 3.24

Function ResolveCrosses of Algorithm 2 does not increase the total cost of the input canonical flow.

To prove Lemma 3.24, we consider all different locations of targets for which changing the flows might affect the utility of players and prove in none of these cases the total cost is increased. The complete proof is left to the appendix.

###### Lemma 3.25

The running time of the function ResolveCrosses in the Algorithm 2 is polynomial.

The proof scheme of Lemma 3.25 is to show the minimum crossing flow at step is strictly less than the minimum crossing flow at step (after the previously minimum crossing flow is resolved). Consequently, since the total number of possible crossing edges of a day graph is polynomial, the number of steps until the algorithm halts is polynomial. Again, we left the formal proof of this lemma to the appendix for space limitations.

Note that another property of Algorithm 2 is that the flow passing through a vertex will not change and it is only the amount of flow passing through the edges that changes (Figure 5). This proves most of the constraints of LP 1 will still hold. The only two constraints that consider the flow passing through the edges, and not the vertices, are number  7 and number  8. The former will be true since the process does not produce any negative flow and the latter is true since by Lemma 3.24 the total cost does not change.

Consequently, the following statement is true since we can first solve LP 1 by any polynomial time LP-solver and the run Algorithm 2 on its solution.

###### Corollary 3.26

There exists a polynomial time algorithm that finds , a collection of canonical flows, that is a solution of LP 1, and for any , does not contain any crossing flow.

#### 3.3.3 Constructing A Strategy

Assuming is a non-crossing solution of LP 1, this section gives an algorithm to find a mixed strategy of the defender that equivalent to the set of canonical flows in and finally proves Theorem 3.1. We first define what we mean by the top most flow path of a non-crossing canonical flow:

###### Definition 3.27 (Top-Most Flow Path)

Let be a canonical flow of without any crossing flows. We say a canonical path of is a flow path of if for any (), . The top-most flow path of is the flow path of that is above all other flow paths of (that is well-defined because does not have any crossing flow). The size of the flow path of , denoted by , is if for any , and there exists an edge of such that .

###### Lemma 3.28

Let be a collection of non-crossing canonical flows that satisfies the compatibility constraint (constraint number 9) of LP 1. For any that , the top-most flow paths of and are compatible.

Proof. Let and respectively denote the top-most flow paths of and . It suffices to prove for any , there is a valid movement from the corresponding interval of to the corresponding interval of . To do so, we assume this is not the case and obtain a contradiction. Let for some , then one of the following conditions should hold:

1. is below the feasible range .

2. is above the feasible range .

If the first condition is true, the contradiction is that cannot be in the top-most flow path of . To see this, note that we know by constraint 9 of LP 1 that the total flow passing through the vertices of is not less than the flow passing through , therefore there is a vertex in (and above ) with a non-negative flow and thus cannot be in the top-most flow path of .

If the second condition is true, the contradiction is that constraint 9 cannot be satisfied. To see this, note that since is in the top-most flow path of , no flow passes through the vertices above it and therefore . However, since is above the feasible range , which means constraint 9 that indicates the value of the latter summation should not be less than the former one, cannot be satisfied.

###### Theorem 3.29

Let be a solution of LP 1 without any crossing flows. There exists a polynomial time algorithm to find a mixed strategy of the defender that is equivalent to .

To prove Theorem 3.29, we show the following iterative algorithm constructs the desired mixed strategy in polynomial time:

1. Find the top-most flow paths of .

2. Construct the pure strategy , corresponding to .

3. Add to with probability .

4. For any edge of any , decrease to .

5. If there is any flow left in , repeat all the steps.

Note that by Lemma 3.28, if the compatibility constraint of LP 1 is satisfied, the top-most flow paths are compatible. Since at the first round of the algorithm, we have an actual solution of the LP, the compatibility constraint is obviously satisfied. To completely prove the correctness of this algorithm, we also need to show after each iteration, changing the flows does not violate the compatibility constraint. For space limitations, we left this part of the proof to the appendix. Furthermore, the running time of this algorithm is polynomial in the input size since in each iteration, the flow passing through at least one edge decreases to zero and the total number of edges is polynomial.

We are now ready to prove Theorem 3.1.

Proof of Theorem 3.1: Recall that by Lemma 3.21, the optimal solution of LP 1 is a lower bound for the utility of the attacker when both players play their optimal (minimax) strategies. Also note that by Lemma 3.26 and Theorem 3.29, we can construct a mixed strategy of the defender that is equivalent to the optimal solution of LP 1 in polynomial time. This means the maximum utility of the attacker when the defender plays , is equal to its lower bound and therefore minimizes the maximum expected utility of the attacker: i.e., it is a minimax strategy of the defender.

## 4 Continuous Model

In this section we prove the following theorem for the continuous model:

###### Theorem 4.1

There exists a polynomial time algorithm to find an optimal solution for .

The given proof is based on a technical assumption that all numbers in the input are rational.

Proof of Theorem 4.1: The main idea of this proof is to reduce any instance of to an instance of , for which we know there exists a polynomial time algorithm.

Recall that any rational number can be represented by a fraction such that both and are integers. Let be the set of denominators in this fractional representation of all numbers in the input. We define to be the product of all numbers in . Note that the number of digits needed to represent is polynomial in the input size since every number in appears in the input. To create an instance of , we multiply all target positions, , and , given in the instance of to .

To use the algorithm for , it suffices to prove in the scaled solutions, there exists an optimal solution that places patrols only in the integer locations. To do this, we prove that for any given patrol path in the scaled version, there exists a patrol path that covers the same set of targets and for any , is an integer position. It suffices to set to be . Note that since the position of all targets and the protecting ranges of the patrols in the scaled version are all integers, this patrol path protects exactly the same set of targets. Now we can use the algorithm of Theorem 3.1 to find the optimal solution of this scaled input and then scale it back to the original size by dividing the patrols’ locations by .

## References

• [1] AmirMahdi Ahmadinejad, Sina Dehghani, MohammadTaghi Hajiaghayi, Brendan Lucier, Hamid Mahini, and Saeed Seddighin. From duels to battefields: Computing equilibria of blotto and other games. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016.
• [2] Maria-Florina Balcan, Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Commitment without regrets: Online learning in stackelberg security games. In Proceedings of the Sixteenth ACM Conference on Economics and Computation, EC, pages 61–78, 2015.
• [3] Soheil Behnezhad, Sina Dehghani, Mahsa Derakhshan, MohammadTaghi HajiAghayi, and Saeed Seddighin. Faster and simpler algorithm for optimal strategies of blotto game. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017.
• [4] Avrim Blum, Nika Haghtalab, and Ariel D. Procaccia. Learning optimal commitment to overcome insecurity. In Annual Conference on Neural Information Processing Systems (NIPS), pages 1826–1834, 2014.
• [5] Branislav Bošanskỳ, Viliam Lisỳ, Michal Jakob, and Michal Pěchouček. Computing time-dependent policies for patrolling games with mobile targets. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 3, pages 989–996. International Foundation for Autonomous Agents and Multiagent Systems, 2011.
• [6] Matthew Brown, Arunesh Sinha, Aaron Schlenker, and Milind Tambe. One size does not fit all: A game-theoretic approach for dynamically and effectively screening for threats. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 425–431, 2016.
• [7] Vincent Conitzer and Tuomas Sandholm. Computing the optimal strategy to commit to. In Proceedings 7th ACM Conference on Electronic Commerce (EC-2006), pages 82–90, 2006.
• [8] Fei Fang, Albert Xin Jiang, and Milind Tambe. Optimal patrol strategy for protecting moving targets with multiple mobile resources. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, pages 957–964. International Foundation for Autonomous Agents and Multiagent Systems, 2013.
• [9] Fei Fang, Thanh Hong Nguyen, Rob Pickles, Wai Y. Lam, Gopalasamy R. Clements, Bo An, Amandeep Singh, Milind Tambe, and Andrew Lemieux. Deploying PAWS: field optimization of the protection assistant for wildlife security. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, pages 3966–3973, 2016.
• [10] Christopher Kiekintveld, Manish Jain, Jason Tsai, James Pita, Fernando Ordóñez, and Milind Tambe. Computing optimal randomized resource allocations for massive security games. In Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems-Volume 1, pages 689–696. International Foundation for Autonomous Agents and Multiagent Systems, 2009.
• [11] Dmytro Korzhyk, Vincent Conitzer, and Ronald Parr. Complexity of computing optimal stackelberg strategies in security resource allocation games. In AAAI, 2010.
• [12] Joshua Letchford and Vincent Conitzer. Solving security games on graphs via marginal probabilities. In Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013.
• [13] Joshua Letchford, Vincent Conitzer, and Kamesh Munagala. Learning and approximating the optimal strategy to commit to. In Algorithmic Game Theory, Second International Symposium, SAGT, pages 250–262, 2009.
• [14] Janusz Marecki, Gerald Tesauro, and Richard Segal. Playing repeated stackelberg games with unknown opponents. In International Conference on Autonomous Agents and Multiagent Systems, AAMAS, pages 821–828, 2012.
• [15] Giuseppe De Nittis and Francesco Trovò. Machine learning techniques for stackelberg security games: a survey. Technical report on arxiv.org, abs/1609.09341, 2016.
• [16] James Pita, Manish Jain, Janusz Marecki, Fernando Ordóñez, Christopher Portway, Milind Tambe, Craig Western, Praveen Paruchuri, and Sarit Kraus. Deployed armor protection: the application of a game theoretic model for security at the los angeles international airport. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems: industrial track, pages 125–132. International Foundation for Autonomous Agents and Multiagent Systems, 2008.
• [17] Tim Roughgarden. Computing equilibria: A computational complexity perspective. Economic Theory, 42(1):193–236, 2010.
• [18] Milind Tambe. Security and game theory: algorithms, deployed systems, lessons learned. Cambridge University Press, 2011.
• [19] Haifeng Xu. The mysteries of security games: Equilibrium computation becomes combinatorial algorithm design. In Proceedings of the 2016 ACM Conference on Economics and Computation, EC, pages 497–514, 2016.
• [20] Haifeng Xu, Fei Fang, Albert Xin Jiang, Vincent Conitzer, Shaddin Dughmi, and Milind Tambe. Solving zero-sum security games in discretized spatio-temporal domains. In AAAI, pages 1500–1506. Citeseer, 2014.
• [21] Zhengyu Yin, Albert Xin Jiang, Milind Tambe, Christopher Kiekintveld, Kevin Leyton-Brown, Tuomas Sandholm, and John P Sullivan. Trusts: Scheduling randomized patrols for fare inspection in transit systems using game theory. AI Magazine, 33(4):59, 2012.

## Appendix A Missing Proofs

### Lemma 3.4

Statement. Let and be two patrols in the same interval at any time . The set of targets that and protect at time are equal.

Proof. We suppose this is not the case and obtain a contradiction. Without losing generality assume protects a target at time that does not. Lines 14 and 15 of Algorithm 1 indicate there are two interval points and at and respectively. Assume is too small that there is no valid patrol position in the non-inclusive range between and . This means a patrol has a distance of at most