Anchoring Theory in Sequential Stackelberg Games

Anchoring Theory in Sequential Stackelberg Games

Abstract

An underlying assumption of Stackelberg Games (SGs) is perfect rationality of the players. However, in real-life situations (which are often modeled by SGs) the followers (terrorists, thieves, poachers or smugglers) – as humans in general – may act not in a perfectly rational way, as their decisions may be affected by biases of various kinds which bound rationality of their decisions. One of the popular models of bounded rationality (BR) is Anchoring Theory (AT) which claims that humans have a tendency to flatten probabilities of available options, i.e. they perceive a distribution of these probabilities as being closer to the uniform distribution than it really is. This paper proposes an efficient formulation of AT in sequential extensive-form SGs (named ATSG), suitable for Mixed-Integer Linear Program (MILP) solution methods. ATSG is implemented in three MILP/LP-based state-of-the-art methods for solving sequential SGs and two recently introduced non-MILP approaches: one relying on Monte Carlo sampling (O2UCT) and the other one (EASG) employing Evolutionary Algorithms. Experimental evaluation indicates that both non-MILP heuristic approaches scale better in time than MILP solutions while providing optimal or close-to-optimal solutions. Except for competitive time scalability, an additional asset of non-MILP methods is flexibility of potential BR formulations they are able to incorporate. While MILP approaches accept BR formulations with linear constraints only, no restrictions on the BR form are imposed in either of the two non-MILP methods.

1 Introduction

Stackelberg Games (SGs) [16, 30] are a game-theoretic model which attracted considerable interest in recent years, in particular in Security Games area [27]. In its simplest form Stackelberg Security Game (SSG) assumes two players: a leader who commits to a (mixed) strategy first, and a follower who makes their commitment already knowing decision of the leader. The above asymmetry of the players very well corresponds to interactions between law enforcement forces (leaders) and smugglers, terrorists or poachers (followers) modeled by SSGs [7, 8, 34, 25].

A fundamental assumption in SGs is that the follower will make an optimal, perfectly rational decision exploiting knowledge about the leader’s commitment. However, in real-life scenarios the follower may suffer from cognitive biases or bounded rationality leading them to suboptimal decisions [33, 21, 7].

On a general note, bounded rationality (BR) [26] in problem-solving refers to limitations of decision-makers that lead them to taking non-optimal actions. Except for limited cognitive abilities, BR can be attributed to partial knowledge about the problem, limited resources, or imprecisely defined goal [1, 24]. The most popular models of BR are Prospect Theory (PT) [9], Anchoring Theory (AT) [28], Quantal Response (QR) [18] and Framing Effect (FE) [29]. Each of these models has specific problem-related assumptions and each of them possesses certain experimental justification, though none of them could be regarded as a widely-agreeable leading BR formulation.

The concept of BR plays an important role in SSGs, as in their real-world applications the follower’s role is played by humans, e.g. terrorists, poachers or criminals, who usually suffer from BR limitations. One of the most popular BR approaches in SSG domain is COBRA [21, 22] which modifies DOBSS MILP (Mixed-Integer Linear Program) [20] to address AT with -optimality models. A similar approach was taken by Yang et al. [33] who proposed BR models relying on PT and QR, resp. and demonstrated their suitability in SSGs based on experiments involving humans. SHARP system [10] points certain game-related aspects (e.g. past performance, similarity between game conditions, etc.) which need to be taken into account by an algorithm for repeated SSGs when playing against human adversaries. MATCH approach [23] optimizes the leader’s strategy against a worst-case outcome within some error bound (i.e. assuming certain deviations from the follower’s optimal strategy). Another approach - BRQR [32] proposed by Yang et al. is based on the idea of QR, further improved in SU-BRQR system [19] which introduces a subjective utility function for the follower with parameters tuned in the experiments involving humans. Despite clear variability, all existing implementations of BR in SSG domain employ MILP for finding the game solution (Stackelberg Equilibrium) and all of them are limited to one-step (non-sequential) games.

In this paper, AT approach implemented in COBRA [21, 22] for single-step normal-form games is extended to the case of sequential extensive-form games in a way that avoids non-linear constraints, which makes it suitable for a wide range of MILP/LP approaches. Consequently, modifications to three state-of-the-art methods for solving extensive-form SSGs [4, 5, 2] which implement AT are proposed. Furthermore, two other non-MILP heuristic methods for solving SSG that rely on Monte Carlo sampling [14, 13] and Evolutionary Algorithm [35], respectively are also adequately modified to incorporate AT principles. All five methods are experimentally evaluated on a set of Warehouse Games [12, 17].

1.1 Definitions

Throughout the paper a notation from [4] will be used so as to easily refer to the method proposed in that paper. Sequential games will be represented as Extensive-Form Games (EFGs), i.e. tuples , where is a set of players, the leader and the follower respectively. is a set of game nodes which compose a game tree with a root node representing the initial game position. is a set of leaves representing terminal game states. is a family of sets , which define possible actions from each non-terminal node. is a function that defines an acting player in a given node. is a family of utility functions that assign a game outcome in a terminal node to the respective player. is a family of Information Sets (ISs); each defines states that are indistinguishable to the acting player. satisfies the following conditions:

  • partitions ,

  • – all nodes in a given IS have the same acting player,

  • – the set of possible actions is the same for all nodes from a given IS.

Additionally, will denote the set of actions available in and a family of ISs with acting player ().

Moreover, the games are assumed to satisfy the perfect recall property, i.e. throughout the game each player is fully aware of previous ISs visited by them and actions taken in that ISs.

In EFG a pure strategy of a player assigns to each IS in which the player is an acting one a particular action to be played in that IS. A mixed strategy of a player is a probability distribution over pure strategies of that player. / will denote a set of pure / mixed strategies of player , resp. Elements of and will be denoted by and with adequate indices, resp.

A behavior strategy is an assignment of a probability distribution of actions for each IS that a player can reach during the game. It can be viewed as a tree with nodes representing player’s ISs and edges representing actions (labeled with their probabilities). The notions of mixed strategy and behavior strategy will be used interchangeably as they are equivalent in games with perfect recall.

We will overload the notation of functions, so that would denote the -th player’s utility after the pure strategy profile was played. Similarly, will denote the expected utility value of the -th player in reference to the mixed strategy profile .

Each node in a game tree is uniquely defined by a pair of sequences: the leader’s actions and the follower’s actions which lead to that node. These sequences will be denoted by and , resp. We will say that a pair of sequences is compatible if it leads to a terminal node in a game tree. Utility values in terminal nodes pointed by a compatible pair of sequences will be denoted by . Following [4], for any pair we will define an auxiliary function which yields a value of if the sequences are compatible and otherwise.

Finally, will denote a sequence of moves of the -th player which led to node and the IS in which the last action from was played.

The goal of SG is to find Stackelberg Equilibrium (SE), i.e. a strategy profile that is a solution of the following set of equations:

(1)

Please observe that SE is not well defined when there is more than one best follower’s response. For this reason SE is often extended to the form of Strong Stackelberg Equilibirum (SSE) [3] in which, in addition to (1), in the case of a tie among follower’s best response strategies, the of them that maximizes the leader’s utility is selected (if there are more such strategies anyone of them is chosen). The SSE version of SE is considered in this paper.

1.2 Motivation

The paper combines the following two concepts, which are generally considered separately in the literature: (1) bounded rationality models in Security Games and (2) efficient solutions for sequential SGs. In both areas significant progress has been observed in recent years.

To our knowledge, the concept of BR has been addressed in Security Games only in the context of single-step games, for instance, in the recently emerged, fast-growing genre of Green Security Games [31, 7], in which game theoretical models exploit not rational behavior of attackers (e.g. poachers or illegal forest extractors) to maximize the effectiveness of protection activities.

At the same time, in reference to large-scale sequential SGs, several algorithms utilizing different techniques, e.g. sequence-form [2], correlated equilibria [4], game abstraction [5], Evolutionary Algorithm [11, 35] or Monte Carlo sampling [12, 13] which visibly extended the range of tractable SGs, have been proposed recently.

We believe that successful studies on the crossroads of these two research directions will allow for tackling large-scale problems that better (more realistically) model security situations involving humans.

Among several BR models introduced in the literature, we have chosen the AT approach since it was already successfully applied to single-step SGs [21] and, furthermore, is intuitively justified in the cases when only limited observation of the leader’s strategy is possible.

Generally speaking, AT [28] assumes the existence of a bias of a person who observes some events (for instance, surveils the opponent’s strategy in SSG) towards the uniform distribution. Formally, for any probability distribution over a finite set , let us denote the probability of as . The observer believes that this probability is equal to , where is a parameter of AT bias and is cardinality of .

In SGs the leader, being aware of the follower’s AT bias, can exploit this knowledge in their mixed strategy formulation.

1.3 Contribution

The main contribution of this paper can be summarized as follows:

  • Introduction of efficient MILP-suitable extension of Anchoring Theory to the genre of sequential (multi-step) Stackelberg Games (ATSG);

  • Implementation of ATSG in three MILP/LP-based state-of-the-art methods for solving sequential Stackelberg Games (two exact and one approximate) and in two approximate non-MILP approaches, relying on Monte Carlo sampling and Evolutionary Algorithm, respectively;

  • Experimental evaluation of five above-mentioned methods in BR settings with respect to the quality of payoffs and time efficiency.

2 Anchoring Theory in Sequential Games

As we mentioned above, implementations of AT in SGs presented in the literature are limited to single-step games only. There are two straightforward ways to generalize AT to sequential games.

The first one is to transform an extensive-form game to its normal-form where each player’s actions are equivalent to pure strategies. Such an approach, however, would introduce a global distortion of probabilities and is, therefore, inaccurate when the opponent’s behavior is considered separately in each decision point.

The other possibility is to apply AT distortion locally - i.e. to a probability distribution in each IS that forms player’s behavior strategy. Such an approach seems to be more intuitive, especially considering the fact that a behavior strategy is usually a natural way of perceiving a mixed strategy by humans. Unfortunately, due to non-linear constraints, such a generalization poses problems for sequence-form MILP methods.

The following subsection presents a solution that yields similar distortion to the one described above while avoids non-linear constraints.

2.1 Sequence-Form based MILPs

A state-of-the-art approach to calculate SSE in sequential games [4] – referred to as C2016 in this paper – is an iterative method which alternates two phases: solving MILP/LP for finding Stackelberg Extensive-Form Correlated Equilibrium (SEFCE) in sequence-form game representation, and SEFCE refinement with a dedicated procedure relying on LP modification towards SSE. Since the equilibrium refinement part is not affected by the implementation of AT it will not be discussed here. The SEFCE part of the method, defined by a set of equations (2)-(8), is built around variables that define probabilities of playing particular sequences by the players [4]. The following LP definition employs the notion of relevant sequence pairs () – formally introduced in Definition 3 of [4].

(2)
s.t. (3)
(4)
(5)
(6)
(7)
(8)

The main variables in the above LP are which describe the correlation plan and represent probabilities that correlation device will give suggestion of playing the respective sequences of moves by the players. Implicitly, they define the resulting players’ strategies. Objective (2) maximizes leader’s utility. Constraints (3) – (5) ensure that the correlation plan is correct, i.e. probability of playing a given sequence is a sum of probabilities of playing sequences that are built from it by adding one action. are auxiliary variables which guarantee that suggested is the best follower’s response. The crucial constraints (from AT perspective) are (6) and (7) which assure that the selected follower’s strategy would yield an outcome no worse than that of any other strategy. Implementation of AT requires changing the perception of variables so as to include anchoring bias - the details are presented in the following section.

Solving the above LP is iteratively alternated with a refinement procedure mentioned above. No modifications to this procedure, compared to its original formulation [4], are required.

2.2 Anchoring Theory modification

ATSG is implemented as a distorted follower’s perception of the leader’s behavior strategy. Let’s denote by a probability of choosing action by the leader in a given IS, stemming from its behavior strategy. The most straightforward implementation of AT (though non-linear in sequence-form games) is to change the probability of taking this action to , where is the number of actions available in this IS. However, in sequence-form games, for a given leader’s sequence of actions a probability of playing it, based on behavior strategy, would be and the distorted AT probability would become

(9)

where is the number of actions available in IS in which is played.

Please observe that variables in LP formulation (2)-(8) are products of values presented above (9), and as such cannot be expressed in a linear form with respect to . Consequently, applying the above AT modification to MILP (2)–(8) would end-up with non-linear constraints, inadequate for MILP formulation.

Consequently, we propose to simplify the above ATSG by dropping distortion coefficients from all but the last one probabilities:

(10)

where is a function which outputs a sequence without the last move. A simplified version of ATSG (eq. (10)) is well suited to MILP/LP formulations of sequence-form games.

Please note that relations among probabilities of the leader’s actions within a single IS are the same according to both eqs. (9) and (10), i.e. , where denote probability of sequence in a given IS calculated according to (9) and (10), resp. Furthermore, for a given sequence , for small values of a difference is also small.

Please also note that the resulting values do not represent proper probability distribution since they do not sum up to one. Their normalization is not needed though, as they are used only to make comparisons between distorted utilities of various follower’s strategies. Results of such comparisons are independent of normalization.

2.3 Modification of MILP/LP based methods

ATSG formulation (10) was incorporated into three state-of-the-art methods for sequential SGs.

SEFCE method

In the first method [4], briefly summarized in section 2.1, ATSG is implemented through modification of constraints (6) and (7), which are replaced by constraints (11) and (12) presented below:

(11)
(12)

The above formulation is a result of application of eq. (10) to LP constraints.

Please observe that LP in C2016 does not contain variables describing probabilities of playing alone , but refers to a correlation plan which provides suggestions on the playing pairs . Moreover, equals only if the correlation plan suggests the follower to play a pure strategy (i.e. marginal probability (*)). In the above ATSG version of C2016, defined by equations (11) -(12), conditions (*) may not initially hold for all , but must be all fulfilled at completion of C2016, since they constitute a stopping condition of this method.

Game abstraction method

In 2018 a new approach to extensive-form SSGs that folds game subtrees into nodes called gadgets and then incrementally unfolds them to refine the solution [5] was proposed. The method (henceforth referred to as CBK2018) internally employs C2016 to solve the abstracted (smaller) games. CBK2018 was formulated by its authors in two variants: as an exact method and as a heuristic time-optimized approach, with experimental evaluation provided only for the latter variant [5]. Consequently, we also focus on heuristic formulation of CBK2018 and following recommendation of the authors of [5] set the internal method’s parameters to which assures fast convergence, albeit at the cost of some deviation from the optimal results. In ATSG modification of CBK2018 original C2016 formulation is replaced with its ATSG version (11)-(12).

Sequence-form method

The third MILP-based method for finding SE in sequential games considered in this paper is an approach proposed in [2] (henceforth referred to as BC2015). BC2015 directly utilizes sequence-form game representation and, unlike C2016, is not an iterative method, i.e. relies on solving a single MILP instance to obtain the game solution. Generally, its performance is expected to be worse than C2016 due to the substantial number of integer variables in MILP (one variable per each possible follower’s sequence of moves). Below, a modified MILP which incorporates eq. (10) into BC2015 formulation is presented:

(13)
(14)
(15)
(16)
(17)
(18)
(19)

3 Heuristic Approximations of ATSG

The above three ATSG modifications of MILP/LP methods are compared with two heuristic non-MILP approaches to solving sequential extensive-form SSG with adequate ATSG adjustments.

3.1 A summary of O2uct method

The first approach (referred to as O2UCT - double-oracle UCT sampling) [14, 13] relies on a guided sampling of the follower’s strategy space interleaved with finding a feasible leader’s strategy using double-oracle method.

In the first step, a follower’s strategy () is obtained using Upper Confidence bound applied to Trees (UCT) algorithm [15] - a variant of guided Monte Carlo sampling. Then, for the sampled follower’s strategy, a process of building the leader’s strategy () is performed. must satisfy the following conditions: (1) is the best response strategy against ; (2) provides as high as possible leader’s utility when played against the best follower’s response. An algorithm of finding the requested leader’s strategy is outlined below and detailed in [13].

In the first step the best follower’s response () is calculated () against . Then the algorithm checks if the . If so, then the procedure for adjusting to obtain better utility against (compared to ) is applied (). Otherwise, when , an adjustment to is made so as to increase leader’s utility against .

The two above-mentioned phases: sampling of the follower’s strategy (against the current leader’s strategy ) and adjustment of are iteratively alternated in O2UCT.

ATSG implementation

ATSG implementation in O2UCT required two changes. In the follower’s best response oracle (), which works by exhaustive search of possible pure strategies in O2UCT, the procedure that calculates follower’s utility was modified so as to use distorted probabilities (10) when calculating the expected value. Similarly, in the procedure that calculates a difference between follower’s utilities for two strategies (), the way the expected utility is calculated was adapted so as to use a distorted strategy (perceived by the follower in ATSG).

Please observe that in the case of O2UCT, contrary to MILP/LP ATSG implementations, the potential existence of non-linearities in the formulas defining distorted follower’s probabilities is not harmful, and - in principle - any other BR modification could be used instead of eq. (10). For comparability reasons, we will use a linear form (10) in the experiments.

3.2 A summary of Easg method

The other heuristic method applicable to sequential SGs considered in this paper [35] utilizes EA to find the leader’s mixed strategy and, to our knowledge, is the first generic evolutionary approach proposed in this domain. We are aware of only one other application of EAs to solving sequential SGs [11] which, however, is specifically designed to games on a plane with moving targets.

- population
randomly selected leader’s pure strategies
while (generations limit is not reached) do
        chromosomes with the highest fitness function values
        /* random population subset for crossover */
        /* crossover merges pairs of chromosomes */
       
        /* random population subset for mutation */
        /* mutation changes actions in randomly selected element of a chromosome */
       
        /* calculate fitness function value - the leader’s payoff against the optimal follower’s response to a strategy encoded in a chromosome */
        /* choose strategies for the next generation based on fitness evaluation */
       
end while
return best leader’s strategy
Algorithm 1 A pseudocode of EASG.

EASG follows a standard evolutionary algorithm scheme and is presented in Algorithm 1. A population of individuals evolves for a fixed number of generations. In each generation crossover and mutation operators are applied with certain probabilities, and then a population for the next generation is created by selection procedure, based on fitness function value computed for each individual.

Population. Each chromosome represents some leader’s mixed strategy in the form of a vector of pure strategies with their probabilities :

where is the length of . Strategy is a list of leader’s actions in consecutive rounds. Each chromosome in the initial population includes one randomly selected pure strategy with probability equal to .

Crossover. A crossover operator combines two randomly chosen chromosomes by aggregating all pure strategies they contain and halving their probabilities (if a given strategy belongs to both chromosomes, the resulting probabilities are summed up). Crossover is applied to a chosen pair of chromosomes with a certain probability.

Mutation. In mutation operation a pair (pure strategy, round number) is uniformly selected in a chromosome. Then, starting from the selected round until the last one, a leader’s action is uniformly chosen in each round (among all actions available in this round) and added to a chromosome in place of the existing action. Mutation affects each individual with a certain probability.

The role of mutation operation is to boost exploration of the leader’s strategy space while crossover combines existing solutions and has more exploitation nature.

Selection. Chromosomes are selected to the next generation through a binary tournament with a certain selection pressure , i.e. among two randomly chosen chromosomes the higher-fitted one is promoted with probability and the lower-fitted one with . individuals (called elite) from the current population with the highest fitness values are directly promoted to the next generation population (unconditionally).

Evaluation. The fitness function is defined as the leader’s utility obtained when playing a strategy encoded by a chromosome. This utility is calculated by computing game payoffs against all possible follower’s strategies and choosing the one that yields the highest value for the follower, while breaking ties in favor of the leader (SSE condition).

ATSG implementation

Similarly to O2UCT, an important advantage of EASG formulation is its flexibility, understood as the ease of adaptation to various SG formulations. In the context of BR various types of perturbations to the optimal follower’s response can be implemented in EASG by adjusting the chromosome evaluation procedure.

Incorporation of ATSG into EASG relies on considering a distorted version of the leader’s mixed strategy when calculating the best follower’s response. This distorted leader’s strategy is obtained in the three following steps.

  1. First, in order to directly apply eq. (10), a strategy encoded by a chromosome is transformed to the form of a tree.

Formally, let’s denote by a sum of probabilities of all pure strategies in the chromosome with prefix . A probability of an edge in a game tree between nodes corresponding to and is computed as .

  1. Then, all probabilities in this tree are modified according to eq. (10).

  2. Finally, the tree (with changed probabilities) is transformed back to a list of pure strategies with assigned probabilities through a reversed procedure.

Technically, each pure strategy in the resultant chromosome is created based on a unique path in the tree, from the root to a leaf node, and its probability is equal to the product of probabilities of all edges on that path. This way, the best follower’s response strategy is obtained.

Next, this follower’s response is used to calculate players’ utilities, but this time using original, unmodified strategy from a chromosome (without distortion of probabilities).

In other words, a distorted strategy is used only in calculation of the follower’s utility. The leader’s strategy is assumed to be perfectly rational and therefore their utility (chromosome fitness value) is calculated with no distortion.

Similarly to O2UCT, instead of eq. (10), the non-linear form of ATSG described in section 2.2 could be used as well. For the sake of comparison with MILP/LP methods a simplified linear ATSG formulation is considered.

4 Experimental evaluation

In what follows the ATSG versions of all five considered methods will be referred to with prefix AT-, i.e. AT-C2016, AT-CBK2018, AT-BC2015, AT-O2UCT and AT-EASG, resp.

4.1 Benchmark games

\buildinggraph

smallbuilding-102

(a) An example of a warehouse layout: narrow black path denotes the main corridor, squares are storage spaces. Room numbers correspond to vertex labels in the resulting game graph presented in the right figure.
(b) A corresponding game graph. Rectangular vertices are targets, a triangle vertex is attacker’s starting point, a blue shaded circle vertex is defender’s starting point.
Figure 1: An example Warehouse Game. Values in the right figure denote payoffs for the attacker and the defender, resp. in the case of an interception of the attacker in a given vertex. Additional utilities, in case of a successful attack, are assigned in targets (the second column). All games are defined on a grid.

Experimental evaluation was performed on a set of benchmark Warehouse Games introduced in [12] (all game instances were downloaded from the website [17]). Each game resembles a situation of patrolling a warehouse building. The game area is modeled in the form of a graph with some vertices containing valuable resources (referred to as targets). There are two players in a game: a defender and an attacker. Each player possesses a single unit, located in one of the vertices (warehouse spaces). In each round, each unit can either stay in a currently occupied vertex or move to an adjacent one (change the room). If the units meet in a common vertex an interception occurs and the defender receives a reward (positive utility value) while the attacker receives a penalty (negative utility). If the attacker reaches any of the target vertices (rooms) without being intercepted by the defender, he/she is rewarded and the defender is penalized. In either of the above cases the game ends. Otherwise the game is played a fixed number of rounds . Once the round limit is reached, both players are assigned a neutral utility of .

The benchmark set consists of games generated on grid, with general-sum utilities. An example game layout created by a warehouse generator is presented in Figure 0(a) (this is an auxiliary game representation). The corresponding game graph (the actual game representation) is depicted in Figure 0(b). A detailed description of the game generator settings is presented in [12]. In this paper games with are considered, albeit for exact methods were unable to compute solutions within allotted time and memory.

4.2 Experimental setup

For each game instance (game layout and game length) AT-O2UCT and AT-EASG were run times and for each other (deterministic) MILP method a single trial was performed.

Tests were run on Intel Xeon Silver 4116 @ 2.10GHz with 256GB RAM. Experiments with AT-O2UCT and AT-EASG were run in parallel, each with 8GB RAM assigned. Tests with AT-C2016, AT-CBK2018, AT-BC2015 were run sequentially with all 256GB RAM available in each trial. All tests were limited to hours (per single test) and were forcibly terminated if not completed within the allotted time.

Performance of both heuristic methods was analyzed in two dimensions: quality of results (an expected leader’s payoff) and time efficiency. Results for all games were merged based on the number of game nodes of an extensive-form game. This grouping followed formula (20) presented below:

(20)

where rounds a number to the nearest integer. Consequently, test games were grouped by the orders of magnitude of the respective numbers of game nodes. Such a grouping combines two aspects of game complexity: the underlying game graph structure and the game length. In the rest of the paper will denote the -th bucket of games, i.e. the one which contains all games for which . In order to streamline the notation, we will denote by the union of buckets and by the union of buckets .

The AT-BC2015 method (likewise BC2015 [2]) is parameterless. In AT-C2016 the SI-LP variant of C2016 [4] is considered. For AT-CBK2018, the fast converging variant of CBK2018 [5] (with and ) is implemented.

AT-O2UCT is parameterized by the following stopping conditions (cf. Fig. in [13]): Either a maximum number of executions of the positive pass (step in the figure) exceeds , or an improvement of the leader’s payoff in subsequent iterations is less than , or a number of subsequent executions of the feasibility pass (step in the figure) without going to positive pass (step ) exceeds (infeasible strategy).

In AT-EASG the following values for the steering parameters are selected: population size - , mutation probability - , crossover probability - , selection pressure , the number of elitist chromosomes - . The algorithm is run either for generations or until no improvement of the leader’s strategy is observed in subsequent generations (whichever occurs first).

Parameters for the last two methods were selected based on a limited number of preliminary simulations.

4.3 Payoffs

The average expected utilities of the leader obtained by each method are presented in Fig. 2. Since both AT-C2016 and AT-BC2015 are exact methods their results are clearly the highest and the respective plots overlap. Both non-MILP heuristic methods perform slightly worse, although AT-EASG is a close runner-up for games from , in which range it outperforms AT-O2UCT.

For the largest games, from , the best-performing method is AT-O2UCT, which excels AT-EASG (the only remaining competitor) by a clear margin. None of the two exact MILP methods were capable of solving games of this size (belonging to ) and the approximate MILP approach (AT-CBK2018) solved game instances and failed in solving the remaining . Consequently, for the sake of fair comparison, the results of AT-CBK2018 are not presented for the largest games.

Generally speaking AT-CBK2018 yields the weakest outcomes across the entire range of game sizes and its performance deteriorates along with increasing game complexity.

Figure 2: The average expected leader’s utility.

4.4 Time scalability

Fig. 3 presents time scalability of the methods. While all of them scale exponentially, the running times of both non-MILP approaches grow at slower paces. For games from (AT-EASG) and from (AT-O2UCT), resp. they already excel both exact MILP methods.

On the other hand, it shouldn’t be forgotten that the main asset of AT-C2016 and AT-BC2015 is convergence to the optimal solution and therefore a comparison of their running times with heuristic approaches needs to be considered with care.

Nevertheless, it seems reasonable to conclude that beyond certain level of game complexity the exact methods become infeasible and, in such scenarios, both heuristic approaches present a viable alternative.

The third MILP method is a state-of-the-art algorithm for approximate solving extensive-form games. Following a recommendation from [5] AT-CBK2018 was parameterized in a way which assures fast convergence () so as to make a fair time comparison with the remaining two heuristic methods. It can be concluded from Fig. 3 that for the set of most complex games AT-EASG and AT-O2UCT are faster than AT-CBK2018, at the same time providing much better leader’s payoffs (cf. Fig. 2).

Please note that since AT-CBK2018 solved only games from , the solution times for the remaining game instances were capped at the limit of h, which was in favor of AT-CBK2018 (in comparison with AT-O2UCT and AT-EASG) since both heuristic methods solved all largest games within the allotted time (hence, in their case the actual times are reported).

Figure 3: The average time requirements.

4.5 Stability of At-O2uct and At-Easg

Figure 4: The average standard deviation of the expected leader’s utility for AT-O2UCT and AT-EASG methods.
Figure 5: The average and maximum of the expected leader’s utility for AT-O2UCT and AT-EASG methods.

While both non-MILP methods proved efficient in both time scalability and returned payoffs, one of the interesting observations from Fig. 2 is deterioration of the average payoffs obtained by AT-EASG for the largest games (from ). Apparently, along with further increasing of games’ complexity the variance of AT-EASG results also increases. In particular, for two game instances from the considered benchmark set [17] (smallbuilding-89-6 and smallbuilding-89-7) standard deviations of results equaled nearly which had a visible impact on the average payoffs. Figure 4 compares the average standard deviations (stddev.) for AT-EASG and AT-O2UCT. Clearly, for games from AT-EASG stddev. increases, while stddev. of AT-O2UCT remains approximately on the same level. In terms of stability, AT-O2UCT appears to have a clear advantage over AT-EASG.

On the other hand, despite high variance, AT-EASG is still able to obtain very good solutions, with the best ones practically equal to those of AT-O2UCT. Figure 5 shows both the average and maximum payoffs of both methods. Dotted curves (which represent the maximum payoff averaged across all benchmark games from a given bucket) are very close to each other and the differences between the best solutions found by AT-O2UCT and AT-EASG are below for each bucket.

5 Conclusions

This work considers the SG formulation in which the follower is not perfectly rational. Such a setting is motivated by the real SG scenarios in which humans, when performing the role of the follower, are prone to certain inefficiencies in perception and assessment of the leader’s strategy. A particular implementation of the follower’s bounded rationality considered in this paper refers to Anchoring Theory [28]. AT assumes the existence of a certain distortion (towards the uniform distribution of probabilities of possible actions) of the follower’s perception of the leader’s mixed strategy. The leader being aware of that distortion can exploit this weakness in their strategy formulation.

The paper proposes an efficient MILP-suitable formulation of AT in the context of sequential extensive-form SGs. This formulation (ATSG) is implemented in three state-of-the-art MILP methods – two exact ones: BC2015 [2] and C2016 [4] and one approximate: CBK2018 [5], as well as in two heuristic non-MILP approaches: O2UCT [14, 13] and EASG [35].

Experimental results on a set of games show that non-MILP methods provide optimal or close-to-optimal leader’s payoffs while being visibly faster than exact MILP approaches. At the same time, they clearly outperform time-optimized approximate MILP method in both payoffs quality and time efficiency.

An additional asset of non-MILP solutions in the context of BR is their flexibility, stemming from virtually no restrictions imposed on the form in which BR is represented. Unlike MILP methods which require a linear form of BR related constraints, non-MILP solutions allow the implementation of more complex, non-linear BR formulations.

Our current focus is on verification of the efficacy of proposed AT formulation in experiments that involve human players in the role of the follower. To this end for a selection of Warehouse Games the following leader’s strategies will be precomputed: (a) SSE strategy (rational, not distorted), (b) ATSG SSE strategy and (c) a strategy stemming from eq. (9) approximated by AT-EASG or AT-O2UCT, respectively. In each game, a human participant will play the follower’s role against (randomly assigned) one of the above strategies. A comparison of the average leader’s payoffs in case (a) vs (b) and (c) will indicate whether potential distortions of SSE can, in fact, provide any advantage for the leader, stemming from the fact that the role of the follower is indeed played by a human.

Acknowledgments

The work was supported by the National Science Centre grant number 2017/25/B/ST6/02061.

References

  1. R. J. Aumann (1997) Rationality and bounded rationality. Games and Economic behavior 21 (1-2), pp. 2–14. Cited by: §1.
  2. B. Bosanský and J. Cermak (2015) Sequence-form algorithm for computing stackelberg equilibria in extensive-form games. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, January 25-30, 2015, Austin, Texas, USA., B. Bonet and S. Koenig (Eds.), pp. 805–811. External Links: Link, ISBN 978-1-57735-698-1 Cited by: §1.2, §1, §2.3.3, §4.2, §5.
  3. M. Breton, A. Alj and A. Haurie (1988-10) Sequential stackelberg equilibria in two-person games. Journal of Optimization Theory and Applications 59 (1), pp. 71–97. External Links: ISSN 1573-2878, Document, Link Cited by: §1.1.
  4. J. Cermak, B. Bosansky, K. Durkota, V. Lisy and C. Kiekintveld (2016) Using correlated strategies for computing stackelberg equilibria in extensive-form games. In 30th AAAI, pp. 439–445. Cited by: §1.1, §1.1, §1.2, §1, §2.1, §2.1, §2.3.1, §4.2, §5.
  5. J. Cerny, B. Bosansky and C. Kiekintveld (2018) Incremental strategy generation for stackelberg equilibria in extensive-form games. In Proceedings of the 2018 ACM Conference on Economics and Computation, Ithaca, NY, USA, June 18-22, 2018, É. Tardos, E. Elkind and R. Vohra (Eds.), pp. 151–168. External Links: Document, Link Cited by: §1.2, §1, §2.3.2, §4.2, §4.4, §5.
  6. E. Elkind, M. Veloso, N. Agmon and M. E. Taylor (Eds.) (2019) Proceedings of the 18th international conference on autonomous agents and multiagent systems, AAMAS ’19, montreal, qc, canada, may 13-17, 2019. International Foundation for Autonomous Agents and Multiagent Systems. External Links: Link, ISBN 978-1-4503-6309-9 Cited by: 14.
  7. F. Fang, P. Stone and M. Tambe (2015) When Security Games go green: designing defender strategies to prevent poaching and illegal fishing. In IJCAI, pp. 2589–2595. Cited by: §1.2, §1, §1.
  8. M. Jain, J. Tsai, J. Pita, C. Kiekintveld, S. Rathi, M. Tambe and F. Ordóñez (2010) Software assistants for randomized patrol planning for the lax airport police and the federal air marshal service. Interfaces 40 (4), pp. 267–290. Cited by: §1.
  9. D. Kahneman and A. Tversky (2013) Prospect theory: An analysis of decision under risk. In Handbook of the fundamentals of financial decision making: Part I, pp. 99–127. Cited by: §1.
  10. D. Kar, F. Fang, F. D. Fave, N. Sintov and M. Tambe (2015) A Game of Thrones : When Human Behavior Models Compete in Repeated Stackelberg Security Games. Proceedings of the 2015 International Conference on Autonomous Agents and Multiagent Systems, pp. 1381–1390. Cited by: §1.
  11. J. Karwowski, J. Mańdziuk, A. Żychowski, F. Grajek and B. An (2019) A Memetic Approach for Sequential Security Games on a Plane with Moving Targets. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), pp. 970–977. Cited by: §1.2, §3.2.
  12. J. Karwowski and J. Mańdziuk (2019) A Monte Carlo Tree Search approach to finding efficient patrolling schemes on graphs. European Journal of Operational Research 277 (1), pp. 255 – 268. External Links: ISSN 0377-2217, Document, Link Cited by: §1.2, §1, §4.1, §4.1.
  13. J. Karwowski and J. Mańdziuk (2019) Double-oracle sampling method for stackelberg equilibrium approximation in general-sum extensive-form games. External Links: 1909.03934 (accepted for AAAI’20) Cited by: §1.2, §1, §3.1, §3.1, §4.2, §5.
  14. J. Karwowski and J. Mańdziuk (2019) Stackelberg equilibrium approximation in general-sum extensive-form games with double-oracle sampling method. See Proceedings of the 18th international conference on autonomous agents and multiagent systems, AAMAS ’19, montreal, qc, canada, may 13-17, 2019, Elkind et al., pp. 2045–2047. External Links: Link Cited by: §1, §3.1, §5.
  15. L. Kocsis and C. Szepesvári (2006) Bandit based Monte-Carlo planning. In Machine Learning: ECML 2006, pp. 282–293. Cited by: §3.1.
  16. G. Leitmann (1978-12) On generalized Stackelberg strategies. Journal of Optimization Theory and Applications 26 (4), pp. 637–643. External Links: ISSN 1573-2878, Document, Link Cited by: §1.
  17. J. Mańdziuk, J. Karwowski and A. Żychowski (2019)(Website) External Links: Link Cited by: §1, §4.1, §4.5.
  18. R. D. McKelvey and T. R. Palfrey (1995) Quantal response equilibria for normal form games. Games and economic behavior 10 (1), pp. 6–38. Cited by: §1.
  19. T. H. Nguyen, R. Yang, A. Azaria, S. Kraus and M. Tambe (2013) Analyzing the effectiveness of adversary modeling in security games. In Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 718–724. Cited by: §1.
  20. P. Paruchuri, J. P. Pearce, J. Marecki, M. Tambe, F. Ordonez and S. Kraus (2008) Playing games for security: an efficient exact algorithm for solving bayesian stackelberg games. In Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems-Vol. 2, pp. 895–902. Cited by: §1.
  21. J. Pita, M. Jain, F. Ordóñez, M. Tambe, S. Kraus, R. Magori-Cohen and M. Tambe (2011) Effective solutions for real-world stackelberg games: when agents must deal with human uncertainties. Security and Game Theory, pp. 193–212. External Links: Document, Link, ISBN 9780511973031 Cited by: §1.2, §1, §1, §1.
  22. J. Pita, M. Jain, M. Tambe, F. Ordóñez and S. Kraus (2010) Robust solutions to Stackelberg games: Addressing bounded rationality and limited observations in human cognition. Artificial Intelligence 174 (174), pp. 1142–1171. External Links: Document Cited by: §1, §1.
  23. J. Pita, R. John, R. Maheswaran, M. Tambe, R. Yang and S. Kraus (2012) A robust approach to addressing human adversaries in security games. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Volume 3, pp. 1297–1298. Cited by: §1.
  24. A. Rubinstein (1998) Modeling bounded rationality. MIT press. Cited by: §1.
  25. E. Shieh, B. An, R. Yang, M. Tambe, C. Baldwin, J. DiRenzo, B. Maule and G. Meyer (2012) PROTECT: a deployed game theoretic system to protect the ports of the United States. In Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems-Vol. 1, pp. 13–20. Cited by: §1.
  26. H. A. Simon (1957) Models of man: social and rational. Wiley. Cited by: §1.
  27. A. Sinha, F. Fang, B. An, C. Kiekintveld and M. Tambe (2018-07) Stackelberg security games: looking beyond a decade of success. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, pp. 5494–5501. External Links: Document, Link Cited by: §1.
  28. A. Tversky and D. Kahneman (1974) Judgment under uncertainty: Heuristics and biases. science 185 (4157), pp. 1124–1131. Cited by: §1.2, §1, §5.
  29. A. Tversky and D. Kahneman (1981) The framing of decisions and the psychology of choice. Science 211 (4481), pp. 453–458. Cited by: §1.
  30. H. von Stackelberg (1934) Marktform und gleichgewicht. Springer, Vienna. Cited by: §1.
  31. H. Xu, B. Ford, F. Fang, B. Dilkina, A. Plumptre, M. Tambe, M. Driciru, F. Wanyama, A. Rwetsiba, M. Nsubaga and J. Mabonga (2017) Optimal patrol planning for green security games with black-box attackers. Decision and Game Theory for Security, pp. 458–477. External Links: ISSN 1611-3349, Document, Link, ISBN 9783319687117 Cited by: §1.2.
  32. R. Yang, C. Kiekintveld, F. Ordonez, M. Tambe and R. John (2011) Improving resource allocation strategy against human adversaries in security games. In Twenty-Second International Joint Conference on Artificial Intelligence, pp. 458–464. Cited by: §1.
  33. R. Yang, C. Kiekintveld, F. Ordóñez, M. Tambe and R. John (2013) Improving resource allocation strategies against human adversaries in security games: An extended study. Artificial Intelligence 195 (195), pp. 440–469. External Links: Document Cited by: §1, §1.
  34. Z. Yin, A. X. Jiang, M. Tambe, C. Kiekintveld, K. Leyton-Brown, T. Sandholm and J. P. Sullivan (2012-12) TRUSTS: scheduling randomized patrols for fare inspection in transit systems using game theory. AI Magazine 33 (4), pp. 59. External Links: ISSN 0738-4602, Document, Link Cited by: §1.
  35. A. Żychowski and J. Mańdziuk (2019) A generic metaheuristic approach to sequential security games. External Links: 1911.05706 Cited by: §1.2, §1, §3.2, §5.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
409287
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description