A Appendix: Hardness Results

# Computation of Stackelberg Equilibria of Finite Sequential Games

## Abstract

The Stackelberg equilibrium is a solution concept that describes optimal strategies to commit to: Player 1 (the leader) first commits to a strategy that is publicly announced, then Player 2 (the follower) plays a best response to the leader’s choice. We study the problem of computing Stackelberg equilibria in finite sequential (i.e., extensive-form) games and provide new exact algorithms, approximate algorithms, and hardness results for finding equilibria for several classes of such two-player games.

## 1 Introduction

The Stackelberg competition is a game theoretic model introduced by von Stackelberg [25] for studying market structures. The original formulation of a Stackelberg duopoly captures the scenario of two firms that compete by selling homogeneous products. One firm—the leader—first decides the quantity to sell and announces it publicly, while the second firm—the follower—decides its own production only after observing the announcement of the first firm. The leader firm must have commitment power (e.g., is the monopoly in an industry) and cannot undo its publicly announced strategy, while the follower firm (e.g., a new competitor) plays a best response to the leader’s chosen strategy.

The Stackelberg competition has been an important model in economics ever since (see, e.g., [22, 15, 1, 24, 19, 11]), while the solution concept of a Stackelberg equilibrium has been studied in a rich body of literature in computer science, with a number of important real-world applications developed in the last decade [23]. The Stackelberg equilibrium concept can be applied to any game with two players (e.g., in normal or extensive form) and stipulates that the leader first commits to a strategy, while the follower observes the leader’s choice and best responds to it. The leader must have commitment power; in the context of firms, the act of moving first in an industry, such as by opening a shop, requires financial investment and is evidently a form of commitment. In other scenarios, the leader’s commitment refers to ways of responding to future events, should certain situations be reached, and in such cases the leader must have a way of enforcing credible threats. The leader can always commit to a Nash equilibrium strategy, however it can often obtain a better payoff by choosing some other strategy profile. While there exist generalizations of Stackelberg equilibrium to multiple players, the real-world implementations to date have been derived from understanding the two player model, and for this reason we will focus on the two-player setting.

One of the notable applications using the conceptual framework of Stackelberg equilibrium has been the development of algorithms for protecting airports and ports in the United States (deployed so far in Boston, Los Angeles, New York). More recent ongoing work (see, e.g., [20]), explores additional problems such as protecting wildlife, forest, and fisheries. The general task of defending valuable resources against attacks can be cast in the Stackelberg equilibrium model as follows. The role of the leader is taken by the defender (e.g., police forces), who commits to a strategy, such as the allocation of staff members to a patrolling schedule of locations to check. The role of the follower is played by a potential attacker, who monitors the empirical distribution (or even the entire schedule) of the strategy chosen by the defender, and then best responds, by devising an optimal attack given this knowledge. The crucial question is how to minimize the damage from potential threats, by computing an optimal schedule for the defender. Solving this problem in practice involves several nontrivial steps, such as estimating the payoffs of the participants for the resources involved (e.g., the attacker’s reward for destroying a section of an airport) and computing the optimal strategy that the defender should commit to.

In this paper, we are interested in the following fundamental question:

Given the description of a game in extensive form, compute the optimal strategy that the leader should commit to.

We study this problem for multiple classes of two-player extensive-form games (EFGs) and variants of the Stackelberg solution concept that differ in kinds of strategies to commit to, and provide both efficient algorithms and computational hardness results. We emphasize the positive results in the main text of the submission and fully state technical hardness results in the appendix.

### 1.1 Our Results

The problem of computing a Stackelberg equilibrium in EFGs can be classified by the following parameters:

• Information. Information captures how much a player knows about the opponent’s moves (past and present). We study turn-based games (TB), where for each state there is a unique player that can perform an action, and concurrent-move games (CM), where the players act simultaneously in at least one state.

• Chance. A game with chance nodes allows stochastic transitions between states; otherwise, the transitions are deterministic (made through actions of the players).

• Graph. We focus on trees and directed acyclic graphs (DAGs) as the main representations. Given such a graph, each node represents a different state in the game, while the edges represent the transitions between states.

• Strategies. We study several major types of strategies that the leader can commit to, namely pure (P), behavioral (B), and correlated behavioral (C).

The results are summarized in Table 1 and can be divided in three categories7.

First, we design a more efficient algorithm for computing optimal strategies for turn-based games on DAGs. Compared to the previous state of the art (due to Letchford and Conitzer [17][16]), we reduce the complexity by a factor proportional to the number of terminal states (see row 1 in Table 1).

Second, we show that correlation often reduces the computational complexity of finding optimal strategies. In particular, we design several new polynomial time algorithms for computing the optimal correlated strategy to commit to for both turn-based and concurrent-move games (see rows 3, 7, 9).

Third, we study approximation algorithms for the NP-hard problems in this framework and provide fully polynomial time approximation schemes for finding pure and behavioral Stackelberg equilibria for turn-based games on trees with chance nodes (see rows 5, 6). We leave open the question of finding an approximation for concurrent-move games on trees without chance nodes (see row 8).

### 1.2 Related Work

There is a rich body of literature studying the problem of computing Stackelberg equilibria. The computational complexity of the problem is known for one-shot games [7], Bayesian games [7], and selected subclasses of extensive-form games [17] and infinite stochastic games [18, 13, 14]. Similarly, many practical algorithms are also known and typically based on solving multiple linear programs [7], or mixed-integer linear programs for Bayesian [21] and extensive-form games [2].

For one-shot games, the problem of computing a Stackelberg equilibrium is polynomial [7] in contrast to the PPAD-completeness of a Nash equilibrium [9, 5]. The situation changes in extensive-form games where Letchford and Conitzer showed [17] that for many cases the problem is NP-hard, while it still remains PPAD-complete for a Nash equilibrium [8]. More specifically, computing Stackelberg equilibria is polynomial only for:

• games with perfect information with no chance on DAGs where the leader commits to a pure strategy,

• games with perfect information with no chance on trees.

Introducing chance or imperfect information leads to NP-hardness. However, several cases were unexplored by the existing work, namely extensive-form games with perfect information and concurrent moves. We address this subclass in this work.

The computational complexity can also change when the leader commits to correlated strategies. This extension of the Stackelberg notion to correlated strategies appeared in several works [6, 18, 27]. Conitzer and Korzhyk [6] analyzed correlated strategies in one-shot games providing a single linear program for their computation. Letchford et al. [18] showed that the problem of finding optimal correlated strategies to commit to is NP-hard in infinite discounted stochastic games8. Xu et al. [27] focused on using correlated strategies in a real-world security based scenario.

The detailed analysis of the impact when the leader can commit to correlated strategies has, however, not been investigated sufficiently in the existing work. We address this extension and study the complexity for multiple subclasses of extensive-form games. Our results show that for many cases the problem of computing Stackelberg equilibria in correlated strategies is polynomial compared to the NP-hardness in behavioral strategies. Finally, these theoretical results have also practical algorithmic implications. An algorithm that computes a Stackelberg equilibrium in correlated strategies can be used to compute a Stackelberg equilibrium in behavioral strategies allowing a significant speed-up in computation time [3].

## 2 Preliminaries

We consider finite two-player sequential games. Note that for every finite set , denotes probability distributions over and denotes the set of all subsets of .

###### Definition 1 (2-player sequential game)

A two-player sequential game is given by a tuple , where:

• is a set of two players;

• is a set of non-terminal states;

• is a set of terminal states;

• is a function that defines which player(s) act in a given state, or whether the node is a chance node (case in which );

• is a set of actions; we overload the notation to restrict the actions only for a single player as and for a single state as ;

• is a transition function between states depending on the actions taken by all the players that act in this state. Overloading notation, also denotes the children of a state : ;

• are the chance probabilities on the edges outgoing from each chance node , such that ;

• Finally, is the utility function for player .

In this paper we study Stackelberg equilibria, thus player 1 will be referred to as the leader and player 2 as the follower.

We say that a game is turn-based if there is a unique player acting in each state (formally, ) and with concurrent moves if both players can act simultaneously in some state. Moreover, the game is said to have no chance if there exist no chance nodes; otherwise the game is with chance.

A pure strategy of a player is an assignment of an action to play in each state of the game (). A behavioral strategy is a probability distribution over actions in each state such that .

The expected utility of player given a pair of strategies is defined as follows:

 ui(σ1,σ2)=∑z∈Zui(z)pσ(z),

where denotes the probability that leaf will be reached if both players follow the strategy from and due to stochastic transitions corresponding to .

A strategy of player is said to represent a best response to the opponent’s strategy if . Denote by the set of all the pure best responses of player to strategy . We can now introduce formally the Stackelberg Equilibrium solution concept:

###### Definition 2 (Stackelberg Equilibrium)

A strategy profile is a Stackelberg Equilibrium if is an optimal strategy of the leader given that the follower best-responds to its choice. Formally, a Stackelberg equilibrium in pure strategies is defined as

 (σ1,σ2)=argmaxσ′1∈Π1,σ′2∈BR(σ′1)u1(σ′1,σ′2)

while a Stackelberg equilibrium in behavioral strategies is defined as

 (σ1,σ2)=argmaxσ′1∈Σ1,σ′2∈BR(σ′1)u1(σ′1,σ′2)

Next, we describe the notion of a Stackelberg equilibrium where the leader can commit to a correlated strategy in a sequential game. The concept was suggested and investigated by Letchford et al. [18], but no formal definition exists. Formalizing such a definition below, we observe that the definition is essentially the “Stackelberg analogue” of the notion of Extensive-Form Correlated Equilibria (EFCE), introduced by von Stengel and Forges [26]. This parallel turns out to be technically relevant as well.

###### Definition 3 (Stackelberg Extensive-Form Correlated Equilibrium)

A probability distribution on pure strategy profiles is called a Stackelberg Extensive-Form Correlated Equilibrium (SEFCE) if it maximizes the leader’s utility (that is, ) subject to the constraint that whenever the play reaches a state where the follower can act, the follower is recommended an action according to such that the follower cannot gain by unilaterally deviating from in state (and possibly in all succeeding states), given the posterior on the probability distribution of the strategy of the leader, defined by the actions taken by the leader so far.

We give an example to illustrate both variants of the Stackelberg solution concept.

###### Example 1

Consider the game in Figure 1, where the follower moves first (in states ) and the leader second (in states ). By committing to a behavioral strategy, the leader can gain utility in the optimal case – leader commits to play left in state and right in . The follower will then prefer playing right in and left in , reaching the leaf with utilities . Note that the leader cannot gain more by committing to strictly mixed behavioral strategies.

Now, consider the case when the leader commits to correlated strategies. We interpret the probability distribution over strategy profiles as signals send to the follower in each node where the follower acts, while the leader is committing to play with respect to and the signals sent to the follower. This can be shown in node , where the leader sends one of two signals to the follower, each with probability . In the first case, the follower receives the signal to move left, while the leader commits to play uniform strategy in and action left in reaching the utility value if the follower plays according to the signal. In the second case, the follower receives the signal to move right, while the leader commits to play right in and left in reaching the utility value if the follower plays according to the signal. By using this correlation, the leader is able to get the utility of , while ensuring the utility of for the follower; hence, the follower will follow the only recommendation in node to play left.

The situation can be visualized using a two-dimensional space, where the -axis represents the utility of the follower and the -axis represents the utility of the leader. This type of visualization was also used in [17] and we use it further in the proof of Theorem 2. While the black nodes correspond to the utility points of the leafs, the solid black lines correspond to outcomes when the leader randomize between the leafs. The follower plays a best-response action in each node; hence, in order to force the follower to play action left in , the leader must guarantee the follower the utility of at least in the sub-game rooted in node since the follower can get at least this value by playing right in . Therefore, each state of the follower restricts the set of possible outcomes of the game. These restrictions are visualized as the vertical dashed lines – one corresponds to the described situation in node , and the second one due to the leaf following node . Considering only commitments to behavioral strategies, the best of all possible outcomes for the leader is the point . With correlation, however, the leader can achieve a mixture of points and (the blue dashed line). This can also be interpreted as forming a convex hull over all possible outcomes in the sub-tree rooted in node . Note, that without correlation, the set of all possible outcomes is not generally a convex set. Finally, after restricting this set of possible solutions due to leaf in node , the intersection point represents the expected utility for the Stackelberg Extensive-Form Correlated Equilibrium solution concept.

The example gives an intuition about the structure of the probability distribution in SEFCE. In each state of the follower, the leader sends a signal to the follower and commits to follow the correlated strategy if the follower admits the recommendation, while simultaneously committing to punish the follower for each deviation. This punishment is simply a strategy that minimizes the follower’s utility and will be useful in many proofs; next we introduce some notation for it.

Let denote a behavioral strategy profile, where in each sub-game the leader plays a minmax behavior strategy based on the utilities of the follower and the follower plays a best response. Moreover, for each state , we denote by the expected utility of the follower in the sub-game rooted in state if both players play according to  (i.e., the value of the corresponding zero-sum sub-game defined by the utilities of the follower).

Note that being a probability distribution over pure strategy profiles, a SEFCE is, a priori, an object of exponential size in the size of the description of the game, when it is described as a tree. This has to be dealt with before we can consider computing it. The following lemma gives a compact representation of the correlated strategies in a SEFCE and the proof yields an algorithm for constructing the probability distribution from the compact representation. It is this compact representation that we seek to compute.

###### Lemma 1

For any turn-based or concurrent-move game in tree form, there exists a SEFCE that can be compactly represented as a behavioral strategy profile such that and corresponds to the following behavior:

• the follower receives signals in each state according to for each action

• the leader chooses the action in each state according to for each action if the state was reached by following the recommendations

• both players switch to the minmax strategy after a deviation by the follower.

#### Proof:

Let be a SEFCE. We construct the behavioral strategy profile from and then show how an optimal strategy can be constructed from and .

To construct , it is sufficient to specify a probability for each action in each state . We use the probability of state being reached (denoted ) that corresponds to the sum of pure strategy profiles such that the actions in strategy profile allow state to be reached.

Formally, there exists a sequence of states and actions (starting at the root), such that for every it holds that , (or is the next decision node of some player if is a chance node), , and . Let denote a set of pure strategy profiles for which such a sequence exists for state , and the strategy profiles that not only reach , but also prescribe action to be played in state . We have:

 σ(a)=∑π′∈Π(s,a)ϕ′(π′)ϕ′(s), where ϕ′(s)=∑π′∈Π(s)ϕ′(π′)

In case , we set the behavior strategy in arbitrarily.

Next, we construct a strategy that corresponds to the desired behavior and show that it is indeed an optimal SEFCE strategy. We need to specify a probability for every pure strategy profile . Consider the sequence of states and actions that corresponds to executing the actions from the strategy profile . Let be one of possible sequences of states and actions (there can be multiple such sequences due to chance nodes), such that , , (or is one of the next decision nodes of some player immediately following the chance node(s) ), , and . The probability for the strategy profile corresponds to the probability of executing the sequences of actions multiplied by the probability that the remaining actions prescribe minmax strategy in case the follower deviates:

 ϕ(π)=(q∏l=1kl−1∏j=0σ(alj))⋅∏a′=π(s′)|s′∈S∖{s10,…,s1k0−1,s20,…,sqkq−1}σm(a′).

#### Correctness

By construction of and , it holds that probability distribution over leafs remains the same as in ; hence, and thus the expected utility of for the players is the same as in .

Second, we have to show that the follower has no incentive to deviate from the recommendations in . By deviating to some action in state , the follower gains , since both players play according to after a deviation. In , the follower can get for the same deviation at best some utility value , which by the definition of the minmax strategies is greater or equal than . Since the expected utility of the follower for following the recommendations is the same in as in , and the follower has no incentive to deviate in because of the optimality, the follower has no incentive to deviate in either.

## 3 Computing Exact Strategies in Turn-Based Games

We start our computational investigation with turn-based games.

###### Theorem 1

There is an algorithm that takes as input a turn-based game in DAG form with no chance nodes and outputs a Stackelberg equilibrium in pure strategies. The algorithm runs in time .

#### Proof:

Our algorithm performs three passes through all the nodes in the graph.

First, the algorithm computes the minmax values of the follower for each node in the game by backward induction.

Second, the algorithm computes a capacity for each state in order to determine which states of the game are reachable (i.e., there exists a commitment of the leader and a best response of the follower such that the state can be reached by following their strategies). The capacity of state , denoted , is defined as the minimum utility of the follower that needs to be guaranteed by the outcome of the sub-game starting in state in order to make this state reachable. By convention and we initially set and mark them as open.

Third, the algorithm evaluates each open state , for which all parents have been marked as closed. We distinguish whether the leader, or the follower makes the decision:

• is a leader node: the algorithm sets for all children ;

• is a follower node: the algorithm sets for all children .

Finally, we mark state as closed.

We say that leaf is a possible outcome, if . Now, the solution is such a possible outcome that maximizes the utility of the leader, i.e. . The strategy is now constructed by following nodes from leaf back to the root while using nodes with capacities . Due to the construction of capacities, such a path exists and forms a part of the Stackelberg strategy. The leader commits to the strategy leading to utility for the follower in the remaining states that are not part of this path.

#### Complexity Analysis

Computing the values can be done in by backward induction due to the fact the graph is a DAG. In the second pass, the algorithm solves the widest-path problem from a single source to all leafs. In each node, the algorithm calculates capacities for every child. In nodes where the leader acts, there is a constant-time operation performed for each child. However, we need to be more careful in nodes where the follower acts. For each child the algorithm computes a maximum value of all of the siblings. We can do this efficiently by computing two maximal values of for all (say ) and for each child then the term equals either to if , or to if . Therefore, the second pass can again be done in . Finally, finding the optimal outcome and constructing the optimal strategy is again at most linear in the size of the graph. Therefore the algorithm takes at most steps.

Next we provide an algorithm for computing a Stackelberg extensive-form correlated equilibrium for turn-based games with no chance nodes.

###### Theorem 2

There is an algorithm that takes as input a turn-based game in tree form with no chance nodes and outputs an SEFCE in the compact representation. The algorithm runs in time .

#### Proof:

We improve the algorithm from the proof of Theorem 4 in [17]. The algorithm contains two steps: (1) a bottom-up dynamic program that for each node computes the set of possible outcomes, (2) a downward pass constructing the optimal correlated strategy in the compact representation.

For each node we keep set of points in two-dimensional space, where the -dimension represents the utility of the follower and the -dimension represents the utility of the leader. These points define the convex set of all possible outcomes of the sub-game rooted in node (we assume that contains only the points on the boundary of the convex hull). We keep each set sorted by polar angle.

#### Upward pass

In leaf , we set . In nodes where the leader acts, the set of points is equal to the convex hull of the corresponding sets of the children . That is, .

In nodes where the follower acts, the algorithm performs two steps. First, the algorithm removes from each set of child the outcomes from which the follower has an incentive to deviate. To do this, the algorithm uses the maxmin values of all other children of except and creates a new set that we call the restricted set. The restricted set is defined as an intersection of the convex set representing all possible outcomes and all outcomes defined by the halfspace restricting the utility of the follower by the inequality:

 x≥maxw′∈T(s);w′≠wminp′∈Hw′u2(p′).

Second, the algorithm computes the set by creating a convex hull of the corresponding restricted sets of the children . That is,

Finally, in the root of the game tree, the outcome of the Stackelberg Extensive-Form Correlated Equilibrium is the point with maximal payoff of player 1:

#### Downward pass

We now construct the compact representation of commitment to correlated strategies that ensures the outcome calculated in the upward pass. The method for determining the optimal strategy in each node is similar to the method used in the proof of Theorem 4 in [17].

Given a node and a point that lies on the boundary of , this method specifies how to commit to correlated strategies in the sub-tree rooted in node . Moreover, the proof in [17] also showed that it is sufficient to consider mixtures of at most two actions in each node and allowing correlated strategies does violate their proof. We consider separately leader and follower nodes:

For each node where the leader acts, the algorithm needs to find two points in the boundaries of children and , such that the desired point is a convex combination of and . If , then the strategy in node is to commit to pure strategy leading to node . If , then the strategy to commit to in node is a mixture: with probability to play action leading to and with probability to play action leading to , where is such that . Finally, for every child we call the method strategy with appropriate (or ) in case (or ), and with the threat value corresponding to for every other child.

For each node where the follower acts, the algorithm again needs to find two points in the restricted boundaries of children and , such that the desired point is a convex combination of and . The reason for using the restricted sets is because the follower must not have an incentive to deviate from the recommendation.

Similarly to the previous case, if , then the correlated strategy in node is to send the follower signal leading to node while committing further to play in sub-tree rooted in node , and to play the minmax strategy in every other child corresponding to value .

If , then there is a mixture of possible signals: with probability the follower receives a signal to play the action leading to and with probability signal to play the action leading to , where is again such that . As before, by sending the signal to play certain action, the leader commits to play method (or ) in sub-tree rooted in node (or ) and committing to play the minmax strategy leading to value for every other child .

#### Correctness

Due to the construction of the set of points that are maintained for each node , these points correspond to the convex hull of all possible outcomes in the sub-game rooted in node . In leafs, the algorithm adds the point corresponding to the leaf. In the leader’s nodes, the algorithm creates a convex combinations of all possible outcomes in the children of the node. The only places where the algorithm removes some outcomes from these sets are nodes of the follower. If a point is removed from in node , there exists an action of the follower in that guarantees the follower a strictly better expected payoff than the expected payoff of the outcome that correspond to the removed point. Therefore, such an outcome is not possible as the follower will have an incentive to deviate. The outcome selected in the root node is the possible outcome that maximizes the payoff of the leader of all possible outcomes; hence, it is optimal for the leader. Finally, the downward pass constructs the compact representation of the optimal correlated strategy to commit to that reaches the optimal outcome.

#### Complexity Analysis

Computing boundary of the convex hull takes time in each level of the game tree since the children sets are already sorted [10, p. 6]. Moreover, since we keep only nodes on the boundary of the convex hull, the inequality for all nodes in a single level of the game tree also bounds the number of lines that need to be checked in the downward pass. Therefore, each pass takes at most time.

Interestingly, the algorithm described in the proof of Theorem 2 can be modified also in cases where the game contains chance, as shown in the next theorem. This is in contrast to computing a Stackelberg equilibria that is NP-hard with chance.

###### Theorem 3

There is an algorithm that takes as input a turn-based game in tree form with chance nodes and outputs the compact form of an SEFCE for the game. The algorithm runs in time .

#### Proof:

We can use the proof from Theorem 2, but need to analyze what happens in chance nodes in the upward pass. The algorithm computes in chance nodes the Minkowski sum of all convex sets in child nodes and since all sets are sorted and this is a planar case, this operation can be again performed in linear time [10, p. 279]. The size of set is again bounded by the number of all leafs [12].

## 4 Computing Exact Strategies in Concurrent-Move Games

Next we analyze concurrent-move games and show that while the problem of computing a Stackelberg equilibrium in behavior strategies is NP-hard (even without chance nodes), the problem of computing a Stackelberg extensive-form correlated equilibrium can be solved in polynomial time.

###### Theorem 4

Given a concurrent-move games in tree form with no chance nodes and a number , it is NP-hard to decide if the leader achieves payoff at least in a Stackelberg equilibrium in behavior strategies.

The proof for the above hardness result above is included in the appendix Section A.1; the proof uses a reduction from the NP-complete problem Knapsack.

###### Theorem 5

For a concurrent-move games in tree form, the compact form of an SEFCE for the game can be found in polynomial time by solving a single linear program.

#### Proof:

We construct a linear program (LP) based on the LP for computing Extensive-Form Correlated Equilibria (EFCE) [26]. We use the compact representation of SEFCE strategies (described by Lemma 1) represented by variables that denote a joint probability that state is reached when both players, and chance, play according to SEFCE strategies.

The size of the original EFCE LP—both the number of variables and constraints—is quadratic in the number of sequences of players. However, the LP for EFCE is defined for a more general class of imperfect-information games without chance. In our case, we can exploit the specific structure of a concurrent-move game and together with the Stackelberg assumption reduce the number of constraints and variables.

First, the deviation from a recommended strategy causes the game to reach a different sub-game in which the strategy of the leader can be chosen (almost) independently to the sub-game that follows the recommendation.

Second, the strategy that the leader should play according to the deviations is a minmax strategy, with which the leader punishes the follower by minimizing the utility of the follower as much as possible. Thus, by deviating to action in state , the follower can get at best the minmax value of the sub-game starting in node that we denote as . The values for each state can be computed beforehand using backward induction.

The linear program is as follows.

 maxδ,v2∑z∈Zδ(z)u1(z) (1) subject to:δ(sroot) = 1 (2) 0≥δ(s) ≥ 1∀s∈S (3) δ(s) = ∑s′∈T(s)δ(s′)∀s∈S;ρ(s)={1,2} (4) δ(T(s,ac)) = δ(s)C(s,ac)∀s∈S∀a∈Ac(s);ρ(s)={c} (5) v2(z) = u2(z)δ(z)∀z∈Z (6) v2(s) = ∑s′∈T(s)v2(s′)∀s∈S (7) ∑a1∈A1(s)v2(T(s,a1×a2)) ≥ ∑a1∈A1(s)δ(T(s,a1×a2))μ(T(s,a1×a′2)) (8) ∀s∈S∀a2,a′2∈A2(s)

The interpretation is as follows. Variables represent the compact form of the correlated strategies.

Equation (2) ensures that the probability of reaching the root state is , while Equation (3) ensures that for each state , we have between and .

Network-flow constraints: the probability of reaching a state equals the sum of probabilities of reaching all possible children (Equation (4)) and it must correspond with the probability of actions in chance nodes (Equation (5)). The objective function ensures that the LP finds a correlated strategy that maximizes the leader’s utility.

The follower has no incentive to deviate from the recommendations given by : To this end, variables represent the expected payoff for the follower in a sub-game rooted in node when played according to ; defined by Equations (6-7). Each action that is recommended by must guarantee the follower at least the utility she gets by deviating from the recommendation. This is ensured by Equation (8), where the expected utility for recommended action is expressed by the left side of the constraint, while the expected utility for deviating is expressed by the right side of the constraint.

Note that the expected utility on the right hand side of Equation (8) is calculated by considering the posterior probability after receiving the recommendation and the minmax values of children states after playing ; .

Therefore, the variables found by solving this linear program correspond to the compact representation of the optimal SEFCE strategy.

## 5 Approximating Optimal Strategies

In this section, we describe fully polynomial time approximation schemes for finding a Stackelberg equilibrium in behavioral strategies as well as in pure strategies for turn based games on trees with chance nodes.

We start with the problem computing behavioral strategies for turn-based games on trees with chance nodes.

###### Theorem 6

There is an algorithm that takes as input a turn-based game on a tree with chance nodes and a parameter , and computes a behavioral strategy for the leader. That strategy, combined with some best response of the follower, achieves a payoff that differs by at most from the payoff of the leader in a Stackelberg equilibrium in behavioral strategies. The algorithm runs in time , where , is the size of the game tree and is its height.

#### Proof:

The exact version of this problem was shown to be NP-hard by Letchford and Conitzer [17]. Their hardness proof was a reduction from Knapsack and our algorithm is closely related to the classical approximation scheme for this problem. We present here the algorithm, and delegate the proof of correctness to the appendix.

Our scheme uses dynamic programming to construct a table of values for each node in the tree. Each table contains a discretized representation of the possible tradeoffs between the utility that the leader can get and the utility that can at the same time be offered to the follower. In the appendix, we show that the cumulative error in the leaders utility is bounded additively by the height of the tree. This error only depends on the height of the tree and not the utility. By an initial scaling of the leader utility by a factor , the error can be made arbitrarily small, at the cost of extra computation time. This scaling is equivalent to discretizing the leaders payoff to multiples of some small . For simplicity, we only describe the scheme for binary trees, since nodes with higher branching factor can be replaced by small equivalent binary trees.

An important property is that only the leader’s utility is discretized, since we need to be able to reason correctly about the follower’s actions. The tables are indexed by the leader’s utility and contains values that are the follower’s utility. More formally, for each sub-tree we will compute a table with the following guarantee for each index in each table:

[a)]
1. the leader has a strategy for the game tree that offers the follower utility while securing utility at least to the leader.

2. no strategy of the leader can (starting from sub-tree ) offer the follower utility strictly more than , while securing utility at least to the leader, where is the height of the tree .

This also serves as our induction hypothesis for proving correctness. For commitment to pure strategies, a similar table is used with the same guarantee, except quantifying over pure strategies instead.

We will now examine each type of node, and for each show how the table is constructed. For each node , we let and denote the two successors (if any), and we let , , and denote their respective tables. Each table will have entries.

#### If T is a leaf

with utility , the table can be filled directly from the definition:

 AT[k] :={u2, if k≤u1−∞, otherwise

Both parts of the induction hypothesis are trivially satisfied by this.

#### If T is a leader node,

and the leader plays with probability , followed up by the strategies that gave the guarantees for and , then the leader would get an expected , while being able to offer to the follower. For a given , the optimal combination of the computed tradeoffs becomes: . This table can be computed in time by looping over all , and taking the maximum with the extremal feasible values of .

#### If T is a chance node,

where the probability of is , and the leader combines the strategies that gave the guarantees for and , then the leader would get an expected while being able to offer to the follower. For a given , the optimal combination of the computed tradeoffs becomes: . The table can thus be filled in time by looping over all , and this can even be improved to by a simple optimization.

#### If T is a follower node,

then if the leader combines the strategy for in with the minmax strategy for , then the followers best response is iff , and similarly it is if . Thus, the optimal combination becomes

 AT[k] :=max(AL[k]↓μ(R),AR[k]↓μ(L)) x↓μ:={x, if x≥μ−∞, otherwise

The table can be filled in time .

Putting it all together, each table can be computed in time , and there is one table for each node in the tree, which gives the desired running time. Let be the table for the root node, and let . The strategy associated with guarantees utility that is at most from the best possible guarantee in the scaled game, and therefore at most from the best possible guarantee in the original game.

This completes the proof of the theorem.

Next, we prove the analogous statement for the case of pure strategies. Again, the exact problem was shown to be NP-hard by Conitzer and Letchford.

###### Theorem 7

There is an algorithm that takes as input a turn-based game on a tree with chance nodes and a parameter , and computes a pure strategy for the leader. That strategy, combined with some best response of the follower, achieves a payoff that differs by at most from the payoff of the leader in a Stackelberg equilibrium in pure strategies. The algorithm runs in time , where , is the size of the game tree and is its height.

#### Proof:

The algorithm is essentially the same as the one for behavioral strategies, except that leader nodes only have . The induction hypothesis is similar, except the quantifications are over pure strategies instead. For a given , the optimal combination of the computed tradeoffs becomes:

 AT[k]:=max{Ac[i] | i≥k∧c∈{L,R}}.

The table can be computed in time .

The performance of the algorithm is slightly better than in the behavioral case, since the most expensive type of node in the behavioral case can now be handled in linear time. Thus, computing each table now takes at most time, which gives the desired running time.

## 6 Discussion

Our paper settles several open questions in the problem of complexity of computing a Stackelberg equilibrium in finite sequential games. Very often the problem is NP-hard for many subclasses of extensive-form games and we show that the hardness holds also for games in the tree form with concurrent moves. However, there are important subclasses that admit either an efficient polynomial algorithm, or fully polynomial-time approximation schemes (FPTAS); we provide an FPTAS for games on trees with chance. The question unanswered within the scope of the paper is whether there exists a (fully) polynomial-time approximation scheme for games in the tree form with concurrent moves. Our conjecture is that the answer is negative.

Second, we formalize a Stackelberg variant of the Extensive-Form Correlated Equilibrium solution concept (SEFCE) where the leader commits to correlated strategies. We show that the complexity of the problem is often reduced (to polynomial) compared to NP-hardness when the leader commits to behavioral strategies. However, this does not hold in general, which is showed by our hardness result for games on DAGs.

Our paper does not address many other variants of computing a Stackelberg equilibrium where the leader commits to correlated strategies. First of all, we consider only two-player games with one leader and one follower. Even though computing an Extensive-Form Correlated Equilibrium in games with multiple players is solvable in polynomial time, a recent result showed that computing a SEFCE on trees with no chance with 3 or more players is NP-hard [4]. Second, we consider only behavioral strategies (or memoryless strategies) in games on DAGs. Extending the concept of SEFCE to strategies that can use some fixed-size memory is a natural continuation of the present work.

## Appendix A Appendix: Hardness Results

In this section we provide the missing proof of NP-hardness.

### a.1 Computing Exact Strategies in Concurrent-Move Games

For the analysis in this section we use a variant of the NP-complete problem Knapsack, which we call Knapsack with unit-items:

Knapsack with unit-items:Given items with positive integer weights and values , a weight budget , and a target value , and such that at least of the items have weight and value 1, does there exist such that and ?

The following lemma will be useful.

###### Lemma 2

The Knapsack with unit-items problem is NP-complete.

#### Proof:

We can reduce from the ordinary Knapsack problem. So given items with weights and values , and weight budget and target , we form items. The weight and values of the first items are given by and , for . The next items are given weight and value 1. The weight budget is unchanged , but the new target value is .

We can now prove the main result of this section.

Theorem 4 (restated). Given a concurrent-move games in tree form with no chance nodes and a number , it is NP-hard to decide if the leader achieves payoff at least in a Stackelberg equilibrium in behavior strategies.

#### Proof:

Consider an instance of Knapsack with unit-items. We define a concurrent-move extensive-form game in a way so that the optimal utility attainable by the leader is equal to the optimal solution value of the Knapsack with unit-items instance.

The game tree consists of two levels (see Figure 2)—the root node consisting of actions of the leader and actions of the follower. denotes a large constant that we use to force the leader to select a uniform strategy in the root node. More precisely, we choose as the smallest integer such that and for . In the second level, there is a state corresponding to item that models the decision of the leader to include items in the subset (action ), or not (action ).

Consider a feasible solution to the Knapsack with unit-items problem with unit-items. This translates into a strategy for the leader as follows. In the root node she plays the uniform strategy, and in sub-game plays with probability 1 if and plays with probability 1 otherwise. We can now observe that the follower plays in sub-games where , since ties are broken in favor of the leader, and the follower plays in sub-games where . In the root node, action for the follower thus leads to payoff . Actions for leads to payoff

 1N(NM−W−M)+N−1N(−W−M)=−W.

Since ties are broken in favor of the leader, the follower plays action , which means that the leader receives payoff , which is the value of the Knapsack with unit-items solution.

Consider on the other hand an optimal strategy for the leader. By the structure of the game we have the following lemma.

###### Claim 1

Without loss of generality the leader plays using a pure strategy in each sub-game .

#### Proof:

If in sub-game the leader commits to playing with probability 1, the follower will choose to play L due to ties being broken in favor of the leader. If on the other hand the leader plays with probability strictly lower than 1, the follower will choose to play R, leading to utility 0 for the leader, and at most 0 for the follower. Since the leader can only obtain positive utility if the follower plays action in the root node, there is thus no benefit for the leader in decreasing the utility for the follower by committing to a strictly mixed strategy. In other words, if the leader plays with probability strictly lower than 1, the leader might as well play with probability 0.

Thus from now on we assume that the leader plays using a pure strategy in each sub-game . Let be such the set of indices of the sub-games where the leader commits to action .

###### Claim 2

If the strategy of the leader ensures positive utility it chooses an action uniformly at random in the root node.

#### Proof:

Let be such that the leader commits to playing action with probability . Then if the follower plays action , the leader obtains payoff

 ∑i∈Jvi+N∑i∈Jεivi

and the follower obtains payoff

 −∑i∈Jwi−N∑i∈Jεiwi.

If the follower plays action , for , the leader obtains payoff and the follower obtains payoff .

Let be such that , and assume to the contrary that . Note that

 εk≥1N∑i:εi>0εi=−1N∑i:εi<0εi.

We now proceed by case analysis.

#### Case 1 (∑i∈Jwi≥W):

By definition of and we have

 εkM≥⎛⎝−1N∑i∈J:εi<0εi⎞⎠M>−1N∑i∈J:εi<0εi(Nwi)=−∑i∈J:εi<0εiwi≥−∑i∈Jεiwi

Multiplying both sides of the inequality by and using the inequality: , we have

 εkNM−W>−∑i∈Jwi−N∑i∈Jεiwi,

which means that action is preferred by the follower. Thus the leader receives payoff 0.

#### Case 2 (∑i∈Jwi<W):

Since we have a Knapsack with unit-items instance, there is a knapsack solution that obtains value , which corresponds to a strategy for the leader that obtains the same utility. Since the current strategy is optimal for the leader we must have , which means that , and thus . We then have by definition of that

 εkNM−W≥NMN2maxivi−W>0.

Thus the payoff for the follower is strictly positive for the action , and this is thus preferred to , thus leading to payoff 0 to the leader.

Since there is a strategy for the leader that obtains strictly positive payoff, we can thus assume that the strategy for the leader chooses an action uniformly at random in the root node, and the follower chooses action . Since is preferred by the follower to any other action this means that , and the leader obtains payoff . Thus this corresponds exactly to a feasible solution to the Knapsack with unit-items instance of the same value.

## Appendix B Appendix: Approximating Optimal Strategies

In this section we provide the missing details for the algorithms that approximate the optimal strategies for the leader to commit to, for both the behavioral and pure case.

Theorem 6 (restated) There is an algorithm that takes as input a turn-based game on a tree with chance nodes and a parameter , and computes a behavioral strategy for the leader. That strategy, combined with some best response of the follower, achieves a payoff that differs by at most from the payoff of the leader in a Stackelberg equilibrium in behavioral strategies. The algorithm runs in time , where , is the size of the game tree and is its height.

We have provided the algorithm in the main body of the paper; its correctness and runtime will follow from the next lemma.

###### Lemma 3

The algorithm of Theorem 6 is correct and has the desired runtime.

#### Proof:

Recall we are given a turn-based game on a tree with chance nodes and parameter , and the goal is to compute a behavioral strategy for the leader. We constructed the algorithm in Theorem 6 so that it uses dynamic programming to store a table of values for each node in the tree, i.e. a discretized representation of the possible tradeoffs between the utility that the leader can get and the utility that can simultaneously be offered to the follower. The crucial part for proving correctness is arguing that the cumulative error in the leader’s utility is bouned additively by the height of the tree.

For clarity, we repeat the induction hypothesis here. For each sub-tree , the table associated with it, , has the following guarantee at each index in the table:

[a)]
1. the leader has a strategy for the game tree that offers the follower utility while securing utility at least to the leader.

2. no strategy of the leader can (starting from sub-tree ) offer the follower utility strictly more than , while securing utility at least to the leader, where is the height of the tree .

We now argue this holds for each type of node in the tree. Note that the base case holds trivially by construction, since it is associated with the leaves of the tree.

Let be a leader node, with successors and , each with tables and . If the leader plays with probability and plays with the remaining probability , followed up by the strategies that gave the guarantees for and , then the leader would get an expected , while being able to offer to the follower. For a given , the optimal combination of the computed tradeoffs becomes:

 AT[k] :=maxi,j,p{pAL[i]+(1−p)AR[j]|pi+(1−p)j≥k}

For part 1 of the induction hypothesis, the strategy that guarantees simply combines the strategies for the maximizing and along with the probability at node . For a given , , and , finding the optimal value amounts to maximizing a linear function over an interval, i.e., it will attain its maximum at one of the end points of the interval. The table can thus be filled in time by looping over all , where is the number of entries in each table.

For part 2 of the induction hypothesis, assume for contradiction that some strategy yields utilities with

 uσ1≥k+HTanduσ2>AT[k] (9)

Let be the probability that assigns to the action , and let and be the utilities from playing and the corresponding follower strategy in the left and right child respectively. By definition,

 uσl=pσ⋅uσ,Ll+(1−pσ)⋅uσ,Rl,∀l∈{1,2} (10)

By the induction hypothesis,

 uσ,c2≤Ac[⌊uσ,c1⌋