Reputation Games for Undirected Graphs

Reputation Games for Undirected Graphs

David Avis School of Informatics, Kyoto University and School of Computer Science, McGill University avis@cs.mcgill.ca Kazuo Iwama School of Informatics, Kyoto University Daichi Pakufootnotemark:
October 19, 2012
Abstract

J. Hopcroft and D. Sheldon originally introduced network reputation games to investigate the self-interested behavior of web authors who want to maximize their PageRank on a directed web graph by choosing their outlinks in a game theoretic manner. They give best response strategies for each player and characterize properties of web graphs which are Nash equilibria. In this paper we consider three different models for PageRank games on undirected graphs such as certain social networks. In undirected graphs players may delete links at will, but typically cannot add links without the other player’s permission. In the deletion-model players are free to delete any of their bidirectional links but may not add links. We study the problem of determining whether the given graph represents a Nash equilibrium or not in this model. We give an time algorithm for a tree, and a parametric time algorithm for general graphs, where is the maximum vertex degree in any biconnected component of the graph. In the request-delete-model players are free to delete any bidirectional links and add any directed links, since these additions can be done unilaterally and can be viewed as requests for bidirected links. For this model we give an time algorithm for verifying Nash equilibria in trees. Finally, in the add-delete-model we allow a node to make arbitrary deletions and the addition of a single bidirectional link if it would increase the page rank of the other player also. In this model we give a parametric algorithm for verifying Nash equilibria in general graphs and characterize so called -insensitive Nash Equilibria. We also give a result showing a large class of graphs where there is an edge addition that causes the PageRank of both of its endpoints to increase, suggesting convergence towards complete subgraphs.

1 Introduction

Introduced by Larry Page and Sergey Brin [13], the PageRank of a web page is an important basis of the Google search engine and possibly one of the most successful applications of a mathematical concept in the IT world. PageRank is a value that is assigned to each web page according to the stationary distribution of an -random walk on the web graph. Here an -random walk is a random walk modified to make a random jump with probability at each step and a random jump is a move to a node according to a given distribution vector .

Unlike rankings based on content such as keywords, tags, etc., PageRank focuses solely on the hyperlink structure of the given web graph. Web links themselves possess strategic worth and hence web authors often try to boost the PageRank of their web pages by carefully choosing links to other pages. Since these authors behave strategically in a self-interested way, this is a typical example of a non-cooperative game. In fact, Hopcroft and Sheldon recently introduced the PageRank game as a game theoretic model played over a directed graph [6]. Each player is identified with a node, and a strategy is a specification of a set of outlinks to other nodes. The payoff for each player is the PageRank value for their node which is calculated on the resulting directed graph. The obvious goal of each player is to maximize their payoff.

In [6], the authors proved a nice property of this game, namely the best strategy of a player is to place her outlinks to the nodes having largest potential value . The potential measures the probability of returning to before the first jump and does not depend on the outlinks from if the other nodes do not change their outlinks. Thus, a simple greedy algorithm exists for deciding if a given graph is in Nash equilibrium and a nice characterization of Nash equilibria graphs is possible. Interestingly, it turns out that such graphs representing Nash equilibria have very strong regularity properties (see Section 3 for details). The purpose of this paper is to study similar problems on undirected graphs.

Motivation. Social networks have become one of the defining paradigms of our time, with enormous influence on how decisions are taken and events unfold. As with web graphs, content by itself will rarely be enough to explain the dynamics of these networks. The underlying graph structure itself surely plays a role in how new relations are formed and old relations broken. In considering two major social networks, Facebook and Twitter, a casual glance shows a radically different graph structure in spite of the fact that they have a comparable number of similar users. Facebook, an undirected graph, has few nodes with degree more than a 1000. Twitter, a directed graph, has nodes (such as Kate Perry) with in-degree 28 million and out degree just 115. A basic difference in the dynamics of the two networks is edge addition, which requires the approval of both nodes in an undirected graph but does not in a directed graph. The ability to add and delete links instantly in a directed network allows for an extremely rapid dynamically changing graph structure. Anecdotal evidence points to a much more stable graph structure in Facebook which apparently consists of large number of relatively small very dense subgraphs.

Another example of an undirected network is the graph of international bilateral agreements between universities. We might consider PageRank as measuring how prestigious a university is, and universities might only accept agreements if it increases their prestige. Finally we might consider the coauthorship graph, possibly one of the oldest social networks, defined so that people could find their “Erdos number”. Here edge deletions are not permitted, but edge additions could conceivably be influenced by the PageRank of the given nodes.

Our motivation is to build models for undirected graphs and to study their dynamics. Our basic tool will be to adopt PageRank as a quantity that users try to optimize. Under this assumption we will study how undirected networks evolve, what networks in equilibrium look like, and contrast this to the case of directed networks. Whether users of these networks actually behave in this way is beyond the scope of this paper.

Outline of the paper. We introduce three different models for PageRank games on undirected graphs. Our study mainly focuses on the deletion-model (described in Section 3) where a player cannot create a new link but may unilaterally delete an existing one. In the directed web graph model described above, what intuitively does is to cut its links to nodes having smaller values, assuming that will not delete its edge to . In the deletion-model, if cuts its outlink to we assume that also cuts its outlink to either automatically, or as a form of revenge. In Figure 1 deleting edge increases ’s PageRank but decreases ’s. So may unilaterally choose to do this. Note after the edge deletion, if proposed to reinstate the edge would refuse.

Figure 1: Edge deletion

Unlike the directed graph model, in the undirected model a node cannot add a new edge by acting unilaterally, but must seek the permission of the other node of the edge. It turns out that the class of equilibria graphs in the deletion-model is larger than in the directed model. Unfortunately, the nice property of the original model that does not depend on the outlinks from , no longer holds. Hence the greedy algorithm for the Nash decision problem does not work, either, and there seems to be no obvious way of checking the equilibrium condition.

In Section 3.1 we give an time algorithm for the case where the graph is a tree. In Section 3.2 we gave a parametric algorithm for general graphs, where the parameter is the maximum degree of any vertex in any biconnected component that contains it. Biconnected components roughly correspond to local clusters of web pages, where one could expect the parameter to be relatively small. Nodes linking biconnected clusters may have arbitrarily large degree without changing the time complexity. We give an time algorithm for general graphs.

Our second model is the request-delete-model where a player can unilaterally delete any existing edges and also can unilaterally create any new directed outlinks but cannot create a new inlink. In Section 4, we give an time algorithm for trees which determines if the given graph is a Nash equilibrium in request-delete-models. It draws on the algorithm in Section 3.1.

The third model is called the add-delete-model where a player can delete any existing edges and also can add an undirected edge to another player if the PageRank of both players are improved. In Figure 2 adding edge increases both ’s and ’s PageRank, so both parties would accept this addition. Note the PageRank of all other players decreases.

Figure 2: Edge addition

While it may seem a great restriction to consider single edge additions, recall that we are interested in Nash equilibria. Multiple edge additions require the simultaneous decisions of multiple players. A player in this group cannot know the actions of the other players and hence cannot predict the new graph structure. Therefore it is not possible in general for a player to calculate whether or not an edge addition would improve her PageRank. In Section 5, we give an time algorithm for general graphs which determines if the given graph is a Nash equilibrium in add-delete-models. It draws on the algorithm in Section 3.2. We also give two structural type theorems. The first shows that the only -insensitive equilibria are complete graphs. The second says that in symmetric graphs edge-addition will occur. This gives some theoretical justification for the anecdotal evidence cited earlier with respect to Facebook.

Our results begin with the study of trees. This is not because social networks are likely to be trees but because the more complex parametric algorithms for general graphs are based on these results.

Related work. Although it focuses less on game theoretic aspects, there is a large literature on optimal linking strategies to maximize the PageRank of given nodes for directed graphs. On the positive side, Avrachenkov and Litvak [2] give a polynomial-time algorithm for maximizing the PageRank of a single node by selecting its outlinks. Kerchove et. al. [7] extend this result to maximizing the sum of the PageRank of a given set of nodes. Csaji et. al. [4] give a polynomial-time algorithm for maximizing the PageRank of a single node with any given set of controllable links.

On the negative side, [4] also shows that the problem becomes NP-hard if some pairs of controllable links in the set are mutually exclusive. Olsen [9] proved that maximizing the minimum PageRank in the given set of nodes is NP-hard if we are allowed to add new links. He also proved that the problem is still NP-hard if we restrict the node set to a single node and the links to only incoming ones to that node [10] and gives a constant factor approximation algorithm for this problem [11]. The question of whether there are -sensitive Nash equilibria was recently affirmatively answered by Chen et. al. [3].

This paper is an extended version of an earlier paper presented at ISAAC 2011 [1], which only covered the edge deletion model.

2 Preliminaries

2.1 PageRank values

Initially we describe the Hopcroft-Sheldon directed graph model. Let be a simple directed graph on node set and arc set , and let be a probability distribution on . Throughout the paper we let denote the number of nodes. For , let denote the set of ’s out-neighbours. A random jump is a move to a node according to the distribution vector instead of using one of the outlinks of the current node. An -random walk on is a random walk that is modified to make a random jump with fixed probability () at each step. The PageRank vector over the vertices in is defined as the stationary distribution of the -random walk. We define the potential matrix such that for vertices , is the probability that a random walk that starts from visits before the first random jump ( if ), which can be written as

(1)

In order to calculate , we have the following equation [6]:

(2)

Chen et. al. proved that is continuous for [3].

2.2 Directed PageRank games

In the PageRank games in [6] the players are the nodes of a directed graph and they attempt to optimize their PageRank by strategic link placement. A strategy for node is a set of outlinks. An outcome is an arc set for consisting of the outlinks chosen by each player. The payoff of each player is the value of PageRank which is calculated on .

We say a player is in best response, if takes a strategy which maximizes ’s PageRank in . A directed graph is a Nash equilibrium if the set of outlinks for each node is a best response: no player can increase her PageRank value by choosing different outlinks. Several results for best response strategies and for Nash equilibria were introduced in [6]. In particular they gave a characterization of -insensitive Nash equilibria, which are graphs being Nash equilibria for all values of the jump parameter.

In this paper we study PageRank games for undirected graphs. Let be an undirected graph on vertex set and edge set . Define the directed graph on the same vertex set , where each edge in gives rise to two arcs and in . In our model, the payoff of each player for the graph is the PageRank of in the corresponding directed graph .

3 Deletion Only Models

In this section, we study the deletion-model for undirected PageRank games, where a player cannot unilaterally create a bidirected link to another node, but it can delete a bidirectional link. We consider the problem that determines whether the given graph is a Nash equilibrium in deletion-model or not. In Section 3.1 we give a quadratic algorithm for the special case when is a tree, and in Section 3.2 we give an time algorithm for general graphs where is the maximum vertex degree on any biconnected component on .

In the deletion-model, we say that a player is in best response if cannot increase her PageRank by any deletion of her (bidirectional) links. A Nash equilibrium is a graph for which every player is in best response. We consider the following problem:

Input:

An undirected graph , , .

Output:

Is the input a Nash equilibrium? (yes/no)

An equivalent formulation is to decide whether no player can increase her PageRank for the given input, where she is only allowed to delete edges to her neighbours. As for directed graphs, we let denote the neighbours of vertex in . A strategy for is to retain a subset of neighbours and delete edges to her other neighbours. Let be a vector of length , which indicates ’s strategy. Formally, if then , otherwise , for . Let denote the potential function (1) for the subgraph of formed by deleting edges . By (2) applied to the corresponding directed graph the PageRank of can be written as

(3)

where . Let denote a vector of ones of length . Usually the length is clear by the context so for simplicity we may drop the subscript . If the input is a Nash equilibrium then is using a best response and no edge deletions for will raise her PageRank. Therefore for any 0/1 vector . The approach we will use to solve the problem described in this section is to compute the maximum of over all 0/1 vectors of length , for each vertex .

We give some examples in Figure 3. Graphs (a), (b) are -insensitive Nash equilibria in directed PageRank games, and are also a Nash equilibrium in deletion-models for any given and . Graph (c) is an example which is not a Nash equilibrium in directed games, since can increase its PageRank if we delete the arc from to which has less potential than . However (c) is a Nash equilibrium in deletion-models for and uniform distribution , since if cuts its edge to then it decreases ’s PageRank. Graph (d) is not a Nash equilibrium in both directed models and deletion-models for and uniform distribution , where is a -complete graph, for example. In this graph the potentials from , to are much less than , so may try to cut the edge to or the edge to , but a single edge deletion decreases ’s PageRank. Interestingly, the deletion of both edges leads to a greater PageRank for .

Figure 3: Examples

3.1 Trees

In this section, we study the problem where the graph is a tree. We prove the following theorem.

Theorem 3.1

Given a tree , jump probability and distribution , we can determine in time whether is a Nash equilibrium in the deletion-model and if not give an improving strategy for at least one player.

The remainder of this section is devoted to the proof of this theorem.

Let be a node in , let be the set of neighbours of , and let . Consider any strategy for as described above and let be the 0/1 vector that represents it. For , let be the set of nodes which are descendants of (including itself) in the subtree of rooted at . For a node ,

(4)

since potentials of all nodes in depend on only link and the other links of do not affect these potentials. This is because if cuts link , all nodes in are disconnected from the other nodes in .

Therefore (3) can be rewritten:

(5)

Note that the potential matrix on can be computed in time, by using Gaussian elimination methods for each column vector defined by equation (1). Since is a tree we can apply elimination steps in post-order, where we consider to be the root of . There are at most forward eliminations and backward substitutions because every node except has only one parent. Therefore it costs time to compute .

Let and let for . Consider the fractional integer programming problem,

where and for are known constants.

In order to solve problem , we fix the Hamming weight of and we solve the following problem for each :

Problem can be solved directly by Megiddo’s method in time [8], and it can be also solved by Newton’s Method in time [12]. However we are able to specialize Megiddo’s method for our problem to obtain an time algorithm. Our approach initially follows the technique describe in [8].

Since , we can solve problem by solving problem for each . Consider the following maximization problem for some fixed .

Let and let be the decreasing sequence of indices ordered by the values of . Problem is easily solved by choosing the first indices in . Let be the optimal value of problem for a given , that is, . When , then is equal to which is the optimal value of problem . On the other hand, if then , and if then , i. e., a root of (see Figure 5). The task in solving problem is to find the value for which by some tests on . The key point is how many values of have to be tested for finding . Since the optimal solution for can change only when the order of changes, we only have to test at intersection values of lines . Using some results from computational geometry, we are able to do this efficiently.

Figure 4: Solving
Figure 5: Path for on

First we compute the line arrangement of lines . Namely we define the planar graph which is formed by subdivision of the plane induced by these lines. Then we look all edges in as directed according to the positive direction of .

For each we test values of at the change point of the th entry in , which are the nodes in lying on the th layer of the arrangement. An example with is shown in Figure 5.

We summarize the algorithm for solving . First compute constants , , for , and line arrangement of lines . Compute by sorting the values of . For , do following steps: let if is within th entry of , and otherwise . Compute and . From the starting edge, which is th edge from the top, follow the -th layer as follows: When we follow the edge on the line , and visit the node which is intersection of and , let and . If then output as the solution, otherwise let and , and go to next node by following the edge on the line .

Finally, we analyze the running time of the algorithm. Computing the line arrangement takes time, which is done by the incremental method or topological sort algorithms (see Edelsbrunner [5]). Since the number of nodes in is and each of them is visited twice, we can find for each in at most time. Therefore this algorithm solves problem in time. Note that if is not in best response the solution to gives an improving strategy for .

As we have seen we can test whether a node is in best response in time. Because is a tree we have and we have Therefore we can test whether all nodes are in best response in time. This concludes the proof of Theorem 3.1.

3.2 General Graphs

In this section we give a parametric algorithm for general connected graphs based on a parameter defined as follows. If is a tree we set else is the maximum vertex degree in any biconnected component of . Note that can be computed in time by decomposing into its biconnected components and by finding the maximum vertex degree in every such component. Note that graphs can have a large maximum vertex degree but small parameter . This would occur whenever the large degree vertices were cut vertices. In a network setting the biconnected components could represent small groups of well connected web pages with relatively few links per page. These groups would be linked together by a few pages containing many more links. We prove the following theorem.

Theorem 3.2

Given a graph with , jump probability and distribution , in time we can determine if this is a Nash Equilibrium in the deletion-model and if not give an improving strategy for at least one player.

The remainder of this section is devoted to the proof of this theorem.

Let be a node in and let be the set of connected components in the subgraph that is induced by deletion of from . It follows from the definition of that has at most links to for . Let indicate the set of ’s neighbours in , for . We have by the definition of . Consider any strategy for , as described in Section 3 and let be the 0/1 vector of length that represents it. We write as the concatenation of the 0/1 vectors representing the strategy restricted to the component , . Then, for , if for , the potential of is written as follows:

(6)

where are the potentials from for the subgraph of formed by deleting edges . This is because potentials to from all nodes in depend only on links to and never depend on other links. To compute each column vector for for , we solve the linear systems defined on (1) by Gaussian elimination method in time.

We have the formula for PageRank of as

(7)

where and be constants such that

(8)
(9)

Note that for all and the denominator of is always positive.

In order to determine whether maximizes its PageRank, consider the following fractional integer programming problem:

The method for problem is the similar to that used in Section 3.1. We fix the Hamming weight of as , and consider the following fractional integer programming problem:

Since , we can solve problem by solving problem for each . Let be a positive real number, and let for all for .

Let be the optimal value of problem for some , i.e., . Note that we do not have to find the optimum value of problem , since our goal is to determine whether maximizes its value, that is, or not. All we have to do is to solve for , and determine whether or not.

For converting in problem to linear function, let denote the 0/1 variable and let be the constant for and such that,

indicates whether the number of edges used in the strategy going from to is equal to or not. If , let be the edges chosen. Then we consider that earns , with cost . In the optimal strategy , if for some then it must be that maximizes , that is , as any other assignment can be improved to it. We then have the equivalent integer linear program to :

(10)
(11)

Problem is similar to a knapsack problem where each item has positive integer weight and value , and the total weight must be . The only difference is the constraint (11).

Dynamic programming can be used to solve in time. Let , for and , denote the maximum value which has total weight and uses only the first items. Let , then,

For each , we can compute for in time. Since , the computation time for solving is .

In order to determine whether is in best response, we test for . Since the running time per vertex is . Moreover from the assumption. Therefore, in order to determine whether every node is in best response, it takes time. This completes the proof of Theorem 3.2.

4 Request-Delete Model

In this section, we study the request-delete-model for undirected PageRank games, where not only a player can unilaterally delete bidirected links, but also can create outlinks from to any non-neighbours. We may consider these outlinks to be a form of requests to the other node to establish a link. We consider the problem of determining whether the given input is a Nash equilibrium in the request-delete-model or not. We give an time algorithm for this when the underlying graph is a bidirected tree.

Let be a tree and be a node in . Let denote the set of neighbours of and let . For , let be the set of nodes which are descendants of (including itself) in the subtree of rooted at . In the request-delete-model, a player is in best response if any combination of edge deletions and creation of outlinks cannot increase PageRank of . A Nash equilibrium is a graph for which every player is in best response. A strategy for is to retain a subset of neighbours and to choose a subset for outlinks to her non-neighbours. Let and be a vectors of length and of length respectively which indicate ’s strategy. Formally, for , if is in otherwise . Similarly, for , if is in otherwise . Let be the PageRank of on the resulting graph for the strategy and . A node is in best response if for any vectors and . Our approach to solve the problem for verifying a Nash equilibrium is to compute the maximum of over all vectors , for each vertex .

Since is a tree and the outlinks of do not affect for any , equation (4) holds. By equation (2), the PageRank of can be written as:

(12)

where we let , and be constants such that

The proof of the following lemma can be found in [6].

Lemma 4.1

[6] Let be a directed graph. For a node in , if a node has the maximum potential with respect to then is a in-neighbour of .

We prove following two lemmas.

Lemma 4.2

Let be a tree, be a node in , and let be defined as above. For all and for all ,

Proof. Let be the subtree of induced by and all nodes in . Since is a tree, the walk starting at the nodes in cannot reach the nodes in without visiting . Thus the potential from each vertex with respect to is the same as the potential in . By Lemma 4.1, only a neighbour of can have the maximum potential to in and the other potentials are strictly less. This means for all in and concludes the proof.       

By Lemma 4.2, we have the following strict inequality.

(13)

Let denote a vector over such that if is within the th largest values in the all entries of , and otherwise.

Lemma 4.3

Let and be defined as above. For ,

Proof. By contradiction. When and , obviously the above equation holds, so we consider the case of .

Assume that is the maximum assignment for subject to , and that for any . There exist in such that and , . Let denote the neighbour of such that is in .

When the case of , the assignment , decreases the denominator in the equation (12), since . This gives an improvement for , and does not change the Hamming weights of and , a contradiction. Therefore we take . However the assignment , decreases the denominator in the equation (12), since by (13). This gives an improvement, and does not change the value . This contradiction concludes the proof.       

Lemma 4.3 means that if a player is in best response, puts her outlinks to the nodes which have the higher potential to .

Theorem 4.4

Given a bidirected tree , jump probability and distribution , we can determine in time whether this is a Nash equilibrium in the request-delete-model and if not give an improving strategy for at least one player.

Proof. Consider the following fractional integer programming problem:

We solve optimization problem for each in . For given and each , we compute as defined just before Lemma 4.3. Consider the following optimization problem for .

where for . By using the algorithm to solve the problem in Section 3.1, we can solve problem for all in time. Let . By Lemma 4.3, the solution to problem for each gives also the solution to since . Thus, for each node in , we can determine whether is in best response. It follows that we can determine whether the input is a Nash equilibrium.

Finally, we analyze the running time of the algorithm. It takes time to compute all potentials in . For each , it takes times to solve the problem . Therefore we can determine if a node is in best response in time. Since is a tree, . Thus we can test whether all nodes are in best response in time.       

5 Add-Delete Model

In this section, we introduce the add-delete-model, where each player can delete any edges from and can add one edge to any non-neighbour if by so doing the PageRank of increases. Otherwise we may presume that would simply delete the edge . We consider the problem of determining whether or not the input is a Nash equilibrium and give an time algorithm for general graphs, where is the maximum vertex degree on any biconnected component in the graph.

Let be an undirected graph and let be a parameter defined as follows. If is a tree we set else is the maximum vertex degree in any biconnected component of . Let denote the set of neighbors of a node in . Let . In the add-delete-model, the strategy for a player is to retain the subset of neighbours from and to choose one non-neighbour to add an edge between and . The PageRank of must increase by the strategy. A player is in best response if cannot increase her PageRank by her any other possible strategies. A Nash equilibrium is a graph where every player is in best response.

Let be a node in and let be the set of connected components in the subgraph that are induced by deletion of from . It follows from the definition of that has at most links to for . Let be the set of ’s neighbours in for . Let be a vector of length which indicates the strategy of for edge deletion. We write as the concatenation of the 0/1 vectors representing the strategy restricted to the component for . The potential on is given by equation (6) for each in .

Theorem 5.1

Given a graph with , jump probability and distribution , in time we can determine if this is a Nash Equilibrium in the add-delete-model and if not give an improving strategy for at least one player.

Proof. We give an algorithm which determines whether a player in best response, for each vertex in .

We fix and for each of non-neighbours perform the following steps. Let be the component containing . We initially decide the strategy vector for by choosing a subset . Let be the graph formed by adding to and by deleting the edges to nodes not in . Otherwise we retain the edges to the nodes in , for each , . Let . For each node in , let be the potential calculated on . Since is a cut vertex of , these potentials are invariant to . By equation (2), ’s PageRank on the resulting graph for the strategy on can be written as follows:

(14)

where , are defined by equation (8) and (9), and and are known constants. We consider the following problem.

In order to determine whether maximizes its PageRank, we fix the Hamming weight of for each and consider the following problem .

(15)
(16)

According to the PageRank formulae for each in ,

Since the nodes in are not adjacent to the nodes in for each , , we can calculate the PageRank of as follows.