Power-law weighted networks from local attachments

Power-law weighted networks from local attachments

P. Moriano and J. Finke
Department of Electrical Engineering and Computer Science
Pontificia Universidad Javeriana, Santiago de Cali
pamoriano@javerianacali.edu.co, finke@ieee.org
Abstract

This letter introduces a mechanism for constructing, through a process of distributed decision-making, substrates for the study of collective dynamics on extended power-law weighted networks with both a desired scaling exponent and a fixed clustering coefficient. The analytical results show that the connectivity distribution converges to the scaling behavior often found in social and engineering systems. To illustrate the approach of the proposed framework we generate network substrates that resemble steady state properties of the empirical citation distributions of publications indexed by the Institute for Scientific Information from 1981 to 1997; patents granted by the U.S. Patent and Trademark Office from 1975 to 1999; and opinions written by the Supreme Court and the cases they cite from 1754 to 2002.

Keywords: complex networks, weighted digraphs, extended power-law distributions.

1 Introduction

Understanding structure lies at the very heart of the study of complex networks. A network is a collection of a large number of interconnected elements (units or agents) whose interaction with each other and with the surroundings leads to characteristic properties that can only be attributed to the network as a whole [1]. Networks often develop distinct structural steady state patterns. Studying these patterns, promises to enhance our understanding of the dynamics underlying collective human responses [2], corrupt behavior [3], and economic development [4].

Random graph models fail to capture key features of real-world networks (e.g., clustering coefficients and degree correlations). Recent efforts to understand network structure have focused on connectivity distributions underlying a number of social and engineering systems which, rather than following the Poisson distribution of random networks (bounded by Chebyshev’s inequality), have heavy tails [5]. Heavy-tailed distributions in empirical data suggests the existence of causal mechanisms that shape the structure and function of real-world networks [6]. In the era of “big data,” the development of formal frameworks that quantify patterns of interaction of networks has set the research agendas across various disciplines (e.g., more recently across the data driven computational social sciences).

Power-laws, a particular type of heavy-tailed distributions, have received significant attention in recent years. For a network with an extended power-law connectivity distribution, if the number of connections of a node is much larger than , the probability that the node connects to other nodes is proportional to for some positive constants and [7]. As a result, the tail of the distribution has no exponential bound and the connectivity of the nodes of the network comprises different orders of magnitude, with a few nodes being highly connected.

Key to modeling power-law networks is the characterization of hubs (highly interconnected nodes). In the context of the spread of disease, measuring patterns in regions that are more vulnerable to infection (hubs) allows us to respond more effectively to the potential spread of large-scale epidemics [8]. The ability to understand and recreate the structure of epidemic networks allows us to design strategies that embrace how interconnected regions influence one another (as a result of the evolution of social systems) in order to quantify and predict the dimensions of disease.

To capture the relationships between the elements of a network, e.g., duration, emotional intensity, or intimacy, models define weights as an inherent property between nodes [9]. Recent models of weighted networks have focused on attachment strategies in which nodes are added according to probability distributions on the existing weights across the entire network. The network model introduced in [10] captures the evolution of weights driven by preferential strength attachment, a mechanism in which newly added nodes are more likely to connect to nodes associated with larger weights. Lacking local competitive factors between nodes, the resulting networks exhibit power-law distributions where the hubs correspond to the nodes that have been part of the network the longest.

This letter introduces a wide class of attachment strategies which promote the formation of hubs based on both the length of time a node has been part of the network (i.e., node longevity) and its ability to compete for weights with surrounding neighbors (i.e., node fitness). Because the connectivity dynamics of the nodes depend on their attractiveness to compete for weights (as in [11]), older nodes are not necessarily more successful in acquiring weights. To our knowledge the proposed mechanism is novel in that it generates weighted directed networks with extended power-law strength distributions in a distributed fashion (decision-making strategies are based on local information; we do not assume any type of global information to generate the desired network structure); for an arbitrary scaling exponent and a fixed clustering (as in [12]); and for values greater than a particular threshold (for the case when only the tail of the distribution obeys a power-law).

The remaining sections are organized as follows. First we introduce a model that captures the connectivity and growth dynamics of the gradual addition of nodes to an existing network component and proposes attachment strategies for local rearrangement of weights between pairs of nodes. We prove that for any connected network there exists a distribution of the total weight from neighboring nodes (node strength) that is asymptotically stable (i.e., the proposed strategies lead to a Nash equilibrium [13]). Moreover, as the network grows, consecutive achievements of this network state leads to weighted directed networks with extended power-law strength distributions and distinctive clustering coefficients (defined as the ratio of the total average weight of transitive triplets over the total weight of possible triplets). We present simulations that capture the effect of node fitness and illustrate the application of the proposed model to generate various citation networks. Finally, we draw some conclusions and future research directions.

2 A model of network topology and growth

Consider a directed network that captures weighted relationships between a set of nodes. As the network grows, more nodes join the network, each possessing a small “budget” used to construct directed links to some existing nodes. When, node establishes a link to node (by passing some of its budget to node ) node has more budget to spend, which it may do by increasing its weighted connections to other nodes. Broadly speaking, every node wishes to spend its budget, but the more it spends the less willing it is to spend more. Nodes will locally rearrange their weights until every node reaches an equilibrium. At the equilibrium all nodes have associated gains that are equal and there are no further incentives to rearrange connections.

To formalize this idea let us introduce the following notation. Let be a finite set of nodes at generation . Nodes represent elements (acting units) that establish connections to other nodes. We represent the relationship between nodes using a weighted matrix , where quantifies the relationship between node and . If , then there exists some kind of action from to with weight . It may capture, for instance, the extent to which node influences node . Let represent the network at generation (because in general , the network is modeled as a directed graph). For a fixed generation, let represent all nodes which influence node (incoming neighbors). Similarly, let represent all nodes influenced by node (outgoing neighbors). A gain function is associated to each node and characterizes the marginal benefit that results from its current set of connections, where , . Note that is a scalar that represents the incoming strength of node (referred to as node strength hereafter). The following network assumptions are needed:

  • Finite network strength: The total weight of the initial network , , is finite. In other words, the extent to which any node in the network can be influenced by other nodes is bounded.

  • Connectedness: Every node is influenced to some extent by another node. At each generation , .

  • Bounded marginal gains: The gain function associated to node satisfies

    (1)

    for any , and some constants . In other words, the marginal gain associated with each node decreases with increasing strength. Equation (1) eliminates the possibility that a very small difference in node strength may result in an unbounded change in gain. Note that if is differentiable and has a negative derivative it satisfies eq. (1).

Next, we use to specify the time index of events. Let be the time instant when a new node is added to form the network (i.e., the start of generation ). Let be the instant right before the new node is added to (i.e., the start of generation ). When , evolves into . For generation let the set of states

be the simplex over which the connectivity dynamics evolve. Constraints on our model below will ensure that for all nodes , for all . We assume that as , the time allowed for the events that drive the connectivity dynamics during generation goes to infinity. Let be the state vector for at time (i.e., the incoming strength distribution of the entire network).

2.1 Connectivity dynamics

We first focus on the dynamics of for (i.e., within a fixed generation). In particular, we want to define the singleton

(2)

such that any strength distribution that belongs to this set represents a distribution where all nodes in have equal gain levels. To capture the connectivity dynamics that lead to , let represents the decision of node to weaken its relation from some nodes in while strengthening its relation to other nodes in . Let the list such that and be composed of elements that denote the weight to be added or created to the to link between node and node . For convenience, we will denote this list by . Similarly, let the list be composed of elements that denote the weight to be subtracted from the link where node .

Let denote the set of all possible combinations of how node can weaken or strengthen its relations to other nodes. Let the set of events be described by (() denotes the power set). We call , , events of type ; they drive the connectivity dynamics within a network generation. Notice that each event is defined as a set, with each element of representing the potential rearrangement of multiple weights between nodes, and multiple elements in representing the simultaneous rearrangements among multiple nodes.

An event may occur only if it belongs to the set defined by an enable function , specified for node as follows

  • If for all , then such that and is the only enabled event. Hence, node does not modify its relationships to others nodes (i.e., the strength of node does not change).

  • If for some , then the only are ones with and such that

    C1
    C2
    C3

for some , for all and . The parameter  regulates the speed at which weights are rearranged and affects the transitivity of the network (i.e., if a node is connected to node and node to node , the probability that node is also connected to node ). Low values of lead to slower convergence processes which increase the probability of forming transitive triples and lead to higher clustering coefficients.

Condition C1 implies that a node can only establish or strengthen its relations to other nodes by weakening incoming weights (the sum of incoming weights must equal the sum of outgoing weights). It implies that conserves total network strength, i.e., is constant. To interpret C2 and C3 it is useful to remember that reducing (increasing) the strength of a node always increases (decreases, respectively) its gain. Both conditions constrain how nodes can modify their weights in terms of the gain of outgoing neighbors. Condition C2 implies that if the gain of node differs from any of its outgoing neighbors, then the relation to some neighbor with the highest gain must be strengthened by some amount. Condition C3 implies that when node weakens incoming weights, node cannot exceed the highest gain of at least one outgoing neighbor. Together they guarantee that the highest gain of the network is strictly monotonically decreasing over time (as we prove in Theorem 1).

Next, state transitions are defined by the operator where . For a fixed generation , if , , then , where

(3)

Equation (3) means that the strength at node at time equals the strength of node at time , plus the total weight added by the nodes that strengthened their relationship to node , minus the total weight reduced by nodes that weakened their relation to node at time .

Let denote the set of all infinite sequence of events . Let denote the sequence of events and let the value of the function denote the state reached at time from the initial state by the application of the sequence of events of type 1. We assume that each event of type 1 occurs infinitely often on each event trajectory , . This assumption is met if nodes persistently try to rearrange weights. The enable function together with state transition operator define the evolution of the connectivity dynamics of the network.

2.2 Growth dynamics

We now turn our attention to the evolution of the network as it grows. To capture a nodes’s advantage of longevity let be the generation when node is added and define as the fraction of generations node has not been part of the network component. Moreover, to capture a node’s competitive advantage in acquiring weights we associate to every node a fitness , where . Let be a constant amount of strength such that . Let the gain function (marginal utility) associated to node during generation be

(4)

Higher values of characterize nodes that are more attractive in the sense that they can carry more weight without greatly reducing their gain. Both high values of (representing the fact that node has been part of the growing network for only a few generations) and low values of (representing the fact that the node has a low competitive advantage for acquiring weights) have a negative effect on the gain of node . Below we will see how allows us to define the scaling exponent of extended power-law strength distributions.

Let represents the attachment of a new node to the network at the beginning of generation (when ). Let be the total (constant) weight of a newly added node. A node attaches to the network component by randomly distributing its weight across some nodes and establishing a non-empty set of incoming neighbors (i.e., some node must connect to it). We call the attachment of nodes to , events of type 2. Let denote all possible combinations of how node can attach to the network component. An event may occur if it is defined by an enable function , specified for a newly added node as follows

  • Node attaches to the network only if the associated gain function follows the general form of (4) with longevity and fitness parameters that satisfy

C4
C5

Condition C4 follows from letting for the newly added node (at generation node has been part of network for one generation). Condition C5 specifies an equal fitness value for every node (as is the case for networks with linear growth under preferential attachment).

The transition is defined by the operator . If , then where only if node is the newly added node. Let denote the set of all infinite sequence of events . Let denote the sequence of events of type 2, . We assume that each event of type 2 occurs infinitely often on each event trajectory . The assumption is met if nodes constantly attach to the existing network component. The enable function together with the transition operator define the growth dynamics of the network.

3 Analysis

Next, we present stability properties of the invariant set and deduce the average gain level of the network . We then prove that, for values greater than a threshold , the strength distribution converges to a scaling behavior.

Theorem : Suppose A1-3 and C1-3 hold. Then is an invariant set and has region of asymptotic stability equal to .

Theorem 1 guarantees that for any generation , initial network state , and event sequence , as for generation . Broadly speaking, the conditions in Theorem 1 capture the dynamic coupling between different nodes that lead to a Nash equilibrium. By attaining the same gain level no node can increase its gain by changing its connections unilaterally without making the average gain of all other nodes worse off. When is reached the average gain of the network an instant before the start of generation is given by

(5)

As the network grows, the behavior of the average gain is characterized by the following lemma.

Lemma : Suppose A1-3 and C1-5 hold. Moreover, let then  as .

Lemma 1 implies that at the desired strength distribution , the average gain tends to as .

The following theorem implies that as the network grows, it develops an extended power-law structure driven by the marginal benefit of the allocation of weights across nodes and quantifies the value above which the scaling behavior emerges.

Theorem : Suppose A1-3 and C1-5 hold. Moreover, let . Then the strength distribution of the network follows an extended power-law with scaling exponent as . The scaling behavior holds for values greater than .

Note that if the model yields power-law rather than extended power-law distributions (as in preferential attachment with linear growth) for values greater than [10].

Extended power-law distributions emerge as a result of both the interaction between local mechanisms that lead to Nash equilibria and the continuum attachment of new nodes to the network. In particular, when the network is at a Nash and a new node is added, it introduces a perturbation to the existing set of strategies. Conditions C1-3 force the network to return to a state which again represents a Nash, with subsequent achievements of Nash equilibria shaping the structure of the network.

4 Simulations

To gain insight into the connectivity dynamics let , , , and consider a network after generations. Figure 1 shows the value of the clustering coefficient (i.e., the ratio of the total average weight of transitive triplets over the total weight of possible triplets) as a function of the size. Note that for any the clustering properties remain constant as the network grows.

Figure 1: Clustering coefficient as a function of network size at various values of and as a function of .

Figure 2 shows the effect of varying node fitness, where is chosen from a uniform distribution with support . Figure 2a shows the evolution of the node’s strength for different values of . Note that follows a power-law for all values of . Because of their competitive advantage, there are some nodes with more strength which have been part of the network for only a few generations. It is possible for a node to join the network at a more recent generation and become more attractive than other nodes that have been part of the network for longer. In particular, fig. 2a shows that the node added at generation with overcomes older nodes with and . In fig. 2b, the cumulative strength distribution for the entire network suggests a power-law with a logarithmic corrective term similar to the theoretical prediction in [11] where with .

Figure 2: (a) Evolution on the strength of three nodes added to the network using fitness , , and from with and when . (b) Cumulative strength distribution  or where is the exponential integral function (i.e., a power-law with an inverse logarithmic correction term emerges).

Finally, fig. 3 shows empirical data on the citation distribution of articles indexed by the Institute for Scientific Information (ISI); patents granted by the U.S. Patents and Trade Office; and opinions written by the U.S. Supreme Court and the cases they cite. Figure 3a illustrates the case for scientific papers published in 1981 and cited between 1981 and 1997 [14]. The authors of [6] estimated both the scaling exponent and the threshold at which the scaling behavior emerges. Figure 3b represents citations on the main subnetwork of U.S. patents granted between 1963 and 1999 and references made to these patents between 1975 and 1999 [15]. Figure 3c shows the majority opinions written by the U.S. Supreme Court and the cases they cite from 1754 to 2002 [16]. All three citation networks follow extended power-law distributions (for the last two examples we estimate the values of , , and from empirical data).

Figure 3: Cumulative probability distribution for (a) the paper citation network presented in [6]; (b) the U.S. patent citation network presented in [15]; and the U.S. Supreme Court citation network presented in [16].
footnotetext: For the paper citation network we use the data and the distribution predicted by the model introduced in [6]. For the U.S. patent citation network we use the data presented in [15] and the distribution predicted by the model introduced in [7] with and . For the U.S. Supreme Court citation network we use the data presented in [16] and the distribution predicted by the model introduced in [7] with and respectively.
Papers Patents Court cases
Nodes 783339 240547 30288
Empirical Links 6716198 561060 220500
network 3.16 4.68 4.29
160 35 8 2 55 20
Not available 0.037 0.107
Generated 3.13 4.63 4.25
network 13.3 7.2 40.8
model 0.713 0.044 0.112
7 2.9 7
Model 18 7 44
parameters 0.47 0.28 0.31
0.5 0.98 0.93
D-statistic Proposed model 0.2792 0.1656 0.1761
Previous models 0.9910 0.3681 0.2995
Sum of squares Proposed model 0.7358 0.0599 0.0499
Previous models 2.3829 0.1416 0.1803
Range [160, 8904] [0, 173] [0, 248]
Table 1: Model parameters for the three citation networks.

Finally, we compare the distributions from empirical data with the distributions predicted by the proposed and previous models [6], [7]. We measure the greatest discrepancy between the empirical and the expected distribution (D-statistic), as well as the sum of squares of the deviations between the two distributions. Table 1 summarizes the model parameters and the results. Note that the performance of other (perhaps simpler) models degrades when the entire range is considered.

5 Discussion

The proposed model generates extended power-law distributions from consecutive achievements of stable strength distributions . Although it does not pretend to empirically validate real-world mechanisms behind citation networks, the model may be of interest in the following context. First, it can be shown that the state is a Nash, which implies that when a network reaches the equilibrium there is not any node that can gain by unilaterally rearranging weights to neighboring nodes (there are no incentives to change or establish new relationships). By focusing on the dynamics that drive the network to we capture the coupling between different nodes, characterizing how relationships between any pair of nodes affects other nodes in the network. Second, the proposed strategies allow us to control the connectivity dynamics of nodes based on local attachment strategies (C1-5), allowing us to generate network substrates through distributed decision-making. Finally, the ability to control the rate at which attachment strategies lead to the scaling behavior allows us to obtain non-negligible clustering coefficients for large networks.

We focused on two types of network incentives: Longevity rewards nodes that have been part of the network for a long time (they have the ability to acquire more weight compared to recently added ones); Fitness rewards nodes that are highly competent (they are more suitable to compete and maintain weights). Modeling nodes with varying fitness allows “latecomers” to overcome nodes that have been in the network for longer generations.

Following similar ideas as in Theorems 1 and 2, the proposed framework can be extended to generate exponential strength distributions. In particular, if we consider the gain function of the general from where , the proposed strategies lead to weighted networks with . A mathematical framework that allows us to generate various strength distributions for different domain intervals provides an important direction for future research.

6 Appendix

Proof of Theorem 1.

First, we define a metric on the distribution of the strength and a Lyapunov function . We then show that for the choices of and , for and there exist two positive constant and such that for all . Finally, to prove asymptotic stability of we show that for any initial distribution and any class of rewiring strategies that satisfy C1-3, i.e., for all such that , the functional as for any fixed generation .

Let and choose

(6)

and

(7)

Note that for , since for all and . To show that is bounded from below by a class function , note that according to eq. (1) for all and all , it must be the case that for any and , there is some node such that . Let and be a constant such that

(8)

Since eq. (8) applies for any such that , it must apply for some node

(9)

Using the definition of and applying eq. (8) to node yields

(10)

Note that for any strength distribution and , one of the following must be true: In the first case, if , i.e., if node needs more weight from its neighbors to achieve the desired state, then there must exist some other node such that

(11)

In other words, there must exist another node that needs to weaken its relationship to neighboring nodes to achieve desired state . Because, and

Similarly, if , i.e., if node needs to weaken its relationship to neighboring nodes to achieve the desired state, then there must also exist some other node such that

In other words, there must exist another node that needs to strengthen its relationships to achieve desired state . Because

Thus, eq. (10) can be bounded from above by

(12)

Next, note that

Using eq. (12) we get

(13)

Thus, for all .

Next, we will show that there exist a constant such that for all . Let . Recall that for all and , . Note also that if , then according to eq. (1)

(14)

and similarly, if , then

(15)

By adding eq. (14) and eq. (15) get

Since , and

Moreover

Hence, if

(16)

Since eq. (16) applies to any and and according to the definition of

(17)

Thus, for all .

Next, in order to show that is globally asymptotically stable, we must show that for all and all such that ,

(18)

(i.e., along all possible motions of the system). This part of the proof is similar to the proof of Theorem 3.4 in [17]. If , then there must exist some node with the highest gain among all nodes (there might actually be more than one). There must also exist another node such that and . Because of the restrictions imposed by , we know that events of type 1 are guaranteed to occur infinitely often. According to condition C2, when each event of type 1 occurs, the gain of node is guaranteed to decrease by a fixed fraction of . Hence, if , then . Regardless of how many nodes with the highest gain there are, since there are only a finite number of nodes in the network , it is inevitable that eventually the highest gain must decrease. Note that according to condition C3 no node can increase its gain beyond the gain of the highest nodes by weakening its relation from neighboring nodes. In other words, must eventually decrease as long as . Note also that since , the Lyapunov function can be bounded by . Hence, for every , there exists such that as long as so that as , and has a region of asymptotic stability equal to . ∎

Proof of Lemma 1.

We show that the value of converges as . Let . Using Theorem 1 we know

Following assumption A2 for each generation , then , so we have

Next, consider the difference in average gain between two consecutive generations

Moreover,

Because and