Powerlaw weighted networks from local attachments
Abstract
This letter introduces a mechanism for constructing, through a process of distributed decisionmaking, substrates for the study of collective dynamics on extended powerlaw weighted networks with both a desired scaling exponent and a fixed clustering coefficient. The analytical results show that the connectivity distribution converges to the scaling behavior often found in social and engineering systems. To illustrate the approach of the proposed framework we generate network substrates that resemble steady state properties of the empirical citation distributions of publications indexed by the Institute for Scientific Information from 1981 to 1997; patents granted by the U.S. Patent and Trademark Office from 1975 to 1999; and opinions written by the Supreme Court and the cases they cite from 1754 to 2002.
Keywords: complex networks, weighted digraphs, extended powerlaw distributions.
1 Introduction
Understanding structure lies at the very heart of the study of complex networks. A network is a collection of a large number of interconnected elements (units or agents) whose interaction with each other and with the surroundings leads to characteristic properties that can only be attributed to the network as a whole [1]. Networks often develop distinct structural steady state patterns. Studying these patterns, promises to enhance our understanding of the dynamics underlying collective human responses [2], corrupt behavior [3], and economic development [4].
Random graph models fail to capture key features of realworld networks (e.g., clustering coefficients and degree correlations). Recent efforts to understand network structure have focused on connectivity distributions underlying a number of social and engineering systems which, rather than following the Poisson distribution of random networks (bounded by Chebyshev’s inequality), have heavy tails [5]. Heavytailed distributions in empirical data suggests the existence of causal mechanisms that shape the structure and function of realworld networks [6]. In the era of “big data,” the development of formal frameworks that quantify patterns of interaction of networks has set the research agendas across various disciplines (e.g., more recently across the data driven computational social sciences).
Powerlaws, a particular type of heavytailed distributions, have received significant attention in recent years. For a network with an extended powerlaw connectivity distribution, if the number of connections of a node is much larger than , the probability that the node connects to other nodes is proportional to for some positive constants and [7]. As a result, the tail of the distribution has no exponential bound and the connectivity of the nodes of the network comprises different orders of magnitude, with a few nodes being highly connected.
Key to modeling powerlaw networks is the characterization of hubs (highly interconnected nodes). In the context of the spread of disease, measuring patterns in regions that are more vulnerable to infection (hubs) allows us to respond more effectively to the potential spread of largescale epidemics [8]. The ability to understand and recreate the structure of epidemic networks allows us to design strategies that embrace how interconnected regions influence one another (as a result of the evolution of social systems) in order to quantify and predict the dimensions of disease.
To capture the relationships between the elements of a network, e.g., duration, emotional intensity, or intimacy, models define weights as an inherent property between nodes [9]. Recent models of weighted networks have focused on attachment strategies in which nodes are added according to probability distributions on the existing weights across the entire network. The network model introduced in [10] captures the evolution of weights driven by preferential strength attachment, a mechanism in which newly added nodes are more likely to connect to nodes associated with larger weights. Lacking local competitive factors between nodes, the resulting networks exhibit powerlaw distributions where the hubs correspond to the nodes that have been part of the network the longest.
This letter introduces a wide class of attachment strategies which promote the formation of hubs based on both the length of time a node has been part of the network (i.e., node longevity) and its ability to compete for weights with surrounding neighbors (i.e., node fitness). Because the connectivity dynamics of the nodes depend on their attractiveness to compete for weights (as in [11]), older nodes are not necessarily more successful in acquiring weights. To our knowledge the proposed mechanism is novel in that it generates weighted directed networks with extended powerlaw strength distributions in a distributed fashion (decisionmaking strategies are based on local information; we do not assume any type of global information to generate the desired network structure); for an arbitrary scaling exponent and a fixed clustering (as in [12]); and for values greater than a particular threshold (for the case when only the tail of the distribution obeys a powerlaw).
The remaining sections are organized as follows. First we introduce a model that captures the connectivity and growth dynamics of the gradual addition of nodes to an existing network component and proposes attachment strategies for local rearrangement of weights between pairs of nodes. We prove that for any connected network there exists a distribution of the total weight from neighboring nodes (node strength) that is asymptotically stable (i.e., the proposed strategies lead to a Nash equilibrium [13]). Moreover, as the network grows, consecutive achievements of this network state leads to weighted directed networks with extended powerlaw strength distributions and distinctive clustering coefficients (defined as the ratio of the total average weight of transitive triplets over the total weight of possible triplets). We present simulations that capture the effect of node fitness and illustrate the application of the proposed model to generate various citation networks. Finally, we draw some conclusions and future research directions.
2 A model of network topology and growth
Consider a directed network that captures weighted relationships between a set of nodes. As the network grows, more nodes join the network, each possessing a small “budget” used to construct directed links to some existing nodes. When, node establishes a link to node (by passing some of its budget to node ) node has more budget to spend, which it may do by increasing its weighted connections to other nodes. Broadly speaking, every node wishes to spend its budget, but the more it spends the less willing it is to spend more. Nodes will locally rearrange their weights until every node reaches an equilibrium. At the equilibrium all nodes have associated gains that are equal and there are no further incentives to rearrange connections.
To formalize this idea let us introduce the following notation. Let be a finite set of nodes at generation . Nodes represent elements (acting units) that establish connections to other nodes. We represent the relationship between nodes using a weighted matrix , where quantifies the relationship between node and . If , then there exists some kind of action from to with weight . It may capture, for instance, the extent to which node influences node . Let represent the network at generation (because in general , the network is modeled as a directed graph). For a fixed generation, let represent all nodes which influence node (incoming neighbors). Similarly, let represent all nodes influenced by node (outgoing neighbors). A gain function is associated to each node and characterizes the marginal benefit that results from its current set of connections, where , . Note that is a scalar that represents the incoming strength of node (referred to as node strength hereafter). The following network assumptions are needed:

Finite network strength: The total weight of the initial network , , is finite. In other words, the extent to which any node in the network can be influenced by other nodes is bounded.

Connectedness: Every node is influenced to some extent by another node. At each generation , .

Bounded marginal gains: The gain function associated to node satisfies
(1) for any , and some constants . In other words, the marginal gain associated with each node decreases with increasing strength. Equation (1) eliminates the possibility that a very small difference in node strength may result in an unbounded change in gain. Note that if is differentiable and has a negative derivative it satisfies eq. (1).
Next, we use to specify the time index of events. Let be the time instant when a new node is added to form the network (i.e., the start of generation ). Let be the instant right before the new node is added to (i.e., the start of generation ). When , evolves into . For generation let the set of states
be the simplex over which the connectivity dynamics evolve. Constraints on our model below will ensure that for all nodes , for all . We assume that as , the time allowed for the events that drive the connectivity dynamics during generation goes to infinity. Let be the state vector for at time (i.e., the incoming strength distribution of the entire network).
2.1 Connectivity dynamics
We first focus on the dynamics of for (i.e., within a fixed generation). In particular, we want to define the singleton
(2) 
such that any strength distribution that belongs to this set represents a distribution where all nodes in have equal gain levels. To capture the connectivity dynamics that lead to , let represents the decision of node to weaken its relation from some nodes in while strengthening its relation to other nodes in . Let the list such that and be composed of elements that denote the weight to be added or created to the to link between node and node . For convenience, we will denote this list by . Similarly, let the list be composed of elements that denote the weight to be subtracted from the link where node .
Let denote the set of all possible combinations of how node can weaken or strengthen its relations to other nodes. Let the set of events be described by (() denotes the power set). We call , , events of type ; they drive the connectivity dynamics within a network generation. Notice that each event is defined as a set, with each element of representing the potential rearrangement of multiple weights between nodes, and multiple elements in representing the simultaneous rearrangements among multiple nodes.
An event may occur only if it belongs to the set defined by an enable function , specified for node as follows

If for all , then such that and is the only enabled event. Hence, node does not modify its relationships to others nodes (i.e., the strength of node does not change).

If for some , then the only are ones with and such that
C1 C2 C3
for some , for all and . The parameter regulates the speed at which weights are rearranged and affects the transitivity of the network (i.e., if a node is connected to node and node to node , the probability that node is also connected to node ). Low values of lead to slower convergence processes which increase the probability of forming transitive triples and lead to higher clustering coefficients.
Condition C1 implies that a node can only establish or strengthen its relations to other nodes by weakening incoming weights (the sum of incoming weights must equal the sum of outgoing weights). It implies that conserves total network strength, i.e., is constant. To interpret C2 and C3 it is useful to remember that reducing (increasing) the strength of a node always increases (decreases, respectively) its gain. Both conditions constrain how nodes can modify their weights in terms of the gain of outgoing neighbors. Condition C2 implies that if the gain of node differs from any of its outgoing neighbors, then the relation to some neighbor with the highest gain must be strengthened by some amount. Condition C3 implies that when node weakens incoming weights, node cannot exceed the highest gain of at least one outgoing neighbor. Together they guarantee that the highest gain of the network is strictly monotonically decreasing over time (as we prove in Theorem 1).
Next, state transitions are defined by the operator where . For a fixed generation , if , , then , where
(3)  
Equation (3) means that the strength at node at time equals the strength of node at time , plus the total weight added by the nodes that strengthened their relationship to node , minus the total weight reduced by nodes that weakened their relation to node at time .
Let denote the set of all infinite sequence of events . Let denote the sequence of events and let the value of the function denote the state reached at time from the initial state by the application of the sequence of events of type 1. We assume that each event of type 1 occurs infinitely often on each event trajectory , . This assumption is met if nodes persistently try to rearrange weights. The enable function together with state transition operator define the evolution of the connectivity dynamics of the network.
2.2 Growth dynamics
We now turn our attention to the evolution of the network as it grows. To capture a nodes’s advantage of longevity let be the generation when node is added and define as the fraction of generations node has not been part of the network component. Moreover, to capture a node’s competitive advantage in acquiring weights we associate to every node a fitness , where . Let be a constant amount of strength such that . Let the gain function (marginal utility) associated to node during generation be
(4) 
Higher values of characterize nodes that are more attractive in the sense that they can carry more weight without greatly reducing their gain. Both high values of (representing the fact that node has been part of the growing network for only a few generations) and low values of (representing the fact that the node has a low competitive advantage for acquiring weights) have a negative effect on the gain of node . Below we will see how allows us to define the scaling exponent of extended powerlaw strength distributions.
Let represents the attachment of a new node to the network at the beginning of generation (when ). Let be the total (constant) weight of a newly added node. A node attaches to the network component by randomly distributing its weight across some nodes and establishing a nonempty set of incoming neighbors (i.e., some node must connect to it). We call the attachment of nodes to , events of type 2. Let denote all possible combinations of how node can attach to the network component. An event may occur if it is defined by an enable function , specified for a newly added node as follows

Node attaches to the network only if the associated gain function follows the general form of (4) with longevity and fitness parameters that satisfy
C4  
C5 
Condition C4 follows from letting for the newly added node (at generation node has been part of network for one generation). Condition C5 specifies an equal fitness value for every node (as is the case for networks with linear growth under preferential attachment).
The transition is defined by the operator . If , then where only if node is the newly added node. Let denote the set of all infinite sequence of events . Let denote the sequence of events of type 2, . We assume that each event of type 2 occurs infinitely often on each event trajectory . The assumption is met if nodes constantly attach to the existing network component. The enable function together with the transition operator define the growth dynamics of the network.
3 Analysis
Next, we present stability properties of the invariant set and deduce the average gain level of the network . We then prove that, for values greater than a threshold , the strength distribution converges to a scaling behavior.
Theorem : Suppose A13 and C13 hold. Then is an invariant set and has region of asymptotic stability equal to .
Theorem 1 guarantees that for any generation , initial network state , and event sequence , as for generation . Broadly speaking, the conditions in Theorem 1 capture the dynamic coupling between different nodes that lead to a Nash equilibrium. By attaining the same gain level no node can increase its gain by changing its connections unilaterally without making the average gain of all other nodes worse off. When is reached the average gain of the network an instant before the start of generation is given by
(5) 
As the network grows, the behavior of the average gain is characterized by the following lemma.
Lemma : Suppose A13 and C15 hold. Moreover, let then as .
Lemma 1 implies that at the desired strength distribution , the average gain tends to as .
The following theorem implies that as the network grows, it develops an extended powerlaw structure driven by the marginal benefit of the allocation of weights across nodes and quantifies the value above which the scaling behavior emerges.
Theorem : Suppose A13 and C15 hold. Moreover, let . Then the strength distribution of the network follows an extended powerlaw with scaling exponent as . The scaling behavior holds for values greater than .
Note that if the model yields powerlaw rather than extended powerlaw distributions (as in preferential attachment with linear growth) for values greater than [10].
Extended powerlaw distributions emerge as a result of both the interaction between local mechanisms that lead to Nash equilibria and the continuum attachment of new nodes to the network. In particular, when the network is at a Nash and a new node is added, it introduces a perturbation to the existing set of strategies. Conditions C13 force the network to return to a state which again represents a Nash, with subsequent achievements of Nash equilibria shaping the structure of the network.
4 Simulations
To gain insight into the connectivity dynamics let , , , , and consider a network after generations. Figure 1 shows the value of the clustering coefficient (i.e., the ratio of the total average weight of transitive triplets over the total weight of possible triplets) as a function of the size. Note that for any the clustering properties remain constant as the network grows.
Figure 2 shows the effect of varying node fitness, where is chosen from a uniform distribution with support . Figure 2a shows the evolution of the node’s strength for different values of . Note that follows a powerlaw for all values of . Because of their competitive advantage, there are some nodes with more strength which have been part of the network for only a few generations. It is possible for a node to join the network at a more recent generation and become more attractive than other nodes that have been part of the network for longer. In particular, fig. 2a shows that the node added at generation with overcomes older nodes with and . In fig. 2b, the cumulative strength distribution for the entire network suggests a powerlaw with a logarithmic corrective term similar to the theoretical prediction in [11] where with .
Finally, fig. 3 shows empirical data on the citation distribution of articles indexed by the Institute for Scientific Information (ISI); patents granted by the U.S. Patents and Trade Office; and opinions written by the U.S. Supreme Court and the cases they cite. Figure 3a illustrates the case for scientific papers published in 1981 and cited between 1981 and 1997 [14]. The authors of [6] estimated both the scaling exponent and the threshold at which the scaling behavior emerges. Figure 3b represents citations on the main subnetwork of U.S. patents granted between 1963 and 1999 and references made to these patents between 1975 and 1999 [15]. Figure 3c shows the majority opinions written by the U.S. Supreme Court and the cases they cite from 1754 to 2002 [16]. All three citation networks follow extended powerlaw distributions (for the last two examples we estimate the values of , , and from empirical data).
Papers  Patents  Court cases  
Nodes  783339  240547  30288  
Empirical  Links  6716198  561060  220500 
network  3.16  4.68  4.29  
160 35  8 2  55 20  
Not available  0.037  0.107  
Generated  3.13  4.63  4.25  
network  13.3  7.2  40.8  
model  0.713  0.044  0.112  
7  2.9  7  
Model  18  7  44  
parameters  0.47  0.28  0.31  
0.5  0.98  0.93  
Dstatistic  Proposed model  0.2792  0.1656  0.1761 
Previous models  0.9910  0.3681  0.2995  
Sum of squares  Proposed model  0.7358  0.0599  0.0499 
Previous models  2.3829  0.1416  0.1803  
Range  [160, 8904]  [0, 173]  [0, 248] 
Finally, we compare the distributions from empirical data with the distributions predicted by the proposed and previous models [6], [7]. We measure the greatest discrepancy between the empirical and the expected distribution (Dstatistic), as well as the sum of squares of the deviations between the two distributions. Table 1 summarizes the model parameters and the results. Note that the performance of other (perhaps simpler) models degrades when the entire range is considered.
5 Discussion
The proposed model generates extended powerlaw distributions from consecutive achievements of stable strength distributions . Although it does not pretend to empirically validate realworld mechanisms behind citation networks, the model may be of interest in the following context. First, it can be shown that the state is a Nash, which implies that when a network reaches the equilibrium there is not any node that can gain by unilaterally rearranging weights to neighboring nodes (there are no incentives to change or establish new relationships). By focusing on the dynamics that drive the network to we capture the coupling between different nodes, characterizing how relationships between any pair of nodes affects other nodes in the network. Second, the proposed strategies allow us to control the connectivity dynamics of nodes based on local attachment strategies (C15), allowing us to generate network substrates through distributed decisionmaking. Finally, the ability to control the rate at which attachment strategies lead to the scaling behavior allows us to obtain nonnegligible clustering coefficients for large networks.
We focused on two types of network incentives: Longevity rewards nodes that have been part of the network for a long time (they have the ability to acquire more weight compared to recently added ones); Fitness rewards nodes that are highly competent (they are more suitable to compete and maintain weights). Modeling nodes with varying fitness allows “latecomers” to overcome nodes that have been in the network for longer generations.
Following similar ideas as in Theorems 1 and 2, the proposed framework can be extended to generate exponential strength distributions. In particular, if we consider the gain function of the general from where , the proposed strategies lead to weighted networks with . A mathematical framework that allows us to generate various strength distributions for different domain intervals provides an important direction for future research.
6 Appendix
Proof of Theorem 1.
First, we define a metric on the distribution of the strength and a Lyapunov function . We then show that for the choices of and , for and there exist two positive constant and such that for all . Finally, to prove asymptotic stability of we show that for any initial distribution and any class of rewiring strategies that satisfy C13, i.e., for all such that , the functional as for any fixed generation .
Let and choose
(6) 
and
(7) 
Note that for , since for all and . To show that is bounded from below by a class function , note that according to eq. (1) for all and all , it must be the case that for any and , there is some node such that . Let and be a constant such that
(8) 
Since eq. (8) applies for any such that , it must apply for some node
(9) 
Using the definition of and applying eq. (8) to node yields
(10)  
Note that for any strength distribution and , one of the following must be true: In the first case, if , i.e., if node needs more weight from its neighbors to achieve the desired state, then there must exist some other node such that
(11) 
In other words, there must exist another node that needs to weaken its relationship to neighboring nodes to achieve desired state . Because, and
Similarly, if , i.e., if node needs to weaken its relationship to neighboring nodes to achieve the desired state, then there must also exist some other node such that
In other words, there must exist another node that needs to strengthen its relationships to achieve desired state . Because
Thus, eq. (10) can be bounded from above by
(12) 
Next, note that
Using eq. (12) we get
(13) 
Thus, for all .
Next, we will show that there exist a constant such that for all . Let . Recall that for all and , . Note also that if , then according to eq. (1)
(14) 
and similarly, if , then
(15) 
By adding eq. (14) and eq. (15) get
Since , and
Moreover
Hence, if
(16) 
Since eq. (16) applies to any and and according to the definition of
(17) 
Thus, for all .
Next, in order to show that is globally asymptotically stable, we must show that for all and all such that ,
(18) 
(i.e., along all possible motions of the system). This part of the proof is similar to the proof of Theorem 3.4 in [17]. If , then there must exist some node with the highest gain among all nodes (there might actually be more than one). There must also exist another node such that and . Because of the restrictions imposed by , we know that events of type 1 are guaranteed to occur infinitely often. According to condition C2, when each event of type 1 occurs, the gain of node is guaranteed to decrease by a fixed fraction of . Hence, if , then . Regardless of how many nodes with the highest gain there are, since there are only a finite number of nodes in the network , it is inevitable that eventually the highest gain must decrease. Note that according to condition C3 no node can increase its gain beyond the gain of the highest nodes by weakening its relation from neighboring nodes. In other words, must eventually decrease as long as . Note also that since , the Lyapunov function can be bounded by . Hence, for every , there exists such that as long as so that as , and has a region of asymptotic stability equal to . ∎
Proof of Lemma 1.
We show that the value of converges as . Let . Using Theorem 1 we know
Following assumption A2 for each generation , then , so we have
Next, consider the difference in average gain between two consecutive generations
Moreover,  
Because and