Modeling Network Growth under Resource Constraints
Abstract.
We propose a resourceconstrained network growth model that explains the emergence of key structural properties of realworld directed networks: heavytailed indegree distribution, high local clustering and degreeclustering relationship. In realworld networks, individuals form edges under constraints of limited network access and partial information. However, wellknown growth models that preserve multiple structural properties do not incorporate these resource constraints. Conversely, existing resourceconstrained models do not jointly preserve multiple structural properties of realworld networks. We propose a random walk growth model that explains how realworld network properties can jointly arise from edge formation under resource constraints. In our model, each node that joins the network selects a seed node from which it initiates a random walk. At each step of the walk, the new node either jumps back to the seed node or chooses an outgoing or incoming edge to visit another node. It links to each visited node with some probability and stops after forming a few edges. Our experimental results against four wellknown growth models indicate improvement in accurately preserving structural properties of five citation networks. Our model also preserves two structural properties that most growth models cannot: skewed local clustering distribution and bivariate indegreeclustering relationship.
openbox
1. Introduction
We develop a resource constrained model of network growth that explains the emergence of key structural properties. The problem is important for several reasons. Individuals form realworld networks acting under resource constraints and while using local information. These networks that individuals form exhibit rich structural properties. However, we lack an understanding of mechanisms that are resource constrained and which use local information explain the emergence of these structural related properties.
Classic models of network growth, make unrealistic assumptions about what agents who form edges do. Consider as a simple stylized example, the process of finding the a set of papers to cite when writing an article. In the preferential attachment model (Barabási and Albert, 1999) of network growth, a node making citations would pick a paper uniformly at random from all papers in the domain, and either cite it or copy one of its references. We would repeat this process, till we’ve exhausted our budget of references. Notice that the process assumes access to the entire dataset, and that one would pick papers uniformly at random. An equivalent formulation of this copying model is to cite papers from the entire dataset in proportion to their in degree. The latter formulation assumes that agent making citations know the entire indegree distribution. While preferential attachment models explains the emergence of the powerlaw degree distribution, the attachment model is an unrealistic representation of how agents make decisions on edge formation.
The problem of developing a model of network growth, where agents act under resource constraints, including access to only local information is hard. The problem lies in identifying simple mechanisms, with few parameters, where the agents only use local information and jointly preserve the properties related structure.
We propose a random walk based model of network growth that jointly explains the emergence of the following properties: heavytailed indegree distributions, local clustering and clusteringdegree relationships. In the growth model, an incoming node picks a recent node as the seed. It will link to this node with some constant linking probability. Then, it follows the outgoing link or the incoming link of this seed node and arrives at a new node. At each new node, it decides to link with the same constant linking probability. Then it has to decide whether to jump back to the seed node, or following incoming or outgoing links. The process repeats until the agent has exhausted its budget for linking. To summarize, new nodes concurrently acquire information and form edges by exploring the local neighborhoods of existing nodes, without access to the entire network.
Our main contributions are as follows:

We propose a model of network growth using a local edge formation mechanism that incoporates the resource constraints that influence individuals’ edge formation mechanisms in realworld networks.

We propose a model that jointly explains multiple structural properties, including indegree distribution, clustering, degree clustering relationship and edge densification.
We conducted extensive experimental results, against state of the art baselines, on large citation network datasets. We show that our growth model outperforms that best competing model in jointly and accurately preserving multiple structural properties—degree distribution, clustering and degreeclustering relationship—by a significant margin.
The rest of the paper is organized as follows. In Section 2, we describe the related work. Then, in Section 3, we define key structural properties and introduce the datasets. We formally state the goal of the paper in Section 4. In Section 5 and Section 6, we report prominent structural characteristics of citation networks and propose a network growth model respectively. This is followed by Section 7, where we validate our model against multiple baselines.
2. Related Work
There has been extensive work on network growth models that explain how a subset of structural properties of real world networks emerge from edge formation mechanisms over time. In this section, we describe four edge formation mechanisms underlying network growth models: preferential attachment (Jeong et al., 2003) and its extensions, fitness (Albert and Barabási, 2002), triangle closing mechanisms (Bianconi et al., 2014) and random walks (Vazquez, 2000). These edge formation mechanisms explain the emergence of multiple structural properties of real networks, but make one or more strong assumptions.
Preferential attachment models such as the BarabasiAlbert model (Barabási and Albert, 1999) and Vertex Copying model (Kumar et al., 2000) explain the emergence of power law degree distributions of the form , commonly observed in real networks. In preferential attachment models, new nodes that join the network form edges to existing nodes with probability proportional to their degree. This implies that high degree nodes accumulate edges quicker than low degree nodes. An intuitive explanation of preferential attachment is that new nodes are more likely to link to “popular” high degree nodes than relatively unknown, low degree nodes. However, preferential attachment implicitly assumes that edge formation depends only on degree and cannot explain why real networks exhibit high clustering or degree distributions that do follow power law.
Unlike preferential attachment models, fitness models are flexible enough to generate networks with varying degree distributions and degree correlation. The inability of preferential attachment to preserve multiple structural properties suggests that factors other than degree influence edge formation. In fitness models, new nodes that join the network form edges to existing nodes with probability proportional to their fitness. The fitness of node is a function of intrinsic nodal properties that influence edge formation. The structural properties a fitness model preserves depends on the exact definition of fitness. For example, the fitness model introduced in (Bianconi and Barabási, 2001) increases fitness as a function of degree and node recency. This can preserve temporal dynamics such as decay in popularity (Wang et al., 2013) of old nodes in citation networks. Simpler fitness models can generate degree distributions that follow power law, exponential or lognormal (Medo et al., 2011) distributions. However, since new nodes form each edge independently, fitness alone cannot explain the emergence of high local clustering or the bivariate relationship between degree and clustering observed in real networks.
Edge formation mechanisms proposed by the above network growth models make two strong assumptions.

Complete access to information These mechanisms require nodes to link uniformly at random to any node in the network or have explicit knowledge of the degree/fitness of every node in the network. This assumption is unrealistic because nodes in real networks form edges partial information and limited access constraints.

Successive edge formations are independent There is a strong, implicit assumption that a node’s decision to link to another node is independent of the nodes to which it has already linked. This assumption contradicts a key empirical finding that the probability of edge formation increases as a function of neighborhood overlap (Kossinets and Watts, 2006) in social, information and citation networks.
Extensions of preferential attachment and fitness models (Holme and Kim, 2002; Klemm and Eguiluz, 2002) using triangle closing mechanisms explain why social & information networks have high average local clustering (Newman, 2001). In these models, a new node that joins the network “closes triangles” by linking to neighbors of nodes it has already linked to based on degree or fitness. Closing triangles increases the number of edges between neighbors, thereby increasing the average local clustering. Triangle closing mechanisms essentially model triadic closure, a sociological process that explains why two nodes with mutual neighbor(s) have an increased probability of connection. In Section 7, we show that these models are not flexible enough to capture the skewness and variance in clustering distributions of real networks.
A few models (Mossa et al., 2002; Zeng et al., 2005; Wang et al., 2009) adapt preferential attachment and fitness to model network growth under constraints of limited access and information. These models incorporate constraints by restricting access to recent nodes or a small set of nodes uniformly sampled from the network. However, these simple models are proofofconcept methods that do not generate networks with varying degree distributions and realistic local clustering distributions.
Random walk models jointly explain multiple structural properties of real networks under constraints of limited access and information. New nodes explore neighborhoods of existing nodes without any assumption of global information and use simple rules to form edges. New nodes that join the network perform one or more random walks to link to existing nodes. For example, the Random Surfer model (Blum et al., 2006), in which new nodes link to the terminal nodes of short random walks, generate networks that exhibit power law degree distributions. Importantly, this model explains preferential attachment as an emergent property of local proceses. Random walk models in which new nodes perform random walk(s) and link to any visited node incorporate triadic closure and generate networks with heavy tailed degree distribution and high local clustering (Herrera and Zufiria, 2011). In Section 7, we show that models based on random walks outpeform wellknown global edge formation mechanisms in preserving structural properties of citation networks. However, current random walk models are either inflexible or too simple to accurately capture local clustering observed in real networks.
To summarize, network growth models use one or more edge formation mechanisms to explain structural properties of real networks. Structural properties preserved by global edge formation mechanisms such as preferential attachment can be preserved by local processes such as random walks as well. However, unlike random walks, extensions of global processes such as preferential attachment & fitness models make strong, unrealistic assumptions.
3. Preliminaries
In this section, we first define important structural characteristics that describe network structure. Then, in Section 3.2, we describe the network datasets used in this paper.
3.1. Structural Properties
Now, we discuss four wellknown structural properties: degree distribution, local clustering coefficient, the relationship between degree & local clustering and average path length. These properties are widely used (Barabási, 2003), compact statistical descriptors of network structure.
The degree distribution of an undirected graph is the probability distribution of nodes with degree . With directed graphs, we can compute the degree distribution separately for indegree and outdegree. The wellknown pagerank centrality measure has positive correlation with indegree (Fortunato et al., 2006). Therefore, indegree distribution indicates how node centrality distributed in directed networks.
The local clustering coefficient of a node measures the edge density of the node’s neighborhood. For example, the clustering coefficient of an individual in an undirected social network is the ratio of observed friendships amongst neighbors to all possible friendships amongst neighbors. In directed networks, the neighborhood of node can refer to the set of nodes that link to , set of nodes that links to or the union of both. In this paper, we define the neighborhood of to be a set of all nodes that link to . More formally, the local clustering coefficient of node with neighborhood and indegree in a directed network is defined as follows.
This equation states that the local clustering coefficient of in a directed network is the number of observed directed relationships divided by the maximum possible directed relationships in the neighborhood of .
The bivariate relationship between degree and local clustering coefficient is important. This property sheds light on the variation of node neighborhood density as a function of node degree. In real directed networks, average local clustering decreases as indegree increases (Vázquez, 2003).
The average path length is the expected length of the shortest path between two randomly picked nodes in a network. First studied by Milgram (Milgram and van Gasteren, 1974) and subsequently validated by experiments (Leskovec and Horvitz, 2008), large real networks tend to have small average path length.
3.2. Datasets
In this paper, we consider five citation networks. Citation networks are directed networks in which nodes are papers and edges are citations from one paper to another.
We focus on citation networks for three reasons. First, nodes form all edges to existing network nodes at the time of joining the citation network. Since nodes do not form or delete edges at a later time, citation networks allow us to carefully analyze how new nodes that join the network form edges. Edge dynamics such as the deletion and addition of edges are important and we plan to investigate them at a later time. Second, citation network datasets include the time (e.g. publication year) at which papers join the network. Therefore, structural properties can be better understood by studying network “snapshots” at different stages of the growth process. Third, citation networks are large networks that span many years. As a result, the structural properties, defined in Section 3.1, are distinct and welldefined.
We consider the citation networks of academic papers, patents and judicial cases; Table 1 provides the basic statistics of these networks:

ArXiv HEPPH (hepph) (Gehrke et al., 2003) is an academic citation network of HEPPH (high energy physics phenomenology) papers in the ArXiv eprint.

U.S Patents (patents) (Leskovec et al., 2005) is a citation network of U.S. utility patents maintained by the National Bureau of Economic Research.

APS Journals (aps) ^{1}^{1}1https://journals.aps.org/datasets is an academic citation network maintained by the American Physical Society (APS) that consists of articles published in APS journals.

Semantic Scholar (semantic) ^{2}^{2}2http://labs.semanticscholar.org/corpus/ is a citation network of all Computer Science and Neuroscience papers made public in June 2017 by Semantic Scholar, an academic search engine corpus.

U.S. Supreme Court Cases (ussc) (Fowler and Jeon, 2008) is a citation network in which nodes are U.S. Supreme Court cases. There is an edges from case to case if and only if case cites case in its majority opinion.
Network  Nodes  Edges  Time range 

USSC  30,228  216,738  17542002 
HEPPH  34,546  421,533  19922002 
APS  577,046  6,967,873  19412015 
Patents  3,923,922  16,522,438  19751999 
Semantic  7,706,506  59,079,055  19912016 
In this section, we reviewed key structural properties that network growth models try to preserve. Then, we briefly described the citation network datasets that we use in this paper.
4. Problem Statement
Extensive research on network growth has led to development of wellknown growth models that generate realistic networks. However, the edge formation mechanisms of most network growth models tend to make strong assumptions about either knowledge (e.g. complete degree/fitness distribution known) or access (e.g. pick nodes uniformly at random).
The goal of this paper is to model network growth under information and resource constraints using edge formsation mechanisms. The growth model should be able to jointly explain global structural properties of real networks such as degree distribution, clustering coefficient distribution, degreeclustering relationship and degree correlations The model should incorporate information & resource constraints that influence edge formation in real networks.
5. Empirical Analyses
In this section, we analyze the sturcture of citation networks to show that these networks exhibit similar structural properties. We begin by analyzing the rate of network growth and indegree distributions of citation networks. Then, we study the local clustering distribution and the relationship between indegree and local clustering. Finally, we briefly discuss the observed average path length. Figure 1 plots the structural properties of USSC and APS citation networks; The other three networks described in Section 3.2 have similar structural properties. Note that we preprocess the citation networks to remove a small fraction of nodes for which the time information is unknown. Finally, we conclude this section by motivating the need to study how the edge formation mechanisms that lead to these structural trends.
In many real networks, the average outdegree of nodes joining the network increases nonlinearly as a function of time and as a function of network size . Figure 1 shows that the average number of citations made by nodes drastically increases over time in both citation networks. Moreover, networks densify over times as the number of edges in the networks at time , , increases superlinearly as a function of network size . Leskovec et al (Leskovec et al., 2005) show that densification in real networks exhibit a power law distribution and can explain why the diameters of real networks shrink over time. Table 2 lists the densification power law (DPL) exponent of all citation networks. In our proposed model, we increase the average outdegree of nodes that join the network to realistically model the rate at which real network grow.
Citation networks have highly skewed, heavy tailed indegree distributions. This suggests that while most nodes receive zero or a few citations, a small but significant fraction of nodes receive many citations and become “popular”. This structural property is important because it helps test the extent to which popularity influences underlying edge formation mechanisms. Figure 1 shows the observed indegree distribution along with its power law fit for each citation network in blue and red respectively. While the power law fits can explain the heavy tail, it does not capture the initial concavity in the observed distribution. In Section 7, we show that our growth model can accurately capture the indegree distributions of citation networks in entirety.
The average local clustering coefficient (CC) in real networks tends to be high. Note that we define the neighborhood of node as the set of nodes that cite . Table 2 lists the average local clustering in all citation networks. High clustering indicates that a significant fraction of nodes that cite tend to be connected to each other as well. Local clustering is fundamental to two wellknown phenomena observed in real networks. First, clustering is one of the two components that explain the smallworld phenomenon, in which two randomly picked nodes in large, sparse real networks are connected by a short path. Second, the clustering coefficient quantifies the extent to which triadic closure influences the underlying edge formation mechanism. By explicitly accounting for the fact that nodes are likely to link to neighbors of nodes it has already linked to, our model can not only generate networks with high average clustering but also capture the local clustering distribution observed in real networks.
We observe that the distribution over the local clustering coefficient of all nodes in real networks is skewed. Figure 1 depicts two local clustering distributions for each citation network; the observed distribution in solid blue and the distribution of a random network, generated using the observed degree sequence, in dashed green. The difference between the two distributions indicates that high local clustering is an inherent structural property of these networks. The skewness in these distributions highlights the high variance in the local clustering of real networks. Despite its widespread use, average local clustering coefficient is not a representative statistic of the skewed clustering distributions. As a result, network growth models that focus on generating networks with high average local clustering do not realistically capture the skewed clustering distribution of real networks. In Section 6 and Section 7, we show that our proposed edge formation mechanism can intuitively explain the emergence of the skewed clustering distribution observed in real networks.
In real networks, the average local clustering decreases as the indegree of a node increases (Vázquez, 2003). This suggests that low indegree nodes have small, tightly knit neighborhoods and high indegree nodes have large, starshaped neighborhoods. In Figure 1, we show that the degreeclustering relation in APS and USSC initially decreases as a linear function of the logarithmic value of indegree. In Section 7, we show that wellknown growth models that generate networks with tunable average clustering are not flexible enough to explain the degreeclustering trend shown in Figure 1.
Citation networks are clustered, sparse networks that exhibit small average path length. Table 2 lists the average path length (APL) of all citation networks. We use a Monte Carlo method (Leskovec and Horvitz, 2008) to estimate the average path lengths as the citation networks are prohibitively large.
Network  DPL exponent  Avg. local CC  APL 

HEPPH  1.894  0.120  4.391 
Patents  1.158  0.039  7.791 
APS  1.334  0.108  5.001 
Semantic  1.900  0.054  6.079 
USSC  2.613  0.115  4.328 
To summarize, citation networks are smallworld networks that undergo accelerated network growth. These networks exhibit heavy tailed indegree distributions, skewed local clustering distributions and a negatively correlated degreeclustering relationship. The global structural similarity of citation networks prompts the question  do individuals use the same criteria to form edges?
In the next section, we propose a growth model that can jointly explain the emergence of these structural properties using a single edge formation mechanism
6. Proposed Model
In this section, we present a resourceconstrained growth model in which new nodes that join the directed network use a random walk edge formation process to link to existing nodes. In Section 6.1, we provide a detailed interpretation and description of our resourceconstrained growth model. Next, in Section 6.2, we briefly explain the methods used to fit our model to observed networks. The goal of our resourceconstrained model is to generate networks that follow structural properties of real networks discussed in Section 5.
6.1. Model Description
In this section, we describe three key components of our growth model. First, we explain how nodes join the network over time. Second, we describe how each node joins the network through an “entry point” under limited access constraint. Third, we describe the random walk mechanism that nodes use to form edges. We conclude by providing two natural interpretations of our growth model.
In our model, a directed network grows over time as new nodes join the network. The number of edges increases over time to reflect the nonlinear growth and densification of real networks. More formally, at every discrete time step , a new node joins the network and forms edges to existing nodes. At time , the initial network consists of nodes and edges. Similarly, the network at time , , consists of nodes and edges. In Section 6.2, we discuss the issue of initializing and increasing the outdegree of new nodes over time.
The processes that new nodes use to select an entry point into the network and subsequently form edges intuitively corresponds to how we expect researchers to find references to cite. A researcher first finds one or more relevant paper as an “starting point”. Then, under time and information constraints, he or she searches for potential references by navigating through a chain of references. After repeating this process one or more times, the researcher selects to cite a subset of these papers. Similarly, in our model, every node that joins the network selects a seed node from which it initiates the random walk process to search for potential links. Nodes terminate the random walk process after linking to a subset of visited node.
New nodes that join real networks select one or more “entry points” into the network under constraints of limited network access. We use a constant recency parameter to model the limited network access constraint under which nodes select entry points or seed node. Node uniformly selects a seed node from fraction of nodes that have most recently joined the network. For example, if , a new node that joins the network at time can only select a seed node that has joined the network after time .
After selecting the seed node, a new node forms one or more edges to existing nodes. As discussed in Section 2 and Section 5, edge formation in real networks depend on local mechanisms such as triadic closure and do not require global information of every node in the network. In our model, new nodes use a random walk process to jointly explore the network and form edges. Random walks incorporates the idea of limited information and can only access its seed node and neighbors of nodes it visits. More formally, a new node that joins the network at time initiates a random walk from seed node to form edges.
The random walk process, visualized in Figure 2, can be described in four steps:

At each step of the random walk, node visits an existing node . It links to this node with probability .

Then, with jumping probability , moves back to seed node .

Otherwise, with probability , picks an outgoing edge with linking probability or an incoming edge with probability , to visit a random neighbor of . If does not have any incoming edges, picks an outgoing edge to visit a node neighboring

Node repeats 13 until it forms distinct edges.
To summarize, we propose a growth model that incorporates constraints of limited network access and partial information that affect edge formation in real networks. In the next section, we show that our resourceconstrained model can preserve key structural properties of real networks as well.
6.2. Model Fitting
Given a citation network , a model fit should generate a directed network that preserves the structural properties observed in . In this subsection, we describe methods to initialize , densify over time and estimate the model parameters that generates networks structurally similar to .
We now describe and justify the method used to initialize networks generated by our model. The random walk edge formation process is sensitive to a large number of weakly connected components in the initial graph . This is because a new node that joins cannot form edges to nodes that are not in the same weakly connected component as the seed node . To ensure that the initial graph is weakly connected, we perform an undirected breadthfirst search on starting from the oldest node that terminates after visiting ( if large) of all nodes in . The initial graph is the small subgraph induced by the set of the visited nodes. After obtainining , new nodes sequentially join and form edges using the random walk process until .
Citation networks densify over time, with the number of edges growing superlinearly in the number of nodes. As shown in figure X, the average number of citations made by papers that join hepph and aps per year increases in a nonlinear fashion. We incorporate densification into our model by increasing the outdegree of new nodes that sequentially join the network. Each new node that joins the network corresponds to some paper that joins the citation network in year . The number of edges that forms is equal to the average number of the citations formed by papers that join in year . As a result, the rate of growth in networks generated by our model coarsely reflects the rate of growth in .
The recency parameter , link probability , jump probability and outgoing edge probability jointly shape the random walk process that new nodes use to form edges. This subsequently determines the structural properties of the network generated by the model. We use a straightforward grid search method to estimate the parameters values of , , and . In LABEL:sub:ExperimentalSetup, we discuss the exact evaluation metrics and criteria used to select the parameter values that generate a network most structurally similar to .
To summarize this section, we first described and justified our growth model in which nodes use a random walk process to form edges under limited information and network access constaints. The growth model relies on four parameters: recency parameter , link probability , jump probability and outgoing edge probability . Then, we briefly discussed methods used to initialize , incorporate the observed growth rate into and estimate the four model parameters. In the next section, we conduct experiments to evaluate whether our random walk model can jointly preserve structural properties of citation networks described in Section 3.2.
7. Experiments
In this section, we present experimental results against four wellknown baselines on citation networks described in Section 3.2. In Section 7.1, we describe the evaluation metrics and baselines used in our experiments. In Section 7.2, we describe and interpret the experimental results.
7.1. Experimental Setup
We first briefly summarize the baselines used in the experiments. Then, we describe the evaluation metrics used to quantify the extent to which growth models preserve structural properties of the citation networks.
We compare our model, abbreviated as rw, against four wellknown growth models that are representative of the common edge formation mechanisms discussed in Section 2. Note that we do not consider graph generation models such as the Kronecker model (Leskovec et al., 2010) in which nodes do not join the network over time are not considered. The four baselines are:

DorogovtsevMendesSamukhin model (dms) (Dorogovtsev et al., 2000) is an extension of the BarabasiAlbert model [X] that generates directed scalefree graphs using using preferential attachment. In this model, the probability of linking to a node is proportional to its indegree and “initial attractiveness”.

HolmeKim model (hk) (Holme and Kim, 2002) is a preferential attachment model that generates scalefree, clustered, undirected networks using an additional triangleclosing mechanism. We modify the model to create directed edges and thereby generate directed networks.

HereraZufiria model (hz) (Herrera and Zufiria, 2011) is a random walk model that generates scalefree undirected networks with “tunable” average clustering. We modify the model to generate directed networks by allowing the random walk process to traverse edge in any direction.

Forest Fire model (ff) (Leskovec et al., 2005) is a recursive random walk model that generates directed networks which exhibit densification and decreasing diameter over time.
To ensure a fair comparison, we update the baseline models in two ways. First, models that do not have an explicitly defined initial graph use the initial network described in Section 6.2. Second, we extend models in which every node has the same outdegree to account for densification using the method described in Section 6.2.
Next, we describe the evaluation metrics used to measure the accuracy of the growth models in preserving the observed structural properties. We use three evaluation metrics in our experiments:

KolmogorovSmirnov (KS) statistic computes the distance between univariate distributions such as indegree distribution & local clustering distribution of the observed network and generated network .

Absolute difference computes the distance between two point estimates such as the average local clustering.

Weighted relative error measures the difference in the bivariate indegreeclustering trend of and . The weighted absolute difference is defined as follows:
The equation aggregates the weighted relative error between and , the average local clustering of nodes with degree in networks and respectively. The weight of each term equals the probability mass of indegree in the observed network .
We estimate the four model parameters – recency parameter , link probability , jump probability and outgoing edge probability – using a grid search method to fit our model to a real network . We select the model parameter values that minimize the L2 norm of the above evaluation metrics. We fit baseline growth models without a prespecified model fitting criteria using the same grid search method. After selecting the model parameters, our model can generate graphs that are structurally similar to the . In the next section, we compare the perfomance of our model against the perfomance of four baseline models using the evaluation metrics discussed in this subsection.
7.2. Experimental Results
We present experimental results that demonstrate the effectiveness of our growth model in preserving three structural properties—indegree distribution, local clustering distribution and the indegreeclustering relationship—of the citation networks described in Section 3.2. We present the accuracy of our model and four baseline growth models in preserving the structural properties of the all five citation networks in Tables 5, 4 and 3. Figure 3 illustrates the performance of all growth models in preserving the three structural properties of the aps network. We evaluate the performance of these structural properties using the evaluation metrics described in Section 7.1.
We now provide a brief overview of the experimental results followed by an interpretation of each result table. A common chararacteristic of the baseline growth models is that they cannot accurately preserve multiple structural properties observed in real networks. For example, the DorogovtsevMendesSamukhin (dms) model can preserve indegree distribution but does not account for local clustering in real networks. Similarly, the Forest Fire model captures the skewed local clustering distribution in some networks but overestimates average local clustering as a function of indegree.
USSC  HEPPH  APS  Patents  Semantic  

DMS  0.020  0.017  0.017  0.033  0.052 
HK  0.124  0.191  0.126  0.063  0.167 
HZ  0.182  0.211  0.131  0.155  0.180 
FF  0.168  0.171  0.277  0.141  0.121 
RW  0.034  0.064  0.055  0.080  0.025 
In table 3, we summarize the accuracy of each model in preserving the indegree distribution of citation networks. We observe that the dms model performs better than our model rw by a small margin. This is because the model specifically captures the initial concavity in the distribution using an “attractiveness” parameter. Note that the difference between dms & rw and the other three models is significant.
USSC  HEPPH  APS  Patents  Semantic  

DMS  0.808  0.805  0.826  0.490  0.569 
HK  0.415  0.480  0.525  0.062  0.147 
HZ  0.087  0.273  0.338  0.081  0.090 
FF  0.321  0.081  0.327  0.517  0.440 
RW  0.043  0.020  0.037  0.039  0.054 
Next, we show that the baseline growth models cannot accurately capture the skewness of the local clustering distribution in citation networks. Table 4 lists the KS statistic of each model for all citation networks. Our model outperforms the baselines by large margins as it captures the skewness of the observed clustering distribution in entirety. As shown in Figure 3, three out of four baselines – dms, hk and hz—do not capture the variance and skewness in the clustering distribution observed in the aps network.
USSC  HEPPH  APS  Patents  Semantic  

DMS  0.657  0.681  0.757  0.589  0.592 
HK  0.403  0.493  0.554  0.097  0.129 
HZ  0.108  0.304  0.375  0.086  0.154 
FF  0.437  0.504  0.369  2.023  1.170 
RW  0.038  0.052  0.047  0.048  0.086 
Next, we discuss the accuracy of the growth models in preserving average clustering as a function of indegree. Table 5 lists the weighted relative error, defined in Section 7.1, of each model for all citation networks. Our model outperforms the baselines by large margins. The DMS has the highest relative error as it does not preserve local clustering. As shown in Figure 3, the HolmeKim hk and HereraZufiria hz models that generate networks with tunable clustering underestimate the clustering of lowindegree nodes. Conversely, the Forest Fire (ff) model significantly overestimates the clustering of lowindegree nodes.
7.3. Parameter space of Rw model
Through a series of extensive experiments, we observe that our model RW is able to model multiple structural characteristics of realworld networks. However, the fitted parameters are different for each dataset, suggesting possibly different local growth mechanisms in each network. Table 6 describes the best fitted parameters for five citation networks used in our experiments.
USSC  HEPPH  APS  Patents  Semantic  

0.80  0.80  0.15  0.25  0.40  
0.30  0.65  0.65  0.05  0.15  
0.95  0.95  0.80  1.00  0.95  
0.50  0.80  0.85  0.45  0.60 
To summarize, the experiment results on five citation networks against show that our resourceconstrained model (rw) outperform four baseline growth models in accurately preserving degree, clustering and its relationship.
8. Limitations
Now, we discuss the limitations of our work. First, our work is limited to bibliographic datasets because of availibility of temporal data. We use the temporal outdegree sequence of incoming nodes in the network to model the network growth. In absence of temporal information, our growth model can be adapted by relying on the densification power law exponent. Second, our random walk model is sensitive to the initial graph. Since random walks explore the locality of a network and cannot access the entire network , the initial graph should have a giant weakly connected component. We recognise that the intialization problem can be addressed by having nonlocal source of information such as multiple seed nodes. Third, we note that our model fails to preserve certain network properties such as path length distribution. This is because our model does not account for nodes that serve as “local bridges” in the network. Modeling local and global processes simulatenously in a joint random walk model should lead to preservation of the discussed key network properties.
9. Conclusion
In this paper, we model resourceconstrained network growth model in which nodes use a random walk process to form edges under constraints of limited information and network access constraints. The problem is important because edge formation in real networks is usually a local process. In typical network growth scenarios, nodes in the network either have limited information about the other nodes in the network or the system allows access to only restricted portion of the existing network. It therefore becomes imperative to model how the local processes of link formation gives rise to network characteristics. In this work, we show that multiple structural properties of real networks can arise from the local process of exploration and link formation. Our results indicate significant improvement over the next best competing model HZ (Herrera and Zufiria, 2011) by a significant margin.
References
 Albert and Barabási [2002] Réka Albert and AlbertLászló Barabási. Statistical mechanics of complex networks. Reviews of modern physics, 74(1):47, 2002.
 Barabási [2003] AlbertLászló Barabási. Linked: The new science of networks, 2003.
 Barabási and Albert [1999] AlbertLászló Barabási and Réka Albert. Emergence of scaling in random networks. science, 286(5439):509–512, 1999.
 Bianconi and Barabási [2001] Ginestra Bianconi and AlbertLászló Barabási. Boseeinstein condensation in complex networks. Physical review letters, 86(24):5632, 2001.
 Bianconi et al. [2014] Ginestra Bianconi, Richard K Darst, Jacopo Iacovacci, and Santo Fortunato. Triadic closure as a basic generating mechanism of communities in complex networks. Physical Review E, 90(4):042806, 2014.
 Blum et al. [2006] Avrim Blum, TH Hubert Chan, and Mugizi Robert Rwebangira. A randomsurfer webgraph model. In 2006 Proceedings of the Third Workshop on Analytic Algorithmics and Combinatorics (ANALCO), pages 238–246. SIAM, 2006.
 Dorogovtsev et al. [2000] Sergey N Dorogovtsev, Jose Ferreira F Mendes, and Alexander N Samukhin. Structure of growing networks: Exact solution of the barabási–albert’s model. arXiv preprint condmat/0004434, 2000.
 Fortunato et al. [2006] Santo Fortunato, Marián Boguñá, Alessandro Flammini, and Filippo Menczer. Approximating pagerank from indegree. In International Workshop on Algorithms and Models for the WebGraph, pages 59–71. Springer, 2006.
 Fowler and Jeon [2008] James H Fowler and Sangick Jeon. The authority of supreme court precedent. Social networks, 30(1):16–30, 2008.
 Gehrke et al. [2003] Johannes Gehrke, Paul Ginsparg, and Jon Kleinberg. Overview of the 2003 kdd cup. ACM SIGKDD Explorations Newsletter, 5(2):149–151, 2003.
 Herrera and Zufiria [2011] Carlos Herrera and Pedro J Zufiria. Generating scalefree networks with adjustable clustering coefficient via random walks. In Network Science Workshop (NSW), 2011 IEEE, pages 167–172. IEEE, 2011.
 Holme and Kim [2002] Petter Holme and Beom Jun Kim. Growing scalefree networks with tunable clustering. Physical review E, 65(2):026107, 2002.
 Jeong et al. [2003] Hawoong Jeong, Zoltan Néda, and AlbertLászló Barabási. Measuring preferential attachment in evolving networks. EPL (Europhysics Letters), 61(4):567, 2003.
 Klemm and Eguiluz [2002] Konstantin Klemm and Victor M Eguiluz. Highly clustered scalefree networks. Physical Review E, 65(3):036123, 2002.
 Kossinets and Watts [2006] Gueorgi Kossinets and Duncan J Watts. Empirical analysis of an evolving social network. science, 311(5757):88–90, 2006.
 Kumar et al. [2000] Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, D Sivakumar, Andrew Tomkins, and Eli Upfal. Stochastic models for the web graph. In Foundations of Computer Science, 2000. Proceedings. 41st Annual Symposium on, pages 57–65. IEEE, 2000.
 Leskovec and Horvitz [2008] Jure Leskovec and Eric Horvitz. Planetaryscale views on a large instantmessaging network. In Proceedings of the 17th international conference on World Wide Web, pages 915–924. ACM, 2008.
 Leskovec et al. [2005] Jure Leskovec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pages 177–187. ACM, 2005.
 Leskovec et al. [2010] Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, and Zoubin Ghahramani. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11(Feb):985–1042, 2010.
 Medo et al. [2011] Matúš Medo, Giulio Cimini, and Stanislao Gualdi. Temporal effects in the growth of networks. Physical review letters, 107(23):238701, 2011.
 Milgram and van Gasteren [1974] Stanley Milgram and L van Gasteren. Das MilgramExperiment. Rowohlt Reinbek, 1974.
 Mossa et al. [2002] Stefano Mossa, Marc Barthelemy, H Eugene Stanley, and Luis A Nunes Amaral. Truncation of power law behavior in âscalefreeâ network models due to information filtering. Physical Review Letters, 88(13):138701, 2002.
 Newman [2001] Mark EJ Newman. Clustering and preferential attachment in growing networks. Physical review E, 64(2):025102, 2001.
 Vazquez [2000] Alexei Vazquez. Knowing a network by walking on it: emergence of scaling. arXiv preprint condmat/0006132, 2000.
 Vázquez [2003] Alexei Vázquez. Growing network with local rules: Preferential attachment, clustering hierarchy, and degree correlations. Physical Review E, 67(5):056104, 2003.
 Wang et al. [2013] Dashun Wang, Chaoming Song, and AlbertLászló Barabási. Quantifying longterm scientific impact. Science, 342(6154):127–132, 2013.
 Wang et al. [2009] LiNa Wang, JinLi Guo, HanXin Yang, and Tao Zhou. Local preferential attachment model for hierarchical networks. Physica A: Statistical Mechanics and its Applications, 388(8):1713–1720, 2009.
 Zeng et al. [2005] Jianyang Zeng, WenJing Hsu, and Suiping Zhou. Construction of scalefree networks with partial information. Lecture notes in computer science, 3595:146, 2005.