Consistency of Hill Estimators in a Linear Preferential Attachment ModelThis work was supported by Army MURI grant W911NF-12-1-0385 to Cornell University.

# Consistency of Hill Estimators in a Linear Preferential Attachment Model

## Abstract

Preferential attachment is widely used to model power-law behavior of degree distributions in both directed and undirected networks. Practical analyses on the tail exponent of the power-law degree distribution use Hill estimator as one of the key summary statistics, whose consistency is justified mostly for iid data. The major goal in this paper is to answer the question whether the Hill estimator is still consistent when applied to non-iid network data. To do this, we first derive the asymptotic behavior of the degree sequence via embedding the degree growth of a fixed node into a birth immigration process. We also need to show the convergence of the tail empirical measure, from which the consistency of Hill estimators is obtained. This step requires checking the concentration of degree counts. We give a proof for a particular linear preferential attachment model and use simulation results as an illustration in other choices of models.

###### Keywords:
Hill estimators Power laws Preferential AttachmentContinuous Time Branching Processes
###### Msc:
60G70 60B10 60G55 60G57 05C80 62E20

## 1 Introduction.

The preferential attachment model gives a random graph in which nodes and edges are added to the network based on probabilistic rules, and is used to mimic the evolution of networks such as social networks, collaborator and citation networks, as well as recommender networks. The probabilistic rule depends on the node degree and captures the feature that nodes with larger degrees tend to attract more edges. Empirical analysis of social network data shows that degree distributions follow power laws. Theoretically, this is true for linear preferential attachment models which makes preferential attachment a popular choice for network modeling [Bollobás et al., 2003, Durrett, 2010, Krapivsky et al., 2001, Krapivsky and Redner, 2001, van der Hofstad, 2017]. The preferential attachment mechanism has been applied to both directed and undirected graphs. Limit theory for degree counts can be found in Resnick and Samorodnitsky [2016], Bhamidi [2007], Krapivsky and Redner [2001] for the undirected case and Wang and Resnick [2017], Samorodnitsky et al. [2016], Resnick and Samorodnitsky [2015], Wang and Resnick [2016], Krapivsky et al. [2001] for the directed case. This paper only focuses on the undirected case.

One statistical issue is how to estimate the index of the degree distribution power-law tail. In practice, this is often done by combining a minimum distance method Clauset et al. [2009] with the Hill estimator Hill [1975]. Data repositories of large network datasets such as KONECT (http://konect.uni-koblenz.de/) Kunegis [2013] provide for each dataset key summary statistics including Hill estimates of degree distribution tail indices. However, there is no theoretical justification for such estimates and consistency of the Hill estimator has been proved only for data from a stationary sequence of random variables, which is assumed to be either iid Mason [1982] or satisfy certain structural or mixing assumptions, e.g. Resnick and Stărică [1995, 1998], Rootzén et al. [1990], Hsing [1991]. Therefore, proving/disproving the consistency of Hill estimators for network data is a major concern in this paper.

The Hill estimator and other tail descriptors are often analyzed using the tail empirical estimator. Using standard point measure notation, let

 ϵx(A)={1, if x∈A,0, if x∉A.

For positive iid random variables whose distribution has a regularly varying tail with index , we have the following convergence in the space of Radon measures on of the sequence of empirical measures

 n∑i=1ϵXi/b(n)(⋅)⇒PRM(να(⋅)),with να(y,∞]=y−α,y>0, (1.1)

to the limit Poisson random measure with mean measure From (1.1) other extremal properties of follow [Resnick, 1987, Chapter 4.4]. See for example the application given in this paper after Theorem 5. Further, for any intermediate sequence , as , the sequence of tail empirical measures also converge to a deterministic limit,

 1knn∑i=1ϵXi/b(n/kn)(⋅)⇒να(⋅), (1.2)

which is one way to prove consistency of the Hill estimator for iid data Resnick [2007, Chapter 4.4]. We seek a similar dual pair as (1.1) and (1.2) for network models that facilitates the study of the Hill estimator and extremal properties of node degrees.

With this goal in mind, we first find the limiting distribution for the degree sequence in a linear preferential attachment model, from which a similar convergence result to (1.1) follows. Embedding the network growth model into a continuous time branching process (cf. Bhamidi [2007], Athreya [2007], Athreya et al. [2008]) is a useful tool in this case. We model the growth of the degree of each single node as a birth process with immigration. Whenever a new node is added to the network, a new birth immigration process is initiated. In this embedding, the total number of nodes in the network growth model also forms a birth immigration process. Using results from the limit theory of continuous time branching processes (cf. Resnick [1992, Chapter 5.11]; Tavaré [1987]), we give the limiting distribution of the degree of a fixed node as well as the maximal degree growth.

Empirical evidence for simulated networks lead s to the belief that the Hill estimator is consistent. However, proving the analogue of (1.2) is challenging and requires showing concentration inequalities for expected degree counts. We have only succeeded for a particular linear preferential attachment model, where each new node must attach to one of the existing nodes in the graph. We are not sure the concentration inequalities always hold for preferential attachment and discussion of limitations of the Hill estimator for network data must be left for the future. For a more sophisticated model where we could not verify the concentration inequalities, we illustrate consistency of the Hill estimator coupled with a minimum distance method (introduced in Clauset et al. [2009]) via simulation for a range of parameter values; however the asymptotic distribution of the Hill estimator in this case is confounding and it is not obviously normal. Whether this possible non-normality is due to the minimum distance threshold selection or due to network data (rather than iid data) being used, we are not sure at this point.

The rest of the paper is structured as follows. After giving background on the tail empirical measure and Hill estimator in the rest of this section, Section 2 gives two linear preferential attachment models. Section 3 summarizes key facts about the pure birth and the birth-immigration processes. We analyze social network degree growth in Section 4 using a sequence of birth-immigration processes and give the limiting empirical measures of normalized degrees in the style of (1.1) for both models under consideration. We prove the consistency of the Hill estimator for the simpler model in Section 5 and give simulation results in Section 6 that illustrate the behavior of Hill estimators in the other model.

Parameter estimation based on maximum likelihood or approximate MLE for directed preferential attachment models is studied in Wan et al. [2017]. A comparison between MLE model based methods and asymptotic extreme value methods is forthcoming.

### 1.1 Background

Our approach to the Hill estimator considers it as a functional of the tail empirical measure so we start with necessary background and review standard results (cf. Resnick [2007, Chapter 3.3.5]).

For , let be the set of non-negative Radon measures on . A point measure is an element of of the form

 m=∑iϵxi. (1.3)

The set is the set of all Radon point measures of the form (1.3) and is a closed subset of in the vague metric.

For iid and non-negative with common regularly varying distribution tail , , there exists a sequence such that for a limiting Poisson random measure with mean measure and for , written as PRM(), we have

 n∑i=1ϵXi/b(n)⇒PRM(να), in Mp((0,∞]), (1.4)

and for some , ,

 1knn∑i=1ϵXi/b(n/kn)⇒να, in M+((0,∞]), (1.5)

Note the limit in (1.4) is random while that in (1.5) is deterministic. Define the Hill estimator based on upper order statistics of as in Hill [1975]

 Hk,n:=1kk∑i=1logX(i)X(k+1),

where are order statistics of . In the iid case there are many proofs of consistency (cf. Mason [1982], Mason and Turova [1994], Hall [1982], de Haan and Resnick [1998], Csörgö et al. [1991a]): For , we have

 Hkn,n\lx@stackrelP→1/αas% n→∞. (1.6)

The treatment in Resnick [2007, Theorem 4.2] approaches consistency by showing (1.6) follows from (1.5) and we follow this approach for the network context where the iid case is inapplicable. The next section constructs two undirected preferential attachment models, labelled A and B, and gives behavior of , the degree of node at the th stage of construction. Theorem 5 shows that for , a parameter in the model construction, the degree sequences in either Model A or B have empirical measures

 n∑i=1ϵDi(n)/n1/(2+δ) (1.7)

that converge weakly to some random limit point measure in . The question then becomes whether there is an analogy to (1.5) in the network case so that

 1knn∑i=1ϵDi(n)/b(n/kn)⇒ν2+δ, in M+((0,∞]), (1.8)

with some function and intermediate sequence . This would facilitate proving consistency of the Hill estimator. We successfully derive (1.8) for Model A in Section 5 and discuss why we failed for Model B. For Model A, we give the consistency of the Hill estimator.

## 2 Preferential Attachment Models.

### 2.1 Model setup.

We consider an undirected preferential attachment model initiated from the initial graph , which consists of one node and a self loop. Node then has degree 2 at stage . For , we obtain a new graph by appending a new node to the existing graph . The graph consists of edges and nodes. Denote the set of nodes in by . For , is the degree of in . We consider two ways to construct the random graph and refer to them as Model A and B.
Model A: Given , the new node is connected to one of the existing nodes with probability

 f(Di(n))+δ∑ni=1(f(Di(n))+δ), (2.1)

where the preferential attachment function is deterministic and non-decreasing. In this case, the new node for , is always born with degree 1.
Model B: In this model, given graph , the graph is obtained by either:

• Adding a new node and a new edge connecting to an existing node with probability

 f(Di(n))+δ∑ni=1(f(Di(n))+δ)+f(1)+δ, (2.2)

where is a parameter;

or

• Adding a new node with a self loop with probability

 f(1)+δ∑ni=1(f(Di(n))+δ)+f(1)+δ. (2.3)

Linear case: If the preferential attachment function is for , then the model is called the linear preferential attachment model. Since every time we add a node and an edge the degree of 2 nodes is increased by 1, we have for both model A or B that Therefore, the attachment probabilities in (2.1), (2.2) and (2.3) are

 Di(n)+δ(2+δ)n,Di(n)+δ(2+δ)n+1+δ, and 1+δ(2+δ)n+1+δ,

respectively, where is a constant.

### 2.2 Power-law tails.

Continuing with , suppose is a random graph generated by either Model A or B after steps. Let be the number of nodes in with degree equal to , i.e.

 Nk(n):=n∑i=11{Di(n)=k}, (2.4)

then , , is the number of nodes in with degree strictly greater than . For , we set .

For both models A and B, it is shown in van der Hofstad [2017, Theorem 8.3] using concentration inequalities and martingale methods that for fixed , as ,

 Nk(n)n\lx@stackrelP→pk=(2+δ)Γ(k+δ)Γ(3+2δ)Γ(k+3+2δ)Γ(1+δ)\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0∼(2+δ)Γ(3+2δ)Γ(1+δ)k−(3+δ); (2.5)

is a pmf and the asymptotic form, as , follows from Stirling. Let be the complementary cdf and by Scheffé’s lemma as well as van der Hofstad [2017, Equation (8.4.6)], we have

 N>k(n)n\lx@stackrelP→p>k:=Γ(k+1+δ)Γ(3+2δ)Γ(k+3+2δ)Γ(1+δ), (2.6)

and again by Stirling’s formula we get from (2.6) as ,

 p>k∼\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0c⋅k−(2+δ),c=Γ(3+2δ)Γ(1+δ).

In other words, the tail distribution of the asymptotic degree sequence in a linear preferential attachment model is asymptotic to a power law with tail index .

In practice, the Hill estimator is widely used to estimate this tail index. Absent prior justification for using the Hill estimator on network data, we investigate its use.

## 3 Preliminaries: Continuous Time Markov Branching Processes.

In this section, we review two continuous time Markov branching processes needed in Section 4.1, where we embed the degree sequence of a fixed network node into a continuous time branching process and derive the asymptotic limit of the degree growth.

### 3.1 Linear birth processes.

A linear birth process is a continuous time Markov process taking values in the set and having a transition rate

 qi,i+1=λi,i∈N+,λ>0.

The linear birth process is a mixed Poisson process; see Resnick [1992, Theorem 5.11.4], Kendall [1966] and Waugh [1970] among other sources. If then the representation is

 ζ(t)=1+N0(W(eλt−1),t≥0, (3.1)

where is a unit rate homogeneous Poisson on with and is a unit exponential random variable independent of . Since almost surely as , it follows immediately that

 ζ(t)eλt\lx@stackrela.s.⟶W,as t→∞. (3.2)

We use these facts in Section 4.2 to analyze the asymptotic behavior of the degree growth in a preferential attachment network.

### 3.2 Birth processes with immigration.

Apart from individuals within the population giving birth to new individuals, population size can also increase due to immigration which is assumed independent of births. The linear birth process with immigration (B.I. process), , having lifetime parameter and immigration parameter is a continuous time Markov process with state space and transition rate

 qi,i+1=λi+θ.

When there is no immigration and the B.I. process becomes a pure birth process.

For , the B.I. process starting from 0 can be constructed from a Poisson process and an independent family of iid linear birth processes Tavaré [1987]. Suppose that is the counting function of homogeneous Poisson points with rate and independent of this Poisson process we have independent copies of a linear birth process with parameter and for . Let , then the B.I. process is a shot noise process with form

 BI(t):=∞∑i=1ζi(t−τi)1{t≥τi}=Nθ(t)∑i=1ζi(t−τi). (3.3)

Theorem 1 modifies slightly the statement of Tavaré [1987, Theorem 5] summarizing the asymptotic behavior of the B.I. process.

###### Theorem 1

For as in (3.3), we have as ,

 e−λtBI(t)\lx@stackrela.s.⟶∞∑i=1Wie−λτi\definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0=:σ (3.4)

where are independent unit exponential random variables satisfying for each ,

 Wi=limt→∞e−tζi(t).

The random variable in (3.4) is a.s. finite and has a Gamma density given by

 f(x)=1Γ(θ/λ)xθ/λ−1e−x,x>0.

The form of in (3.4) and its Gamma density is justified in Tavaré [1987]. It can be guessed from (3.3) and some cavalier interchange of limits and infinite sums. The density of comes from transforming Poisson points , summing and recognizing a Gamma Lévy process at .

## 4 Embedding Process.

Our approach to the weak convergence of the sequence of empirical measures in (1.7) embeds the degree sequences into a B.I. process. The embedding idea is proposed in Athreya et al. [2008] and we tailor it for our setup finding it flexible enough to accommodate both linear preferential attachment Models A and B introduced in Section 2.1.

### 4.1 Embedding.

Here is how we embed the network growth model using a sequence of independent B.I. processes.

#### Model A and B.I. processes.

Model A is the simpler case where a new node is not allowed to have self loop. Let be independent B.I. processes such that

 BI1(0)=2,BIi(0)=1,∀i≥2. (4.1)

Each has transition rate is , . For , let be the jump times of the B.I. process and set for all . Then for ,

 BI1(τ(1)k)=k+2,BIi(τ(i)k)=k+1,i≥2.

Therefore,

 τ(1)k−τ(1)k−1∼Exp(k+1+δ),and τ(i)k−τ(i)k−1∼Exp(k+δ),i≥2.

and are independent.

Set and relative to define

 TA2:=τ(1)1, (4.2)

i.e. the first time that jumps. Start the new B.I. process at and let be the first time after that either or jumps so that,

 TA3=min{TAi+τ(i)k:k≥1,TAi+τ(i)k>TA2,i=1,2}.

Start a new, independent B.I. process at . See Figure 4.1, which assumes . Continue in this way. When lines have been created, define to be the first time after that one of the processes jumps, i.e.

 TAn+1:=min{TAi+τ(i)k:k≥1,TAi+τ(i)k>TAn,1≤i≤n}. (4.3)

At , start a new, independent B.I. process .

#### Model B and BI processes.

In Model B, a new node may be born with a self loop but the B.I. process framework can still be used. We keep the independent sequence of initialized as in (4.1), as well as the definition of for .

Set and start two B.I. processes and at . At time with , there exist B.I. processes. We define as the first time after that one of the processes jumps, i.e.

 TBn+1:=min{TBi−1+τ(i)k:k≥1,TBi−1+τ(i)k>TBn,1≤i≤n+1}, (4.4)

and start a new, independent B.I. process at .

#### Embedding.

The following embedding theorem is similar to the one proved in Athreya et al. [2008] and summarizes how to embed in the B.I. constructions.

###### Theorem 2

Fix .
(a) For Model A, suppose

 \definecolorpgfstrokecolorrgb0,0,0\pgfsys@color@gray@stroke0\pgfsys@color@gray@fill0DA(n):=(DA1(n),…,DAn(n))

is the degree sequence of nodes in the graph and is defined as in (4.3). For each fixed , define

 ˜DA(n) :=(BI1(TAn),BI2(TAn−TA2),…,BIn−1(TAn−TAn−1),BIn(0)),

and then and have the same distribution in .
(b) Analogously, for Model B, the degree sequence

 DB(n):=(DB1(n),…,DBn(n))

and

 ˜DB(n):=(BI1(TBn),BI2(TBn),…,BIn−1(TBn−TBn−2),BIn(TBn−TBn−1))

have the same distribution in .

###### Proof

By the construction of Model A, at each , , we start a new B.I. process with initial value equal to 1 and one of , also increases by 1. This makes the sum of the values of , , increase by 2 so that

 n∑i=1(BIi(TAn−TAi)+δ)=(2+δ)n.

The rest is essentially the proof of Athreya et al. [2008, Theorem 2.1] which we now outline.

Both and are Markov on the state space since

 DA(n+1)= (DA(n),1)+(e(n)Jn+1,0), ˜DA(n+1)= (˜DA(n),1)+(e(n)Ln+1,0),

where for is a vector of length of ’s except for a in the -th entry and

 P[Jn+1=j|DA(n)]=DAj(n)+δ(2+δ)n,1≤j≤n,

and records which B.I. process in is the first to have a new birth after .

When ,

 ˜DA(1)=BI1(0)=2=DA1(1)=DA(1),

so to prove equality in distribution for any , it suffices to verify that the transition probability from to is the same as that from to .

According to the preferential attachment setup, we have

 P(DA(n+1) =(d1,d2,…,di+1,di+1,…,dn,1)∣∣DA(n)=(d1,d2,…,dn)) =di+δ(2+δ)n,1≤i≤n. (4.5)

At time , there are B.I. processes and each of them has a population size of , . Therefore, is the minimum of independent exponential random variables, , with means

 (BIi(TAn−TAi)+δ)−1,1≤i≤n,

which gives for any

 P (Ln+1=i∣∣˜DA(n)=(d1,d2,…,dn)) = P(˜DA(n+1)=(d1,d2,…,di+1,di+1,…,dn,1)∣∣˜DA(n)=(d1,…,dn)) = P(E(i)n

This agrees with the transition probability in (4.5), thus completing the proof for Model A.

For Model B, the proof follows in a similar way except that for each , is the minimum of independent exponential random variables with means

 (BIi(TBn−TBi−1)+δ)−1,1≤i≤n+1,

so that for ,

 P(˜DB(n+1) =(d1,d2,…,di+1,di+1,…,dn,1)∣∣˜DB(n)=(d1,d2,…,dn)) =BIi(TBn−TBi−1)+δ∑n+1i=1(BIi(TBn−TBi−1)+δ)=di+δ(2+δ)n+1+δ, and P(˜DB(n+1) =(d1,d2,…,dn,2)∣∣˜DB(n)=(d1,d2,…,dn)) =1+δ(2+δ)n+1+δ.
###### Remark 3

This B.I. process construction can also be generalized for other choices of the preferential attachment functions . For example, its applications to the super- and sub-linear preferential attachment models are studied in Athreya [2007].

### 4.2 Asymptotic properties.

One important reason to use the embedding technique specified in Section 4.1 is that asymptotic behavior of the degree growth in a preferential attachment model can be characterized explicitly. These asymptotic properties then help us derive weak convergence of the empirical measure, which is analogous to (1.4) in the iid case.

#### Branching times.

We first consider the asymptotic behavior of the branching times and , which are defined in Section 4.1.

###### Proposition 4

For and defined in (4.3) and (4.4) respectively, we have

 ne(2+δ)TAn\lx@stackrel% a.s.⟶WA,WA ∼Exp(1); (4.6) and ne(2+δ)TBn\lx@stackrel% a.s.⟶WB,WB ∼Gamma(3+2δ1+δ,1). (4.7)
###### Proof

Define two counting processes

 NA(t):=12∞∑i=1BIi(t−TAi)1{t≥TAi}

in Model A, and

 NB(t):=12∞∑i=1BIi(t−TBi−1)1{t≥TBi}

in Model B. In either case, we have

 Nl(t)1{t∈[Tln,Tln+1)}=n,l=A,B.

In other words, are the jump times of the counting process , for , with the following structure

 {TAn+1−TAn:n≥1} \lx@stackreld={Ai(2+δ)i,i≥1}, (4.8) {TBn+1−TBn:n≥1} \lx@stackreld={Bi(2+δ)i+1+δ,i≥1}, (4.9)

where and are iid unit exponential random variables.

From (4.8), we see that is a pure birth process with and transition rate

 qAi,i+1=(2+δ)i,i≥1.

Replacing with in (3.2) gives (4.6). By (4.9), is a B.I. process with and transition rate , . In order to apply Theorem 1 which assumes , we define for all . Then is a B.I. process with and transition rate , for . Therefore, (4.7) follows directly from Theorem 1.

#### Convergence of the measure.

Using embedding techniques, Theorem 5 gives the convergence of the empirical measure, which draws an analogy to (1.4) in the iid case.

###### Theorem 5

Suppose that

1. , are distributed as in (4.8) and (4.9).

2. , are limit random variables as given in (4.6) and (4.7).

3. is a sequence of independent Gamma random variables specified in (4.12) and (4.13) below.

Then in , we have for ,

 n∑i=1 ϵDAi(n)/n1/(2+δ)(⋅)⇒∞∑i=1ϵσie−TAi/W1/(2+δ)A(⋅), (4.10a) n∑i=1 ϵDBi(n)/n1/(2+δ)(⋅)⇒∞∑i=1ϵσie−TBi−1/W1/(2+δ)B(⋅). (4.10b)
###### Remark 6

From (4.10a) we get for any fixed , that in ,

 (4.11)

where a subscript inside parentheses indicates ordering so that and the limit on the right side of (4.11) represents the ordered largest points from the right side of (4.10a). A similar result for Model B follows from (4.10b).

To prove Theorem 5, we first need to show the following lemma, which gives the asymptotic limit of the degree sequence under the B.I. process framework.

###### Lemma 7

Suppose that

1. , are distributed as in (4.8) and (4.9).

2. , are limit random variables as given in (4.6) and (4.7).

Then we have the following convergence results pertinent to the degree sequence , for :

1. For each ,

 B