Edgeworth expansions for network moments

# Supplemental Material for: Edgeworth expansions for network moments

\fnmsYuan \snmZhanglabel=e1]yzhanghf@stat.osu.edu [    \fnmsDong \snmXialabel=e2]madxia@ust.hk [ The Ohio State University and Hong Kong University of Science and Technology Yuan Zhang
Department of Statistics
The Ohio State University
229 Cockins Hall
1958 Neil Avenue
Columbus, Ohio, USA 43210
E-mail:
Dong Xia
Department of Mathematics
Hong Kong University of Science and Technology
Room 3485
Clear Water Bay
Kowloon, Hong Kong
E-mail:
\fnmsYuan \snmZhanglabel=e1]yzhanghf@stat.osu.edu [    \fnmsDong \snmXialabel=e2]madxia@ust.hk [ The Ohio State University and Hong Kong University of Science and Technology Yuan Zhang
Department of Statistics
The Ohio State University
229 Cockins Hall
1958 Neil Avenue
Columbus, Ohio, USA 43210
E-mail:
Dong Xia
Department of Mathematics
Hong Kong University of Science and Technology
Room 3485
Clear Water Bay
Kowloon, Hong Kong
E-mail:

## Abstract

Network method of moments [14] is an important tool for nonparametric network inferences. However, there has been little investigation on accurate descriptions of the sampling distributions of network moment statistics. In this paper, we present the first higher-order accurate approximation to the sampling CDF of a studentized network moment by Edgeworth expansion. In sharp contrast to classical literature on noiseless U-statistics, we showed that the Edgeworth expansion of a network moment statistic as a noisy U-statistic can achieve higher-order accuracy without non-lattice or smoothness assumptions but just requiring weak regularity conditions. Behind this result is our surprising discovery that the two typically-hated factors in network analysis, namely, sparsity and edge-wise observational errors, jointly play a blessing role, contributing a crucial self-smoothing effect in the network moment statistic and making it analytically tractable. Our assumptions match the minimum requirements in related literature.

For practitioners, our empirical Edgeworth expansion is highly accurate and computationally efficient. It is also easy to implement. These were demonstrated by comprehensive simulation studies.

We showcase three applications of our results in network inference. We proved, to our knowledge, for the first time that some network bootstraps enjoy higher-order accuracy, and provided theoretical guidance for tuning network sub-sampling. We also derived a one-sample test and Cornish-Fisher confidence interval for any given moment, both with analytical formulation and explicit error rates.

[
\kwd
\externaldocument

supplemental-material \patchcmd1 \startlocaldefs \endlocaldefs

\runtitle

Network Edgeworth expansion

{aug}

and

class=MSC] \kwd[Primary ]62E17 \kwd91D30 \kwd[; secondary ]60F05

network inferences \kwdmethod of moments \kwdEdgeworth expansion \kwdnoisy U-statistic \kwdnetwork bootstrap

## 1 Introduction

### 1.1 Overview

Network moments are frequencies of particular patterns, called motifs, that repeatedly occur in networks [84], such as triangles, stars and wheels. They provide informative sketches of the potentially very high-dimensional network population distribution. Pioneered by [14, 78], the method of moments for network data has become a powerful tool for frequentist nonparametric network inferences [4, 83, 106, 3, 79]. Compared to model-based network inference methods [74, 17, 103, 77], network method of moments enjoys several unique values and advantages.

First, the evaluation of network moments is completely model-free, making them objective evidences for specification and comparison of network models [23, 94, 101, 87]. They are the building blocks of the well-known exponential random graph models (ERGM) [64, 110]. Moreover, the deep theory by [14] (Theorem 3) and [22] (Theorem 2.1) shows that knowing all population moments can uniquely determine a general exchangeable network model up to a weak isomorphism map, despite no available inversion formula. Second, in a big data era, many high-valued business and industry networks contain or even more nodes [36, 75]. In such regime, efficiency becomes a substantive practicality concern. Model-fitting based network inferences might face challenges in handling huge networks, while moment method equipped with proper sampling techniques [92, 39] can be very scalable. Third, many network moments themselves are informative descriptive statistics, attracting a lot of research interests, such as clustering coefficient [62, 105], degree distribution [89, 98], transitivity [91], and so on.

Despite a surging literature on network method of moments in recent years, the answer to the following core question remains under-explored:

What is the sampling distribution of a network moment?

For a given network motif , let denote its sample relative frequency with expectation . Let be an estimator of to be chosen later. We are mainly interested in finding the distribution of the studentized form . It is well-known that under the widely-studied exchangeable network model, uniformly [14, 13, 51], but is rough unless the network is large, so one naturally yearns for a finer approximation. To this end, several network bootstrap methods have been proposed recently [14, 13, 51, 76, 79] attempting to address this question, and they quickly inspired many follow-up works [100, 99, 50, 32] that clearly reflected the interests from the application side in accurate approximations. However, compared to their empirical effectiveness, the theoretical support of network bootstraps remains weak. Almost all existing justifications of network bootstraps critically depend on the following type of results

 |ˆU∗n−ˆUn|=op(n−1/2), or % similarly, ∣∣ˆT∗n−ˆTn∣∣=op(1),

where or are bootstrapped statistics, combined with the asymptotic normality of or . But this approach cannot show whether network bootstraps have any accuracy advantage over a simple normal approximation, especially considering the much higher computational costs to bootstrap.

In this paper, we propose the first provable higher-order approximation to the sampling distribution of a given studentized network moment. We briefly summarize our main theorems into an informal statement as follows.

###### Theorem 1.1 (Informal statement of main theorems).

Assume the network is generated from an exchangeable model. Define the Edgeworth expansion for a given network moment with nodes edges as follows:

 Gn(x) +r−12⋅(x2+1)E[g1(X1)g1(X2)g2(X1,X2)]},

where are the CDF and PDF of , and , , and are estimable quantities depending only on the graphon and the motif to be defined in Section 3. Let denote the network sparsity parameter. Under the following assumptions:

1. ,

2. Either is acyclic and , or cyclic and ,

3. Either , or ,

we have

 ∥∥FˆTn(u)−Gn(u)∥∥∞=O(M(ρn,n;R)), (1.1)

where , and , define in (3.8), satisfies . Under the same conditions, the empirical Edgeworth expansion with estimated coefficients (see (3.11)) satisfies

 ∥∥FˆTn(u)−ˆGn(u)∥∥∞=Op(M(ρn,n;R)). (1.2)

### 1.2 Our contributions

Our contributions are three-fold. First, we established the first accurate distribution approximation for network moments (1.1), that originated from our novel insights on the surprising roles that network noise and sparsity play in this setting. Second, we proposed a provably highly accurate and computationally efficient empirical Edgeworth approximation (1.2) for practical use. Third, our results pave the way towards future developments in accurate and fast nonparametric network inferences.

To understand the strength of our main results (1.1) and (1.2), notice that for mildly sparse networks, we achieved higher-order accuracy in distribution approximation without non-lattice or smoothness assumption. The non-lattice assumption is universally imposed in all related literature known to the authors where higher-order accuracy is pursued. However, this assumption is violated by some popular network models, including stochastic block model, arguably the most important network model. Waiving the graphon smoothness assumption makes our approach a powerful tool for model-free, exploratory network analysis and for analyzing networks with irregularities.

The key insight is our novel view of the sample network moment as a noisy U-statistic, where “noise” refers to edge-wise observational errors in . Our analysis reveals the connection and differences between the noisy and the conventional noiseless U-statistic settings. We discovered, with surprise, the blessing roles that the two typically-hated factors, namely, edge-wise observational errors and network sparsity jointly play in this setting:

1. The errors behave like a smoother that tames potential distribution discontinuity due to a lattice or discrete network population2;

2. Network sparsity then boosts the smoothing effect of the error term to a sufficient level such that becomes analytically tractable.

In our proofs, we present original analysis that carefully quantifies the impact of such smoothing effect. Our proof techniques are very different from those in network bootstrap papers [13, 51, 76, 79]. It seems unlikely that our assumptions can be substantially relaxed since they match well-known minimum conditions in related settings.

Our empirical Edgeworth expansion (1.2) is very fast, much more scalable than network bootstraps, and easily permits parallel computing.

As an application of our theory, we present the first proof of the higher-order accuracy of some mainstream network bootstrap techniques under certain conditions, which their original proposing papers did not prove. Our results also enable rich future works on accurate and highly efficient network inferences. We present two immediate applications in testing and confidence intervals for network moments with explicit accuracy guarantees.

### 1.3 Paper organization

The rest of this paper is organized as follows. In Section 2, we formally set up the problem and provide a detailed literature review. In Section 3, we present our core ideas, derive the Edgeworth expansion and establish its uniform approximation error bound. We also discuss different versions of the studentization form. In Section 4, we present three applications of our results: bootstrap accuracy, one-sample test, and one-sample Cornish-Fisher confidence interval. Section 5 presents simulation studies. Section 6 discusses our results and future work.

## 2 Problem set up and literature review

### 2.1 Exchangeable networks and graphon model

The base model of this paper is exchangeable network model [42, 15]. Exchangeability describes the unlabeled nature of many networks in social, knowledge and biological contexts, where node indexes do not carry meaningful information. It is a very rich family that contains many popular models as special cases, including the stochastic block model and its variants [61, 115, 111, 1, 69, 114, 66], the configuration model [35, 85], latent space models [60, 52] and general smooth graphon models [33, 49, 113].

Exchangeable networks can be succinctly formulated by the Aldous-Hoover representation [2, 63]: the nodes correspond to latent space positions  Uniform. Network generation is governed by a measurable latent graphon function , that encodes all structures. The edge probability between nodes is

 Wij=Wji:=ρn⋅f(Xi,Xj);1≤i

where the sparsity parameter absorbs the constant factor, and . We only observe the adjacency matrix :

 Aij|W∼Bernoulli(Wij); and Aij=Aji;1≤i

The model (2.1) and (2.2) has a well-known issue that both and are only identifiable up to equivalence classes [29]. This may pose significant challenges for some model-based network inferences, as is a natural part in modeling the population of networks. Meanwhile, network moments are permutation-invariant and thus clearly immune to the identification issue.

### 2.2 Network moment statistics

To formalize network moments, it is more convenient to first define the sample version and then the population version. Each network moment is indexed by its corresponding motif . For simplicity, we follow the convention to focus on connected motifs. Let represent the adjacency matrix of , which has nodes and edges. For any -node sub-network of , define

 h(Ai1,…,ir):=\mathbbm1[Ai1,…,ir≅R]\lx@notefootnoteSinceweconsideranarbitrarybutfixed$R$throughoutthispaper,withoutcausingconfusion,wedropthedependencyon$R$insymbolssuchas$h$tosimplifynotation., for all 1≤i1<⋯

where “” means there exists a permutation map , such that , where is defined as . Define the sample network moment as

 ˆUn:=1(nr)∑1≤i1<⋯

and its sample-population version and population version are defined to be and , respectively. We call a noisy U-statistic for it is based on and call the conventional 3 a noiseless U-statistic for it is based on . Similar to the advantage of studentization in the i.i.d. setting (Section 3.5 of [104]), we study

 ˆTn:=ˆUn−μnˆSn,

where will be specified later. Similarly, the noiseless versions of can be defined by and , respectively, where and is a proper estimator for based on .

4

### 2.3 Edgeworth expansions for i.i.d. data and noiseless U-statistic

Edgeworth expansion [44, 102] refines the central limit theorem. It is the supporting pillar in the justification of bootstrap’s higher-order accuracy. In this subsection, we review the literature on Edgeworth expansions for i.i.d. data and for U-statistics, due to their close connection. Under mild conditions, the one-term Edgeworth expansion for i.i.d. mean-zero and unit-variance is , where and are the CDF and PDF of , respectively. Higher order Edgeworth terms can be derived [57] but are not practically meaningful without knowing the true population moments appearing in the coefficients. The minimax rate for estimating is so is the best possible practical remainder control for an Edgeworth expansion. For further references, see [18, 93, 12, 55, 56, 7] and textbooks [57, 40, 104].

The literature on Edgeworth expansions for U-statistics concentrates on the noiseless version. In early 1980’s, [26, 65, 27] established the asymptotic normality of and then with an remainder. Then [25, 16, 73] approximated degree-two (i.e. ) standardized U-statistics with an remainder, and [9] established an bound under relaxed conditions for more general symmetric statistics. Empirical Edgeworth expansions were studied in [58, 90], and they established bounds. For finite populations, [8, 20, 19, 21] established the earliest results, and we will use some of their results in our analysis of network bootstraps. An incomplete list of other notable works includes [10, 53, 67, 80, 11, 68].

### 2.4 The non-lattice condition and lattice Edgeworth expansions in the i.i.d. setting

A major assumption called the non-lattice condition is critical for achieving accuracy in Edgeworth expansions, including all results in the i.i.d. setting not requiring oracle moment knowledge and all results for noiseless U-statistics, but this condition is clearly not required by an accuracy bound5. A random variable is called lattice, if it is supported on for some where . General discrete distributions are “nearly lattice”6. A distribution is essentially non-lattice if it contains a continuous component. In many works, the non-lattice condition is replaced by the stronger Cramer’s condition [38]:

For U-statistics, this condition is imposed on . Cramer’s condition can be relaxed [5, 82, 96, 97] towards a non-lattice condition, but all known essential relaxations come at the price of essentially depreciated error bounds7. Therefore, for simplicity, in Theorems 3.1 and 4.1, we use Cramer’s condition to represent the non-lattice setting.

However, in network analysis, Cramer’s condition is violated by stochastic block model, arguably the most important network model. In a block model, only depends on node ’s community membership, thus is discrete. Also, non-lattice or Cramer’s condition is difficult to check in practice. Moreover, some non-constant smooth models may even yield lattice if paired with some motifs but not with others. For example, the graphon yields a lattice when is an edge, but a non-lattice when is a triangle.

Next, we present a detailed review of the approaches to treat a lattice in literature and the key inspiration to our work. By so far, latticeness can only be analytically remedied in the i.i.d. setting without losing accuracy. Existing approaches are categorized into two mainstreams: (1). adding an artificial error term to the sample mean to smooth out lattice-induced discontinuity [95, 72]; and (2) formulating the lattice version Edgeworth expansion with a jump function [95]. The seminal work [95] added a uniform error with bandwidth , and by reversing its impact in the smoothed distribution function, he exactly formulated the lattice Edgeworth expansion with remainder. Another classical work [72] used a normal error instead of uniform, and showed that the Gaussian bandwidth must be and to smooth sufficiently without introducing an distribution distortion. Other notable works include [107, 70, 6].

The intrinsic difficulty of the lattice problem obstructed significant further advances. First, the artificial error term, despite reinstating a tractable formula, brings an distortion to the original distribution8. Second, the exact formulation of the one-term lattice Edgeworth expansion contains an jump term with jump locations depending on true population moments [95], laying an uncrossable barrier for any empirical CDF approximation method.

## 3 Edgeworth expansions for network moments

### 3.1 Outline and core ideas to analyze ˆTn

Our key discovery is that the studentized noisy U-statistic can be decomposed as follows:

 ˆTn=˜Tn+ˆΔn+Ignorable % remainder, (3.1)

where can be roughly understood as a studentized noiseless U-statistic, similarly to , and .

Our decomposition (3.1) is a renaissance of the spirits of [95] and [72], but with the following crucial differences. First and most important, the error term in our formula is not artificial, but naturally a constituting component of . Therefore, the smoother does not distort the objective distribution, that is, is self-smoothed. The second difference lies in the bandwidth of the smoothing error term. The Gaussian bandwidth is not at our choice like that in [95] and [72], but governed by the network sparsity, so if is lattice, we would need to gain sufficient smoothing power. This echoes the lower bound on Gausssian bandwidth in [72]. We also need to be lower bounded for other reasons, see Lemma 3.1. Third, our error term is dependent on through . In our analysis, we carefully handled this dependency with original analysis.

### 3.2 Decomposition of the stochastic variations of ˆUn

To simplify narration, in this subsection, we focus on analyzing , and the analysis of is conceptually similar. The stochastic variations in comes from two sources: the randomness in due to and ultimately , and the randomness in due to , the edge-wise observational errors.

The stochastic variations in as a conventional noiseless U-statistic is well-understood due to Hoeffding’s decomposition [59]

 Un−μn =rnn∑i=1g1(Xi)+r(r−1)n(n−1)∑1≤i

where are defined as follows. To avoid complicated subscripts, without confusion we define ’s for special indexes . For indexes , (only when ) and , define , for and . From classical literature, we know that , where the strict subset could be , and unless and . Consequently, the linear part in the Hoeffding’s decomposition is dominant. Define

 ξ21:=Var(g1(X1)). (3.3)

We focus on discussing the stochastic variations in . The typical treatment in network bootstrap literature is to simply bound and ignore this component, such as Lemma 7 in [51]. But we shall reveal its key smoothing effect by a refined analysis. To better understand the impact of , let us inspect two simple examples.

###### Example 3.1.

Let be an edge with and , and is simply the sample edge density. By definition, all terms are mutually independent given . Then with a uniform Berry-Esseen CDF approximation error.

The next example shows that the insight of Example 3.1 generalizes.

###### Example 3.2.

Let be a triangular motif with , and is the empirical triangle frequency. We can decompose as follows:

 ˆUn−Un=1(n3)∑1≤i1

where and . The linear part is order and dominating if , noticing that all and terms are mutually uncorrelated given .

The insights of the two examples are generalized in Lemma (3.1)-(b). When the network is moderately dense, the linear part in dominates. Consequently, the overall contribution of the stochastic variations in approximates Gaussian with an Berry-Esseen bound.

### 3.3 Studentization form

The understanding of in Section 3.2 prepares us to fully specify . We now design . In , we observe and . We shall assume , so dominates. There are two main choices of . The conventional choice for studentizing noiseless U-statistics [27, 58, 90] suggests the jackknife estimator

 n⋅ˆS2n;jackknife:=(n−1)n∑i=1(ˆU(−i)n−ˆUn)2, (3.4)

where is calculated on the sub-network of induced by removing the th node. Despite conceptual straightforwardness, the jackknife estimator unnecessarily complicates analysis. Therefore, we use an estimator with a simpler formulation. In , replace by its moment estimator. We design as follows

 n⋅ˆS2n:=r2nn∑i=1⎧⎪ ⎪⎨⎪ ⎪⎩1(n−1)r−1∑1≤i1<⋯

We will show in Theorem 3.3 that the two estimators are in fact equivalent.

Next, we expand . For simplicity, define the following shorthand

 U∗n:=1√n⋅ξ1n∑i=1g1(Xi), Δn:=r−1√n(n−1)ξ1∑1≤i

where in (3.5), the technical intermediate term is defined as

 n⋅ˆσ2n :=r2nn∑i=1{1(n−1r−1)∑1≤i1<⋯

We now show that can be expanded as follows.

 ˆTn =(U∗n+Δn+ˆΔn+Op(n−1))⋅(1+ˆδn+δn)−1/2 =˜Tn+ˆΔn+Op(M(ρn,n;R)), (3.6)

where

 ˜Tn:=U∗n+Δn−12U∗n⋅δn. (3.7)

The form (3.6) is partially justified by the Taylor expansion , with [80]; and the remaining justification comes from our main lemma, i.e. Lemma 3.1.

###### Definition 3.1 (Acyclic and cyclic motifs, see also [14, 13, 76]).

A motif is called acyclic, if its edge set is a subset of an -tree. The motif is called cyclic, if it is connected and contains at least one cycle. In other words, a cyclic motif is connected but not a tree.

###### Definition 3.2.

To simplify the statements of our method’s error bound under different motif shapes, especially in Table 1 and proof steps, define the following shorthand

 M(ρn,n;R):={(ρn⋅n)−1, For acyclic Rρ−r/2n⋅n−1,For cyclic R (3.8)
###### Lemma 3.1.

Assume the following conditions hold:

1. , ,

2. Either is acyclic and , or cyclic and ,

where is a universal constant. We have the following results:

1. ,

2. We have

 (ˆUn−Un)σn=ˆΔn+Op(M(ρn,n;R)).
 ∥∥FˆΔn|W(u)−FN(0,(ρn⋅n)−1σ2w)(u)∥∥∞=Op(ρ−1/2n⋅n−1), (3.9)

where the definition of is lengthy and formally stated in Section 7 in supplementary material. As , we have .

3. ,

4. We have

 δn=1nn∑i=1g21(Xi)−ξ21ξ21+2(r−1)n(n−1)∑1≤{i,j}≤ni≠jg1(Xi)g2(Xi,Xj)ξ21+Op(n−1).
###### Remark 3.1.

Assumption (i) is a standard non-degeneration assumption in literature. It should not be confused with a graphon smoothness assumption. A globally smooth Erdos-Renyi graphon leads to a degenerate . In the degenerate setting, both the standardization/studentization and the analysis would be very different. Asymptotic results for motifs under an Erdos-Renyi graphon were established in [48, 47]. Degenerate U-statistics are outside the scope of this paper.

###### Remark 3.2.

Assumption (ii) regards the randomness in and guarantees the domination of the linear part of 9. The seemingly higher requirement of our Assumption (ii) compared to its counterparts in [14, 13, 51], which require for cyclic and for acyclic , is purely due to our pursuit of higher-order accuracy. Under their sparsity conditions, our approach achieves a Berry-Esseen bound , still better than their rates. However, letting their analysis assume our Assumption (ii) does not clearly lead to an improvement of their error rates.

###### Remark 3.3.

In Lemma 3.1, Parts (a) and (d) are similar to classical literature, but here we accounted for . Parts (b) and (c) are unique to the network setting. Especially in the proof of Part (b), we refined the analysis of the randomness in in [13] and [51].

###### Remark 3.4.

Our result (3.9) in Lemma 3.1-(b) should not be confused with Theorem 1 of [14]. There are three distinct quantities: the true , the estimated and . The convergence rate of is much faster than . Our result (3.9) regards , thus avoiding the bottleneck; whereas [14] and later [79] focused on .

Overall, Lemma 3.1 clarifies the asymptotic orders of the leading terms the expansion of . In fact, Lemma 3.1 also holds for a jackknife , in view of Theorem 3.3, but we do not present it due to page limit.

### 3.4 Population and empirical Edgeworth expansions for network moments

In this subsection, we present our main theorems.

###### Theorem 3.1 (Population network Edgeworth expansion).

Define

 Gn(x) +r−12⋅(x2+1)E[g1(X1)g1(X2)g2(X1,X2)]}, (3.10)

where and are the CDF and PDF of . Under the assumptions of Lemma 3.1, and additionally assume either or Cramer’s condition holds. We have

 ∥∥FˆTn(x)−Gn(x)∥∥∞=O(M(ρn,n;R)).
###### Remark 3.5.

The assumed ’s upper bound in absence of Cramer’s condition serves to sufficiently boost the smoothing power of , quantified in Lemma 3.1-(3.9). This assumption is unlikely improvable, since its required Gaussian variance matches the minimum Gaussian standard deviation requirement in Remark 2.4 in [72] for the i.i.d. setting.

In (3.10), the Edgeworth coefficients depend on true population moments. In practice, they need to be estimated from data. Define

 ˆg1(Xi) :=1(n−1r−1)∑1≤i1<…

where we write “” rather than “” for cleanness. We stress that the evaluation of and only requires the indexes but not the latent . Then the Edgeworth coefficients can be estimated by

 ˆξ21:=n⋅ˆS2nr2=1nn∑i=1ˆg21(Xi), and ˆE[g31(X1)]:=1nn∑i=1ˆg31(Xi), ˆE[g1(X1)g1(X2)g2(X1,X2)] :=1(n2)∑1≤i
###### Theorem 3.2 (Empirical network Edgeworth expansion).

Define the empirical Edgeworth expansion as follows:

 ˆGn(x) :=Φ(x)+φ(x)√n⋅ˆξ31⋅{2x2+16⋅ˆE[g31(X1)] +r−12⋅(x2+1)ˆE[g1(X1)g1(X2)g2(X1,X2)]}, (3.11)

Under the conditions of Theorem 3.1, we have

 ∥∥FˆTn(