Edgeworth expansions for network moments
Supplemental Material for: Edgeworth expansions for network moments
Abstract
Network method of moments [14] is an important tool for nonparametric network inferences. However, there has been little investigation on accurate descriptions of the sampling distributions of network moment statistics. In this paper, we present the first higherorder accurate approximation to the sampling CDF of a studentized network moment by Edgeworth expansion. In sharp contrast to classical literature on noiseless Ustatistics, we showed that the Edgeworth expansion of a network moment statistic as a noisy Ustatistic can achieve higherorder accuracy without nonlattice or smoothness assumptions but just requiring weak regularity conditions. Behind this result is our surprising discovery that the two typicallyhated factors in network analysis, namely, sparsity and edgewise observational errors, jointly play a blessing role, contributing a crucial selfsmoothing effect in the network moment statistic and making it analytically tractable. Our assumptions match the minimum requirements in related literature.
For practitioners, our empirical Edgeworth expansion is highly accurate and computationally efficient. It is also easy to implement. These were demonstrated by comprehensive simulation studies.
We showcase three applications of our results in network inference. We proved, to our knowledge, for the first time that some network bootstraps enjoy higherorder accuracy, and provided theoretical guidance for tuning network subsampling. We also derived a onesample test and CornishFisher confidence interval for any given moment, both with analytical formulation and explicit error rates.
supplementalmaterial
\patchcmd
Network Edgeworth expansion
and
class=MSC] \kwd[Primary ]62E17 \kwd91D30 \kwd[; secondary ]60F05
network inferences \kwdmethod of moments \kwdEdgeworth expansion \kwdnoisy Ustatistic \kwdnetwork bootstrap
1 Introduction
1.1 Overview
Network moments are frequencies of particular patterns, called motifs, that repeatedly occur in networks [84], such as triangles, stars and wheels. They provide informative sketches of the potentially very highdimensional network population distribution. Pioneered by [14, 78], the method of moments for network data has become a powerful tool for frequentist nonparametric network inferences [4, 83, 106, 3, 79]. Compared to modelbased network inference methods [74, 17, 103, 77], network method of moments enjoys several unique values and advantages.
First, the evaluation of network moments is completely modelfree, making them objective evidences for specification and comparison of network models [23, 94, 101, 87]. They are the building blocks of the wellknown exponential random graph models (ERGM) [64, 110]. Moreover, the deep theory by [14] (Theorem 3) and [22] (Theorem 2.1) shows that knowing all population moments can uniquely determine a general exchangeable network model up to a weak isomorphism map, despite no available inversion formula. Second, in a big data era, many highvalued business and industry networks contain or even more nodes [36, 75]. In such regime, efficiency becomes a substantive practicality concern. Modelfitting based network inferences might face challenges in handling huge networks, while moment method equipped with proper sampling techniques [92, 39] can be very scalable. Third, many network moments themselves are informative descriptive statistics, attracting a lot of research interests, such as clustering coefficient [62, 105], degree distribution [89, 98], transitivity [91], and so on.
Despite a surging literature on network method of moments in recent years, the answer to the following core question remains underexplored:
What is the sampling distribution of a network moment?
For a given network motif , let denote its sample relative frequency with expectation . Let be an estimator of to be chosen later. We are mainly interested in finding the distribution of the studentized form . It is wellknown that under the widelystudied exchangeable network model, uniformly [14, 13, 51], but is rough unless the network is large, so one naturally yearns for a finer approximation. To this end, several network bootstrap methods have been proposed recently [14, 13, 51, 76, 79] attempting to address this question, and they quickly inspired many followup works [100, 99, 50, 32] that clearly reflected the interests from the application side in accurate approximations. However, compared to their empirical effectiveness, the theoretical support of network bootstraps remains weak. Almost all existing justifications of network bootstraps critically depend on the following type of results
where or are bootstrapped statistics, combined with the asymptotic normality of or . But this approach cannot show whether network bootstraps have any accuracy advantage over a simple normal approximation, especially considering the much higher computational costs to bootstrap.
In this paper, we propose the first provable higherorder approximation to the sampling distribution of a given studentized network moment. We briefly summarize our main theorems into an informal statement as follows.
Theorem 1.1 (Informal statement of main theorems).
Assume the network is generated from an exchangeable model. Define the Edgeworth expansion for a given network moment with nodes edges as follows:
where are the CDF and PDF of , and , , and are estimable quantities depending only on the graphon and the motif to be defined in Section 3. Let denote the network sparsity parameter. Under the following assumptions:

,

Either is acyclic and , or cyclic and ,

Either , or ,
we have
(1.1) 
where , and , define in (3.8), satisfies . Under the same conditions, the empirical Edgeworth expansion with estimated coefficients (see (3.11)) satisfies
(1.2) 
1.2 Our contributions
Our contributions are threefold. First, we established the first accurate distribution approximation for network moments (1.1), that originated from our novel insights on the surprising roles that network noise and sparsity play in this setting. Second, we proposed a provably highly accurate and computationally efficient empirical Edgeworth approximation (1.2) for practical use. Third, our results pave the way towards future developments in accurate and fast nonparametric network inferences.
To understand the strength of our main results (1.1) and (1.2), notice that for mildly sparse networks, we achieved higherorder accuracy in distribution approximation without nonlattice or smoothness assumption. The nonlattice assumption is universally imposed in all related literature known to the authors where higherorder accuracy is pursued. However, this assumption is violated by some popular network models, including stochastic block model, arguably the most important network model. Waiving the graphon smoothness assumption makes our approach a powerful tool for modelfree, exploratory network analysis and for analyzing networks with irregularities.
The key insight is our novel view of the sample network moment as a noisy Ustatistic, where “noise” refers to edgewise observational errors in . Our analysis reveals the connection and differences between the noisy and the conventional noiseless Ustatistic settings. We discovered, with surprise, the blessing roles that the two typicallyhated factors, namely, edgewise observational errors and network sparsity jointly play in this setting:

The errors behave like a smoother that tames potential distribution discontinuity due to a lattice or discrete network population
^{2} ; 
Network sparsity then boosts the smoothing effect of the error term to a sufficient level such that becomes analytically tractable.
In our proofs, we present original analysis that carefully quantifies the impact of such smoothing effect. Our proof techniques are very different from those in network bootstrap papers [13, 51, 76, 79]. It seems unlikely that our assumptions can be substantially relaxed since they match wellknown minimum conditions in related settings.
Our empirical Edgeworth expansion (1.2) is very fast, much more scalable than network bootstraps, and easily permits parallel computing.
As an application of our theory, we present the first proof of the higherorder accuracy of some mainstream network bootstrap techniques under certain conditions, which their original proposing papers did not prove. Our results also enable rich future works on accurate and highly efficient network inferences. We present two immediate applications in testing and confidence intervals for network moments with explicit accuracy guarantees.
1.3 Paper organization
The rest of this paper is organized as follows. In Section 2, we formally set up the problem and provide a detailed literature review. In Section 3, we present our core ideas, derive the Edgeworth expansion and establish its uniform approximation error bound. We also discuss different versions of the studentization form. In Section 4, we present three applications of our results: bootstrap accuracy, onesample test, and onesample CornishFisher confidence interval. Section 5 presents simulation studies. Section 6 discusses our results and future work.
2 Problem set up and literature review
2.1 Exchangeable networks and graphon model
The base model of this paper is exchangeable network model [42, 15]. Exchangeability describes the unlabeled nature of many networks in social, knowledge and biological contexts, where node indexes do not carry meaningful information. It is a very rich family that contains many popular models as special cases, including the stochastic block model and its variants [61, 115, 111, 1, 69, 114, 66], the configuration model [35, 85], latent space models [60, 52] and general smooth graphon models [33, 49, 113].
Exchangeable networks can be succinctly formulated by the AldousHoover representation [2, 63]: the nodes correspond to latent space positions Uniform. Network generation is governed by a measurable latent graphon function , that encodes all structures. The edge probability between nodes is
(2.1) 
where the sparsity parameter absorbs the constant factor, and . We only observe the adjacency matrix :
(2.2) 
The model (2.1) and (2.2) has a wellknown issue that both and are only identifiable up to equivalence classes [29]. This may pose significant challenges for some modelbased network inferences, as is a natural part in modeling the population of networks. Meanwhile, network moments are permutationinvariant and thus clearly immune to the identification issue.
2.2 Network moment statistics
To formalize network moments, it is more convenient to first define the sample version and then the population version. Each network moment is indexed by its corresponding motif . For simplicity, we follow the convention to focus on connected motifs. Let represent the adjacency matrix of , which has nodes and edges. For any node subnetwork of , define
(2.3) 
where “” means there exists a permutation map , such that , where is defined as . Define the sample network moment as
(2.4) 
and its samplepopulation version and population version are defined to be and , respectively. We call a noisy Ustatistic for it is based on and call the conventional
where will be specified later. Similarly, the noiseless versions of can be defined by and , respectively, where and is a proper estimator for based on .
2.3 Edgeworth expansions for i.i.d. data and noiseless Ustatistic
Edgeworth expansion [44, 102] refines the central limit theorem. It is the supporting pillar in the justification of bootstrap’s higherorder accuracy. In this subsection, we review the literature on Edgeworth expansions for i.i.d. data and for Ustatistics, due to their close connection. Under mild conditions, the oneterm Edgeworth expansion for i.i.d. meanzero and unitvariance is , where and are the CDF and PDF of , respectively. Higher order Edgeworth terms can be derived [57] but are not practically meaningful without knowing the true population moments appearing in the coefficients. The minimax rate for estimating is so is the best possible practical remainder control for an Edgeworth expansion. For further references, see [18, 93, 12, 55, 56, 7] and textbooks [57, 40, 104].
The literature on Edgeworth expansions for Ustatistics concentrates on the noiseless version. In early 1980’s, [26, 65, 27] established the asymptotic normality of and then with an remainder. Then [25, 16, 73] approximated degreetwo (i.e. ) standardized Ustatistics with an remainder, and [9] established an bound under relaxed conditions for more general symmetric statistics. Empirical Edgeworth expansions were studied in [58, 90], and they established bounds. For finite populations, [8, 20, 19, 21] established the earliest results, and we will use some of their results in our analysis of network bootstraps. An incomplete list of other notable works includes [10, 53, 67, 80, 11, 68].
2.4 The nonlattice condition and lattice Edgeworth expansions in the i.i.d. setting
A major assumption called the nonlattice condition is critical for achieving accuracy in Edgeworth expansions, including all results in the i.i.d. setting not requiring oracle moment knowledge and all results for noiseless Ustatistics, but this condition is clearly not required by an accuracy bound
For Ustatistics, this condition is imposed on . Cramer’s condition can be relaxed [5, 82, 96, 97] towards a nonlattice condition, but all known essential relaxations come at the price of essentially depreciated error bounds
However, in network analysis, Cramer’s condition is violated by stochastic block model, arguably the most important network model. In a block model, only depends on node ’s community membership, thus is discrete. Also, nonlattice or Cramer’s condition is difficult to check in practice. Moreover, some nonconstant smooth models may even yield lattice if paired with some motifs but not with others. For example, the graphon yields a lattice when is an edge, but a nonlattice when is a triangle.
Next, we present a detailed review of the approaches to treat a lattice in literature and the key inspiration to our work. By so far, latticeness can only be analytically remedied in the i.i.d. setting without losing accuracy. Existing approaches are categorized into two mainstreams: (1). adding an artificial error term to the sample mean to smooth out latticeinduced discontinuity [95, 72]; and (2) formulating the lattice version Edgeworth expansion with a jump function [95]. The seminal work [95] added a uniform error with bandwidth , and by reversing its impact in the smoothed distribution function, he exactly formulated the lattice Edgeworth expansion with remainder. Another classical work [72] used a normal error instead of uniform, and showed that the Gaussian bandwidth must be and to smooth sufficiently without introducing an distribution distortion. Other notable works include [107, 70, 6].
The intrinsic difficulty of the lattice problem obstructed significant further advances. First, the artificial error term, despite reinstating a tractable formula, brings an distortion to the original distribution
3 Edgeworth expansions for network moments
3.1 Outline and core ideas to analyze
Our key discovery is that the studentized noisy Ustatistic can be decomposed as follows:
(3.1) 
where can be roughly understood as a studentized noiseless Ustatistic, similarly to , and .
Our decomposition (3.1) is a renaissance of the spirits of [95] and [72], but with the following crucial differences. First and most important, the error term in our formula is not artificial, but naturally a constituting component of . Therefore, the smoother does not distort the objective distribution, that is, is selfsmoothed. The second difference lies in the bandwidth of the smoothing error term. The Gaussian bandwidth is not at our choice like that in [95] and [72], but governed by the network sparsity, so if is lattice, we would need to gain sufficient smoothing power. This echoes the lower bound on Gausssian bandwidth in [72]. We also need to be lower bounded for other reasons, see Lemma 3.1. Third, our error term is dependent on through . In our analysis, we carefully handled this dependency with original analysis.
3.2 Decomposition of the stochastic variations of
To simplify narration, in this subsection, we focus on analyzing , and the analysis of is conceptually similar. The stochastic variations in comes from two sources: the randomness in due to and ultimately , and the randomness in due to , the edgewise observational errors.
The stochastic variations in as a conventional noiseless Ustatistic is wellunderstood due to Hoeffding’s decomposition [59]
(3.2) 
where are defined as follows. To avoid complicated subscripts, without confusion we define ’s for special indexes . For indexes , (only when ) and , define , for and . From classical literature, we know that , where the strict subset could be , and unless and . Consequently, the linear part in the Hoeffding’s decomposition is dominant. Define
(3.3) 
We focus on discussing the stochastic variations in . The typical treatment in network bootstrap literature is to simply bound and ignore this component, such as Lemma 7 in [51]. But we shall reveal its key smoothing effect by a refined analysis. To better understand the impact of , let us inspect two simple examples.
Example 3.1.
Let be an edge with and , and is simply the sample edge density. By definition, all terms are mutually independent given . Then with a uniform BerryEsseen CDF approximation error.
The next example shows that the insight of Example 3.1 generalizes.
Example 3.2.
Let be a triangular motif with , and is the empirical triangle frequency. We can decompose as follows:
where and . The linear part is order and dominating if , noticing that all and terms are mutually uncorrelated given .
3.3 Studentization form
The understanding of in Section 3.2 prepares us to fully specify . We now design . In , we observe and . We shall assume , so dominates. There are two main choices of . The conventional choice for studentizing noiseless Ustatistics [27, 58, 90] suggests the jackknife estimator
(3.4) 
where is calculated on the subnetwork of induced by removing the th node. Despite conceptual straightforwardness, the jackknife estimator unnecessarily complicates analysis. Therefore, we use an estimator with a simpler formulation. In , replace by its moment estimator. We design as follows
We will show in Theorem 3.3 that the two estimators are in fact equivalent.
Next, we expand . For simplicity, define the following shorthand
(3.5)  
where in (3.5), the technical intermediate term is defined as
We now show that can be expanded as follows.
(3.6) 
where
(3.7) 
The form (3.6) is partially justified by the Taylor expansion , with [80]; and the remaining justification comes from our main lemma, i.e. Lemma 3.1.
Definition 3.1 (Acyclic and cyclic motifs, see also [14, 13, 76]).
A motif is called acyclic, if its edge set is a subset of an tree. The motif is called cyclic, if it is connected and contains at least one cycle. In other words, a cyclic motif is connected but not a tree.
Definition 3.2.
To simplify the statements of our method’s error bound under different motif shapes, especially in Table 1 and proof steps, define the following shorthand
(3.8) 
Lemma 3.1.
Assume the following conditions hold:

, ,

Either is acyclic and , or cyclic and ,
where is a universal constant. We have the following results:

,

We have
(3.9) where the definition of is lengthy and formally stated in Section 7 in supplementary material. As , we have .

,

We have
Remark 3.1.
Assumption (i) is a standard nondegeneration assumption in literature. It should not be confused with a graphon smoothness assumption. A globally smooth ErdosRenyi graphon leads to a degenerate . In the degenerate setting, both the standardization/studentization and the analysis would be very different. Asymptotic results for motifs under an ErdosRenyi graphon were established in [48, 47]. Degenerate Ustatistics are outside the scope of this paper.
Remark 3.2.
Assumption (ii) regards the randomness in and guarantees the domination of the linear part of
Remark 3.3.
Remark 3.4.
3.4 Population and empirical Edgeworth expansions for network moments
In this subsection, we present our main theorems.
Theorem 3.1 (Population network Edgeworth expansion).
Define
(3.10) 
where and are the CDF and PDF of . Under the assumptions of Lemma 3.1, and additionally assume either or Cramer’s condition holds. We have
Remark 3.5.
The assumed ’s upper bound in absence of Cramer’s condition serves to sufficiently boost the smoothing power of , quantified in Lemma 3.1(3.9). This assumption is unlikely improvable, since its required Gaussian variance matches the minimum Gaussian standard deviation requirement in Remark 2.4 in [72] for the i.i.d. setting.
In (3.10), the Edgeworth coefficients depend on true population moments. In practice, they need to be estimated from data. Define
where we write “” rather than “” for cleanness. We stress that the evaluation of and only requires the indexes but not the latent . Then the Edgeworth coefficients can be estimated by
Theorem 3.2 (Empirical network Edgeworth expansion).
Define the empirical Edgeworth expansion as follows:
(3.11) 
Under the conditions of Theorem 3.1, we have