An Iterative Step-Function Estimator for Graphons
Exchangeable graphs arise via a sampling procedure from measurable functions known as graphons. A natural estimation problem is how well we can recover a graphon given a single graph sampled from it. One general framework for estimating a graphon uses step-functions obtained by partitioning the nodes of the graph according to some clustering algorithm. We propose an iterative step-function estimator (ISFE) that, given an initial partition, iteratively clusters nodes based on their edge densities with respect to the previous iteration’s partition. We analyze ISFE and demonstrate its performance in comparison with other graphon estimation techniques.
Latent variable models of graphs can be used to model hidden structure in large networks and have been applied to a variety of problems such as community detection [?] and link prediction [?]. Furthermore, many graphs are naturally modeled as exchangeable when the nodes have no particular ordering [?]. Examples of exchangeable graph models include the stochastic block model (SBM) [?] and its extensions [?], latent feature models [?], and latent distance models [?].
Several key inference problems in exchangeable graph models can be formulated in terms of estimating symmetric measurable functions , known as graphons. There is a natural sampling procedure that produces an exchangeable (undirected) random graph from a graphon by first sampling a countably infinite set of independent uniform random variables , and then sampling an edge between every pair of distinct vertices and according to an independent Bernoulli random variable with weight . In the case where the graphon is constant or piecewise constant with a finite number of pieces, this procedure recovers the standard notions of Erdős–Rényi graphs and stochastic block models, respectively. But this procedure is much more general; indeed, [?] and [?] showed, via what can be viewed as a higher-dimensional analogue of de Finetti’s theorem, that the distribution of any exchangeable graph is some mixture of such sampling procedures from graphons.
Graphon estimation has been studied in two contexts: (1) graphon function estimation [?], where we are concerned with inverting the entire sampling procedure to recover a measurable function from a single sampled graph, and (2) graphon value estimation, where we are interested in inverting just the second step of the sampling procedure, to obtain estimates of the latent values from a single graph [?] (or several [?]) sampled using the sequence .
Graphons are well-approximated by step-functions in the cut distance [?], a notion of distance between graphs that extends to graphons, which we describe in Section 2. Although the topology on the collection of graphons induced by the cut distance is coarser than that induced by (as used in MSE and MISE risk), two graphons are close in the cut distance precisely when their random samples (after reordering) differ by a small fraction of edges. Hence it is natural to consider graphon estimators that produce step-functions; this has been extensively studied with the stochastic block model.
A standard approach to approximating graphons using step-functions is to first partition the vertices of the sampled graph and then return the step-function graphon determined by the average edge densities in between classes of the partition. In this way, every clustering algorithm can be seen to induce a graphon estimation procedure (Section Section 3.2). While many clustering algorithms thereby give rise to tractable graphon estimators, one challenge is to produce clustering algorithms that induce good estimators. In this paper, we introduce a method, motivated by the cut distance, that takes a vertex partition and produces another partition that yields an improved graphon estimate. By iterating this method, even better estimates can be obtained. We describe and analyze the graphon estimator that results from this iterative procedure applied to the result of a clustering algorithm.
We propose iterative step-function estimation (ISFE), a computationally tractable graphon estimation procedure motivated by the goal, suggested by the Frieze–Kannan weak regularity lemma, of finding a partition that induces a step-function estimate close in cut distance to the original graphon. ISFE iteratively improves a partition of the vertices of the sampled graph by considering the average edge densities between each vertex and each of the classes of the existing partition (Section 3).
We analyze a variant of ISFE on graphs sampled from a -step stochastic block model, and demonstrate a sense in which ISFE correctly classifies an arbitrarily large fraction of the vertices, as the number of vertices of the sampled graph and number of classes in the partition increase (Section ?).
Finally, we evaluate our graphon estimation method on data sampled from several graphons, comparing ISFE against several other graphon estimation methods (Section ?). ISFE quickly recovers detailed structure in samples from graphons having block structure, while still performing competitively with other tractable graphon estimators on various classes of continuous graphons, while making fewer structural assumptions.
2Background and related work
Throughout this paper, graphs are undirected and simple; we consider sequences of graphs that are dense, in that a graph with vertices has edges. For natural numbers , we define a graph on to be a graph with set of vertices ; its adjacency matrix is the -valued matrix , where iff has an edge between vertices and . Graphs on , and their adjacency matrices, are defined similarly. We write when two random variables and are equal in distribution, and abbreviate almost surely and almost everywhere by a.s. and a.e., respectively.
For detailed background on graphons and the relationship between graphons and exchangeable graphs, see the book by [?] and surveys by [?] and [?]. Here we briefly present the key facts that we will use.
A random graph on is exchangeable when its distribution is invariant under arbitrary permutations of . In particular, if such a graph is not a.s. empty, then the marginal probability of an edge between any two vertices is positive.
Note that a random graph on is exchangeable precisely when its adjacency matrix is jointly exchangeable. We now define a sampling procedure that produces exchangeable graphs.
A graphon can be thought of as a continuum-sized, edge-weighted graph. We can sample from a graphon in the following way.
Every -random graph is exchangeable, as is any mixture of -random graphs. Conversely, the following statement is implied by the Aldous–Hoover theorem, a two-dimensional generalization of de Finetti’s theorem, which characterizes exchangeable sequences as mixtures of i.i.d. sequences.
The Aldous–Hoover representation has since been extended to higher dimensions, more general spaces of random variables, and weaker notions of symmetry; for a detailed presentation, see [?].
Since every exchangeable graph is a mixture of graphon sampling procedures, many network models can be described in this way [?]. The stochastic block model [?] is such an example, as explored further by [?] and others; it plays a special role as one of the simplest models that can approximate arbitrary graphon sampling procedures. Some Bayesian nonparametric models, including the eigenmodel [?], Mondrian process graph model [?], and random function model [?] were built knowing the Aldous–Hoover representation. Furthermore, many other such models are naturally expressed in terms of a distribution on graphons [?], including the infinite relational model (IRM) [?] the latent feature relational model (LFRM) [?], and the infinite latent attribute model (ILA) [?].
Two different graphons can give rise to the same distribution on graphs, in which case we say that and are weakly isomorphic. For example, modifying a graphon on a measure zero subset does not change the distribution on graphs. Moreover, applying a measure-preserving transformation to the unit interval, before sampling the graphon, leaves the distribution on graphs unchanged. The following is a consequence of Proposition 7.10 and Equation (10.3) of [?].
Thus, the graphon from which an exchangeable graph is sampled is non-identifiable; see [?]. Such measure-preserving transformations are essentially the only freedom allowed. Hence the appropriate object to estimate is a graphon up to weak isomorphism.
As a result of Theorem ?, when considering the problem of estimating a graphon, we only ask to recover the graphon up to a measure-preserving transformation; this is analogous to a key aspect of the definitions of cut distance and of distance between graphons, which we describe in Appendix ?.
2.2The graphon estimation problems
Given a graph with adjacency matrix sampled according to Equation , there are two natural ways one may seek to invert this sampling procedure. Here we consider two distinct graphon estimation problems that correspond to inverting one or both of the sampling steps. The “graphon value estimation problem” aims to invert the second step of the sampling procedure, and hence can be thought of as finding the local underlying structure of a graph sampled from a graphon (without concluding anything about the graphon at any location not involved in the sample). Suppose we sample the -random graph using as in Equation . Graphon value estimation consists of giving an estimator for the matrix where each . One measure of success for the graphon value estimation problem is given by the mean squared error:
as used by [?] and [?] (see also [?]). Whereas MSE in nonparametric function estimation is typically with respect to particular points of the domain (see, e.g., [?]), here the random sequence is latent, and so we take the expectation also with respect to the randomness in the terms (and hence in the terms ), following [?].
The “graphon function estimation problem” aims to invert the entire sampling procedure to recover a graphon (i.e., symmetric measurable function). A notion of success for the graphon function estimator problem, used by [?], [?], and [?], is given by the mean integrated squared error for an estimator of a graphon :
where ranges over measure-preserving transformations of . However, as we describe in Appendix ?, there are graphons and such that the random graphs and are close in distribution, but and are far in distance. An alternative global notion of success for the function estimation problem is to use the distribution of such random graphs directly [?], or to use the cut distance, defined in terms of the cut along which two graphs differ the most in their edge densities, which also captures this notion of subsamples being close in distribution; see Appendix ?.
The distinction between these two problems is analogous to the typical distinction between MSE and MISE in nonparametric function estimation [?]; see also the two estimation problems in [?].
In general, it is impossible to recover a measurable function from its values at a countable number of points. However, if we assume that the measurable function has specific structure (e.g., is continuous, Lipschitz, a step-function, etc.), then it may become possible. As a result, many graphon estimation methods, which we describe below, require the graphon to have a representation of a certain form. However, the problem of recovering a real-valued function from its values at a random set of inputs, under various assumptions on the function, may be treated separately from the estimation of these values. Hence in this paper, while we illustrate the step-function graphon provided by ISFE, we evaluate its graphon value estimate using MSE.
2.3Graphon estimation methods
The first study of graphon estimation was by [?] in the more general context of exchangeable arrays. This work predates the development of the theory of graphons; for details, see [?].
A number of graphon estimators have been proposed in recent years. Here we mention several that are most closely related to our approach. The stochastic block model approximation (SBA) [?] requires multiple samples on the same vertex set, but is similar to our approach in some respects, as it partitions the vertex set according to the metric on their edge vectors (in essence, the vector of average edge densities with respect to the discrete partition). Sorting and smoothing (SAS) [?] takes a different approach to providing a computational tractable estimator, requiring the graphon to have absolutely continuous degree distribution.
Several estimators use spectral methods, including universal singular value thresholding (USVT) [?]. Rather than estimating a specific cluster and using this to define a step-function, [?] first estimate a co-cluster matrix and then obtain a graphon estimate from this matrix by using eigenvalue truncation and -means.
Other recent work in graphon estimation has focused on minimax optimality, histogram bin width, estimation using moments, or consequences of the graphon satisfying certain Lipschitz or Hölder conditions [?].
The estimation problem for latent space models can also be seen as graphon estimation, as such models are equivalent to graphon sampling procedures for graphons having nicer properties than mere measurability [?].
Many of the above graphon estimators are formulated in the setting of bipartite graphs and separate exchangeability, where the distribution is invariant under separate permutations of the rows and columns. For notational simplicity, we focus on the case of arbitrary undirected graphs, whose adjacency matrices are symmetric, and for which joint exchangeability is the appropriate notion, but many of our results have straightforward analogues for bipartite graphs.
3Iterative step-function estimation
We first discuss how a partition of a finite graph’s vertex set induces a step-function graphon and how clustering algorithms produce step-function graphon estimators. Next we propose iterative step-function estimation (ISFE), an approach to iteratively improving such estimates by forming a new partition whose classes contain vertices that have similar edge densities with respect to the old partition.
3.1Step-function estimators for graphons
A step-function graphon can be associated with any finite graph given a partition of its vertices. Our presentation largely follows §7.1 and §9.2 of [?], with modified notation.
A graphon is called a step-function when there is a partition of into finitely many measurable pieces, called steps, such that is constant on each set . Suppose is a vertex-weighted, edge-weighted graph on , with vertex weights and edge-weights for . Then the step-function graphon associated with is defined by for and , where the steps form a partition of into consecutive intervals of size for . (Consider an unweighted finite graph to be the weighted graph with vertex weights and edge weights .)
Given a graph on and vertex sets , write for the number of edges across the cut . Then the edge density in between and is defined to be when and are disjoint, this quantity is the fraction of possible edges between and that contains.
Now suppose is a graph on and is a partition of the vertices of into classes. The quotient graph is defined to be the weighted graph on with respective vertex weights and edge weights . For our estimation procedure, we will routinely pass from a sampled graph and a partition of its vertex set to the graphon formed from the quotient .
One may similarly define the step-function graphon of with respect to a measurable partition of as the step-function graphon of the weighted graph with each vertex weight equal to the measure of and edge weight .
The Frieze–Kannan weak regularity lemma [?] implies that every graphon is well-approximated in the cut distance by such step-functions formed from measurable partitions; moreover, a bound on the quality of such an approximation is determined by the number of classes in the partition, uniformly in the choice of graphon. For further details, see Appendix ?.
3.2Graphon estimation via clustering
The partition of a finite graph described in Section 3.1, which the step-function utilizes, can be formed by clustering the nodes using some general clustering method, such as -means [?], hierarchical agglomerative clustering [?], random assignment, or simpler clusterings, such as the trivial partition, in which all vertices are assigned to a single class, or the discrete partition, in which all vertices are in separate classes.
In Figure ?, we display the result of estimating a graphon according to several clustering algorithms. In all graphon figures, we use a grayscale gradient for values on , where darker values are closer to .
Within the graphon estimator literature, several techniques produce step-functions, but the analysis has generally focused on the choice of partition size [?] or on the convergence rates for optimal partitions [?], or else the technique requires multiple observations [?]. Here we aim to exploit structural aspects of graphs, such weak regularity (i.e., their uniform approximability in the cut distance), via an algorithm for forming a new partition that improves the step-function estimate produced by any given partition .
3.3Iterative step-function estimation
In Algorithm ?, we describe iterative step-function estimation (ISFE), which can be used to produce graphon function and value estimates.
Given a finite graph , consider the following graphon function estimator procedure: (a) partition the vertices of according to some clustering algorithm; (b) repeatedly improve this partition by iteratively running Algorithm ? for iterations; and (c) report the step-function graphon , where is the final partition produced, with its classes sorted according to their average edge densities. Let be a graphon and , and suppose is a sample of the -random graph . The ISFE procedure on can be evaluated as a graphon function estimate in terms of MISE by directly comparing to .
ISFE can also be used to produce a graphon value estimate from a graph on . Let be the number of classes in an initial partition of . Implicit in the ISFE procedure is a map sending each vertex of to the index of its class in . A graphon value estimate is then given by . In other words, a regular grid of points within is chosen as a set of representatives of the piecewise constant regions of , in some order that corresponds to how the vertices of were rearranged into the partition . In a synthetic run, where is a sample of the -random graph and we retain the history of how was formed from the values , MSE can be evaluated by comparing with .