Sparse Exchangeable Graphs and Their Limitsvia Graphon Processes

Sparse Exchangeable Graphs and Their Limits
via Graphon Processes

\nameChristian Borgs \emailborgs@microsoft.com
\addrMicrosoft Research
One Memorial Drive
Cambridge, MA 02142, USA \AND\nameJennifer T. Chayes \emailjchayes@microsoft.com
\addrMicrosoft Research
One Memorial Drive
Cambridge, MA 02142, USA \AND\nameHenry Cohn \emailcohn@microsoft.com
\addrMicrosoft Research
One Memorial Drive
Cambridge, MA 02142, USA \AND\nameNina Holden \emailninah@math.mit.edu
\addrDepartment of Mathematics
Massachusetts Institute of Technology
Cambridge, MA 02139, USA
Abstract

In a recent paper, Caron and Fox suggest a probabilistic model for sparse graphs which are exchangeable when associating each vertex with a time parameter in . Here we show that by generalizing the classical definition of graphons as functions over probability spaces to functions over -finite measure spaces, we can model a large family of exchangeable graphs, including the Caron-Fox graphs and the traditional exchangeable dense graphs as special cases. Explicitly, modelling the underlying space of features by a -finite measure space and the connection probabilities by an integrable function , we construct a random family of growing graphs such that the vertices of are given by a Poisson point process on with intensity , with two points of the point process connected with probability . We call such a random family a graphon process. We prove that a graphon process has convergent subgraph frequencies (with possibly infinite limits) and that, in the natural extension of the cut metric to our setting, the sequence converges to the generating graphon. We also show that the underlying graphon is identifiable only as an equivalence class over graphons with cut distance zero. More generally, we study metric convergence for arbitrary (not necessarily random) sequences of graphs, and show that a sequence of graphs has a convergent subsequence if and only if it has a subsequence satisfying a property we call uniform regularity of tails. Finally, we prove that every graphon is equivalent to a graphon on equipped with Lebesgue measure.

Sparse Exchangeable Graphs and Their Limits via Graphon Processes Christian Borgs borgs@microsoft.com
Microsoft Research
One Memorial Drive
Cambridge, MA 02142, USA
Jennifer T. Chayes jchayes@microsoft.com
Microsoft Research
One Memorial Drive
Cambridge, MA 02142, USA
Henry Cohn cohn@microsoft.com
Microsoft Research
One Memorial Drive
Cambridge, MA 02142, USA
Nina Holden ninah@math.mit.edu
Department of Mathematics
Massachusetts Institute of Technology
Cambridge, MA 02139, USA

Keywords: graphons, graph convergence, sparse graph convergence, modelling of sparse networks, exchangeable graph models

1 Introduction

The theory of graphons has provided a powerful tool for sampling and studying convergence properties of sequences of dense graphs. Graphons characterize limiting properties of dense graph sequences, such as properties arising in combinatorial optimization and statistical physics. Furthermore, sequences of dense graphs sampled from a (possibly random) graphon are characterized by a natural notion of exchangeability via the Aldous-Hoover theorem. This paper presents an analogous theory for sparse graphs.

In the past few years, graphons have been used as non-parametric extensions of stochastic block models, to model and learn large networks. There have been several rigorous papers on the subject of consistent estimation using graphons (see, for example, papers by , 2009, , 2011, , 2011, , 2012, , 2013, , 2015, , 2015, , 2017, and , 2015, as well as references therein), and graphons have also been used to estimate real-world networks, such as Facebook and LinkedIn (E. M. Airoldi, private communication, 2015). This makes it especially useful to have graphon models for sparse networks with unbounded degrees, which are the appropriate description of many large real-world networks.

In the classical theory of graphons as studied by, for example, Borgs, Chayes, Lovász, Sós, and Vesztergombi (2006), Lovász and Szegedy (2006), Borgs, Chayes, Lovász, Sós, and Vesztergombi (2008), Bollobás and Riordan (2009), Borgs, Chayes, and Lovász (2010), and Janson (2013), a graphon is a symmetric -valued function defined on a probability space. In our generalized theory we let the underlying measure space of the graphon be a -finite measure space; i.e., we allow the space to have infinite total measure. More precisely, given a -finite measure space we define a graphon to be a pair , where is a symmetric integrable function, with the special case when is -valued being most relevant for the random graphs studied in the current paper. We present a random graph model associated with these generalized graphons which has a number of properties making it appropriate for modelling sparse networks, and we present a new theory for convergence of graphs in which our generalized graphons arise naturally as limits of sparse graphs.

Given a -valued graphon with a -finite measure space, we will now define a random process which generalizes the classical notion of -random graphs, introduced in the statistics literature (Hoff, Raftery, and Handcock, 2002) under the name latent position graphs, in the context of graph limits (Lovász and Szegedy, 2006) as -random graphs, and in the context of extensions of the classical random graph theory (Bollobás, Janson, and Riordan, 2007) as inhomogeneous random graphs. Recall that in the classical setting where is defined on a probability space, -random graphs are generated by first choosing points i.i.d. from the probability distribution over the feature space , and then connecting the vertices and with probability . Here, inspired by Caron and Fox (2014), we generalize this to arbitrary -finite measure spaces by first considering a Poisson point process111We will make this construction more precise in Section 2.4; in particular, we will explain that we may associate with a collection of random variables . The same result holds for the Poisson point process considered in the next paragraph. with intensity on for any fixed , and then connecting two points in with probability . As explained in the next paragraph, this leads to a family of graphs such that the graphs have almost surely at most countably infinitely many vertices and (assuming appropriate integrability conditions on , e.g., ) a finite number of edges. Removing all isolated vertices from , we obtain a family of graphs that are almost surely finite. We refer to the families and as graphon processes; when it is necessary to distinguish the two, we call them graphon processes with or without isolated vertices, respectively.

Figure 1: This figure illustrates how we can generate a graphon process from a graphon , where is a -finite measure space. The two coordinate axes on the middle figure represent our feature space , where the red (resp. blue) dots on the axes represent vertices born during (resp. ) for , and the red (resp. blue) dots in the interior of the first quadrant represent edges in for (resp. ). The graph is an induced subgraph of a graph with infinitely many vertices in the case , such that is obtained from by removing isolated vertices. At time the marginal law of the features of is a Poisson point process on with intensity . Two distinct vertices with features and , respectively, are connected to each other by an undirected edge with probability . The coordinate axes on the right figure represent time . We get the graph by considering the edges restricted to . Note that the coordinate axes in the right figure and the graphs in the left figure are slightly inaccurate if we assume , since in this case there are infinitely many isolated vertices in for each . We have chosen to label the vertices by the order in which they appear in , where ties are resolved by considering the time the vertices were born, i.e., by considering the time they appeared in .

To interpret the graphon process as a family of growing graphs we will need to couple the graphs for different times . To this end, we consider a Poisson point process on (with being equipped with the Borel -algebra and Lebesgue measure). Each point of corresponds to a vertex of an infinite graph , where the coordinate is interpreted as the time the vertex is born and the coordinate describes a feature of the vertex. Two distinct vertices and are connected by an undirected edge with probability , independently for each possible pair of distinct vertices. For each fixed time define a graph by considering the induced subgraph of corresponding to vertices which are born at time or earlier, where we do not include vertices which would be isolated in . See Figure 1 for an illustration. The family of growing graphs just described includes classical dense -random graphs (up to isolated vertices) and the sparse graphs studied by Caron and Fox (2014) and Herlau, Schmidt, and Mørup (2016) as special cases, and is (except for minor technical differences) identical to the family of random graphs studied by Veitch and Roy (2015), a paper which was written in parallel with our paper; see our remark at the end of this introduction.

The graphon process satisfies a natural notion of exchangeability. Roughly speaking, in our setting this means that the features of newly born vertices are homogeneous in time. More precisely, it can be defined as joint exchangeability of a random measure in , where the two coordinates correspond to time, and each edge of the graph corresponds to a point mass. We will prove that graphon processes as defined above, with integrable and possibly random, are characterized by exchangeability of the random measure in along with a certain regularity condition we call uniform regularity of tails. See Proposition 26 in Section 2.4. This result is an analogue in the setting of possibly sparse graphs satisfying the aforementioned regularity condition of the Aldous-Hoover theorem (Aldous, 1981; Hoover, 1979), which characterizes -random graphs over probability spaces as graphs that are invariant in law under permutation of their vertices.

The graphon processes defined above also have a number of other properties making them particularly natural to model sparse graphs or networks. They are suitable for modelling networks which grow over time since no additional rescaling parameters (like the explicitly given density dependence on the number of vertices specified by , 2009, and , 2014a) are necessary; all information about the random graph model is encoded by the graphon alone. The graphs are projective in the sense that if the graph is an induced subgraph of . Finally, a closely related family of weighted graphs is proven by Caron and Fox (2014) to have power law degree distribution for certain , and our graphon processes are expected to behave similarly. The graphon processes studied in this paper have a different qualitative behavior than the sparse -random graphs studied by Bollobás and Riordan (2009) and Borgs, Chayes, Cohn, and Zhao (2014a, b) (see Figure 2), with the only overlap of the two theories occurring when the graphs are dense. If the sparsity of the graphs is caused by the degrees of the vertices being scaled down approximately uniformly over time, then the model studied by Bollobás and Riordan (2009) and Borgs, Chayes, Cohn, and Zhao (2014a, b) is most natural. If the sparsity is caused by later vertices typically having lower connectivity probabilities than earlier vertices, then the model presented in this paper is most natural. The sampling method we will use in our forthcoming paper (Borgs, Chayes, Cohn, and Holden, 2017) generalizes both of these methods.

rescaled graphon on probability space

graphon with non-compact support

Figure 2: The adjacency matrices of graphs sampled as described by Borgs, Chayes, Cohn, and Zhao (2014a) (left) and in this paper (right), where we used the graphon (left) and the graphon (right), with for and for . Black (resp. white) indicates that there is (resp. is not) an edge. We rescaled the height of the graphon by on the left figure. As described by Borgs, Chayes, Cohn, and Zhao (2014a, b) the type of each vertex is sampled independently and uniformly from , and each pair of vertices is connected with probability . In the right figure the vertices were sampled by a Poisson point process on of intensity , and two vertices were connected independently with a probability given by ; see Section 2.4 and the main text of this introduction. The two graphs have very different qualitative properties. In the left graph most vertices have a degree close to the average degree, where the average degree depends on our scaling factor . In the right graph the edges are distributed more inhomogeneously: most of the edges are contained in induced subgraphs of constant density, and the sparsity is caused by a large number of vertices with very low degree.

To compare different models, and to discuss notions of convergence, we introduce the following natural generalization of the cut metric for graphons on probability spaces to our setting. For two graphons and , this metric is easiest to define when the two graphons are defined over the same space. However, for applications we want to compare graphons over different spaces, say two Borel spaces and . Assuming that both Borel spaces have infinite total measure, the cut distance between and can then be defined as

(1)

where we take the infimum over measure-preserving maps for , for , and the supremum is over measurable sets . (See Definition 5 below for the definition of the cut distance for graphons over general spaces, including the case where one or both spaces have finite total mass.) We call two graphons equivalent if they have cut distance zero. As we will see, two graphons are equivalent if and only if the random families generated from these graphons have the same distribution; see Theorem 27 below.

To compare graphs and graphons, we embed a graph on vertices into the set of step functions over in the usual way by decomposing into adjacent intervals of lengths , and define a step function as the function which is equal to on if and are connected in , and equal to otherwise. Extending to a function on by setting it to zero outside of , we can then compare graphs to graphons on measure spaces of infinite mass, and in particular we get a notion of convergence in metric of a sequence of graphs to a graphon .

In the classical theory of graph convergence, such a sequence will converge to the zero graphon whenever the sequence is sparse.222Here, as usual, a sequence of simple graphs is considered sparse if the number of edges divided by the square of the number of vertices goes to zero. We resolve this difficulty by rescaling the input arguments of the step function so as to get a “stretched graphon” satisfying . Equivalently, we may interpret as a graphon where the measure of the underlying measure space is rescaled. See Figure 3 for an illustration, which also compares the rescaling in the current paper with the rescaling considered by Borgs, Chayes, Cohn, and Zhao (2014a). We say that converges to a graphon (with norm equal to 1) for the stretched cut metric if . Graphons on -finite measure spaces of infinite total measure may therefore be considered as limiting objects for sequences of sparse graphs, similarly as graphons on probability spaces are considered limits of dense graphs. We prove that graphon processes converge to the generating graphon in the stretched cut metric; see Proposition 28 in Section 2.4. We will also consider another family of random sparse graphs associated with a graphon over a -finite measure space, and prove that these graphs are also converging for the stretched cut metric.

1

1

1

canonical graphon

1

1

rescaled graphon

1

stretched graphon

Figure 3: The figure shows three graphons associated with the same simple graph on five vertices. In the classical theory of graphons all simple sparse graphs converge to the zero graphon. We may prevent this by renormalizing the graphons, either by rescaling the height of the graphon (middle) or by stretching the domain on which it is defined (right). The first approach was chosen by Bollobás and Riordan (2009) and Borgs, Chayes, Cohn, and Zhao (2014a, b), and the second approach is chosen in this paper. In our forthcoming paper (Borgs, Chayes, Cohn, and Holden, 2017) we choose a combined approach, where the renormalization depends on the observed graph.

Particular random graph models of special interest arise by considering certain classes of graphons . Caron and Fox (2014) consider graphons on the form (with a slightly different definition on the diagonal, since they also allow for self-edges) for certain decreasing functions . In this model represents a sociability parameter of each vertex. A multi-edge version of this model allows for an alternative sampling procedure to the one we present above (Caron and Fox, 2014, Section 3). Herlau, Schmidt, and Mørup (2016) introduced a generalization of the model of Caron and Fox (2014) to graphs with block structure. In this model each node is associated to a type from a finite index set for some , in addition to its sociability parameter, such that the probability of two nodes connecting depends both on their type and their sociability. More generally we can obtain sparse graphs with block structure by considering integrable functions for , and defining and . As compared to the block model of Herlau, Schmidt, and Mørup (2016), this allows for a more complex interaction within and between the blocks. An alternative generalization of the stochastic block model to our setting is to consider infinitely many disjoint intervals for , and define for constants . For the block model of Herlau, Schmidt, and Mørup (2016) and our first generalization above (with ), the degree distribution of the vertices within each block will typically be strongly inhomogeneous; by contrast, in our second generalization above (with infinitely many blocks), all vertices within the same block have the same connectivity probabilities, and hence the degree distribution will be more homogeneous.

We can also model sparse graphs with mixed membership structure within our framework. In this case we let be the standard -simplex, and define . For a vertex with feature the first coordinate is a vector such that for describes the proportion of time the vertex is part of community , and the second coordinate describes the role of the vertex within the community; for example, could be a sociability parameter. For each let be a graphon describing the interactions between the communities and . We define our mixed membership graphon by

Alternatively, we could define , which would provide a model where, for example, the sociability of a node varies depending on which community it is part of.

In the classical setting of dense graphs, many papers only consider graphons defined on the unit square, instead of graphons on more general probability spaces. This is justified by the fact that every graphon with a probability space as base space is equivalent to a graphon with base space . The analogue in our setting would be graphons over equipped with the Lebesgue measure. As the examples in the preceding paragraphs illustrate, for certain random graph models it is more natural to consider another underlying measure space. For example, each coordinate in some higher-dimensional space may correspond to a particular feature of the vertices, and changing the base space can disrupt certain properties of the graphon, such as smoothness conditions. For this reason we consider graphons defined on general -finite measure spaces in this paper. However, we will prove that every graphon is equivalent to a graphon on equipped with the Borel -algebra and Lebesgue measure, in the sense that their cut distance is zero; see Proposition 10 in Section 2.2. As stated before, our results then imply that they correspond to the same random graph model.

The set of -valued graphons on probability spaces is compact for the cut metric. For the possibly unbounded graphons studied by Borgs, Chayes, Cohn, and Zhao (2014a), which are real-valued and defined on probability spaces, compactness holds if we consider closed subsets of the space of graphons which are uniformly upper regular (see Section 2.3 for the definition). In our setting, where we look at graphons over spaces of possibly infinite measure, the analogous regularity condition is uniform regularity of tails if we restrict ourselves to, say, -valued graphons. In particular our results imply that a sequence of simple graphs with uniformly regular tails is subsequentially convergent, and conversely, that every convergent sequence of simple graphs has uniformly regular tails. See Theorem 15 in Section 2.3 and the two corollaries following this theorem.

In the setting of dense graphs, convergence for the cut metric is equivalent to left convergence, meaning that subgraph densities converge. This equivalence does not hold in our setting, or for the unbounded graphons studied by Borgs, Chayes, Cohn, and Zhao (2014a, b); its failure is characteristic of sparse graphs, because deleting even a tiny fraction of the edges in a sparse graph can radically change the densities of larger subgraphs (see the discussion by , 2014a, Section 2.9). However, randomly sampled graphs do satisfy a notion of left convergence; see Proposition 30 in Section 2.5.

As previously mentioned, in our forthcoming paper (Borgs, Chayes, Cohn, and Holden, 2017) we will generalize and unify the theories and models presented by Bollobás and Riordan (2009), Borgs, Chayes, Cohn, and Zhao (2014a, b), Caron and Fox (2014), Herlau, Schmidt, and Mørup (2016), and Veitch and Roy (2015). Along with the introduction of a generalized model for sampling graphs and an alternative (and weaker) cut metric, we will prove a number of convergence properties of these graphs. Since the graphs in this paper are obtained as a special case of the graphs in our forthcoming paper, the mentioned convergence results also hold in our setting.

In Section 2 we will state the main results of this paper, which will be proved in the subsequent appendices. In Appendix A we prove that the cut metric is well defined. In Appendix B we prove that any graphon is equivalent to a graphon with underlying measure space . We also prove that under certain conditions on the underlying measure space we may define the cut metric in a number of equivalent ways. In Appendix C, we deal with some technicalities regarding graph-valued processes. In Appendix D we prove that certain random graph models derived from a graphon , including the graphon processes defined above, give graphs converging to for the cut metric. We also prove that two graphons are equivalent (i.e., they have cut distance zero) iff the corresponding graphon processes are equal in law. In Appendix E we prove that uniform regularity of tails is sufficient to guarantee subsequential metric convergence for a sequence of graphs; conversely, we prove that every convergent sequence of graphs with non-negative edge weights has uniformly regular tails. In Appendix F we prove some basic properties of sequences of graphs which are metric convergent, for example that metric convergence implies unbounded average degree if the number of edges diverge and the graph does not have too many isolated vertices; see Proposition 22 below. We also compare the notion of metric graph convergence in this paper to the one studied by Borgs, Chayes, Cohn, and Zhao (2014a). In Appendix G we prove with reference to the Kallenberg theorem for jointly exchangeable measures that graphon processes for integrable are uniquely characterized as exchangeable graph processes satisfying uniform tail regularity. We also describe more general families of graphs that may be obtained from the Kallenberg representation theorem if this regularity condition is not imposed. Finally, in Appendix H we prove our results on left convergence of graphon processes.

Remark 1

After writing a first draft of this work, but a little over a month before completing the paper, we became aware of parallel, independent work by Veitch and Roy (2015), who introduce a closely related model for exchangeable sparse graphs and interpret it with reference to the Kallenberg theorem for exchangeable measures. The random graph model studied by Veitch and Roy (2015) is (up to minor differences) the same as the graphon processes introduced in the current paper. Aside from both introducing this model, the results of the two papers are essentially disjoint. While Veitch and Roy (2015) focus on particular properties of the graphs in a graphon process (in particular, the expected number of edges and vertices, the degree distribution, and the existence of a giant component under certain assumptions on ), our focus is graph convergence, the cut metric, and the question of when two different graphons lead to the same graphon process.

See also the subsequent paper by Janson (2016) expanding on the results of our paper, characterizing in particular when two graphons are equivalent, and proving additional compactness results for graphons over -finite spaces.

2 Definitions and Main Results

We will work mainly with simple graphs, but we will allow the graphs to have weighted vertices and edges for some of our definitions and results. We denote the vertex set of a graph by and the edge set of by . The sets and may be infinite, but we require them to be countable. If is weighted, with edge weights and vertex weights , we require the vertex weights to be non-negative, and we often (but not always) require that (note that is defined in such a way that for an unweighted graph, it is equal to , as opposed to the density, which is ill-defined if ). We define the edge density of a finite simple graph to be . Letting denote the positive integers, a sequence of simple, finite graphs will be called sparse if as , and dense if . When we consider graph-valued stochastic processes or of simple graphs, we will assume each vertex is labeled by a distinct number in , so we can view as a subset of and as a subset of . The labels allow us to keep track of individual vertices in the graph over time. In Section 2.4 we define a topology and -algebra on the set of such graphs.

2.1 Measure-theoretic Preliminaries

We start by recalling several notions from measure theory.

For two measure spaces and , a measurable map is called measure-preserving if for every we have . Two measure spaces and are called isomorphic if there exists a bimeasurable, bijective, and measure-preserving map . A Borel measure space is defined as a measure space that is isomorphic to a Borel subset of a complete separable metric space equipped with a Borel measure.

Throughout most of this paper, we consider -finite measure spaces, i.e., spaces such that can be written as a countable union of sets with . Recall that a set is an atom if and if every measurable satisfies either or . The measure space is atomless if it has no atoms. Every atomless -finite Borel space of infinite measure is isomorphic to , where is the Borel -algebra and is Lebesgue measure; for the convenience of the reader, we prove this as Lemma 33 below.

We also need the notion of a coupling, a concept well known for probability spaces: if is a measure space for and , we say that is a coupling of and if is a measure on with marginals and , i.e., if for all and for all . Note that this definition of coupling is closely related to the definition of coupling of probability measures, which applies when . For probability spaces, it is easy to see that every pair of measures has a coupling (for example, the product space of the two probability spaces). We prove the existence of a coupling for -finite measure spaces in Appendix A, where this fact is stated as part of a more general lemma, Lemma 34.

Finally, we say that a measure space extends a measure space if , , and for all . We say that is a restriction of , or, if is specified, the restriction of to .

2.2 Graphons and Cut Metric

We will work with the following definition of a graphon.

Definition 2

A graphon is a pair , where is a -finite measure space satisfying and is a symmetric real-valued function that is measurable with respect to the product -algebra and integrable with respect to . We say that is a graphon over .

Remark 3

Most literature on graphons defines a graphon to be the function instead of the pair . We have chosen the above definition since the underlying measure space will play an important role. Much literature on graphons requires to take values in , and some of our results will also be restricted to this case. The major difference between the above definition and the definition of a graphon in the existing literature, however, is that we allow the graphon to be defined on a measure space of possibly infinite measure, instead of a probability space.333The term “graphon” was coined by Borgs, Chayes, Lovász, Sós, and Vesztergombi (2008), but the use of this concept in combinatorics goes back to at least Frieze and Kannan (1999), who considered a version of the regularity lemma for functions over . As a limit object for convergent graph sequences it was introduced by Lovász and Szegedy (2006), where it was called a -function, and graphons over general probability spaces were first studied by Borgs, Chayes, and Lovász (2010) and Janson (2013).

Remark 4

One may relax the integrability condition for in the above definition such that the corresponding random graph model (as defined in Definition 25 below) still gives graphs with finitely many vertices and edges for each bounded time. This more general definition is used by Veitch and Roy (2015). We work with the above definition since the majority of the analysis in this paper is related to convergence properties and graph limits, and our definition of the cut metric is most natural for integrable graphons. An exception is the notion of subgraph density convergence in the corresponding random graph model, which we discuss in the more general setting of not necessarily integrable graphons; see Remark 31 below.

We will mainly study simple graphs in the current paper, in particular, graphs which do not have self-edges. However, the theory can be generalized in a straightforward way to graphs with self-edges, in which case we would also impose an integrability condition for along its diagonal.

If , where is a Borel subset of , is the Borel -algebra, and is Lebesgue measure, we write to simplify notation. For example, we write instead of .

For any measure space and integrable function , define the cut norm of over by

If and/or is clear from the context we may write or to simplify notation.

Given a graphon with and a set , we say that is the restriction of to if is the restriction of to and . We say that is the trivial extension of to if is the restriction of to and . For measure spaces and , a graphon , and a measurable map , we define the graphon by for . We say that (resp. ) is a pullback of (resp. ) onto . Finally, let denote the norm.

Definition 5

For , let with be a graphon.

  • If , the cut metric and invariant metric are defined by

    (2)

    where denotes projection for , and we take the infimum over all couplings of and .

  • If , let be a -finite measure space extending for such that . Let be the trivial extension of to , and define

  • We call two graphons and equivalent if .

The following proposition will be proved in Appendix A. Recall that a pseudometric on a set is a function from to which satisfies all the requirements of a metric, except that the distance between two different points might be zero.

Proposition 6

The metrics and given in Definition 5 are well defined; in other words, under the assumptions of (i) there exists at least one coupling , and under the assumptions of (ii) the definitions of and do not depend on the choice of extensions . Furthermore, and are pseudometrics on the space of graphons.

An important input to the proof of the proposition (Lemma 42 in Appendix A) is that the (resp. ) distance between two graphons over spaces of equal measure, as defined in Definition 5(i), is invariant under trivial extensions. The lemma is proved by first showing that it holds for step functions (where the proof more or less boils down to an explicit calculation) and then using the fact that every graphon can be approximated by a step function.

We will see in Proposition 48 in Appendix B that under additional assumptions on the underlying measure spaces and the cut metric can be defined equivalently in a number of other ways, giving, in particular, the equivalence of the definitions (1) and (2) in the case of two Borel spaces of infinite mass. Similar results hold for the metric ; see Remark 49.

While the two metrics and are not equivalent, a fact which is already well known from the theory of graph convergence for dense graphs, it turns out that the statement that two graphons have distance zero in the cut metric is equivalent to the same statement in the invariant metric. This is the content of our next proposition.

Proposition 7

Let and be graphons. Then if and only if .

The proposition will be proved in Appendix B. (We will actually prove a generalization of this proposition involving an invariant version of the metric; see Proposition 50.) The proof proceeds by first showing (Proposition 51) that if for graphons with for , then there exists a particular measure on such that . Under certain conditions we may assume that is a coupling measure, in which case it follows that the infimum in the definition of is a minimum; see Proposition 8 below.

To state our next proposition we define a coupling between two graphons with for as a pair of graphons over a space of the form , where is a coupling of and and , and where as before, denotes the projection from onto for .

Proposition 8

Let be graphons over -finite Borel spaces , and let , for . If , then the restrictions of and to and can be coupled in such a way that they are equal a.e.

The proposition will be proved in Appendix B. Note that Janson (2016, Theorem 5.3) independently proved a similar result, building on a previous version of the present paper which did not yet contain Proposition 8. His result states that if the cut distance between two graphons over -finite Borel spaces is zero, then there are trivial extensions of these graphons such that the extensions can be coupled so as to be equal almost everywhere. It is easy to see that our result implies his, but we believe that with a little more work, it should be possible to deduce ours from his as well.

Remark 9

Note that the classical theory of graphons on probability spaces appears as a special case of the above definitions by taking to be a probability space. Our definition of the cut metric is equivalent to the standard definition for graphons on probability spaces; see, for example, papers by Borgs, Chayes, Lovász, Sós, and Vesztergombi (2008) and Janson (2013). Note that is not a true metric, only a pseudometric, but we call it a metric to be consistent with existing literature on graphons. However, it is a metric on the set of equivalence classes as derived from the equivalence relation in Definition 5 (iii).

We work with graphons defined on general -finite measure spaces, rather than graphons on , since particular underlying spaces are more natural to consider for certain random graphs or networks. However, the following proposition shows that every graphon is equivalent to a graphon over .

Proposition 10

For each graphon there exists a graphon such that .

The proof of the proposition follows a similar strategy as the proof of the analogous result for probability spaces by Borgs, Chayes, and Lovász (2010, Theorem 3.2) and Janson (2013, Theorem 7.1), and will be given in Appendix B. The proof uses in particular the result that an atomless -finite Borel space is isomorphic to an interval equipped with Lebesgue measure (Lemma 33).

2.3 Graph Convergence

To define graph convergence in the cut metric, one traditionally (Borgs, Chayes, Lovász, Sós, and Vesztergombi, 2006; Lovász and Szegedy, 2006; Borgs, Chayes, Lovász, Sós, and Vesztergombi, 2008) embeds the set of graphs into the set of graphons via the following map. Given any finite weighted graph we define the canonical graphon as follows. Let be an ordering of the vertices of . For any let denote the weight of , for any let denote the weight of the edge , and for define . By rescaling the vertex weights if necessary we assume without loss of generality that . If is simple all vertices have weight , and we define . Let be a partition of into adjacent intervals of lengths (say the first one closed, and all others half open), and finally define by

Note that depends on the ordering of the vertices, but that different orderings give graphons with cut distance zero. We define a sequence of weighted, finite graphs to be sparse444Note that in the case of weighted graphs there are multiple natural definitions of what it means for a sequence of graphs to be sparse or dense. Instead of considering the norm as in our definition, one may for example consider the fraction of edges with non-zero weight, either weighted by the vertex weights or not. In the current paper we do not define what it means for a sequence of weighted graphs to be dense, since it is not immediate which definition is most natural, and since the focus of this paper is sparse graphs. if as . Note that this generalizes the definition we gave in the very beginning of Section 2 for simple graphs.

A sequence of graphs is then defined to be convergent in metric if is a Cauchy sequence in the metric , and it is said to be convergent to a graphon if . Equivalently, one can define convergence of by identifying a weighted graph with the graphon , where consists of the vertex set equipped with the probability measure given by the weights (or the uniform measure if has no vertex weights), and is the function that maps to .

In the classical theory of graph convergence a sequence of sparse graphs converges to the trivial graphon with . This follows immediately from the fact that for sparse graphs. To address this problem, Bollobás and Riordan (2009) and Borgs, Chayes, Cohn, and Zhao (2014a) considered the sequence of reweighted graphons , where with for any graph , and defined to be convergent iff is convergent. The theory developed in the current paper considers a different rescaling, namely a rescaling of the arguments of the function , which, as explained after Definition 11 below, is equivalent to rescaling the measure of the underlying measurable space.

We define the stretched canonical graphon to be identical to except that we “stretch” the function to a function such that . More precisely, , where

Note that in the case of a simple graph , each node in corresponds to an interval of length in the canonical graphon , while it corresponds to an interval of length in the stretched canonical graphon.

It will sometimes be convenient to define stretched canonical graphons for graphs with infinitely many vertices (but finitely555More generally, in the setting of weighted graphs, we can allow for infinitely many edges as long as . many edges). Our definition of makes no sense for simple graphs with infinitely many vertices, because they cannot all be crammed into the unit interval. Instead, given a finite or countably infinite graph with vertex weights which do not necessarily sum to (and may even sum to ), we define a graphon by setting if , and if there exist no such pair , with being the interval where we assume the vertices of have been labeled , and for . The stretched canonical graphon will then be defined as the graphon with

a definition which can easily be seen to be equivalent to the previous one if is a finite graph.

Alternatively, one can define a stretched graphon as a graphon over equipped with the measure , where

for any . In the case where , this graphon is obtained from the graphon representing by rescaling the probability measure

to the measure , while the function with is left untouched.

Note that any graphon with underlying measure space can be “stretched” in the same way as ; in other words, given any graphon we may define a graphon , where is defined to be the linear map such that , except when , in which case we define the stretched graphon to be . But for graphons over general measure spaces, this rescaling is ill-defined. Instead, we consider a different, but related, notion of rescaling, by rescaling the measure of the underlying space, a notion which is the direct generalization of our definition of the stretched graphon .

Definition 11
  • For two graphons with for , define the stretched cut metric by

    where with and . (In the particular case where