Edge-exchangeable graphs and sparsity

Edge-exchangeable graphs and sparsity

\fnmsDiana \snmCailabel=edc]dcai@uchicago.edu [       \fnmsTrevor \snmCampbelllabel=etc]tdjc@mit.edu [       \fnmsTamara \snmBrodericklabel=etb]tbroderick@csail.mit.edu [ Department of Statistics,
University of Chicago,
Chicago, IL, USA 60637
\printeadedc
Computer Science and Artificial Intelligence Laboratory (CSAIL),
Massachusetts Institute of Technology,
Cambridge, MA, USA 02139
\printeadetc,etb
Abstract

Many popular network models rely on the assumption of (vertex) exchangeability, in which the distribution of the graph is invariant to relabelings of the vertices. However, the Aldous-Hoover theorem guarantees that these graphs are dense or empty with probability one, whereas many real-world graphs are sparse. We present an alternative notion of exchangeability for random graphs, which we call edge exchangeability, in which the distribution of a graph sequence is invariant to the order of the edges. We demonstrate that edge-exchangeable models, unlike models that are traditionally vertex exchangeable, can exhibit sparsity. To do so, we outline a general framework for graph generative models; by contrast to the pioneering work of Caron and Fox (2015), models within our framework are stationary across steps of the graph sequence. In particular, our model grows the graph by instantiating more latent atoms of a single random measure as the dataset size increases, rather than adding new atoms to the measure.

\kwd
\setattribute

journalname

\runtitle

Edge-exchangeable graphs and sparsity

{aug}

and

exchangeability \kwdgraph \kwdedge exchangeability \kwdBayesian nonparametrics

1 Introduction

In recent years, network data have appeared in a growing number of applications, such as online social networks, biological networks, and networks representing communication patterns. As a result, there is growing interest in developing models for such data and studying their properties. Crucially, individual network data sets also continue to increase in size; we typically assume that the number of vertices is unbounded as time progresses. We say a graph sequence is dense if the number of edges grows quadratically in the number of vertices, and a graph sequence is sparse if the number of edges grows sub-quadratically as a function of the number of vertices. Sparse graph sequences are more representative of real-world graph behavior. However, many popular network models (see, e.g., Lloyd et al. (2012) for an extensive list) share the undesirable scaling property that they yield dense sequences of graphs with probability one. The poor scaling properties of these models can be traced back to a seemingly innocent assumption: that the vertices in the model are exchangeable, that is, any finite permutation of the rows and columns of the graph adjacency matrix does not change the distribution of the graph. Under this assumption, the Aldous-Hoover theorem (Aldous, 1981; Hoover, 1979) implies that such models generate dense or empty graphs with probability one (Orbanz and Roy, 2015).

This fundamental model misspecification motivates the development of new models that can achieve sparsity. One recent focus has been on models in which an additional parameter is employed to uniformly decrease the probabilities of edges as the network grows (e.g., Bollobás et al. (2007); Borgs et al. (2014); Wolfe and Olhede (2013); Borgs et al. (2015)). While these models allow sparse graph sequences, the sequences are no longer projective. In projective sequences, vertices and edges are added to a graph as a graph sequence progresses—whereas in the models above, there is not generally any strict subgraph relationship between earlier graphs and later graphs in the sequence. Projectivity is natural in streaming modeling. For instance, we may wish to capture new users joining a social network and new connections being made among existing users—or new employees joining a company and new communications between existing employees.

Caron and Fox (2015) have pioneered initial work on sparse, projective graph sequences. Instead of the vertex exchangeability that yields the Aldous-Hoover theorem, they consider a notion of graph exchangeability based on the idea of independent increments of subordinators (Kallenberg, 2005), explored in depth by Veitch and Roy (2015). However, since this Kallenberg-style exchangeability introduces a new countable infinity of latent vertices at every step in the graph sequence, its generative mechanism seems particularly suited to the non-stationary domain. By contrast, we are here interested in exploring stationary models that grow in complexity with the size of the data set. Consider classic Bayesian nonparametric models as the Chinese restaurant process (CRP) and Indian buffet process (IBP); these engender growth by using a single infinite latent collection of parameters to generate a finite but growing set of instantiated parameters. Similarly, we propose a framework that uses a single infinite latent collection of vertices to generate a finite but growing set of vertices that participate in edges and thereby in the network. We believe our framework will be a useful component in more complex, non-stationary graphical models—just as the CRP and IBP are often combined with hidden Markov models or other explicit non-stationary mechanisms. Additionally, Kallenberg exchangeability is intimately tied to continuous-valued labels of the vertices, and here we are interested in providing a characterization of the graph sequence based solely on its topology.

In this work, we introduce a new form of exchangeability, distinct from both vertex exchangeability and Kallenberg exchangeability. In particular, we say that a graph sequence is edge exchangeable if the distribution of any graph in the sequence is invariant to the order in which edges arrive—rather than the order of the vertices. We will demonstrate that edge exchangeability admits a large family of sparse, projective graph sequences.

In the remainder of the paper, we start by defining dense and sparse graph sequences rigorously. We review vertex exchangeability before introducing our new notion of edge exchangeability in Section 2, which we also contrast with Kallenberg exchangeability in more detail in Section 4. We define a family of models, which we call graph frequency models, based on random measures in Section 3. We use these models to show that edge-exchangeable models can yield sparse, projective graph sequences via theoretical analysis in Section 5 and via simulations in Section 6. Along the way, we highlight other benefits of the edge exchangeability and graph frequency model frameworks.

2 Exchangeability in graphs: old and new

Let be a sequence of graphs, where each graph consists of a (finite) set of vertices and a (finite) multiset of edges . Each edge is a set of two vertices in . We assume the sequence is projective—or growing—so that and . Consider, e.g., a social network with more users joining the network and making new connections with existing users. We say that a graph sequence is dense if , i.e., the number of edges is asymptotically lower bounded by for some constant . Conversely, a sequence is sparse if , i.e., the number of edges is asymptotically upper bounded by for all constants . In what follows, we consider random graph sequences, and we focus on the case where almost surely.

2.1 Vertex-exchangeable graph sequences

If the number of vertices in the graph sequence grows to infinity, the graphs in the sequence can be thought of as subgraphs of an “infinite” graph with infinitely many vertices and a correspondingly infinite adjacency matrix. Traditionally, exchangeability in random graphs is defined as the invariance of the distribution of any finite submatrix of this adjacency matrix—corresponding to any finite collection of vertices—under finite permutation. Equivalently, we can express this form of exchangeability, which we henceforth call vertex exchangeability, by considering a random sequence of graphs with , where . In this case, only the edge sequence is random. Let be any permutation of the integers . If , let . If , let . {definition} Consider the random graph sequence , where has vertices and edges . is (infinitely) vertex exchangeable if for every and for every permutation of the vertices , , where has vertices and edges .

A great many popular models for graphs are vertex exchangeable; see Appendix B and Lloyd et al. (2012) for a list. However, it follows from the Aldous-Hoover theorem (Aldous, 1981; Hoover, 1979) that any vertex-exchangeable graph is a mixture of sampling procedures from graphons. Further, any graph sampled from a graphon is almost surely dense or empty (Orbanz and Roy, 2015). Thus, vertex-exchangeable random graph models are misspecified models for sparse network datasets, as they generate dense graphs.

2.2 Edge-exchangeable graph sequences

Vertex-exchangeable sequences have distributions invariant to the order of vertex arrival. We introduce edge-exchangeable graph sequences, which will instead be invariant to the order of edge arrival. As before, we let be the th graph in the sequence. Here, though, we consider only active vertices—that is, vertices that are connected via some edge. That lets us define as a function of ; namely, is the union of the vertices in . Note that a graph that has sub-quadratic growth in the number of edges as a function of the number of active vertices will necessarily have sub-quadratic growth in the number of edges as a function of the number of all vertices, so we obtain strictly stronger results by considering active vertices. In this case, the graph is completely defined by its edge set .

As above, we suppose that . We can emphasize this projectivity property by augmenting each edge with the step on which it is added to the sequence. Let be a collection of tuples, in which the first element is the edge and the second element is the step (i.e., index) on which the edge is added: . We can then define a step-augmented graph sequence as a sequence of step-augmented edge sets. Note that there is a bijection between the step-augmented graph sequence and the original graph sequence.

1

1

2

2

1

2

3

2

1

2

3

4

2

4

4

4

3

2

4

1

3

3

2

4

2

5

1

1

2

5

1

1

2

5

1

3

1

2

5

1

6

1

3

1

4

2

5

1

6

4

2

4

1

Figure 1: Upper, left four: Step-augmented graph sequence from Ex. 2.2. At each step , the step value is always at least the maximum vertex index. Upper, right two: Two graphs with the same probability under vertex exchangeability. Lower, left four: Step-augmented graph sequence from Ex. 2.2. Lower, right two: Two graphs with the same probability under edge exchangeability.
{example}

In the setup for vertex exchangeability, we assumed and every edge is introduced as soon as both of its vertices are introduced. In this case, the step of any edge in the step-augmented graph is the maximum vertex value. For example, in Figure 1, we have

In general step-augmented graphs, though, the step need not equal the max vertex, as we see next.

{example}

Suppose we have a graph given by the edge sequence (see Figure 1):

The step-augmented graph is

Roughly, a random graph sequence is edge exchangeable if its distribution is invariant to finite permutations of the steps. Let be a permutation of the integers . For a step-augmented edge set , let .

{definition}

Consider the random graph sequence , where has step-augmented edges and are the active vertices of . is (infinitely) edge exchangeable if for every and for every permutation of the steps , , where has step-augmented edges and associated active vertices. See Figure 1 for visualizations of both vertex exchangeability and edge exchangeability. It remains to show that there are non-trivial models that are edge exchangeable (Section 3) and that edge-exchangeable models admit sparse graphs (Section 5).

3 Edge-exchangeable graph frequency models

We next demonstrate that a wide class of models, which we call graph frequency models, exhibit edge exchangeability. Consider a latent infinity of vertices indexed by the positive integers , along with an infinity of edge labels , each in a set , and positive edge rates (or frequencies) in . We allow both the and to be random, though this is not mandatory. For instance, we might choose for , and . Alternatively, the could be drawn iid from a continuous distribution such as . For any choice of and ,

(1)

is a measure on . Moreover, it is a discrete measure since it is always atomic. If either or (or both) are random, is a discrete random measure on since it is a random, discrete-measure-valued element. Given the edge rates (or frequencies) in , we next show some natural ways to construct edge-exchangeable graphs.

Single edge per step

If the rates are normalized such that , then is a distribution over all possible vertex pairs. In other words, is a probability measure. We can form an edge-exchangeable graph sequence by first drawing values for and —and setting . We recursively set , where is an edge chosen from the distribution . This construction introduces a single edge in the graph each step, although it may be a duplicate of an edge that already exists. Therefore, this technique generates multigraphs one edge at a time. Since the edge every step is drawn conditionally iid given , we have an edge-exchangeable graph.

Multiple edges per step

Alternatively, the rates may not be normalized. Then may not be a probability measure. Let be a distribution over non-negative integers given some rate . We again initialize our sequence by drawing and and setting . In this case, recursively, on the th step, start by setting . For every possible edge , we draw the multiplicity of the edge in this step as and add copies of edge to . Finally, . This technique potentially introduces multiple edges in each step, in which edges themselves may have multiplicity greater than one and may be duplicates of edges that already exist in the graph. Therefore, this technique generates multigraphs, multiple edges at a time. If we restrict and such that finitely many edges are added on every step almost surely, we have an edge-exchangeable graph, as the edges in each step are drawn conditionally iid given .

Given a sequence of edge sets constructed via either of the above methods, we can form a binary graph sequence by setting to have the same edges as except with multiplicity . Although this binary graph is not itself edge exchangeable, it inherits many of the properties (such as sparsity, as shown in Section 5) of the underlying edge-exchangeable multigraph.

The choice of the distribution on the measure has a strong influence on the properties of the resulting edge-exchangeable graph sampled via one of the above methods. For example, one choice is to set , where the are a countable infinity of random values generated according to a Poisson point process (PPP). We say that is distributed according to a Poisson point process parameterized by rate measure , , if (a) for any set with finite measure and (b) are independent random variables across any finite collection of disjoint sets . In Section 5 we examine a particular example of this graph frequency model, and demonstrate that sparsity is possible in edge-exchangeable graphs.

4 Related work and connection to nonparametric Bayes

Given a unique label for each vertex , and denoting to be the number of undirected edges between vertices and , the graph itself can be represented as the discrete random measure on . A different notion of exchangeability for graphs than the ones in Section 2 can be phrased for such atomic random measures: a point process on is (jointly) exchangeable if, for all finite permutations of and all ,

This form of exchangeability, which we refer to as Kallenberg exchangeability, can intuitively be viewed as invariance of the graph distribution to relabeling of the vertices, which are now embedded in . As such it is analogous to vertex exchangeability, but for discrete random measures (Caron and Fox, 2015, Sec. 4.1). Exchangeability for random measures was introduced by Aldous (Aldous, 1985), and a representation theorem was given by Kallenberg (Kallenberg, 2005, 1990, Ch. 9). The use of Kallenberg exchangeability for modeling graphs was first proposed by Caron and Fox (2015), and then characterized in greater generality by Veitch and Roy (2015) and Borgs et al. (2016). Edge exchangeability is distinct from Kallenberg exchangeability, as shown by the following example. {example}[Edge exchangeable but not Kallenberg exchangeable] Consider the graph frequency model developed in Section 3, with and . Since the edges at each step are drawn iid given , the graph sequence is edge exchangeable. However, the corresponding graph measure (where ) is not Kallenberg exchangeable, since the probability of generating edge is directly related to the positions and in of the corresponding atoms in (in particular, the probability is decreasing in ).

(a) Graph frequency model (fixed , steps)
(b) Caron–Fox, PPP on (1 step, grows)
Figure 2: A comparison of a graph frequency model (Section 3 and Equation 2) and the generative model of Caron and Fox (2015). Any interval contains a countably infinite number of atoms with a nonzero weight in the random measure; a draw from the random measure is plotted at the top (and repeated on the right side). Each atom corresponds to a latent vertex. Each point corresponds to a latent edge. Darker point colors on the left occur for greater edge multiplicities. On the left, more latent edges are instantiated as more steps are taken. On the right, the edges within are fixed, but more edges are instantiated as grows.

Our graph frequency model is reminiscent of the Caron and Fox (2015) generative model, but has a number of key differences. At a high level, this earlier model generates a weight measure (Caron and Fox (2015) used, in particular, the outer product of a completely random measure), and the graph measure is constructed by sampling once given for each pair . To create a finite graph, the graph measure is restricted to the subset for ; to create a projective growing graph sequence, the value of is increased. By contrast, in the analogous graph frequency model of the present work, is fixed, and we grow the network by repeatedly sampling the number of edges between vertices and and summing the result. Thus, in the Caron and Fox (2015) model, a latent infinity of vertices (only finitely many of which are active) are added to the network each time increases. In our graph frequency model, there is a single collection of latent vertices, which are all gradually activated by increasing the number of samples that generate edges between the vertices. See Figure 2 for an illustration.

Increasing in the graph frequency model has the interpretation of both (a) time passing and (b) new individuals joining a network because they have formed a connection that was not previously there. In particular, only latent individuals that will eventually join the network are considered. This behavior is analogous to the well-known behavior of other nonparametric Bayesian models such as, e.g., a Chinese restaurant process (CRP). In this analogy, the Dirichlet process (DP) corresponds to our graph frequency model, and the clusters instantiated by the CRP correspond to the vertices that are active after steps. In the DP, only latent clusters that will eventually appear in the data are modeled. Since the graph frequency setting is stationary like the DP/CRP, it may be more straightforward to develop approximate Bayesian inference algorithms, e.g., via truncation (Campbell et al., 2016b).

Edge exchangeability first appeared in work by Crane and Dempsey (2015a, b); Williamson (2016), and Broderick and Cai (2015a, b); Cai and Broderick (2015). Broderick and Cai (2015a, b) established the notion of edge exchangeability used here and provided characterizations via exchangeable partitions and feature allocations, as in Appendix C. Broderick and Cai (2015a); Cai and Broderick (2015) developed a frequency model based on weights generated from a Poisson process and studied several types of power laws in the model. Crane and Dempsey (2015a) established a similar notion of edge exchangeability in the context of a larger statistical modeling framework. Crane and Dempsey (2015b, a) provided sparsity and power law results for the case where the weights are generated from a Pitman-Yor process and power law degree distribution simulations. Williamson (2016) described a similar notion of edge exchangeability and developed an edge-exchangeable model where the weights are generated from a Dirichlet process, a mixture model extension, and an efficient Bayesian inference procedure. In work concurrent to the present paper, Crane and Dempsey (2016) re-examined edge exchangeability, provided a representation theorem, and studied sparsity and power laws for the same model based on Pitman-Yor weights. By contrast, we here obtain sparsity results across all Poisson point process-based graph frequency models of the form in Equation 2 below, and use a specific three-parameter beta process rate measure only for simulations in Section 6.

5 Sparsity in Poisson process graph frequency models

We now demonstrate that, unlike vertex exchangeability, edge exchangeability allows for sparsity in random graph sequences. We develop a class of sparse, edge-exchangeable multigraph sequences via the Poisson point process construction introduced in Section 3, along with their binary restrictions.

Model

Let be a Poisson process on with a nonatomic, -finite rate measure satisfying and . These two conditions on guarantee that is a countably infinite collection of rates in and that almost surely. We can use to construct the set of rates: if , and . The edge labels are unimportant in characterizing sparsity, and so can be ignored.

To use the multiple-edges-per-step graph frequency model from Section 3, we let be Bernoulli with probability . Since edge is added in each step with probability , its multiplicity after steps has a binomial distribution with parameters . Note that self-loops are avoided by setting . Therefore, the graph after steps is described by:

(2)

As mentioned earlier, this generative model yields an edge-exchangeable graph, with edge multiset containing with multiplicity , and active vertices . Although this model generates multigraphs, it can be modified to sample a binary graph by setting and to the set of edges such that has multiplicity in . We can express the number of vertices and edges, in the multi- and binary graphs respectively, as

Moments

Recall that a sequence of graphs is considered sparse if . Thus, sparsity in the present setting is an asymptotic property of a random graph sequence. Rather than consider the asymptotics of the (dependent) random sequences and in concert, Section 5 allows us to consider the asymptotics of their first moments, which are deterministic sequences and can be analyzed separately. We use to denote asymptotic equivalence, i.e., . For details on our asymptotic notation and proofs for this section, see Appendix D.

Lemma \thetheorem.

The number of vertices and edges for both the multi- and binary graphs satisfy

Thus, we can examine the asymptotic behavior of the random numbers of edges and vertices by examining the asymptotic behavior of their expectations, which are provided by Section 5.

Lemma \thetheorem.

The expected numbers of vertices and edges for the multi- and binary graphs are

Sparsity

We are now equipped to characterize the sparsity of this random graph sequence: {theorem} Suppose has a regularly varying tail, i.e., there exist and s.t.

Then as ,

Section 5 implies that the multigraph is sparse when , and that the restriction to the binary graph is sparse for any . See Remark D.3 for a discussion. Thus, edge-exchangeable random graph sequences allow for a wide range of sparse and dense behavior.

6 Simulations

In this section, we explore the behavior of graphs generated by the model from Section 5 via simulation, with the primary goal of empirically demonstrating that the model produces sparse graphs. We consider the case when the Poisson process generating the weights in Equation 2 has the rate measure of a three-parameter beta process (3-BP) on (Teh and Görür, 2009; Broderick et al., 2012):

(3)

with mass , concentration , and discount . In order for the 3-BP to have finite total mass , we require that . We draw realizations of the weights from a according to the stick-breaking representation given by Broderick, Jordan, and Pitman (2012). That is, the are the atom weights of the measure for

and any continuous (i.e., non-atomic) choice of distribution .

Since simulating an infinite number of atoms is not possible, we truncate the outer summation in to 2000 rounds, resulting in weights. The parameters of the beta process were fixed to and , as they do not influence the sparsity of the resulting graph frequency model, and we varied the discount parameter . Given a single draw (at some specific discount ), we then simulated the edges of the graph, where the number of Bernoulli draws varied between 50 and 2000.

Figure 2(a) shows how the number of edges varies versus the total number of active vertices for the multigraph, with different colors representing different random seeds. To check whether the generated graph was sparse, we determined the exponent by examining the slope of the data points (on a log-scale). In all plots, the black dashed line is a line with slope 2. In the multigraph, we found that for the discount parameter settings , the slopes were below 2; for , the slopes were greater than 2. This corresponds to our theoretical results; for the multigraph is dense with slope greater than 2, and for the multigraph is sparse with slope less than 2. Furthermore, the sparse graphs exhibit power law relationships between the number of edges and vertices, i.e., , where , as suggested by the linear relationship in the plots between the quantities on a log-scale. Note that there are necessarily fewer edges in the binary graph than in the multigraph, and thus this plot implies that the binary graph frequency model can also capture sparsity. Figure 2(b) confirms this observation; it shows how the number of edges varies with the number of active vertices for the binary graph. In this case, across , we observe slopes that are less than 2. This agrees with our theory from Section 5, which states that the binary graph is sparse for any .

(a) Multigraph edges vs. active vertices
(b) Binary graph edges vs. active vertices
Figure 3: Data simulated from a graph frequency model with weights generated according to a 3-BP. Colors represent different random draws. The dashed line has a slope of 2.

7 Conclusions

We have proposed an alternative form of exchangeability for random graphs, which we call edge exchangeability, in which the distribution of a graph sequence is invariant to the order of the edges. We have demonstrated that edge-exchangeable graph sequences, unlike traditional vertex-exchangeable sequences, can be sparse by developing a class of edge-exchangeable graph frequency models that provably exhibit sparsity. Simulations using edge frequencies drawn according to a three-parameter beta process confirm our theoretical results regarding sparsity. Our results suggest that a variety of future directions would be fruitful—including theoretically characterizing different types of power laws within graph frequency models, characterizing the use of truncation within graph frequency models as a means for approximate Bayesian inference in graphs, and understanding the full range of distributions over sparse, edge-exchangeable graph sequences.

Acknowledgments

We would like to thank Bailey Fosdick and Tyler McCormick for helpful conversations.

Appendix A Overview

In Appendix B, we provide more examples of graph models that are either vertex exchangeable or Kallenberg exchangeable. In Appendix C, we establish characterizations of edge exchangeability in graphs via existing notions of exchangeability for combinatorial structures such as random partitions and feature allocations. In Appendix D, we provide full proof details for the theoretical results in the main text.

Appendix B More exchangeable graph models

Many popular graph models are vertex exchangeable. These models include the classic Erdős–Rényi model (Erdős and Rényi, 1959), as well as Bayesian generative models for network data, such as the stochastic block model (Holland et al., 1983), the mixed membership stochastic block model (Airoldi et al., 2008), the infinite relational model (Kemp et al., 2006; Xu et al., 2007), the latent space model (Hoff et al., 2002), the latent feature relational model (Miller et al., 2009), the infinite latent attribute model (Palla et al., 2012), and the random function model (Lloyd et al., 2012). See Orbanz and Roy (2015) and Lloyd et al. (2012) for more examples and discussion.

Recently, a number of extensions to the Kallenberg-exchangeable model of Caron and Fox (2015), which builds on early work on bipartite graphs by Caron (2012), have also been developed. These models include extensions to stochastic block models (Herlau et al., 2016), mixed membership stochastic block models (Todeschini and Caron, 2016), and dynamic network models (Palla et al., 2016).

Appendix C Characterizations of edge-exchangeable graph sequences

We introduced edge exchangeability, a new notion of exchangeability for graphs. Just as the Aldous-Hoover theorem provides a characterization of the distribution of vertex-exchangeable graphs, it is desirable to provide a characterization of edge exchangeability in graphs. Below we show how characterization theorems that already exist for other combinatorial structures can be readily applied to provide characterizations for edge exchangeability in graphs.

We first develop mappings from edge-exchangeable graph sequences to familiar combinatorial structures—such as partitions (Pitman, 1995), feature allocations (Broderick et al., 2013b), and trait allocations (Broderick et al., 2015; Campbell et al., 2016a)—showing that edge exchangeability in the graph corresponds to exchangeability in those structures. In this manner, we provide characterizations of the case where one edge is added to the graph per step in Section C.1.1, where multiple unique edges may be added per step in Section C.1.2, and where multiple (non)unique edges may be added in Section C.1.3.

A limitation of these connections is that it is not immediately clear how to recover the connectivity in the graph from the mapped combinatorial object; for instance, given a particular feature allocation, the graph to which it corresponds is not identifiable. This issue has been addressed in a purely combinatorial context via vertex allocations and the graph paintbox (Campbell et al., 2016a) using the general theory of trait allocations. In Section C.2, we provide an alternative connection to ordered combinatorial structures (Broderick et al., 2013b; Campbell et al., 2016a) under the assumption that vertex labels are provided. This assumption is often reasonable in the setting of network data where the vertices and edges are observed directly. By contrast, it is unusual to assume that labels are provided for blocks in the case of partitions, feature allocations, and trait allocations since, in these cases, the combinatorial structure is typically entirely latent in real data analysis problems. For instance, in clustering applications, finding parameters that describe each cluster is usually part of the inference problem. In the graph case, though, the use of an ordered structure identifies the particular pair of vertices corresponding to each edge in the graph, allowing recovery of the graph itself.

c.1 The step collection sequence and connections to other forms of combinatorial exchangeability

In order to analyze edge-exchangeable graphs using the existing combinatorial machinery of random partitions, feature allocations, and trait allocations, we introduce a new combinatorial structure, the step collection sequence, which can take the form of a sequence of partitions, feature allocations, or trait allocations. As we will now see, the step collection sequence can be constructed from the step-augmented graph sequence in the following way.

Suppose we assign a unique label to each pair of vertices. Then if a pair of vertices is labeled , we may imagine that any particular edge between this pair of vertices is assigned label when it appears. Let be the th such unique edge label.

Recall that we consider a sequence of graphs defined by its step-augmented edge sequence . Let be the set of steps up to the current step in which any edge labeled was added. If edges labeled were added in a single step , appears in with multiplicity . So each element is an element of . Let be the number of unique vertex pairs seen among edges introduced up until the current step . Then we may define to be the collection of step sets across edges that have appeared by step :

Finally, we can define the step collection sequence as the sequence of for . Note that it is not clear how to recover the original edge connectivity of the graph from the step collection sequence, or whether it is possible to modify the sequence (or the labels ) such that it is easy to recover connectivity while maintaining the (non-trivial) connections to combinatorial exchangeability provided in Sections C.1.3, C.1.2 and C.1.1 below. {example} Suppose we have the edge sequence

with step-augmentation

for . Now we label the unique edges in . Using an order of appearance scheme Broderick et al. (2013b) to index the labels, becomes

where the labels correspond to the four unique vertex pairs: . The step collection sequence for is

Here each element of is a set corresponding to one of the four unique labels and contains all step indices up to step in which an edge with that label was added to the graph sequence.

To see that the step collection sequence can be interpreted as a familiar combinatorial object, we recall the following definitions. A partition of is a set whose blocks, or clusters, are mutually exclusive, i.e., , and exhaustive, i.e., . Feature allocations relax the definition of partitions by no longer requiring the blocks to be mutually exclusive and exhaustive. A feature allocation of is a multiset of subsets of , such that any datapoint in occurs in finitely many features (Broderick et al., 2013b). A trait allocation generalizes the feature allocation where now each , called a trait, may itself be a multiset (Broderick et al., 2015; Campbell et al., 2016a).

We see that the step collection can be interpreted as follows. If a single edge is added to the graph at each round, is a partition of , and the step collection sequence is a projective partition sequence. If at most one edge is added between any pair of vertices at each step, is a feature allocation of , and the step collection sequence is a projective sequence of feature allocations. In the most general case, when multiple edges may be added between any pair of vertices at each step, is a trait allocation of , and the step collection sequence is a projective sequence of trait allocations.

In the following examples, corresponding to Figure 4, we show different step collection sequences that correspond to a partition, a feature allocation, and a trait allocation.

{example}

[Partition] Consider the step collection . The edges form a partition of the steps. Here exactly one edge arrives in each step. {example}[Feature allocation] Consider the step collection . This step collection forms a feature allocation of the steps. Thus in this case, there may be multiple unique edges arriving in each step. {example}[Trait allocation] In a trait allocation, there may be multiple edges (not necessarily unique) at each step. Consider the step collection . This collection forms a trait allocation of the steps, where elements of are now multisets.

(a) Partition
(b) Feature allocation
(c) Trait allocation
Figure 4: Connection of edge-exchangeable graphs with partitions, feature allocations, and trait allocations. Light blocks represent 0, dark blocks either represent 1 or the specified count. In a partition, exactly one edge arrives in each step. In a feature allocation, multiple edges may arrive at each step, but at most one edge arrives between any two vertices at each step. In a trait allocation, there may be multiple edges of any type.

In this section, we have connected certain types of edge-exchangeable graphs to partitions and feature allocations. In the next two sections, we make use of known characterizations of these combinatorial objects to characterize edge exchangeability in graphs.

c.1.1 Partition connection

First consider the connection to partitions. In this case, suppose that each index in appears exactly once across all of the subsets of . This assumption on is equivalent to assuming that in the original graph sequence , we have that always has exactly one more edge than . In this case, is exactly a partition of ; that is, is a set of mutually exclusive and exhaustive subsets of . If the edge sequence is random, then is random as well.

We say that a partition sequence , where is a (random) partition of and for all , is infinitely exchangeable if, for all , permuting the indices in does not change the distribution of the (random) partitions (Pitman, 1995). Permuting the indices in the partition sequence corresponds to permuting the order in which edges are added in our graph sequence . As an example of a model that generates a step collection sequence corresponding to a partition sequence, consider the frequency model we introduced in Section 3 where the weights are normalized. At each step, we choose a single edge according the resulting probability distribution over pairs of vertices.

Given this connection to exchangeable partitions, the Kingman paintbox theorem (Kingman, 1978) provides a characterization of edge exchangeability in graph sequences that introduce one edge per step: in particular, it guarantees that a graph sequence that adds exactly one edge per step is edge exchangeable if and only if the associated step collection sequence has a Kingman paintbox representation. An alternate characterization of edge exchangeability in graph sequences that introduce one edge per step is provided by exchangeable partition probability functions (EPPFs) (Pitman, 1995). In particular, a graph sequence that introduces one edge per step is edge-exchangeable if and only if the marginal distribution of (the step collection at step ) is given by an EPPF for all .

c.1.2 Feature allocation connection

Next we notice that it need not be the case that exactly one edge is added at each step of the graph sequence, e.g. between and . If we allow multiple unique edges at any step, then the step collection is just a set of subsets of , where each subset has at most one of each index in . Suppose that any belongs to only finitely many subsets in for any . That is, we suppose that only finitely many edges are added to the graph at any step. Then is an example of a feature allocation (Broderick et al., 2013b). Again, if is random, then is random as well.

We say that a (random) feature allocation sequence is infinitely exchangeable if, for any , permuting the indices of does not change the distribution of the (random) feature allocations Broderick et al. (2013a, b). Permuting the indices in the sequence corresponds to permuting the steps when edges are added in the edge sequence . Consider the following example of a graph frequency model that produces a step collection sequence corresponding to an exchangeable feature allocation. For , we draw whether the graph has an edge at time step as Bernoulli with probability . Thus, in each step, we draw at most one edge per unique vertex pair. But we may draw multiple edges in the same step.

Similarly to the partition case in \mysecpart, we can apply known results from feature allocations to characterize edge exchangeability in graph models of this form. For instance, we know that the feature paintbox Broderick et al. (2013b); Campbell et al. (2016a) characterizes distributions over exchangeable feature allocations (and therefore the step collection sequence for graphs of this form) just as the Kingman paintbox characterizes distributions over exchangeable partitions (and therefore the step collection sequence for edge-exchangeable graphs with exactly one new edge per step).

We may also consider feature paintbox distributions with extra structure. For instance, the step collection sequence is said to have an exchangeable feature probability function (EFPF) (Broderick et al., 2013b) if the probability of each step collection in the sequence can be expressed as a function only of the total number of steps and the subset sizes within (i.e. the edge multiplicities in the graph), and is symmetric in the subset sizes. As another example, the step collection sequence is said to have a feature frequency model if there exists a (random) sequence of probabilities associated with edges and a number , conditioned on which the step collection sequence arises from the graph built by adding edge at each step independently111This is conditional independence since the may be random. with probability for all values of , along with an additional number of edges that never share a vertex with any other edge in the sequence. In other words, the graph is constructed with a graph frequency model as in the main text of the present work (modulo the aforementioned additional Poisson number of edges). Theorem 17 (“Equivalence of EFPFs and feature frequency models”) from Broderick et al. (2013b) shows that these two examples are actually equivalent: if the step collection sequence has an EFPF, it has a feature frequency model, and vice versa.

c.1.3 Further extensions

Finally, we may consider the case where at every step, any non-negative (finite) number of edges may be added and those edges may have non-trivial (finite) multiplicity; that is, the multiplicity of any edge at any step can be any non-negative integer. By contrast, in \mysecfeat, each unique edge occurred at most once at each step. In this case, the step collection is a set of subsets of . The subsets need not be unique or exclusive since we assume any number of edges may be added at any step. And the subsets themselves are multisets since an edge may be added with some multiplicity at step . We say that is a trait allocation, which we define as a generalization of a feature allocation where the subsets of are multisets. As above, if is random, is as well.

We say that a (random) trait allocation sequence is infinitely exchangeable if, for any , permuting the indices of does not change the distribution of the (random) trait allocation. Here, permuting the indices of corresponds to permuting the steps when edges are added in the edge sequence . A graph frequency model that generates a step collection sequence as a trait allocation sequence is the multiple-edge-per-step frequency model sampling procedure described in Section 3. Here, at each step, multiple edges can appear each with multiplicity potentially greater than 1, requiring the full generality of a trait allocation sequence.

Campbell et al. (2016a) characterize exchangeable trait allocations via, e.g., probability functions and paintboxes and thereby provide a characterization over the corresponding step collection sequences of such edge-exchangeable graphs.

c.2 Connections to exchangeability in ordered combinatorial structures

As noted earlier, it is not immediately clear how to recover the connectivity in an edge-exchangeable graph from the step collection sequence, nor how to do so in a way that preserves non-trivial connections to other exchangeable combinatorial structures. Campbell et al. (2016a) considers an alternative to the step collection sequence in which the (multi)subsets in the combinatorial structure correspond to vertices rather than edges, known as a vertex allocation. This allows for the characterization of edge-exchangeable graphs via the graph paintbox using the general theory of trait allocations, while maintaining an explicit representation of the structure of the graph, i.e., the connection between edges that share a vertex.

If we are willing to eschew the unordered nature of the step collection sequence, and assume that we have an a priori labeling on the vertices, there is yet another alternative using the ordered step collection sequence. The availability of labeled vertices is often a reasonable assumption in the setting of network data, where the vertices and edges are typically observed directly. Suppose the vertices are labeled using the natural numbers . Then we can use the ordering of the vertex labels to order the vertex pairs in a diagonal manner, i.e. . Note that, for the purpose of building this diagonal ordering, we consider the lowest-valued index in each vertex pair first. We build the step collection sequence in the same manner as before, except that each step collection is no longer an unordered collection of subsets; the subsets derive their order from the vertex pairs they represent. For example, if we observe edges at vertex pairs and at step 1, and edges at vertex pairs and at step 2, then

and

Since we know the order of the subsets in each as they relate to the vertex pairs in the graph and their connectivity, we can recover the graph sequence from the ordered step collection sequence . Exchangeability in an ordered step collection sequence means that the distribution is invariant to permutations of the indices within the subsets (although the ordering of the subsets themselves cannot be changed). Given this notion of exchangeability, the earlier connections to exchangeable partitions, feature allocations, and trait allocations remain true, modulo the fact that they must themselves be ordered. Broderick et al. (2013b) provides a paintbox characterization of ordered exchangeable feature allocations, thereby providing characterizations (via the earlier connections to partitions and feature allocations) of edge-exchangeable graphs that add either one or multiple unique edges per step. Note that, in these cases, this is a full characterization of edge-exchangeable graphs, by contrast to Section C.1, where we provided a characterization only of edge exchangeability in graphs. We suspect that a similar characterization of edge-exchangeable graphs with multiple (non)unique edges per step is available by examining characterizations of exchangeable ordered trait allocations.

Appendix D Proofs

The proof of the main theorem in the paper (Theorem 5) follows from a collection of lemmas below. Lemma 5 characterizes the expected number of vertices and edges; Lemma D.2 establishes a useful transformation of those expectations; and Lemma D.3 shows that the two sets of expectations are asymptotically equivalent, so it is enough to consider the transformed expectation. Lemma D.3 provides the asymptotics of the transformed expectations. Finally, Lemma 5 shows that the random sequences converge almost surely to their expectations, yielding the final result.

d.1 Preliminaries

Notation

We first define the asymptotic notation used in the main paper and appendix. We use the notation “a.s.” to mean almost surely, or with probability 1. Let be two random sequences. We say that if a.s., and that if a.s. We say that if a.s. Lastly, we say that if and .

Let be the respective sets of active vertices and edges at step in the multigraph, and be their respective cardinalities, as defined in the main text. We use the notation and to represent these analogous vertex and edge sets for the binary graph. Note that is the same as .

Useful results

We present two useful theorems for analyzing expectations involving random sums of functions of points from Poisson point processes. Below, we will apply these theorems repeatedly to get expectations of graph quantities. The first theorem is Campbell’s theorem, which is used to compute the moments of functionals of a Poisson process. We state it below for completeness, and refer to Kingman (1993, Sec. 3.2) for details. {theorem}[Campbell’s theorem] Let be a Poisson point process on with rate measure , and let be measurable. If , then

for any , and furthermore,

The second theorem is a specific form of the Slivnyak-Mecke theorem, which is useful for computing the expected sum of a function of each point and over all points in a Poisson point process . If each point in is thought of as relating to a particular vertex in a graph, the Slivnyak-Mecke theorem allows us to take expectations of the sum (over all possible vertices in the graph) of a function of each vertex and all its possible edges. For example, it is used below to compute the expected number of active vertices by taking the expected sum of vertices that have nonzero degree. We state it below for completeness, and refer to Daley and Vere-Jones (2008, Prop. 13.1.VII) and Baddeley et al. (2007, Thm. 3.1,Thm. 3.2) for details. {theorem}[Slivnyak-Mecke theorem] Let be a Poisson point process on with rate measure , and let be measurable. Then

d.2 Graph moments

In this section, we give the expected number of vertices and expected number of edges for the multi- and binary graph cases. We begin by defining the degree of vertex in the multigraph and the degree of vertex in the binary graph, respectively, as

(4)

Now we present the expected number of edges and vertices. We note that both the multi- and binary graphs have the same number of (active) vertices, and so their expectations are the same. {lemma*}[5, main text] The expected number of vertices and edges for the multi- and binary graphs are

Proof.

Using the tower property of conditional expectation and Fubini’s theorem, we have that the expected number of vertices is

followed by the definition of degree in Equation 4 and the binomial density,

Using the Slivnyak-Mecke theorem (Section D.1),

and finally by Campbell’s theorem (Section D.1) on the inner expectation,

For the expected number of edges, we can again apply the tower property and Fubini’s theorem followed by repeated applications of Slivnyak-Mecke to the expectations to get:

The expected number of edges for the binary case is obtained similarly via Fubini and Slivnyak-Mecke:

The asymptotic behavior of these quantities is difficult to derive directly due to the discreteness of the indices . Therefore, we rely on a technique called Poissonization, which allows us to bypass this difficulty by instead considering a continuous analog of the quantities in order to get asymptotic behaviors. Below, we introduce primed notation to represent the Poissonized quantities for the vertices, multigraph edges, binary edges, and the degree of a vertex, where the index now represents a continuous quantity. These will be defined such that has the same asymptotic behavior as , has the same asymptotic behavior as , and so on.

Given , let be the Poisson process generated with rate if and rate 0 if , and let . Let , which is a Poisson process with rate via Poisson process superposition (Kingman, 1993, Sec. 2.2). If we think of as continuous time passing, the process represents the times at which new edges are added between vertices and , and represents the times at which any new edges involving vertex are added.

Thus, we define the Poissonized degree of vertex in the multi- and binary graph cases, respectively, to be a function of the continuous parameter ,

We can define the Poissonized graph quantities of interest using these two quantities: