Weighted dependency graphs

# Weighted dependency graphs

Valentin Féray Institut für Mathematik, Universität Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
###### Abstract.

The theory of dependency graphs is a powerful toolbox to prove asymptotic normality of sums of random variables. In this article, we introduce a more general notion of weighted dependency graphs and give normality criteria in this context. We also provide generic tools to prove that some weighted graph is a weighted dependency graph for a given family of random variables.

To illustrate the power of the theory, we give applications to the following objects: uniform random pair partitions, the random graph model , uniform random permutations, the symmetric simple exclusion process and multilinear statistics on Markov chains. The application to random permutations gives a bivariate extension of a functional central limit theorem of Janson and Barbour. On Markov chains, we answer positively an open question of Bourdon and Vallée on the asymptotic normality of subword counts in random texts generated by a Markovian source.

###### Key words and phrases:
dependency graphs, combinatorial central limit theorems, cumulants, spanning trees, random graphs, random permutations, simple exclusion process, Markov chains.
###### 2010 Mathematics Subject Classification:
Primary: 60F05. Secondary: 60C05, 05C80, 82C05, 60J10.
VF is partially supported by SNF grant nb 149461 “Dual combinatorics of Jack polynomials”.
\embeddedfile

VariancesVariances.sage

## 1. Introduction

### 1.1. Background: dependency graphs

The central limit theorem is one of the most famous results in probability theory : it states that suitably renormalized sums of independent identically distributed random variables with finite variance converge towards a standard Gaussian variable.

It is rather easy to relax the identically distributed assumption. The Lindeberg criterion, see e.g. [11, Chapter 27], gives a sufficient (and almost necessary) criterion for a sum of independent random variables to converge towards a Gaussian law (after suitable renormalization).

Relaxing independence is more delicate and there is no universal theory to do it. One of the ways, among many others, is given by the theory of dependency graphs. A dependency graph encodes the dependency structure in a family of random variables: roughly we take a vertex for each variable in the family and connect dependent random variables by edges. The idea is that, if the degrees in a sequence of dependency graphs do not grow too fast, then the corresponding variables behave as if independent and the sum of the corresponding variables is asymptotically normal. Precise normality criteria using dependency graphs have been given by Petrovskaya/Leontovich, Janson, Baldi/Rinott and Mikhailov [58, 42, 6, 53].

These results are powerful black boxes to prove asymptotic normality of sums of partially dependent variables and can be applied in many different contexts. The original motivation of Petrovskaya and Leontovich comes from the mathematical modelization of cell populations [58]. On the contrary, Janson was interested in random graph theory: dependency graphs are used to prove central limit theorems for some statistics, such as subgraph counts, in [6, 42, 46]; see also [55] for applications to geometric random graphs. The theory has then found a field of application in geometric probability, where central limit theorems have been proven for various statistics on random point configurations: the lengths of the nearest-neighbour graph, of the Delaunay triangulation and of the Voronoi diagram of these random points [5, 56], or the area of their convex hull [7]. More recently it has been used to prove asymptotic normality of pattern counts in random permutations [13, 38]. Dependency graphs also generalize the notion of -dependence [40, 10], widely used in statistics [24]. All these examples illustrate the importance of the theory of dependency graphs.

### 1.2. Overview of our results

The goal of this article is to introduce a notion of weighted dependency graphs; see Definition 4.5. As with usual dependency graphs, we want to prove the asymptotic normality of sums of random variables . Again, we take a vertex for each variable in the family and connect dependent random variables with an edge. The difference is that edges may now carry weights in . If two variables are almost independent (in a sense that will be made precise), the weight of the corresponding edge is small. Our main result, Theorem 4.11, is a normality criterion for weighted dependency graphs: roughly, instead of involving degrees as Janson’s or Baldi/Rinott’s criteria, we can now use the weighted degree, which is in general smaller.

This of course needs to quantify in some sense the “dependency” between random variables. This is done using the notion of joint cumulants, and maximum spanning trees of weighted graphs (which is a classical topic in algorithmics literature; see Section 3).

As explained in Section 4.5, our normality criterion contains Janson’s criterion and natural applications of Mikhailov’s criterion. Unfortunately, we are not able to deal with variables with only few finite moments, as in the result of Baldi and Rinott.

On the other hand, and most importantly, the possibility of having small weights on edges extends significantly the range of application of the theory. Indeed, in this article we provide several examples where weighted dependency graphs are used to prove asymptotic normality of sums of pairwise dependent random variables (for such families, the only usual dependency graph is the complete graph, and the standard theory of dependency graphs is useless). Examples given in the article involve pair partitions, the random graph model , permutations, statistical mechanics and finally Markov chains.

Before describing specifically the results obtained on each of these objects, let us mention that weighted dependency graphs can also be used to prove multivariate asymptotic normality and functional central limit theorems; rather than giving a cumbersome general theorem for that, we refer the reader to examples in Sections 9.3, 8.3 and 8.2.

#### 1.2.1. Random pair partitions

Our first example deals with uniform random pair partitions of a element set. This model is the starting point of the configuration model in random graph theory (see e.g. [46, Chapter 9]) and has also recently appeared in theoretical physics [23].

We consider the number of crossings in such a random pair partitions. This is a natural statistics in the combinatorics literature, see e.g. [17]. A central limit theorem for this statistics has been given by Flajolet and Noy [36]. We give an alternate proof of this result (see Theorem 6.5) that does not rely on the explicit formula for the generating function.

#### 1.2.2. Random graphs

The second example deals with the random graph model , that is a uniform random graph among all graphs with vertex set and edges. This is the model considered by Erdős and Rényi in their seminal paper of 1959 [32].

Since the number of edges is prescribed, the presence of distinct edges are not independent events, unlike in . Therefore the usual theory of dependency graph cannot be used, but weighted dependency graphs work fine on this model.

To illustrate this, we consider the subgraph count statistics; i.e. we fix a finite graph and look at the number of copies of in the random graph . We prove a central limit theorem for these statistics, when and go together to infinity in a suitable way (Theorem 7.5).

This central limit theorem is a weaker version of a theorem of Janson [44, Theorem 19] (who gets the same result with slightly weaker hypotheses). We nevertheless think that the proof given here is interesting, since it parallels completely the proof with usual dependency graphs that can be done for the companion model : we refer to [46, Chapter 6] for the application of dependency graphs to central limit theorem for subgraph counts in . In comparison, Janson’s approach involves martingales in the continuous time model and a stopping time argument.

#### 1.2.3. Random permutations

The study of uniform random permutations is a wide subject in probability theory and, as for random graphs, it would be hopeless to try and do a comprehensive presentation of it. Relevant to this paper, Hoeffding [39] has given a central limit theorem for what can be called simply indexed permutation statistics. The latter is a statistic of the form

 Xn=n∑i=1a(n)(i,π(i)),

where is a uniform random permutation of size and a sequence of real matrices with appropriate conditions.

Hoeffding’s result has been extended and refined in many directions, including the following ones.

• In [12], Bolthausen used Stein’s method to give an upper bound for the speed of convergence in Hoeffding’s central limit theorem.

• This work has then been extended to doubly indexed permutation statistics (called DIPS for short) by Zhao, Bai, Chao and Liang [67]. Barbour and Chen [8] have then given new bounds on the speed of convergence, that are sharper in many situations. DIPS have been used in various contexts in statistics; we refer the reader to [67, 8] and references therein for background on these objects.

• In another direction, Barbour and Janson have established a functional central limit theorem for single indexed permutation statistics [9].

Using weighted dependency graphs, we provide a functional central limit theorem for doubly indexed permutation statistics; see Theorem 8.7. This can be seen as an extension of Barbour and Janson’s theorem or a functional version of Zhao, Bai, Chao and Liang’s result (note however that, in the simply indexed case, our hypotheses are slightly stronger than the ones of Barbour and Janson and that we cannot provide a speed of convergence). There is a priori no obstruction in obtaining an extension for -indexed permutation statistics, except maybe that the general statement and the computation of covariance limits in specific examples may become quickly cumbersome.

#### 1.2.4. Stationary configuration of SSEP

The symmetric simple exclusion process (SSEP) is a classical model of statistical physics that represents a system outside equilibrium. Its success in the physics literature is mainly due to the fact that it is tractable mathematically and displays phase transition phenomena. We refer the reader to [25] for a survey of results on SSEP and related models from a mathematical physics viewpoint.

The description of the invariant measure, or steady state, of SSEP (and more generally the asymmetric version ASEP), has also attracted the interest of the combinatorics community in the recent years. This question is indeed connected to the hierarchy of orthogonal polynomials and has led to the study of new combinatorial objects, such as permutation tableaux and staircase tableaux [22, 21].

In this paper we prove that indicator random variables, which indicate the presence of particles at given locations in the steady state, have a natural weighted dependency graph structure. As an application we give a functional central limit theorem for the particle distribution function in the steady state, Theorem 9.4. An analogue result for the density function, which is roughly the derivative of the particle distribution function has been given by Derrida, Enaud, Landim and Olla [26]. Their result holds in the more general setting of ASEP and it would be interesting to generalize our approach to ASEP as well.

#### 1.2.5. Markov chains

Our last application deals with the number of occurrences of a given subword in a text generated by a Markov source. More precisely, let be an aperiodic irreducible Markov chain on a finite state space . Assume that is distributed according to the stationary distribution of the chain and denote . We are interested in the number of times that a given word occurs as a subword of , possibly adding some additional constraints, such as adjacency of some letters of in .

This problem, motivated by intrusion detection in computer science and identifying meaningful bits of DNA in molecular biology, has attracted the attention of the analysis of algorithm community in the nineties; we refer the reader to [37] for detailed motivations and references on the subject.

A central limit theorem for was obtained in some particular cases:

• when we are only counting consecutive occurrences of , i.e. the number of factors of that are equal to (see Régnier and Szpankowski [59], or Bourdon and Vallée [15] for an extension to probabilistic dynamical sources);

• or when the letters of are independent (see Flajolet, Szpankowski and Vallée [37]).

• Another related result is a central limit theorem by Nicodème, Salvy and Flajolet [54] for the number of occurrence positions, i.e. positions where an occurrence of the pattern terminates. This statistics is quite different from the number of occurrences itself, since the number of occurrence positions is always bounded by the length of the word.

Despite all these results, the number of occurrences in the general subword case with a Markov source was left open by these authors; see [14, Section 4.4]. Using weighted dependency graphs, we are able to fill this gap; see Theorem 10.5.

Note that there is a rich literature on central limit theorems for linear statistics on Markov chains , that is statistics of the form for a function on the state space. We refer the reader to [47] and references therein for numerous results in this direction, in particular on infinite state spaces. In [61], the authors study through cumulants linear statistics on mixing sequences (including Markov chains; Chapter 4) and multilinear statistics on independent identically distributed random variables (Chapter 5). It seems however that there is a lack of tools to study multilinear statistics on Markov chains such as the above considered subword count statistics. The theory of weighted dependency graphs introduced here is such a tool.

#### 1.2.6. Homogeneity versus spatial structure

It is worth noticing that the previous examples have various structures. The first three are homogeneous in the sense that there is a transitive automorphism group acting on the model. This is reflected in the corresponding weighted dependency graphs that have all equal weights.

In comparison, the last two examples have a linear structure: particles in SSEP are living on a line and a Markov chain is canonically indexed by . For Markov chains, this is reflected in the corresponding weighted dependency graph, since the weights decrease exponentially with the distance. On the contrary, SSEP has a homogeneous weighted dependency graph (all weights are equal to ), which comes as a surprise for the author and indicates a quite different dependency structure from the Markov chain setting.

The possibility to cover models with various dependency structures is, in the author’s opinion, a nice feature of weighted dependency graphs.

### 1.3. Finding weighted dependency graphs

The proof of our normality criterion (Theorem 4.11) is quite elementary and easy, so that the contribution of this paper is to identify a good framework that covers many different objects. Nevertheless, one could argue that the difficulty of proving a central limit theorem has only been shifted to the difficulty of finding an appropriate weighted dependency graph. Indeed, proving that a given weighted graph is a weighted dependency graph for a given family of random variables consists in establishing bounds on all joint cumulants , where is a multiset of elements of . We refer to this problem as proving the correctness of the weighted dependency graph . Attacking it head-on is rather challenging. (The definition of joint cumulants is given in Eq. 2; the precise bound that should be proved can be found in Eq. 9, but is not relevant for the discussion here.)

To avoid this difficulty, we give in Section 5 three general results that help proving the correctness of a weighted dependency graph. These results make the application of our normality criterion much easier in general, and almost immediate in some cases.

Before describing these three tools, let us observe that proving the correctness of a usual dependency graph is usually straightforward; it is most of the time an immediate consequence of the definition of the model we are working on. Therefore the existing literature does not provide any tool for that.

1. Our first tool (Proposition 5.2) is an equivalence of the definition with a slightly different set of inequalities involving cumulants of product of random variables. When the random variables are Bernoulli random variables, we can then use the trivial fact to reduce (most of the time significantly) the number of inequalities to establish.

2. The second tool (Proposition 5.8) shows the equivalence of bounds on cumulants and bounds on an auxiliary quantity defined as

 Pr=∏δ⊆[r]E[∏i∈δYαi](−1)|δ|.

At first sight, one might think that this new expression is not simpler to bound than cumulants, but its advantage is that it is multiplicative: if moments have a natural factorization, then factorizes accordingly and we can bound each factor separately.

3. The third tool (Proposition 5.11) is a stability property of weighted dependency graphs by products. Namely, if we prove that some basic variables admit a weighted dependency graph, we obtain for free a weighted dependency graphs for monomials in these basic variables. A typical example of application is the following: in the random graph setting, we prove that the indicator variables corresponding to presence of edges have a weighted dependency graph and we automatically obtain a similar result for presence of triangles or of copies of any given fixed graph.

Items 1 and 3 are both used in all applications described in Section 1.2 and reduces the proof of the correctness of the relevant weighted dependency graph to bounding specific simple cumulants. For random pair partitions, random permutations and random graphs, this bound directly follows from an easy computation of joint moments and item 2 above. In summary, the proof of correctness of the weighted dependency graph is rather immediate in these cases.

For SSEP, we also make use of an induction relation for joint cumulants obtained by Derrida, Lebowitz and Speer [28] (joint cumulants are called truncated correlation functions in this context). The Markov chain setting uses linear algebra considerations and a recent expression of joint cumulants in terms of the so-called boolean cumulants, due to Arizmendi, Hasebe, Lehner and Vargas [2] (see also [61, Lemma 1.1]). Boolean cumulants have been introduced in non-commutative probability theory [64, 49] and their appearance here is rather intriguing.

To conclude this section, let us mention that in each case, the proof of correctness of the weighted dependency graph relies on some expression for the joint moments of the variables . This expression might be of various forms: explicit expressions in the first three cases, an induction relation in the case of SSEP or a matrix expression for Markov chains, but we need such an expression. In other words, weighted dependency graphs can be used to study what could be called locally integrable systems, that is systems in which the joint moments of the basic variables can be computed. Such systems are not necessarily integrable in the sense that there is no tractable expression for the generating function or the moments of , so that classical asymptotic methods can a priori not be used. In particular, in all the examples above, it seems hopeless to analyse the moments by expanding them directly in terms of joint moments.

### 1.4. Usual dependency graphs: behind the central limit theorem.

We have focused so far on the question of asymptotic normality. However, usual dependency graphs can be used to establish other kinds of results. The first family of such results consists in refinements of central limit theorems.

• In their original paper [6] Baldi and Rinott have combined dependency graphs with Stein’s method. In addition to give a central limit theorem, this approach provides precise estimates for the Kolmogorov distance between a renormalized version of and the Gaussian distribution. For more general and in some cases sharper bounds, we also refer the reader to [16]. An alternate approach to Stein’s method, based on mod-Gaussian convergence and Fourier analysis, can also be used to establish sharp bounds in Kolmogorov distance in the context of dependency graphs, see [34].

• Another direction, addressed in [29, 35], is the validity domain of the central limit theorem.

The Gaussian law is not the only limit law that is accessible with the dependency graph approach. Convergence to Poisson distribution can also be proved this way, as demonstrated in [4]; again, this result has found applications, e.g., in the theory of random geometric graphs [55].

We now leave convergence in distribution to discuss probabilities of rare events:

• In [45], S. Janson has established some large deviation upper bound involving the fractional chromatic number of the dependency graph.

• Another important, historically first use of dependency graphs is the Lovász local lemma [31, 65]. The goal here is to find a lower bound for the probability that when are indicator random variables, that is the probability that none of the is equal to . This inequality has found a large range of application to prove by probabilistic arguments the existence of an object (often a graph) with given properties: this is known as the probabilistic method, see [1, Chapter 5].

### 1.5. Future work

We believe that weighted dependency graphs may be useful in a number of different models and that they are worth being studied further. A recent preprint with Dousse shows an application of weighted dependency graphs on the -dimensional Ising model [30]. In a work in progress with Thiel, we use it to study statistics in uniform set-partitions and obtain a far-reaching generalization of a result of Chern, Diaconis, Kane and Rhoades [18].

Proving the correctness of these weighted dependency graphs again use the tools from Section 5 of this paper. In the case of Ising model, we also need the theory of cluster expansions.

Another source of examples of weighted dependency graphs is given by determinantal point processes (see, e.g., [41, Chapter 4]): indeed, for such processes, it has been observed by Soshnikov that cumulants have rather nice expressions [63, Lemma 1]. This fits in the framework of weighted dependency graphs and the stability by taking monomials in the initial variables may enable to study multilinear statistics on such models. This is a direction that we plan to investigate in future work.

The results of the present article also invite to consider the following models.

• Uniform -regular graphs: the weighted dependency graph for pair partitions presented in Section 6 gives bounds on joint cumulants in the configuration model. It would be interesting to have similar bounds for uniform -regular graphs, especially when tends to infinity, in which case the graph given by the configuration model is simple with probability tending to . The fact that joint moments of presence of edges have no simple expression for -regular graphs is an important source of difficulty here.

• The asymmetric version of SSEP, called ASEP: finding a weighted dependency graph for this statistical mechanics model is closely related to the conjecture made in [28], on the scaling limit of the truncated correlation functions.

• Markov chains on infinite state spaces: as mentioned earlier, there is an important body of literature on CLT for linear statistics on such models, see [47]. Does Proposition 10.4, which gives a weighted dependency graph for Markov chain on a finite state space, generalize under some of these criteria? This would potentially give access to CLT for multilinear statistics on these models…

Finally, because of the diversity of examples, it would be of great interest to adapt some of the results mentioned in Section 1.4 to weighted dependency graphs. An approach to do this would be to use recent results on mod-Gaussian convergence [35, 34]. Unfortunately, this requires uniform bounds on cumulants of the sum , which are at the moment out of reach for weighted dependency graphs in general.

### 1.6. Outline of the paper

The paper is organized as follows.

• Standard notation and definitions are given in Section 2.

• Section 3 gives some background about maximum spanning trees, a notion used in our bounds for cumulants.

• The definition of weighted dependency graphs and the associated normality criterion are given in Section 4.

• Section 5 provides tools to prove the correctness of weighted dependency graphs.

• The next five sections (from 6 to 10) are devoted to the applications described in Section 1.2.

• Appendices give a technical proof, some variance estimations and adequate tightness criteria for the functional central limit theorems, respectively.

## 2. Preliminaries

### 2.1. Set partitions

The combinatorics of set partitions is central in the theory of cumulants and is important in this article. We recall here some well-known facts about them.

A set partition of a set is a (non-ordered) family of non-empty disjoint subsets of (called blocks of the partition), whose union is . We denote by the number of blocks of .

Denote the set of set partitions of a given set . Then may be endowed with a natural partial order: the refinement order. We say that is finer than or coarser than (and denote ) if every part of is included in a part of .

Endowed with this order, is a complete lattice, which means that each family of set partitions admits a join (the finest set partition which is coarser than all set partitions in , denoted with ) and a meet (the coarsest set partition which is finer than all set partitions in , denoted with ). In particular, there is a maximal element (the partition in only one part) and a minimal element (the partition in singletons).

Lastly, denote the Möbius function of the partition lattice . In this paper, we only use evaluations of at pairs , i.e. where the second argument is the maximum element of . In this case, the value of the Möbius function is given by:

 (1) μ(π,{S})=(−1)#(π)−1(#(π)−1)!.

### 2.2. Joint cumulants

For random variables with finite moments living in the same probability space (with expectation denoted ), we define their joint cumulant (or mixed cumulant) as

 (2) κ(X1,…,Xr)=[t1…tr]log(E(exp(t1X1+⋯+trXr))).

As usual, stands for the coefficient of in the series expansion of in positive powers of . The finite moment assumption ensures that the function is analytic around . If all random variables are equal to the same variable , we denote and this is the usual cumulant of a single random variable. .

Joint cumulants have a long history in statistics and theoretical physics and it is rather hard to give a reference for their first appearance. Their most useful properties are summarized in [46, Proposition 6.16] — see also [50].

• It is a symmetric multilinear functional.

• If the set of variables can be split into two mutually independent sets of variables, then the joint cumulant vanishes;

• Cumulants can be expressed in terms of joint moments and vice-versa, as follows:

 (3) E(X1⋯Xr) =∑π∈P([r])∏C∈πκ(Xi;i∈C); (4) κ(X1,…,Xr) =∑π∈P([r])μ(π,{[r]})∏C∈πE(∏i∈CXi).

Hence, knowing all joint cumulants amounts to knowing all joint moments.

Because of the symmetry, it is natural to consider joint cumulants of multisets of random variables.

The second property above has a converse. Since we have not been able to find it in the literature, we provide it with a proof.

###### Proposition 2.1.

Let be a finite set, partitioned into two parts. Let be a family of random variables defined on the same probability space, such that each is determined by its moments. We assume that for each multiset of that contains elements of both and ,

 κ(Yα;α∈B)=0.

Then and are independent.

###### Proof.

Since each is determined by its moments, from a theorem of Petersen [57], we know that the multivariate random variable is also determined by its joint moments, or equivalently by its joint cumulants. Consider random variables such that (resp. ) has the same (multi-variate) distribution than (resp. ) and such that and are independent. Because of the equalities of multi-variate distribution, if the multiset is composed either only by elements of or only by elements of , then

 κ(Zα,α∈B)=κ(Yα,α∈B).

On the other hand, if contains elements of both and , then can be split into two mutually independent sets: and . Therefore,

 κ(Zα,α∈B)=0.

But, for such , one has by hypothesis.

Finally all joint cumulants of and coincide and, therefore, both random vectors have the same distribution (recall that the first one is determined by its joint moments). Therefore and are independent, as claimed. ∎

###### Remark 2.2.

We do not know whether the hypothesis “determined by their moment” can be relaxed or not.

### 2.3. Multisets

As mentioned above it is natural to consider joint cumulants of multisets of random variables, so let us fix some terminology.

For a multiset , we denote by the total number of elements (i.e. counted with multiplicities) and the number of distinct elements. Furthermore is by definition the disjoint union of the multisets and , i.e. the multiplicity of an element in is the sum of its multiplicity in and .

The set of multisets of elements of is denoted by , while is the subset of multisets with .

### 2.4. Graphs

###### Definition 2.3.

A graph is a pair , where is the vertex set and the edge set. Elements of are 2-element subsets of (our graphs are simple loopless graphs). All graphs considered in this paper are finite. We denote by the number of connected components of a graph .

Two types of graphs appear here: dependency graphs throughout the paper and random graphs in Section 7. The former are tools to prove central limit theorems, while the latter are the objects of study, and they should not be confused. Following [46], we use the letter for dependency graphs, and we reserve the more classical for random graphs.

If is a multiset of vertices of , we can consider the graph induced by on and defined as follows: the vertices of correspond to elements of (if contains an element with multiplicity , then vertices correspond to this element), and there is an edge between two vertices if the corresponding vertices of are equal or connected by an edge in .

Finally we say that two subsets (or multisets) and of vertices of are disconnected if they are disjoint and there is no edge in that has an extremity in and an extremity in .

### 2.5. Weighted graphs

An edge-weighted graph , or weighted graph for short, is a graph in which each edge is assigned a weight . In this article we restrict ourselves to weights with . Edges not in the graph can be thought of as edges of weight , all our definitions are consistent with this convention.

The induced graph of a weighted graph on a multiset has a natural weighted graph structure: put on each edge of the weight of the corresponding edge in (if the edge connects two copies of the same vertex of , there is no corresponding edge in and we put weight ).

If and are subsets (or multisets) of vertices of a weighted graph , we write for the maximal weight of an edge connecting a vertex of and a vertex of . If , then . On the contrary, if and are disconnected, we set .

This enables to define powers of weighted graphs.

###### Definition 2.4.

Let be a weighted graph with vertex set and be a positive integer. The -th power of is the graph with vertex set such that and are linked by an edge unless and are disjoint and disconnected in . Moreover the edge has weight . We denote this weighted graph .

### 2.6. Asymptotic notation

We use the symbol (resp. , ) to say that is a nonzero constant (resp. , ) as . In particular, should be nonzero for sufficiently large.

## 3. Spanning trees

As we shall see in the next section, our definition of weighted dependency graphs involves the maximal weight of a spanning tree of a given weighted graph. In this section, we recall this notion and prove a few lemmas that we use later in the paper.

### 3.1. Maximum spanning tree

###### Definition 3.1.

A spanning tree of a graph is a subset of such that is a tree.

More generally, we say that a subset of form a spanning subgraph of if is connected.

If is a weighted graph, we say that the weight of a spanning tree of is the product of the weights of the edges in . The maximum weight of a spanning tree of is denoted . This parameter is central in our work.

If is disconnected, we set conveniently .

###### Example 3.2.

An easy case which appears a few times in the paper is the case of a connected graph with vertices and all weights equal to the same value, say . Then all spanning trees have weight so that .

For a less trivial example, consider the weighted graph of Fig. 1. The red edges form a spanning tree of weight . It is easy to check that there is no spanning trees with bigger weight so that in this case.

Finding a spanning tree with maximum weight is a well-studied question in the algorithmics literature: see [19, Chapter 23] (the usual convention is to define the weight of a spanning tree as the sum of the weights of its edges and to look for a spanning tree of minimal weight, but this is of course equivalent, up to replacing weights with the logarithms of their inverses).

### 3.2. Prim’s algorithm and the reordering lemma

There are several classical algorithms to find a spanning tree with maximum weight. We describe here Prim’s algorithm, which is useful for our work.

Assume is a connected weighted graph. Choose arbitrarily a vertex in the graph and set initially and . We iterate the following procedure: find the edge with maximum weight connecting a vertex in with a vertex outside (since is connected, there is at least one such edge), then add to and to . It is easy to check that at each step, is always a tree with vertex set and a general result ensures that at each step, is included in a spanning tree of maximum weight of [19, Corollary 23.2]. Note also that the weight of the edge is equal to . We stop the iteration when is the vertex set of , and is then a spanning tree of maximum weight.

The correctness of this algorithm implies the following lemma.

###### Lemma 3.3.

Let be a weighted graph with vertices. There exists an ordering of its vertex set such that

 (5) r−1∏j=1W({βj+1};{β1,⋯,βj})=M(~L).
###### Proof.

Adding edges of weight to the graph does not change any side of the above equality, so we can assume that is connected.

We apply Prim’s algorithm, as described above, and we denote vertices of by in the order in which they are added to the set . Then is the weight of the edge added in the -th iteration of the algorithm. Therefore the LHS of Eq. 5 is the weight of the spanning tree constructed by Prim’s algorithm. Since this is a spanning tree of maximum weight, this weight is . ∎

###### Remark 3.4.

In the special case where has only edges of weight , the lemma states the following: if is connected, there exists an ordering of its vertices such that each is in the closed neighbourhood of . This easy particular case is used in the dependency graph literature, but with weighted dependency graphs, we need Lemma 3.3 in its full generality.

### 3.3. Inequalities on maximal weights of spanning trees

We now state some inequalities on maximal weights, that are useful in the sequel. We first introduce some notation.

If is a subset of , we denote the multiset partition of which has and singletons as blocks. Furthermore, if is a family of subsets of , then we denote . Note that if , …, are the parts of a partition , then trivially .

Finally, for a weighted graph we denote the subgraph formed by edges with weight .

###### Lemma 3.5.

Let be a weighted graph with vertex set and a family of subsets of . We assume that . Then

 (s∏i=1M(~L[Δi]))≤M(~L).
###### Proof.

Consider a spanning tree of maximum weight in each induced graph . Each can be seen as a subset of edges of the original graph . Let be the union of the and of the set of edges of weight . The condition ensures that the edge set forms a spanning subgraph of . Therefore we can extract from it a spanning tree . Then

 w(T)=∏e∈Tw(e)≥∏e∈Sw(e)≥s∏i=1∏e∈Tiw(e)=s∏i=1M(~L[Δi]).

But, since is a spanning tree of , we have , which completes the proof. ∎

Our next lemma uses the notion of -th power of a weighted graph, which was defined in Section 2.5.

###### Lemma 3.6.

Let be multisets of vertices of a weighted graph . We consider a partition of such that

 (6) π∨{I1,⋯,Ir}={I1⊎⋯⊎Ir}.

Then we have

where is the -th power of .

###### Proof.

The multiset can be explicitly represented by

 {(v,j): j≤r and v∈Ij}.

Let be a part of and consider a spanning tree of minimum weight of . Edges of are pairs . For such an edge with , we can consider the corresponding edge in . By definition of power graphs, has at least the same weight as . Doing so for each edge of with , we get a set of edges in such that

 ∏¯e∈Siw(¯e)≥∏e∈Tij≠j′w(e)≥∏e∈Tiw(e).

As in the proof of the previous lemma, we now consider the union of the ’s. The condition (6) ensures that forms a spanning subgraph of and hence we can extract from it a spanning tree . Then

 w(T)=∏¯e∈Tw(¯e)≥∏¯e∈Sw(¯e)≥s∏i=1∏¯e∈Siw(¯e)≥s∏i=1∏e∈Tiw(e)≥s∏i=1M(~L[Δi]).

But, since is a spanning tree of , we have , which concludes the proof. ∎

## 4. Weighted dependency graphs

### 4.1. Usual dependency graphs

Consider a family of random variables . A dependency graph for this family is an encoding of the dependency relations between the variables in a graph structure. We take here the definition given by Janson [42]; see also papers of Malyshev [51] and Petrovskaya/Leontovich [58] for earlier appearances of the notion with slightly different names.

###### Definition 4.1.

A graph is a dependency graph for the family if the two following conditions are satisfied:

1. the vertex set of is .

2. if and are disconnected subsets in , then and are independent.

A trivial example is that any family of independent variables admits the discrete graph (with no edges) as a dependency graph. A more interesting example is the following.

###### Example 4.2.

Consider the Erdős-Rényi random graph model , that is has vertex set and it has an edge between and with probability , all these events being independent from each other. Let be the set of 3-element subsets of and if , let be the indicator function of the event “the graph contains the triangle with vertices , and ”.

Let be the graph with vertex set and the following edge set: and are linked if (that is, if the corresponding triangles share an edge in ). Then is a dependency graph for the family .

Note also that the complete graph on is a dependency graph for any family of variables indexed by . In particular, given a family of variables, it may admit several dependency graphs. The fewer edges a dependency graph has, the more information it encodes and, thus, the more interesting it is. It would be tempting to consider the dependency graph with fewest edges, but such a graph is not always uniquely defined.

As said in the introduction, dependency graphs are a valuable toolbox to prove central limit theorems for sums of partially dependent variables. Denote a standard normal random variable. The following theorem is due to Janson [42, Theorem 2].

###### Theorem 4.3 (Janson’s normality criterion).

Suppose that, for each , is a family of bounded random variables; a.s. Suppose further that is a dependency graph for this family and let be the maximal degree of . Let and .

Assume that there exists an integer such that

 (7) (NnΔn)1/sΔnσnMn→0 as n→∞.

Then, in distribution,

 (8) Xn−EXnσn→dN(0,1) as n→∞.
###### Example 4.4.

We use the same model and notation as in Example 4.2. Assume to simplify that is bounded away from . Then one has , and . An easy computation — see, e.g., [46, Lemma 3.5] — gives . Thus the hypothesis (7) in Janson’s theorem is fulfilled if for some .

When this holds, Theorem 4.3 implies that, after rescaling, the number of triangles in is asymptotically normal. The latter is in fact true under the less restrictive hypothesis , as proved by Ruciński [60], but this cannot be obtained from Theorem 4.3.

To finish this section, let us mention a stonger normality criterion, due to Mikhailov [53]. Roughly, he replaces the number of vertices and the degree by some quantities defined using conditional expectations of variables. If (7) holds with these new quantities, then we can also conclude that one has Gaussian fluctuations. His theorem has a larger range of applications than Janson’s: e.g., for triangles in random graphs, it proves asymptotic normality in its whole range of validity, that is if and ; see [46, Example 6.19].

### 4.2. Definition of weighted dependency graphs

The goal of the present article is to relax the independence hypothesis in the definition of dependency graphs. As we shall see in the next sections, this enables to include many more examples.

As above, is a family of random variables defined on the same probability space. We suggest the following definition.

###### Definition 4.5.

Let be a sequence of positive real numbers. Let be a function on multisets of elements of .

A weighted graph is a weighted dependency graph for if, for any multiset of elements of , one has

 (9) ∣∣∣κ(Yα;α∈B)∣∣∣≤CrΨ(B)M(~L[B]).

Our definition implies in particular that all cumulants, or equivalently all moments of the are finite. This might seem restrictive but in most applications, the are Bernoulli random variables. Note also that we already have this restriction in Janson’s and Mikhailov’s normality criteria.

###### Remark 4.6.

It is rather easy to ensure inequality (9). For any family , take

 Ψ(B)=∣∣κ(Yα;α∈B)∣∣,C=(1,1,⋯),

and the complete graph on with weight 1 on each edge. Then is trivially a weighted dependency graph for . But this type of examples do not yield interesting results.

We are interested in constructing examples, where:

• may depend on , but is constant along a sequence of weighted dependency graphs;

• has a rather simple form, such as for some (the case gives a good intuition);

• Edge weights also have a very simple expression and most of them tend to along a sequence of weighted dependency graphs;

Intuitively, Eq. 9 should be thought of as follows: variables that are linked by edges of small weight in are almost independent, in the sense that their joint cumulants are required to be small (because of the factor ). Indeed, the smaller the weights in are, the smaller is.

###### Example 4.7.

Most of this paper is devoted to the treatment of examples: proving that they are indeed weighted dependency graphs and inferring some central limit theorems. Nevertheless, to guide the reader’s intuition, let us give right away an example without proof.

Consider the Erdős-Rényi random graph model , i.e. is a graph with vertex set and an edge set of size , chosen uniformly at random among all possible edge set of size .

If we set , then each edge belongs to with probability , but the corresponding events are not independent anymore. Indeed, since the total number of edges is fixed, if we know that one given edge is in , it is less likely that another given edge is also in .

As in Example 4.2, let be the set of 3-element subsets of and if , let be the indicator function of the event “the graph contains the triangle with vertices , and ”. Since presences of edges are no longer independent event, neither are presences of edge-disjoint triangles and the only dependency graph of this family in the classical sense is the complete graph on .

Consider the complete graph with vertex set and weights on the edges determined as follows:

• If (that is, if the corresponding triangles share an edge in ), then the edge in has weight ;

• If , then the edge in has weight .

We will prove in Section 7 that is a weighted dependency graph with where is the total number of distinct edges in (recall that is here a multiset of triangles) and the sequence does not depend on .

Intuitively, this means that presences of edge-disjoint triangles are almost independent events. Moreover, the weight quantifies this almost-independence. This is rather logical: the bigger is, the less knowing that a given edge is in influences the probability that another given edge is also in (and hence the same holds for presence of edge-disjoint triangles).

### 4.3. A criterion for asymptotic normality

Let be a weighted dependency graph for a family of variables . We introduce the following parameters (for )

 (10) R =∑α∈AΨ({α}); (11) Tℓ =maxα1,…,αℓ∈A⎡⎣∑β∈AW({β},{α1,⋯,αℓ})Ψ({α1,⋯,αℓ,β})Ψ({α1,⋯,αℓ})⎤⎦.
###### Remark 4.8.

Despite the complicated definition of , its order of magnitude is usually not hard to determine in examples (recall that and the weights usually have rather simple expression).

###### Remark 4.9.

Let us consider the special case where is the constant function equal to . One has

• , which is the number of vertices of ;

• using the easy observation , we see that

 Δ≤Tℓ≤ℓΔ, where Δ:=maxα∈A ∑β∈αw{β,α};

is the maximal weighted degree in . In particular, each has the same order of magnitude as .

In general, and should be thought of as deformations of the number of vertices and the maximal weighted degree. Considering and rather than simply and leads to a more general normality criterion, in a similar way that Mikhailov’s criterion extends Janson’s.

The following lemma bounds cumulants in terms of the two above defined quantities.

###### Lemma 4.10.

Let be a weighted dependency graph for a family of variables . Define and (for ) as above. Then, for ,

 ∣∣ ∣∣κr(∑α∈AYα)∣∣ ∣∣≤Crr!RT1⋯Tr−1.
###### Proof.

By multilinearity

 κr(∑α∈AYα)=∑α1,…,αr∈Aκ(Yα1,…,Yαr).

Applying the triangular inequality and Eq. 9,

where, by definition, for (in particular is invariant by permutation of the indices).

We also define

 M′α1,…,αr=[r−1∏j=1W({βj+1};{β1,⋯,βj})]Ψ(B)

We say that a list of elements of is well-ordered if

 (12) M(~L[B])=r−1∏j=1W({βj+1};{β1,⋯,βj}),

which implies . From Lemma 3.3, each list admits a well-ordered permutation. Conversely, a well-ordered list is a permutation of at most lists . Therefore

 (13)

where both sums run over well-ordered lists of elements of . Extending the sum to all lists of elements of only increases the right-hand side, so that we get:

 (14)

By definition, one has, for any and elements in :

 M′β1,…,βℓ+1=W({βℓ+1},{β1,⋯,βℓ})Ψ({β1,⋯,βℓ,βℓ+1})Ψ({β1,⋯,βℓ})M′β1,…,βℓ

Fixing and summing over in , we get

 ∑βℓ+1∈AM′β1,…,βℓ+1≤TℓM′β1,…,βℓ.

Since , an immediate induction yields

 ∑β1,…,βr∈AM′β1,…,βr≤RT1⋯Tr−1.

Together with Eq. 14, this ends the proof of the lemma. ∎

We can now give an asymptotic normality criterion, using weighted dependency graphs.

###### Theorem 4.11.

Suppose that, for each , is a family of random variables with finite moments defined on the same probability space. For each , let a function on multisets of elements of . We also fix a sequence , not depending on .

Assume that, for each , one has a weighted dependency graph   for and define the corresponding quantities , , , …, by Eqs. 10 and 11.

Let and .

Assume that there exist numbers and and an integer such that

 (15) Tr,n ≤DrQn (16) (RnQn)1/sQnσn →0 as n→∞,

then, in distribution,

 (17) Xn−EXnσn→dN(0,1) as n→∞.
###### Proof.

From Lemma 4.10, we know that, for ,

 (18) ∣∣ ∣∣κr(Nn∑i=1Yn,i)∣∣ ∣∣≤Crr!RnD1⋯Dr−1Qr−1n.

Setting and , we get that for ,

 ∣∣κr(˜Xn)∣∣=1σrn|κr(Xn)|≤C′rRnQr−1nσrn=C′r(RnQs−1nσsn)r−2s−2(σ2nRnQn)r−ss−2.

Eq. 18 for ensures that the last factor is bounded while the middle factor tends to from our hypothesis (16). We conclude that tends to for . The convergence towards a normal law then follows from [42, Theorem 1]. ∎

###### Remark 4.12.

Continuing Remark 4.9, when is constant equal to , one can choose and , where is the maximal weighted degree in . Then hypothesis Eq. 16 says that the quotient tends to reasonably fast (faster than some power of ). Roughly, one has a central limit theorem as soon as the weighted degree is smaller than the standard deviation. (In particular, except in pathological cases, the standard deviation should tend to infinity.)

###### Remark 4.13.

In most examples of application, is immediate to evaluate, while a good upper bound for and thus a sequence as in the theorem can be found by a relatively easy combinatorial case analysis. The most difficult part in applying the theorem is to find a lower bound for (Lemma 4.10 gives a usually sharp upper bound). In this sense, the weighted dependency graph structure, once uncovered, reduces the central limit theorem to a variance estimation.

###### Remark 4.14.

[42, Theorem 1] also ensures the convergence of all moments. Therefore, in Theorem 4.11 above and in all applications, we have convergence of all moments, in addition to the convergence in distribution.

###### Remark 4.15.

Except Lemma 3.3 — see Remark 3.4 —, the proof of our normality criterion is largely inspired from the case of usual dependency graphs. The difficulty here was to find a good definition of weighted dependency graphs, not to adapt the theorem to this new setting.

### 4.4. Multidimensional convergence and bounds for joint cumulants

Bounds on cumulants, and thus weighted dependency graphs, can also be used to obtain the convergence of a random vector towards a multidimensional Gaussian vector or the convergence of a random function towards a Gaussian process.

To avoid a heavily technical theorem, we do not state a general result, but refer the reader to examples in Sections 9.3, 8.3 and 8.2. We nevertheless give here a useful bound on joint cumulants, whose proof is a straightforward adaptation of the one of Lemma 4.10.

###### Lemma 4.16.

Let be a be a weighted dependency graph for a family of variables . Consider subsets of . Then, with the notation of the previous section,

 ∣∣ ∣∣κ⎛⎝∑α∈A1Yα,…,∑α∈ArYα⎞⎠∣∣ ∣∣≤Crr!RT1⋯Tr−1.
###### Remark 4.17.

It is also possible in the above bound to replace by

 R1=∑α∈A1Ψ({α1})

and/or the product by , where

 Ti≤r−1=maxℓ≤r−1maxα1,…,αℓ∈A⎡⎣∑β∈AiW({β},{α1,⋯,αℓ})Ψ({α1,⋯,αℓ,β})Ψ({α1,⋯,αℓ})⎤⎦.

The maximum over in the equation above comes from the reordering argument, that is the use of Lemma 3.3 in the proof of Lemma 4.10. We do not know what is the index of the element taken from in the reordered sequence . The only thing we can ensure is that (since we can choose arbitrarily the first vertex in Prim’s algorithm; see the proof of Lemma 3.3), which allows us to use instead of .

This slight improvement of the bound is not used in the applications given in this paper. It could however be useful if we wanted to prove, say, a multivariate convergence result for numbers of copies of subgraphs of different sizes in ; see Section 7 for the corresponding univariate statement.

Note that, with this improvement, the bound given for the joint cumulant is not symmetric in ,…,, while the quantity to bound obviously is.

### 4.5. Comparison between usual and weighted dependency graphs

In this Section, we compare at a formal level the notions of weighted dependency graphs and of usual dependency graphs. The results of this Section are not needed in the rest of the paper and it can safely be skipped.

The key observation here is the following: if the induced weighted graph is disconnected, then is by convention, and hence (9) states that the corresponding joint cumulant should be .

For the next proposition, we need to introduce some terminology. Let be a family of random variables defined on the same probability space. We say that a function on multisets of dominates joint moments, if for any multiset and multiset partition of :

 ∣∣ ∣∣∏C∈πE(∏α∈CYα)∣∣ ∣∣≤Ψ(B).

Examples include:

• Assume that the variables are uniformly bounded by a constant , i.e. , for any , one has