Subgraph statistics in subcritical graph classes

Subgraph statistics in subcritical graph classes

Michael Drmota (MD) TU Wien, Institute of Discrete Mathematics and Geometry, 1040 Wien, Austria michael.drmota@tuwien.ac.at http://www.dmg.tuwien.ac.at/drmota/ Lander Ramos (LR) Universitat Politècnica de Catalunya. Departament de Matemàtica aplicada , 08034 Barcelona, Spain lander.ramos@upc.edu http://www-ma2.upc.edu/lramos/  and  Juanjo Rué (JR) Freie Universität Berlin, Institut für Mathematik und Informatik, 14195 Berlin, Germany jrue@zedat.fu-berlin.de http://www-ma2.upc.edu/jrue/
Abstract.

Let be a fixed graph and a subcritical graph class. In this paper we show that the number of occurrences of (as a subgraph) in a uniformly at random graph of size in follows a normal limiting distribution with linear expectation and variance. The main ingredient in our proof is the analytic framework developed by Drmota, Gittenberger and Morgenbesser to deal with infinite systems of functional equations [11]. As a case study, we get explicit expressions for the number of triangles and cycles of length four for the family of series-parallel graphs.

M.D. was supported by the SFB F50 Algorithmic and Enumerative Combinatorics of the Austria Science Foundation FWF. J. R. was partially supported by the FP7-PEOPLE-2013-CIG project CountGraph (ref. 630749), the Spanish MICINN projects MTM2014-54745-P and MTM2014-56350-P, the DFG within the Research Training Group Methods for Discrete Structures (ref. GRK1408), and the Berlin Mathematical School.

1. Introduction

The study of subgraphs in random discrete structures is a central area in graph theory, which dates back to the seminal works of Erdős and Rényi in the sixties [14]. Since then, lot of effort has been devoted to locate the threshold function for the appearance of a given subgraph in the model, as well as the limiting distribution of the corresponding counting random variable (see for instance [27, 25, 39], and the monograph [26, Chapter 3]). The number of appearances of a fixed graph and its statistics had been also addressed as well in different restricted graph classes, including random regular graphs and random graphs with specified vertex degree (see for instance, [31, 17, 29, 28, 33], see also [32]) and random planar maps  [18, 19].

In this paper we study subgraphs on a random graph in a so-called subcritical class. Roughly speaking, a graph class is called subcritical if the largest block of a random graph in the class with vertices has vertices (see the precise analytic definition in Section 3). Indeed, graphs in these classes have typically a tree-like structure and share several properties with trees. Just to mention some families, prominent subcritical graph classes are forests, cacti trees, outerplanar graphs and series-parallel graphs, and more generally graph families defined by a finite set of -connected components (see [22]). Let us mention that the analysis of subcritical graph classes is intimately related to the study of the random planar graph model: it is conjectured that a graph class defined by a set of excluded minors is subcritical if and only if at least one of the excluded graphs is planar (see [35]).

The systematic study of subcritical graph classes started in [2] when studying the expected number of vertices of given degree. Later, in [8] the authors extended the analysis to unlabelled graph classes, and obtained normal limiting probability distributions for different parameters, including the number of cut-vertices, blocks, edges and the vertex degree distribution. Drmota and Noy [12] investigated several extremal parameters in these graph classes. They showed, for instance, that the expected diameter of a random connected graph on a subcritical graph class on vertices satisfies for some constants and . More recently, the precise asymptotic estimate has been deduce to be of order  [36]. Furthermore, the normalized metric space (where denotes the number of edges in a shortest path that contains and in ) is shown to converge with respect to the Gromov-Hausdorff metric to the so-called Brownian Continuum Random Tree multiplied by an scaling factor that depends only the class under study (see [36] for details, and also [40] for extensions to the unlabelled setting). Let us also mention that even more recently, the Schramm-Benjamini convergence had been addressed as well in [20, 40] for these graph families. Finally, the maximum degree and the degree sequence of a random series-parallel graph have been studied in [10] and [2, 9], respectively.

Our results:

this paper is a contribution to the understanding of the shape of a random graph from on these graph classes. More precisely, we present a very general framework to deal with subgraph statistics in subcritical graph classes. Our main result is the following theorem:

Theorem 1.1.

Let be the set of connected graphs of size in a subcritical graph class , and let be a fixed (connected) graph. Let be the number of copies of on a uniformly at random object in . Then,

for some constants that only depends on (and on the subcritical graph class under study). Moreover, if , then

The strategy we use on the proof is based on analytic combinatorics. More precisely, given a subgraph we are able to get expressions for the counting formulas encoding the number of copies of . As we will show, even if has a very simple structure, we will need infinitely many equations and infinitely many variables to encode all the possible appearances. Later, we will be able to fully analyze the infinite system of equations that we obtain using an adapted version of the main theorem of Drmota, Gittenberger and Morgenbesser [11], which provide the necessary analytic ingredient in order to study infinite functional systems of equations. This result extends the classical Drmota-Lalley-Woods theorem for (finite) systems of functional equations (see for instance [6]).

Let us also discuss some similar results from the literature. The study of induced subgraphs (also called patterns) in random trees was done in ([5]), showing normal limiting distributions with linear expectation and variance. This covers in particular the distribution of the number of vertices of given degree in random trees. In the more general setting of subcritical graph classes, the number of vertices of degree was studied in [8]. In another direction, appearances of a fixed subgraph (also called pendant copies) in a subcritical graph class where studied in [22] (see the proper definition of appearance in [30]), showing again normal limiting distributions with linear expectation. As every appearance define a subgraph, this result shows that the number of subgraphs in a uniformly at random subcritical graph is (at least) linear. Our result strongly strengths this fact showing the precise limiting probability distribution.

As a case study, we get explicit constants for series-parallel graphs and for specific subgraphs. Recall that a graph is series-parallel if it excludes as a minor. Equivalently, a series-parallel graph has treewidth at most 2. We are able to show the following result for triangles:

Theorem 1.2.

The number of copies of , , on a uniformly at random series-parallel graph with vertices is asymptotically normal, with

where and .

Our encoding also let us analyze the asymptotic number of triangle-free series-parallel graphs on vertices. Also, the more involved case of studying the number of copies of , as well as series-parallel graphs with a given girth is discussed in Section 6.

Plan of the paper:

the paper is divided in the following way. Section 2 is devoted to fix the notation concerning generating functions. Section 3 covers the analytic preliminaries of the paper. This section includes a modified version of the main theorem of Drmota, Gittenberger and Morgenbesser in [11], which is our main analytic ingredient on the proof of Theorem 1.1. Section 4 deals with the easier situation where the subgraph under study is 2-connected. The arguments to deal with the general connected case are developed in Section 5. In order to prepare the reader to the involved notation used to deal with general subgraphs, some easy cases are fully developed. Section 6 is devoted to explicit computations in the family of series-parallel graphs. Finally, Section 7 discusses the results obtained so far and possible future investigations.

2. Graph preliminaries

In our work, all graphs we study are assumed to be simple (no loops nor multiple edges) and labelled. A graph on vertices will be always labelled with different elements in .

2.1. Combinatorial classes. Exponential generating functions.

We follow the notation and definitions in [16]. A labelled combinatorial class is a set joint with a size function, such that for each the set of elements of size , denoted by , is finite. Each object in is built by using labelled atoms of size . In graph classes, atoms are precisely the vertices.

Two elements in are said to be isomorphic if one is obtained from the other by relabelling. In particular, two isomorphic elements have the same size. We always assume that a combinatorial class is stable under relabelling, namely, if and only if all relabellings of are also elements of . For counting reasons we consider the exponential generating function (shortly the EGF) associated to the labelled class :

In our setting, we use the (exponential) indeterminate to encode vertices. In the opposite direction, we also write . The basic constructions we consider in this paper are described in Table 1. In particular, we consider the disjoint union of labelled classes, the labelled product of classes, the sequence construction, the set construction, the cycle and the substitution (see [16] for all the details).

We additionally consider classes of graphs of various types depending on whether one marks vertices or not. A (vertex-)pointed graph is a graph with a distinguished (labelled) vertex. A derived graph is a graph where one vertex is distinguished but not labelled (the other vertices have distinct labels in ). In particular, isomorphisms between two pointed graphs (or between two derived graphs) have to respect the distinguished vertex.

Given a graph class , the pointed class is the class of pointed graphs arising from . Similarly, the derived graph class is obtained by taking all derived graphs built from . Hence, and , and we have respectively and .

Construction Class Equations
Sum
Product
Sequence
Set
Restricted Set
Cycle
Substitution
Pointing
Deriving
Table 1. The Symbolic Method translating combinatorial constructions into operations on counting series.

Pointing and deriving operators will be only used over vertices. When dealing with ordinary parameters over combinatorial classes (for instance, edges or copies of a fixed subgraph) we use extra variables in the corresponding counting formulas. The partial derivatives of counting series with respect to parameters are denoted by subindices of the corresponding indeterminate. For instance, a generating function of the form means .

2.2. Graph decompositions

A block of a graph is a maximal 2-connected subgraph of . A graph class is block-stable if it contains the edge-graph (the unique connected graph with two labelled vertices), and satisfies the property that a graph belongs to if and only if all the blocks of belong to . Block-stable classes covers a wide variety of natural graph families, including graph classes specified by a finite list of forbidden minors that are all 2-connected. Planar graphs () or series-parallel graphs () are block-stable.

For a graph class , we write and the subfamily of connected and -connected graphs in , respectively. In particular, the following combinatorial specifications hold (see [1, 7, 23]):

By means of the Table 1 these expressions translates into equations of EGF in the following way:

(1)

See [41] for further results on graph decompositions and connectivity on graphs.

3. Analytic preliminaries

In this part we include the analytic results necessary in the forthcoming sections of the paper.

3.1. Subcritical graphs

We start with the notion of subcritical graph class. Further details concerning these graph classes can be found in [8].

Definition 1.

A block-stable class of (vertex labelled) graphs is called subcritical if

where denotes the radius of convergence of and the radius of convergence of .

Roughly speaking, subcritical condition means that the singular behaviour of does not interfer with the singular behaviour of . Only the behaviour of for matters (where is arbitrarily small). From general theory (see for instance [16]) it follows that becomes singular for if (and ) satisfies the system of equations

of equivalently if

In particular, we just have to assure that the equation has a solution . Equivalently this is granted if

It also follows from general theory that the solution function has a square-root type singularity at and can be (locally) written in the form

where and are analytic functions at and satisfy the condition and .

It is convenient to assume that our graph class is an -periodic class. That is, we have for . Then it follows that is the only singularity on the circle of convergence . Additionally, there is an analytic continuation of to a domain of the form for some real number and some positive angle . We call such a domain -region or domain dented at .

More precisely, if but then

Thus by the Implicit Function Theorem has no singularity there and can be analytically continued. Consequently, we get by singularity analysis over that

Since we also obtain the local singular behavior of which is of the form

for some functions and which are analytic at . Since this also provides the local singular behavior of :

where again and are analytic at . This implies (applying again singularity analysis) that

In what follows we will heavily make use of these properties of subcritical graph classes.

3.2. A single equation

We first state a central limit theorem that is a slight modification of [7, Theorem 2.23]. Let be an analytic function in , around , and is a complex parameter with . Suppose that the following conditions hold: , and all coefficients of are real and non-negative. Suppose also that for it is true that . Finally, assume that the function is at least three times continuously differentiable and all derivatives are analytic, too, in and . Then, by the implicit function Theorem it is clear that the functional equation

(2)

has a unique analytic solution with that is three times continuously differentiable with respect to if . Furthermore the coefficients are non-negative.

It is easy to show that there exists an integer and a residue class modulo such that if . In order to simplify the following presentation we assume that (namely, we discuss the -periodic case). The general case can be reduced to this case by a proper substitution in the original equation.

We also assume that the region of convergence of is large enough such that there exist non-negative solutions and of the system of equations

(3)

with and .

Theorem 3.1.

Let satisfies the above assumptions and is a power series in that is the (analytic) solution of the functional equation . Suppose that is a sequence of random variables such that

where . Set

where all partial derivatives are evaluated at the point solution to the system of equations (3). Then we have that

and if then

Proof.

The proof runs along the same lines as that of [7, Theorem 2.23]. We just indicate the differences.

By the Implicit Function Theorem it follows that there there exist functions and (for and for some ) which are three times differentiable with respect to if that satisfies

with and . Furthermore, by applying a proper variant of the Weierstrass Representation Theorem it follows (as in the proof of [7, Theorem 2.23]) that we have a presentation of the form

(4)

locally around , , where , and are analytic in and three times continuously differentiable with respect to if , where and

Since we also get

(5)

uniformly for and . Hence,

(6)

By using the local expansion of we get for

which directly implies

By Levi’s Theorem this proves the central limit theorem. ∎

Remark 3.2.

In our applications, the function will be the generating function of connected graphs. Since it follows that

and, thus, it is sufficient to work with instead of . However, if we are interested in all graphs (not necessarily connected) we need to study the behaviour of . By means of the set construction we have to replace by the function

and the new random variable that is defined by . Indeed, has a slightly different singular behaviour: from (4) we obtain

and consequently

for proper function . However, from that expression we obtain the same kind of asymptotic behavior as in (6) and a central limit theorem for with the same asymptotic behaviour for mean and variance as for .

Remark 3.3.

In most of the applications, the condition is satisfied. As it is shown in [8, Lemma 4], if satisfies some natural analytic conditions (see [8]), and assuming that there are three integer vectors , with , with

and for , then .

Remark 3.4.

Finally we remark that Theorem 3.1 extends to a finite system of equations , , provided that the system is strongly connected (compare with [7, Theorem 2.35]). We will use this extension in Section 6.2.

3.3. An infinite system of equations

The main reference for this subsection is the work [11]. We start again with an equation of the form , where satisfies (almost) the same assumptions as that of Theorem 3.1 (we just omit the conditions concerning ). In particular this means that the solution has a square-root type singularity at and the coefficients have an asymptotic expansion of the form (5), where .

Next, for a parameter with , we suppose that there exist functions , , such that

(7)

and that the functions satisfy the (infinite) system of equations

(8)

where has a power series expansion

with coefficients that satisfy . In particular, these coefficients are non-negative for . Moreover, we assume that for every there exists a function with

(9)

and that

(10)

Informally speaking, this means that the infinite system can be interpreted as a partition of the main equation . Hence, we refer to this later as the partition Property.

From these properties it immediately follows that is well defined (and also analytic) for and for which is analytic (recall that ). Consequently, under the same conditions, is convergent. Actually we only need convergence for and for some .

This property suggests to work in the space for . However, in the present situation we have to be slightly more careful since we have to take also into account derivatives with respect to (with ). For this purpose we use weighted spaces of the form

for some non-negative real number (see also Remark 3.6). Since the functions are also well defined (and analytic) if and for some .

Finally we assume that, for each , is three times continuously differentiable with respect to with such that the series

(11)

converges absolutely for and (for some ). Note that the case just says that for each the mapping is well defined in the space with and (for some ).

The main theorem in this context is the following:

Theorem 3.5.

Let be a power series in , , where the set of power series , , satisfies an infinite system of equations satisfying the above assumptions. Suppose that is a sequence of random variables such that

for . Then we have

for some real constants and . Furthermore if then

Remark 3.6.

We note that a corresponding theorem for a finite system is also true ([6, 7]) but in our context we just need the infinite version.

Furthermore, Theorem 3.5 even holds in slightly more general situations. For example, if the functions are not indexed by an integer but by a multi-index of integers then we can also adapt the space to the space

Actually we will need this generalization if we consider subgraphs with more than one cut-vertex.

Finally, as for Theorem 3.1 the central limit theorem transfers to that is defined with the help of . Compare this fact with Remark 3.2.

Proof.

We first note that Theorem 3.5 will be deduced from [11, Theorem 1] with a slight adaption corresponding – here we just require differentiability with respect to if and not analyticity – and corresponding to the underlying space – we replace by . Actually the modification corresponding to can be treated as in the proof of Theorem 3.1 and the change of the underlying space does not change the proof at all, so we will not discuss these issues.

Next we note that (9) implies

where is the solution of the equation . Thus, we study two cases. First, if does not depend on then is analytic at . This also implies that is analytic for and for for some . Let denote the set of indices with this property. Furthermore, since is also analytic at it also follows that is analytic in for and for .

In the second case has a square-root singularity of the form

which is inherited from that of . Furthermore it follows that depends on all variables , . Let denote the set of indices of the second case.

If we reduce now the infinite system to those equations with , where we consider with as already known functions, then we get a strongly connected system of equations

that satisfies all regularity assumptions of [11, Theorem 1]. In particular, since

and is analytic (at least) in the region where is analytic, it follows that the function is well defined (and analytic in and ) for in a proper neighborhood of , in a proper neighborhood of in and with .

The only remaining assumption that has to be checked is that the operator

is compact. Since the property

is satisfied, it follows that

is independent of the choice of . Hence the rank of equals which implies that is a compact operator.

Thus we can apply [11, Theorem 1] and obtain that all functions , , have a common square-root type singularity, and an expression of the form

with functions that are three times differentiable in , where and analytic in around .

Summing up we, thus, obtain a square-root singularity for . So we are precisely in the same situation as in the proof of Theorem 3.1. And so the result follows. ∎

4. -Connected Subgraphs

The purpose of this section is to consider 2-connected subgraphs . This case is much easier than the general case since a 2-connected subgraph can only appear in a block. Due to its shortness, we include the proof for this specific subgraph case.

Theorem 4.1.

Suppose that is a -connected graph that appears as a subgraph in a(n a-periodic) subcritical graph class . Let denote the number of occurences of as a subgraph in a connected or general random graph of size of .

Then, satisfies a central limit theorem with and as .

Proof.

Let the number of rooted -connected graphs in with non-root vertices such that appears precisely times as a subgraph. Furthermore let

be the corresponding generating function.

Let be the corresponding generating function of connected graphs in (where the root is non discounted). Since is assumed to be -connected the number of occurrences of in a connected graph is just the sum of its occurrences in the -connected components. Hence we have

If then and are the usual counting functions that satisfy the equation .

In order to prove Theorem 4.1 we just have to check the conditions of Theorem 3.1. By the subcritical condition we certainly have and that satisfy

Furthermore, since the region of convergence of is large enough.

The only missing assumption that has to be (finally) checked is that the mapping is three times continuously differentiable in . Of course it is sufficient to study the mapping . First we note that . From this it follows that exists (and is also analytic in ) for all and for . Next we note that the number of occurrences of a graph of size in a graph with vertices is bounded by . Write for the number of rooted 2-connected graphs in with non-root vertices. Thus is follows that

for with ; for notational convenience we have taken the derivatives formally with respect to . However, since all all derivatives are finite it follows that all derivatives exist for . (Alternatively we can use the bound for every which implies that

Consequently all assumptions of Theorem 3.1 are satisfied and the result follows for the connected case. In the general case, where we have to work with , we get the same result, see Remark 3.2. ∎

5. Connected Subgraphs

The purpose of this section is to extend Theorem 4.1 to subgraphs that are not 2-connected, and hence prove the main Theorem (1.1). The main difference between the 2-connected case and the (general) connected case is that occurrences of are not necessarily separated by cut-vertices. This means that we have to cut also into pieces (more precisely, into blocks) and to count all combinations of these pieces when two (or several) blocks are joint by a cut-vertex, or several cut-vertices.

We start this section by illustrating the arguments with the base case , which is the simplest case of a graph that is not 2-connected. Later, as a warm-up for the general case (where notation could be specially involved), we show the combinatorics behind two particular cases: copies of subgraphs with 1 cut-vertex and exactly 3 blocks (Subsection 5.2) and the number of copies of (Subsection 5.3). In both cases we show again the type of functional equations we obtain in this setting and the main difficulties that arise when encoding the counting formulas. At the end of the section we indicate how the method can be modified to cover the general case, both combinatorially and analitically.

5.1. Counting copies of

Despite this example does not cover the full general case, it is important to say that in the proof of the main theorem in Subsection 5.4 we will a use similar type of arguments one we find a convenient encoding.

If is just a path of length 2 then the situation is relatively simple since just separates by a cut-vertex into two edges. For example if we join two blocks at a cut-vertex and the two corresponding degrees of two blocks at this cut-vertex are and then there are