Automorphism groups of Gaussian chain graph models

Automorphism groups of Gaussian chain graph models

Abstract

In this paper we extend earlier work on groups acting on Gaussian graphical models to Gaussian Bayesian networks and more general Gaussian models defined by chain graphs. We discuss the maximal group which leaves a given model invariant and provide basic statistical applications of this result. This includes equivariant estimation, maximal invariants and robustness. The computation of the group requires finding the essential graph. However, by applying Stúdeny’s theory of imsets we show that computations for DAGs can be performed efficiently without building the essential graph. In our proof we derive simple necessary and sufficient conditions on vanishing sub-minors of the concentration matrix in the model.

1Introduction

Having an explicit group action on a parametric statistical model gives a better understanding of equivariant estimation or invariant testing for the model under consideration [4]. In [5] we have identified the largest group that acts on an undirected Gaussian graphical model and we have shown how this group can be used to study equivariant estimators of the covariance matrix in this model class. In the present paper we extend our discussion to chain graph models.

A chain graph is a graph with both directed and undirected edges that contains no semi-directed cycles, that is sequences of nodes such that for every either or but at least one edge is directed. In this paper we focus on chain graphs without flags (NF-CGs), that is with no induced subgraphs of the form . Note that both undirected graphs and directed acyclic graphs (DAGs) are chain graphs without flags. For more details on these graph-theoretic notions see Section 2.1.

Gaussian models on chain graphs

constitute a flexible family of graphical models, which contains both undirected Gaussian graphical models and Gaussian Bayesian networks defined by directed acyclic graphs (DAGs). Let be a NF-CG. Let denote the space of all matrices such that if in ; let denote the space of all symmetric positive definite matrices and let be the subspace of consisting of matrices such that if and in . The Gaussian chain graph model of a NF-CG consists of all Gaussian distributions on with mean zero and concentration matrices of the form

The set of all matrices of this form will be denoted by .

Let be a Gaussian vector with the covariance matrix . A linear transformation yields another Gaussian vector . A basic question of equivariant inference is for which the covariance matrix of still lies in . More formally, the general linear group acts on by . Fix a chain graph . We study the problem of finding:

In other words, find the stabilizer of in .

The problem in (Equation 2) can be alternatively phrased in terms of concentration matrices, which will be more useful in our case. Let act on by . Now find all such that .

1.1The group G

Example ? showed that two different chain graphs may define the same chain graph model. We discuss this in more detail in Section 2.2. For any NF-CG denote by the unique graph without flags with the largest number of undirected edges which induces the same Gaussian model as . The fact that such a unique graph exists follows from Proposition ? given later. For example for the DAG in Example ? such a graph is given by the undirected graph . By we denote the children of in , so . Similarly by we denote the set of neighbours of in , that is, nodes connected to by an undirected edge, which we denote by . We write

Our main results can be summarized as follows. For a fixed chain graph without flags with set of nodes given by consider the set of invertible matrices given by

Further, an automorphism of a chain graph is any permutation of its nodes that maps directed edges to directed edges and undirected edges to undirected edges.

In the undirected case, this theorem reduces to [5]. However, the proof in our current, more general setting is much more involved, first because the set is not a linear space, and second because the characterization is in terms of the essential graph rather than the graph itself.

Note that for some graphs there may be two nodes such that . In this case the transposition of and lies already in , which shows that and may have a non-trivial intersection. In Section 4 we prove a more refined version of Theorem ? that gets rid of this redundancy.

Given a set of edges defining a chain graph without flags we would like to find by listing all pairs for such that for all . Since our theorem depends on computing the essential graph , a natural question arises on complexity of this computation. In Section 5 we show how can be efficiently computed in the case of DAGs. We propose an efficient algorithm that does not require computing the essential graph .

1.2Existence and robustness of equivariant estimators

The description of the group can be used to analyse the inference for chain graph models. Let denote a random sample of length from the model . An estimator of the covariance matrix of is any map . In this paper we are interested in equivariant estimators, that is, estimators satisfying

where the action of on is

An important example of an equivariant estimator is the maximum likelihood estimator. A natural theoretical question is how large the sample size needs to be so that an equivariant estimator exists with probability one (see [5]). Define .

Our next result is the formula for the maximal invariant (see [12], [5]). It uses the equivalence relation on defined by if and only if . We write for the equivalence class of and for the set of all equivalence classes.

In [5] we also used the structure of the group to provide non-trivial bounds on the finite sample breakdown point for all equivariant estimators of the covariance matrix for undirected Gaussian graphical models. These results extends to chain graphs without flags.

Unlike the proof of Theorem ?, the proofs of Theorem ?, Theorem ? and Proposition ? are similar to the undirected case because they depend on only through the induced poset defined by the ordering relation , which drives the zero pattern of the group . The proofs of these three results will be therefore omitted, see [5] for details.

Organization of the paper

In Section 2.1 we provide some basic graph-theoretical definitions. The theory of Markov equivalence of chain graphs will be discussed in Section 2.2. In Section 3 we provide new results that give necessary and sufficient vanishing conditions for subdeterminants of the concentration matrix . In Section 4 we analyze the structure of the group in order to prove Theorem ?. In Section 5 we show that in the case of DAG models, structural imsets give us all the required information to identify without constructing the essential graphs. Section 6 contains some simple examples of Theorem ?.

2Preliminaries

In this section we discuss basic notions of the theory of chain graphs and chain graph models.

2.1Basics of chain graphs

Let be a hybrid graph, that is a graph with both directed and undirected edges, but neither loops nor multiple edges. This excludes also a situation when two nodes are connected by an undirected and a directed edge. We assume that the set of nodes of is labelled with . A directed edge (arrow) from to is denoted by and an undirected edge between and is denoted by . We write , and say that and are linked, whenever we mean that either or , or .

An undirected path between and in a hybrid graph is any sequence of nodes such that , and in for every . A semi-directed path between and is any sequence of nodes such that , and either or in for every and for at least one . A directed path between and in a hybrid graph is any sequence of nodes such that , and in for every . A semi-directed cycle in a hybrid graph is a sequence , of nodes in such that are distinct, and this sequence forms a semi-directed path. In a similar way we define a undirected cycle and directed cycle.

A set of nodes is connected in , if for every there exists an undirected path between and . Maximal connected subsets in with respect to set inclusion are called components in . The class of components of is denoted by . The elements of form a partition of the set of nodes of . For any subset of the set of vertices we define the induced graph on , denoted by , as the graph with set of nodes and for any two we have , or if and only if , or in , respectively.

Define the set of parents of , denoted by , as the set of such that in for some . The set of children is the set of such that in for some ; and the set of neighbors is the set of all such that in for some . In addition we define

If is a connected set in a chain graph , then there are no arrows between elements in , for otherwise there would exist a semi-directed cycle. In particular, the induced graph on is an undirected graph and is disjoint from for any . In addition, for every the induced subgraph of a chain graph is a chain graph itself. A clique in an undirected graph is a subset of nodes such that any two nodes are linked. We say that a clique is maximal if it is maximal with respect to inclusion.

Undirected graphs and DAGs are chain graphs without flags. We often use the following basic fact.

The notion of meta-arrow is important in the considerations of equivalence classes of chain graphs, which we discuss in the next section.

2.2Equivalence classes of chain graphs

A chain graph model is given by all concentration matrices of the form (Equation 1). In Example ? we saw that two different chain graphs may give the same Gaussian models or equivalently the same set of conditional independence statements. If two NF-CGs and define the same chain graph model, we say that they are graph equivalent (or simply equivalent). For example the three DAGs in Figure 3 are equivalent.

=[circle,fill=black,minimum size=5pt,inner sep=0pt]

The equivalence class of in the set of NF-CGs is denoted by :

Equivalence of CGs and DAGs was discussed in many papers, for example [1]. We briefly list the most relevant results.

The original statement of this result, given by Frydenberg in [9], is more general and applies to any chain graph in the LWF definition of chain graph models.

As was remarked in [14] considering meta-arrows helps to understand equivalence classes of chain graphs. Suppose that we want to obtain one chain graph from another with the same skeleton by changing some of the arrows to or . Changing only a subset of arrows in a meta-arrow is not permitted as it would introduce semi-directed cycles. Hence the only permitted operations on arrows of , if we work in the class of CGs, is either changing the directions of all the elements of or changing all arrows of into undirected edges. The following basic operation on a chain graph was defined in [14].

See for example the proof of Lemma 22 in [15].

For two distinct CGs , with the same skeleton we write if, whenever in , then either or in , and whenever in , then in . We write if and .

By the following proposition there is always a unique NF-CG representing with the largest number of undirected edges.

By definition has the same skeleton as , and an edge is essential if and only if it occurs as an arrow with the same orientation in every ; all other edges are undirected. For example, the essential graph for any of the graphs in Figure 3 is the undirected graph , whereas the essential graph of is itself. By Theorem ?, every arrow that participates in an immorality in is essential, but may contain other essential arrows. For example, in the DAG in Figure 4 all arrows are essential but not all of them form immoralities.

The following result has been independently observed in [14].

3Subdeterminants of concentration matrices

Let be any chain graph on . We want to determine which sub-determinants of the concentration matrix of the corresponding model are identically zero on the model. This provides simple necessary conditions for a concentration matrix to lie in . We will use the following combinatorial notions.

For the graph there is no self-avoiding cup system from to but there is such a system between and .

Let denote the -submatrix of . By expanding the entries, we find that

where the sum is over all cup systems from to . In this expression cancellation can occur because of the signs (not because of the signs in the , which we might as well have taken as new variables). The following proposition captures exactly which terms cancel. For more details on the arguments, we refer to [17].

To see that the sum in can be restricted to self-avoiding cup systems , we proceed as in the Lindström-Gessel-Viennot lemma [10] and give a sign-reversing involution on the set of non-self-avoiding cup systems, as follows. Order any cup system from to as where starts in . If is not self-avoiding, let be minimal such that the entries are not all distinct, and let be a lexicographically minimal pair such that . Then is the cup system obtained from by replacing and by their swaps at position . For instance, if , then and ; and similarly for . Now and is indeed an involution. This proves the expression in the proposition. The second statement is more subtle, but it follows by applying [7] to the DAG obtained from by reversing all arrows and replacing all undirected edges by a pair of arrows, where is a new vertex. Indeed, self-avoiding cup systems in correspond to special types of trek systems without sided intersection in that new graph.

Note that the set of covariance matrices in the model is captured by which subdeterminants vanish identically — indeed, the conditional independence statements already suffice for this, and they are determinants (see for example [6]) — but we do not know if this is true for the set of concentration matrices as well. Therefore, Proposition ? may well have other statistical applications, but in what follows, we will mostly use the following direct consequence.

In the next section we begin our analysis of the group , defined in (Equation 2), with a study of its connected component of the identity.

4The group G

4.1The connected component of the identity

Denote by the matrix in with all entries zero apart from the -th element which is . By denote the normal subgroup of which forms the connected component of the identity matrix. The subgroup of all diagonal and invertible matrices is contained in the group because scaling of vector does not affect conditional independencies. By [5], to compute , it suffices to check for which the one-parameter groups , , lie in ; or equivalently , where is the Lie algebra of .

Before we provide the main result of this section we recall [5].

If is a NF-CG such that is an undirected graph then Proposition ? can be used to characterize for by passing to the essential graph. However, it is not immediately clear how this result extends to all chain graphs without flags. We first note that one direction of the above result holds in general.

If then the statement is clear so suppose that . We have only if either or in . Suppose first that . We have

where ; if ; if ; and . The fact that lies in follows from and hence for every if then .

If in then and by Lemma ?. By Proposition ? applied to the undirected part of we can write for some . Therefore

where we now show that there exists such that

Indeed,

where the last term must vanish because . Hence is obtained from by adding a multiple of the -th column to the -th column and by adding a multiple of the -th row to the -th row. The fact that lies in follows from the fact that and , that is, the -th column has the same support as the -th column and the support of the -th row is contained in the support of the -th row.

The converse of the lemma does not hold for general NF-CG . Consider for instance . By Example ?, the element lies in but . Nevertheless, the converse of the lemma above does hold when is essential; this is the main result of this section.

The proof is moved to the Appendix.

As we noted in the beginning of this section, the set of all gives already the complete information on the group . Hence Theorem ? gives the description of in (Equation 3).

4.2The component group

Note that given in Theorem ? in general is not the whole group . For example both for the model and for any of the equivalent DAGs in Figure 3 the permutation matrix

lies in but not in . The following result shows that permutation matrices form the basis for understanding the remaining part of the group . For the proof see [5].

An automorphism of a hybrid graph is any bijection of its nodes such that for every we have if and only if and if and only if .

The model is uniquely defined by the set of conditional independence statements (see for example [11]). Given a set of such statements that come from a chain graph the equivalence class is determined uniquely. The essential graph is the unique representative of with the largest number of undirected edges. Since any permutation applied to gives a NF-CG with the same number of undirected and directed edges (it simply relabels the nodes), lies in the model if and only if is an automorphism of .

By Lemma ? we can conclude that is generated by and the automorphism group of , which proves Theorem ?.

Define an equivalence relation on by whenever . For example if then and hence . The equivalence class of is denoted by .

As explained in the introduction, the expression is not minimal in the sense that and may intersect. To get rid of that intersection, we define to be the graph with vertex set and () in if and only if () in . We first show that is well defined.

If then and are necessarily linked. Since and we conclude that in fact in . By Lemma ?, since , we also have . This shows that if and only if , and , which shows that the definition of the arrows and edges in is independent of the representative and .

Define and view as a coloring of the vertices of by natural numbers. Let denote the group of automorphisms of preserving the coloring. There is a lifting defined as follows: the element is mapped to the unique bijection that maps each equivalence class to the equivalence class by sending the -th smallest element of (in the natural linear order on ) to the -th smallest element of , for .

It is a standard result from the Lie group theory that the connected component of the identity is a normal subgroup of . Hence, to show that we need to show that and . The first part follows by Proposition ? and Lemma ?. To show that note that transpositions of and lie in precisely when and are equivalent and hence, when they do not lie in .

Computing the essential graph is not always a simple task. In Section 5 we show how to identify the group without finding in the case when is a DAG. In the next section we illustrate Theorem ? with some basic examples.

5Efficient computations for DAG models

In this section we present some efficient techniques for computing the group in the case when is a DAG. The following characterization of essential graphs of DAGs will be useful.

For any DAG on the set of nodes , the standard imset for is an integer-valued function , where is the set of all subsets of , defined by

where satisfies if and is zero otherwise. For example, it is easy to verify that all DAGs in Figure 3 give raise to the imset represented by Figure 7.

The support of for a DAG has been described in [20] directly in terms of the essential graph. To provide this result we introduce some useful notions related to chain graphs.

By [15] every chain graph has a unique maximal idle set of nodes (which may be empty), which we denote by . The complement of the largest idle set is called the core of and denoted . Directly from the definition it follows that is a union of connected components of . Therefore, the core is also a union of connected components. The class of core-components, that is, components in contained in is denoted by .

Because there is a directed arrow from any node outside to any node in , every component of lies either inside or outside of . Since all nodes in are linked, there is a meta-arrow between any two distinct components of and each component is a clique. Without loss of generality pick such that is the only child-component of . First note that forms a clique. Second, the parent-components of are . Indeed, if a component , such that , lies outside of then by definition. If then because and are necessarily linked and has no other children than . Thus, by Definition ?, and can be legally merged, which contradicts the fact that is essential.

Note that is precisely the set of vertices such that , where .

From now on will always denote a DAG. By Theorem ? each component induces a decomposable graph . We recall that a decomposable graph is an undirected graph with no induced cycles of size . An alternative definition, that will be useful in this section, is that its maximal cliques can be ordered into a sequence satisfying the running intersection property (see [11]), that is

By [19] the collection of sets for does not depend on the choice of ordering that satisfies (Equation 7). We call these sets separators of the graph. The multiplicity of a separator is then defined as the number of indices such that . This number also does not depend on the choice of an ordering that satisfies (Equation 7).

By we denote the collection of maximal cliques of , by the collection of its separators, and by the multiplicity of in . A set is called a parent set in if it is non-empty and there exists a component with . The multiplicity of is the number of with . The collection of all parent sets in is denoted by . Finally, by we denote the number of initial components of , that is the components such that .

We refer for the following result to [20].

By Lemma 5.2 in [20], unless is a complete graph, the terms in the above formula never cancel each other. In particular the support of is the collection of all sets of the form:

• the core of

• for and

• for and

• for

The empty set may or may now appear in the support set of but this does not play any role in the following arguments.

Lemma ? gives the support of in terms of , see also items (i)-(iv) above. For the forward direction first note that if then , which follows immediately from . This implies that if lies in the core then also lies in the core. Suppose now that for some and . If then we have just shown that . If then because . The arguments for the subsets of type (iii) and (iv) above are the same.

For the opposite direction first note that if implies for all in the support of then taking where is the connected component of and we find that either or and hence . Let . Suppose first that . If then . To see that take any such that , which implies that . Similarly, if then , which follows by considering a parent set of the component containing . Consequently . The case is similar.

Proposition ? gives an efficient procedure of checking when without constructing the essential graph , which gives the description of . We present this procedure in the pseudocode below.

In addition note that the size of the support set of is . The fact that it is is obvious from (Equation 6). But also any initial vertex in will have and hence and will cancel each other. It follows that the number of operation to build construct is quadratic in . In fact all loops are linear in apart from the penultimate one.

The imset gives in fact the complete description of the group .

This follows from the fact that is in a one-to-one correspondence with a DAG model of .

Lemma ? does not provide an efficient algorithm to find the automorphism group of , which in general is a hard problem.

6Special graphs and small examples

Some DAG models are equivalent to undirected graphical models, in which case we refer to [5] for examples. To obtain a new set of examples we first consider two simple DAGs: the sprinkle graph in Figure 9 and the Verma graph in Figure 10.

The essential graph of the sprinkle graph is also given in Figure 9. There are no non-trivial equivalence classes and therefore . The only nontrivial relation between neighboring sets is , so the matrices in have only one non-zero off-diagonal element on position . The group of automorphisms of has only one non-trivial element which permutes and . Hence matrices in are in either of the two following forms:

The essential graph of the Verma graph is equal to the Verma graph itself. All equivalence classes are singletons. Moreover, there is no two distinct vertices satisfy and hence is equal to the group of all invertible diagonal matrices. Since there are no non-trivial automorphisms of then in fact the whole group consists solely of diagonal matrices.

=[circle,fill=black,minimum size=5pt,inner sep=0pt] =[shape=circle,minimum size=5pt,inner sep=0pt,draw]

For a slightly more general example consider the DAGs defining factor models as given in Figure 11. We have for every and there are no other containment relations. The only non-zero off-diagonal elements of matrices in are in position for all . For example if and then they are of the form

Any automorphisms of is a product of any permutation permuting and any permutation permuting . Consequently all matrices in look like the matrices in where the two diagonal blocks are replaced by arbitrary monomial matrices.

AProof of Theorem

To prove this theorem, we will use the following two lemmas, in which is the concentration matrix of the model.

Recall that the one-parameter group acts on via

In words, this matrix is obtained from by adding a multiple of the -th row to the -th row and adding a multiple of the -th column to the -th column. Now consider the effect of this operation on . Since either or else both , adding the -th column to the -th has either no effect on or else is just an elementary column operation on . This means that it does not affect the rank of . On the other hand, since is non-zero, the rows of are linearly independent, and since is zero, the -th row lies in the span of the rows of . This is not true for the -th row , hence the -submatrix of has full rank for generic . This means that does not preserve the model, hence it does not lie in the group .

Since , only if the determinant of the -submatrix of is zero. To show that it is not zero it suffices to show that the the linear term of does not vanish. To study this linear term, we alternatively study the linear term of further specializing to . Because has rank , the determinant of the -submatrix of is a polynomial of order two in . To find its coefficient of the linear term we can set . Matrix is obtained by adding a multiple of the -th row to the -th row. Suppose that the elements of are and the elements of are . Let be such that . The determinant if its -submatrix can be computed by expanding along the -th row (which corresponds to the -th row of ):

Similar computations for the coefficient of give

Hence the coefficient of in the determinant of is . If this sum does not identically vanish on the model then .

Lemma ? gives one direction of the proof of Theorem ?; we need only prove that if and , then . First of all, if there is no cup from to , then is identically zero, while is not. Hence (this is the special case of Lemma ? with and ). Thus in what follows we may assume that there do exist cups from to . We treat the various types of cups from to separately; in each case, we assume that cups of the previous types do not exist. Before we get going, we remark that, since there are no flags, for any cup with also is a cup. The following lemma will be also useful.

By Corollary ? it is enough to show that there is no self-avoiding cup system from to . It is clear that the second element of every cup starting in needs to lie in just because it is either equal to or it is equal to such that in . Also every cup from needs to have its second entry in . Indeed, let be such a cup. The node is either equal to or it is a child of , in which case it lies in . So suppose that and show that this leads to a contradiction. If then is either or a neighbor of . If then must be a parent of , which cannot be a vertex of (because otherwise there is a semi-directed cycle in ) and it cannot be because there is no arrow (by assumption). If then must be a parent of and by the no flag assumption also a parent of . This situation is also impossible because cannot lie in . Hence, by the pigeon hole principle, in any cup system from to , two of the elements after one step coincide, and this proves the claim.

In what follows we assume that is essential.

I. Vertex i lies in nH(j)∪cH(j).

In that case there must exist

Let denote the set of all children of together with their descendants. We have and thus and have the same cardinalities. By Lemma ? with , we have . On the other hand, there does exist a self-avoiding cup system from to that links directly to without crossing and each to itself via and hence by Corollary ?. Now by Lemma ?.

II. There is no arrow i→j.

In that case let be the set of all children of together with their descendants. Set and . By Lemma ? . But clearly .

Mid-proof break.

We pause a moment to point out that we have used that has no flags, but not yet that it is essential. This will be exploited in the following arguments. Indeed, in the remaining cases, there must be an arrow . This arrow must be essential, hence either the parents of in the undirected component of do not form a clique, or else one of has a parent outside that is not a parent of the other. We deal with these cases as follows.

III. There is an arrow k→j with k in the component of i at distance at least 2.

In that case let be the set of all children of together with their descendants. Set and . By Lemma ? . But, as in the first case, because there is a self-avoiding cup system from to given by for and . Again, we conclude that .

IV. There is an induced subgraph like in Figure .

Let be the set of all children of together with their descendants. Set and and note that both and contain . We again have by Lemma ?. However, both and are nonzero. Even more: the sum of these two determinants is also nonzero because has a monomial that does not appear in : consider the cup system from to given by , and for all . By Proposition ? this system corresponds to a monomial in . On the other hand this monomial cannot appear in because it contains only one element of , namely , and only one off-diagonal element of , namely . This means that it must correspond to a cup system between and that contains only one undirected edge and onearrow . However any cup from to must contain either an arrow for some or an undirected edge . By Lemma ? we conclude that .

V. There is an arrow k→j with k∉T and no arrow between k and i.

So we have the induced subgraph . Let be the set of all children of together with their descendants. Set and . By Lemma ? . On the other hand, , because of the self-avoiding cup system from to consisting of and and for all . Again, we may apply Lemma ?, this time with both in