Stable mixed graphs

# Stable mixed graphs

\smonth10 \syear2011\smonth5 \syear2012
\smonth10 \syear2011\smonth5 \syear2012
\smonth10 \syear2011\smonth5 \syear2012
###### Abstract

In this paper, we study classes of graphs with three types of edges that capture the modified independence structure of a directed acyclic graph (DAG) after marginalisation over unobserved variables and conditioning on selection variables using the -separation criterion. These include MC, summary, and ancestral graphs. As a modification of MC graphs, we define the class of ribbonless graphs (RGs) that permits the use of the -separation criterion. RGs contain summary and ancestral graphs as subclasses, and each RG can be generated by a DAG after marginalisation and conditioning. We derive simple algorithms to generate RGs, from given DAGs or RGs, and also to generate summary and ancestral graphs in a simple way by further extension of the RG-generating algorithm. This enables us to develop a parallel theory on these three classes and to study the relationships between them as well as the use of each class.

\kwd
\aid

0 \volume19 \issue5B 2013 \firstpage2330 \lastpage2358 \doi10.3150/12-BEJ454 \newproclaimalgAlgorithm \newremarkexampleExample

\runtitle

Stable mixed graphs

{aug}

ancestral graph \kwddirected acyclic graph \kwdindependence model \kwd-separation criterion \kwdmarginalisation and conditioning \kwdMC graph \kwdsummary graph

## 1 Introduction

Introduction and motivation. In graphical Markov models, graphs have been used to represent conditional independence statements of sets of random variables. Nodes of the graph correspond to random variables and edges typically capture dependencies. Different classes of graphs with different interpretation of independencies have been defined and studied in the literature.

One of the most important classes of graphs in graphical models is the class of directed acyclic graphs (DAGs) kii84 (), lau96 (). Their corresponding Markov models, often known under the name of Bayesian networks pea88 (), have direct applications to a wide range of areas including econometrics, social sciences, and artificial intelligence. When, however, some variables are unobserved, that is also called latent or hidden, one can in general no longer capture the implied independence model among observed variables by a DAG. In this sense, the DAG models are not stable under marginalisation. A similar problem occurs because DAG models are not stable under conditioning wer94 (), pea94 (), cox96 ().

This makes it necessary to identify and study a class of graphs that includes DAGs and is stable under marginalisation and conditioning in the sense that it is able to express the induced independence model after marginalisation and conditioning through an object of the same class. The methods that have been used to solve this problem employ three different types of edges instead of a single type.

Three known classes of graphs have previously been suggested for this purpose in the literature. We specifically call these stable mixed graphs (under marginalisation and conditioning) and they include MC graphs (MCGs) kos02 (), summary graphs (SGs) wer94 (), wer08 (), and ancestral graphs (AGs) ric02 ().

MCGs do not use the same interpretation of independencies, called the -separation criterion, as the other types of stable mixed graphs. In this paper, we use similar methods as in kos02 () to derive a modification of the class of MCGs to use -separation, which we call ribbonless graphs (RGs). The class of RGs is exactly the class with three types of edges that is generated after marginalisation over and conditioning on the node sets of a DAG. More importantly, we extend the RG-generating algorithm to generate summary and ancestral graphs in a theoretically neat way. These algorithms are computationally polynomial, even though we shall not go through their computational complexity in this paper. Defining these algorithms leads to establishing a parallel theory for the different classes, and studying the similarities, differences, and relationships among them.

Structure of the paper. In the next section, we define some basic concepts of graph theory and independence models needed in this paper.

In Section 3, we define the class of RGs, give some basic graph-theoretical definitions for these, and define the -separation criterion for interpretation of the independence structure on them.

In Section 4, we formally define marginalisation and conditioning for independence models in such a way that it conforms with marginalisation and conditioning for probability distributions. We also formally define stable classes of graphs.

Each of the next three sections of this paper deals with one type of stable mixed graphs. We discuss RGs in Section 5, SGs in Section 6, and AGs in Section 7. In each section, we introduce a straightforward algorithm to generate the stable mixed graph from DAGs or from graphs of the same type. For each type of stable mixed graph, we prove that the graphs and algorithms are well-defined in the sense that instead of marginalising over or conditioning on a set of nodes, by splitting the marginalisation or conditioning set into two subsets, one can marginalise over or condition on the first subset first, and then marginalise over or condition on the second subset and obtain the same graph. We also prove that the generated graphs induce the modified independence model after marginalisation and conditioning, meaning that the generated classes are stable under marginalisation and conditioning.

In Section 8, we scrutinise the relationships between the three types of stable mixed graphs. In Section 9, we provide a discussion on the use of the different classes of stable mixed graphs.

In the Appendix, we provide the proof of lemmas, propositions, and theorems given in the previous sections.

## 2 Basic definitions and concepts

Independence models and graphs. An independence model over a set is a set of triples (called independence statements), where , , and are disjoint subsets of and can be empty, and and are always included in . The independence statement is interpreted as “ is independent of given ”. Notice that independence models contain probabilistic independence models as a special case. For further discussion on independence models, see stu05 ().

A graph is a triple consisting of a node set or vertex set , an edge set , and a relation that with each edge associates two nodes (not necessarily distinct), called its endpoints. When nodes and are the endpoints of an edge, these are adjacent and we write . We say the edge is between its two endpoints. We usually refer to a graph as an ordered pair . Graphs and are called equal if . In this case we write .

Notice that the graphs that we use in this paper (and in general in the context of graphical models) are so-called labeled graphs, that is, every node is considered a different object. Hence, for example, graph is not equal to .

We use the notation for an independence model defined over the node set of . Among the independence models over the node set of a graph , those that are of interest to us conform with , meaning that in implies for any . Henceforth, we assume that independence models conform with , unless otherwise stated. Notice that henceforth we use the notation instead of for a subset consisting of a single element in an independence statement.

Basic graph theoretical definitions. Here we introduce some basic graph theoretical definitions. A loop is an edge with the same endpoints. Multiple edges are edges with the same pair of endpoints. A simple graph has neither loops nor multiple edges.

If a graph assigns an ordered pair of nodes to each edge, then the graph is a directed graph. We say that the edge is from the first node of the ordered pair to the second one. We use an arrow, , to draw an edge in a directed graph. We also call node a parent of , node a child of and we use the notation for the set of all parents of in the graph.

A walk is a list of nodes and edges such that for , the edge has endpoints and . A path is a walk with no repeated node or edge. A cycle is a walk with no repeated node or edge except . If the graph is simple, then a path or a cycle can be determined uniquely by an ordered sequence of node sets. Throughout this paper, however, we use node sequences for describing paths and cycles even in graphs with multiple edges, but we suppose that the edges of the path are all determined. Usually it is apparent from the context or the type of the path which edge belongs to the path in multiple edges. We say a path is between the first and the last nodes of the list in . We call the first and the last nodes endpoints of the path and all other nodes inner nodes.

A path (or a cycle) in a directed graph is direction preserving if all its arrows point to one direction (). A directed graph is acyclic if it has no direction-preserving cycle.

If in a direction-preserving path an arrow starts at a node and an arrow points to a node , then is an ancestor of , and a descendant of . We use the notation for the set of all ancestors of .

## 3 Independence model for ribbonless graphs

Loopless mixed graphs. Graphs that will be discussed in this paper are subclasses of loopless mixed graphs. A mixed graph is a graph containing three types of edges denoted by arrows, arcs (two-headed arrows), and lines (solid lines). Mixed graphs may have multiple edges of different types but do not have multiple edges of the same type. We do not distinguish between and or and , but we do distinguish between and . Thus there are up to four edges as a multiple edge between any two nodes. A loopless mixed graph (LMG) is a mixed graph that does not contain any loops (a loop may be formed by a line, arrow, or arc).

Some definitions for mixed graphs. For a mixed graph , we keep the same terminology introduced before for directed and undirected graphs. We say that is a neighbour of if these are endpoints of a line, and is a parent of if there is an arrow from to . We also define that is a spouse of if these are endpoints of an arc. We use the notations , , and for the set of all neighbours, parents, and spouses of , respectively.

In the cases of or , we say that there is an arrowhead pointing to (at) . A path is from to if is either a line or an arrow from to , and is either an arc or an arrow from to .

A V-configuration or simply Vs is a path with three nodes and two edges. In a mixed graph, the inner node of three Vs , and is a collider and the inner node of all other Vs is a non-collider node in the V or more generally in a path on which the V lies. We also call a V with collider or non-collider inner node a collider or non-collider V, respectively. We may mention that a node is collider or non-collider without mentioning the V or path when this is apparent from the context. Notice that originally kii84 () and in most texts, the endpoints of a V are not adjacent whereas we do not use this restriction.

Two paths (including Vs or edges) are called endpoint-identical if presence or lack of arrowheads pointing to endpoints on the path are the same in both. For example, , , and are all endpoint-identical as there is an arrowhead pointing to but there is no arrowhead pointing to on the paths.

Ribbonless graphs and its subclasses. The largest subclass of LMGs studied in this paper is the class of ribbonless graphs.

A ribbon is a collider V such that

1. there is no endpoint-identical edge between and , that is, there is no -arc in the case of ; there is no -line in the case of ; and there is no arrow from to in the case of ;

2. or a descendant of is an endpoint of a line or on a direction-preserving cycle.

A ribbonless graph (RG) is an LMG that does not contain ribbons as induced subgraphs.

Figure 1 illustrates ribbons . Figure 2(a) illustrates a graph containing a ribbon . Figure 2(b) illustrates a ribbonless graph. Notice that is not here a ribbon since there is a line between and .

The three classes of undirected graphs (UGs) (used for concentration graph models), bidirected graphs (BGs) (used for covariance graph models), and DAGs are subclasses of RGs. SGs and AGs, which are studied in this paper, are also subclasses of RGs. We use the notations , , , , and for the set of all RGs, SGs, AGs, UGs, BGs and DAGs, respectively. The common feature of all these graphs is that these all entail independence models using the same so-called separation criterion, which is called -separation and will be shortly defined.

The -separation criterion for RGs. The following definition was given in ric02 ().

Let be a subset of the node set of an RG. A path is -connecting given and if all its collider nodes are in and all its non-collider nodes are in . For two other disjoint subsets of the node set and such that , we may just call the path -connecting given between and . We say if there is no -connecting path between and given .

Notice that the -separation criterion induces an independence model on by .

## 4 Marginalisation, conditioning and stability

Marginal and conditional independence models. Consider an independence model over a set . For a subset of , the independence model after marginalisation over , denoted by ), is the subset of whose triples do not contain members of , that is,

 α(J;M,∅)={⟨A,B|D⟩∈J\dvt(A∪B∪D)∩M=∅}.

One can observe that is an independence model over .

For a subset of , the independence model after conditioning on , denoted by ), is

 α(J;∅,C)={⟨A,B|D⟩\dvt⟨A,B|D∪C⟩∈J and (A∪B∪D)∩C=∅}.

One can also observe that is an independence model over .

Combining these definitions, for disjoint subsets and of , the independence model after marginalisation over and conditioning on is

which is an independence model over .

Notice here that is a function from the set of independence models and two of their subsets to the set of independence models. Notice also that operations for marginalisation and conditioning commute.

Marginalisation and conditioning in probability conform with marginalisation and conditioning for independence models. Consider a set and a collection of random variables with joint density . We associate an independence model to this density. It can be shown that if is the associated independence model to the collection of random variables with joint density then the associated independence model to with joint density is .

Stability under marginalisation and conditioning for RGs and its subclasses. Consider a family of graphs . If, for every graph and every disjoint subsets and of , there is a graph such that then is stable under conditioning, and if there is a graph such that then is stable under marginalisation. We call stable (under marginalisation and conditioning) if there is a graph such that .

Notice that if the node set of such a graph is then .

We shall see that RGs, SGs, AGs, UGs, and BGs are stable. On the other hand, the class of DAGs is not stable. It can be shown that in Figure 3 is a DAG whose induced marginal independence model cannot be represented by a DAG and is a DAG whose induced conditional independence model cannot be represented by a DAG. We leave the details as an exercise to the readers.

Stable mixed graphs. As the class of DAGs is not stable, we look for stable classes of graphs that include the class of DAGs as a subclass. In this paper, we discuss three such types of graphs, namely RGs (as a modification of MCGs), SGs, and AGs, and specifically call these stable mixed graphs. We will see that in these graphs arcs are related to marginalisation and lines are related to conditioning.

For the graph for which , we use the notation . For each type of stable mixed graphs, we later precisely define with specific algorithms. We call a generating function or more specifically a -generating function.

## 5 Ribbonless graphs

MC graphs and ribbonless graphs. MCGs only contain the three desired types of edges. However, these are not loopless and, in addition, in MCGs a different separation criterion is used for inducing the independence model. However, from an MCG that can be generated by marginalisation and conditioning over DAGs and by a minor modification one can generate an RG that induces the same independence model. This modification includes adding edges between pairs of nodes connected by a ribbon such that the generated edges preserve the arrowheads at the endpoints of the ribbon, and removing all the loops. We shall not go through the details of this modification in this paper, but refer readers to sad12 ().

### 5.1 Generating ribbonless graphs

A local algorithm to generate RGs from RGs. Here we present an algorithm to generate an RG from a given RG and two subsets of its node set that will be marginalised over and conditioned on. This algorithm is local in the sense that, after determining the ancestor set of the conditioning set, it looks solely for all Vs in the graph and not for longer paths. Later in this section, we will show that a graph generated by the algorithm is an RG and it induces the marginal and conditional independence model of the input graph by using -separation.

Suppose that is an RG and consider and two disjoint subsets of the node set. There are possible non-isomorphic Vs in an RG, displayed in Table 1. Notice that this table generates endpoint-identical edges to the given Vs. We now define the following algorithm, derived from wer94 () and kos02 (). See also the appendix of wer08 (). {alg} (Generating an RG from a ribbonless graph ):

Start from .

Generate an endpoint identical edge between the endpoints of collider Vs with inner node in and non-collider Vs with inner node in , that is, generate an appropriate edge as in Table 1 between the endpoints of every V with inner node in or if the edge of the same type does not already exist.

Apply the previous step until no other edge can be generated. Then remove all nodes in .

This method is a generalisation of the method used by lau90 (), called moralisation, as a separation criterion on DAGs. Notice that the order of applying steps of Table 1 in Algorithm 5.1 is irrelevant since adding an edge does not destroy other Vs in the graph.

Figure 4 illustrates how to apply Algorithm 5.1 step by step to a DAG. We start from step 1 of Table 1 and proceed step by step. We return to step 1 at the end if there are any applicable steps left. Since , one can also use Algorithm 5.1 to generate an RG from a DAG. Notice that it is not enough to simply apply steps 1, 4, and 10 of Table 1 to a DAG.

Global interpretation of the algorithm. The following lemma explains the global characteristics of the process of marginalisation and conditioning.

###### Lemma 1

Let be a ribbonless graph. There exists an edge between and in the ribbonless graph if and only if there exists an endpoint-identical -connecting path given and between and in .

Basic properties of . We show here that is an RG-generating function.

###### Proposition 1

Graphs generated by Algorithm 5.1 are RGs.

Notice that for every ribbonless graph , it holds that .

Surjectivity of . The following result shows that the class of RGs is the exact class of graph that is generated after marginalisation and conditioning for DAGs.

###### Proposition 2

The map is surjective.

### 5.2 Two necessary properties of RG-generating functions

Here we establish the two important properties that (or every generating function) must have. In short, it must be well-defined and it must generate a stable class of graphs.

Well-definition of . The following theorem shows that is well-defined. This means that instead of directly generating an RG we can split the nodes that we marginalise over and condition on into two parts, first generate the RG related to the first part, then from the generated RG generate the desired RG related the second part.

###### Theorem 1

For a ribbonless graph and disjoint subsets , , , and of ,

 αRG(αRG(H;M,C);M1,C1)=αRG(H;M∪M1,C∪C1).

Stability of the graphs generated by . Here we introduce the second important property that must have. This property is the core idea in defining RGs and in general stable mixed graphs. The modification applied by the function should generate a graph that induces the marginal and conditional independence model.

###### Theorem 2

For a ribbonless graph and disjoint subsets , , , , and of ,

 A⊥mB|C1in αRG(H;M,C)⟺A⊥mB|C∪C1in H.
###### Corollary 1

For a ribbonless graph and and disjoint subsets of ,

 α(Jm(H);M,C)=Jm(αRG(H;M,C)).
###### Corollary 2

The class of RGs is stable.

The following result has been implicitly discussed in the literature, for example, see cox96 ().

###### Corollary 3

The classes of UGs and BGs are stable.

{pf}

The result follows from the fact that, from UGs and BGs, Algorithm 5.1 generates UGs and BGs, respectively.

{example*}

Figure 5 illustrates a DAG as well as two subsets and of its node set. Figure 6(a) illustrates the generated ribbonless graph using Algorithm 5.1 as well as two subsets and of its node set, and Figure 6(b) illustrates the RG generated by the algorithm from .

For example, consider the graph in Figure 6(b), and let , and . It is seen that . We have that is not -separated from given since is an -connecting path between and given . By Theorem 2 we conclude that is not -separated from given . The same conclusion is made by observing -connecting path in .

## 6 Summary graphs

Definition of summary graphs. A summary graph is a loopless mixed graph which contains no or (arrowhead pointing to line) and no direction-preserving cycle as subgraph. Notice that there are also no multiple edges in SGs except multiple edges consisting of an arrow and an arc.

Obviously the class of SGs is a subclass of RGs. Figure 7 illustrates an SG and an RG that is not an SG. (Because of two reasons: existence of arrowheads pointing to lines and existence of a double edge containing line and arrow.)

### 6.1 Generating summary graphs

A local algorithm to generate SGs. We now present a local algorithm (after determining the ancestor set of the conditioning set) to generate an SG from an SG. {alg} : (Generating an SG from a summary graph )

Start from . Label the nodes in .

1. Apply Algorithm 5.1.

2. Remove all edges (arrows or arcs) with arrowhead pointing to a node in , and replace these by the edge with the arrowhead removed (line or arrow) if the edge does not already exist.

Continually apply step 1 until it is not possible to apply the given step further before moving to the second step. Figure 8 illustrates how to apply Algorithm 6.1 step by step to a DAG. Notice that as it is stated in the description of the algorithm the order of applying the steps does matter here.

The map and its basic properties. For SGs, we prove analogous results to those for RGs.

###### Proposition 3

Graphs generated by Algorithm 6.1 are SGs.

The map and its properties. Notice that step 1 of Algorithm 6.1 generates an RG before removing the nodes in . Hence, step 2 of the algorithm generates an SG from an RG and some extra nodes that are conditioned on. We denote these two steps by . This shows that for generating RGs from SGs, is needed.

###### Proposition 4

Let be a ribbonless graph and and be subsets of . It holds that .

Surjectivity of . The following result shows that every member of can be generated by a DAG after marginalisation and conditioning.

###### Proposition 5

The map is surjective.

### 6.2 Two necessary properties of SG-generating functions

Here, we express two important results that have been introduced for graphs generated by for graphs generated by .

Well-definition of . This property is analogous to the well-definition of as defined in the previous section. For a proof based on matrix representations of graphs and on properties of corresponding matrix operators, see wer08 ().

###### Theorem 3

For a summary graph and disjoint subsets , , , and of ,

 αSG(αSG(H;M,C);M1,C1)=αSG(H;M∪M1,C∪C1).

Stability of the graphs generated by . We prove that analogous to RGs, graphs generated by induce the marginal and conditional independence model. This result can be implied from what was discussed in wer08 ().

###### Theorem 4

For a summary graph and disjoint subsets , , , , and of ,

 A⊥mB|C1in αSG(H;M,C)⟺A⊥mB|C∪C1in H.
###### Corollary 4

For a summary graph and and disjoint subsets of ,

 α(Jm(H);M,C)=Jm(αSG(H;M,C)).
###### Corollary 5

The class of SGs is stable.

{example*}

Figure 9(a) illustrates the generated SG from the DAG in Figure 5 using Algorithm 6.1 as well as the two subsets and of its node set. Figure 9(b) illustrates the SG generated by the algorithm from the SG in part (a).

## 7 Ancestral graphs

An ancestral graph (AG) is a simple mixed graph that has the following properties for every node :

1. ;

2. If , then .

This means that there is no arrowhead pointing to a line and there is no direction-preserving cycle, and there is no arc with one endpoint that is an ancestor of the other endpoint in the graph.

AGs are obviously a subclass of SGs, and therefore RGs. Figure 10 illustrates an SG that is not ancestral. (Because of an arc with one endpoint that is an ancestor of the other.)

### 7.1 Generating ancestral graphs

A local algorithm to generate AGs. In ric02 (), there is a method to generate AGs (in fact maximal AGs) globally by looking at the so-called inducing paths. Here we introduce an algorithm to generate AGs locally (after determining the ancestor set) by looking only for Vs after determining the ancestor set of the conditioning set. {alg} (Generating an AG from an ancestral graph ):

Start from .

1. Apply Algorithm 6.1.

2. Generate respectively an arrow from to or an arc between and for V or V when if the arrow or the arc does not already exist.

3. Remove the arc between and in the case that , and replace it by an arrow from to if the arrow does not already exist.

Continually apply each step until it is not possible to apply the given step further before moving to the next step. Figure 11 illustrates how to apply Algorithm 7.1 step by step to a DAG.

The map and its basic properties. Basic properties of Algorithm 7.1 and its corresponding function are analogous to the basic properties of RGs and SGs.

###### Proposition 6

Graphs generated by Algorithm 7.1 are AGs.

As before we consider as a function from the set of AGs and two subsets of their node set to the set of AGs.

Notice that by the extension of the generated AG to a maximal AG (as explained in ric02 ()) the same maximal AG as that generated by the method explained in ric02 () is generated, and hence these two graphs induce the same independence model. This also explains the global interpretation of the algorithm. We will not give the details in this paper.

The map and its properties. Notice that step 1 of Algorithm 7.1 generates an SG. Hence steps 2 and 3 of the algorithm generate an AG from an SG. We denote these two steps by , a function from to .

###### Proposition 7

It holds that .

Surjectivity of . The following result shows that every member of can be generated by a DAG after marginalisation and conditioning.

###### Proposition 8

The map is surjective.

{pf}

The result follows from Proposition 5, the fact that , and if then .

### 7.2 Two necessary properties of AG-generating functions

Again we discuss the two important properties that we have proven for two other stable mixed graphs.

Well-definition of . Well-definition of is analogous to the well-definition of and as defined in the previous sections.

###### Theorem 5

For an ancestral graph and disjoint subsets , , , and of ,

 αAG(αAG(H;M,C);M1,C1)=αAG(H;M∪M1,C∪C1).

Stability of the graphs generated by . Analogous to RGs and SGs, graphs generated by induce marginal and conditional independence models. An analogous result was proven in ric02 () for maximal AGs that were generated in that paper.

###### Theorem 6

For an ancestral graph and disjoint subsets , , , , and of ,

 A⊥mB|C1in αAG(H;M,C)⟺A⊥mB|C∪C1in H.
###### Corollary 6 ((ric02 ()))

For an ancestral graph and and disjoint subsets of ,

 α(Jm(H);M,C)=Jm(αAG(H;M,C)).
###### Corollary 7

The class of AGs is stable.

{example*}

Figure 12(a) illustrates the AG generated from the DAG in Figure 5. Figure 12(b) illustrates the AG generated by the algorithm from the AG in part (a).

## 8 The relationship between different types of stable mixed graphs

Thus far, we have defined RGs (as a modification of MCGs), SGs, and AGs, and introduced algorithms to generate each of these from a graph of the same class or a DAG, and some algorithms that act between these classes. Despite the similarities of these definitions and generating algorithms of these different classes, as well as the parallel theory developed for these, it is of interest to investigate the exact relationship between these types of graphs.

Corresponding stable mixed graphs. When one starts from a DAG and generates different types of stable mixed graphs after marginalisation over and conditioning on two specific subsets of the node set of the DAG, the generated graphs must induce the same independence models. This leads us to the definition of corresponding stable mixed graphs. For a directed acyclic graph and two disjoint subsets of its node set and , graphs , , and are called, respectively, the corresponding RG, SG, and AG.

We observe that the corresponding RGs, SGs, and AGs of a DAG induce the same independence model. This fact, without being formulated in this way, was discussed in all three papers that define these graphs kos02 (), ric02 (), wer08 ().

###### Proposition 9

For a directed acyclic graph and disjoint subsets and of ,

 Jm(αRG(G;M,C))=Jm(αSG(G;M,C))=Jm(αAG(G;M,C)).
{pf}

The result follows from Corollaries 1, 4 and 6. As it was shown, in SGs and AGs there are extra properties regarding the structure of the graph. We know that . The corresponding AG to an SG can be generated by as outlined in Proposition 7. However, we cannot generate the corresponding SG to an RG by only knowing the RG and not the DAG (or the conditioning set of the DAG). For example, DAGs and , where and , give the same RG but different SGs and respectively. This is also true for AGs instead of SGs.

It is possible, however, to introduce an algorithm to generate SGs that induce the same independence model as the given RGs, by removing arrowheads pointing to a line or a node that is an ancestor of a node that is the endpoint of a line.

We have also seen that the image of generating functions is big enough to cover all graphs included in the set of the related type of stable mixed graphs, since the generating functions are surjective. On the other hand, it is easy to show that generating functions are not injective. Therefore, the relationship between the three types of stable mixed graphs is summarised by the diagram in Figure 13, in which one can only move towards the directions that arrows show.

## 9 Discussion on the use of different types of stable mixed graphs

By what we discussed, if is a DAG with latent variables and selection variables then stable mixed graphs are a class of graphs that represent the independence model implied among the remaining variables, conditional on the selection variables. However, each of the three types has been used in different contexts and for different purposes.

Why MCGs or RGs? MCGs have been introduced in order to straightforwardly deal with the problem of finding a class of graphs that is closed under marginalisation and conditioning by a simple process of deriving these from DAGs. In fact, the class of MCGs is much larger than what one really needs for representing independence models after marginalisation and conditioning. We have noted that only MCGs that are ribbonless can be generated this way.

Why SGs? The main goal of defining SGs is to trace the effects after marginalisation and conditioning, as will be explained shortly in this section. By using binary matrix representations of graphs, called edge matrices, and corresponding matrix operators wer06 (), the edge matrix of a SG is obtained. It contains three types of edge matrices: those for solid lines, dashed lines (corresponding to arcs), and for arrows. In the family of joint Gaussian distributions, solid lines in concentration graphs correspond to concentration matrices, dashed lines in covariance graphs to covariance matrices and arrows to equation parameters in structural equation models.

SGs are used when the generating DAG is known. Despite knowledge on the structure of the generating DAG, SGs are still of interest in at least three situations: (1) For models with large number of unobserved and selection variables; and (2) for the comparison of models when one of them has unobserved or selection variables that are a subset of the unobserved or selection variables of the other; (3) for detecting some types of confounding as shown in werc08 () and as described briefly later.

Why AGs? The main goal of defining AGs is to represent and parametrise sets of distributions obeying Markov properties. Even though, we discussed the class of AGs in this paper to sustain a parallel theory to RGs and SGs, the class of maximal AGs possess some desired properties that AGs do not. These include the fact that under the Gaussian path diagram parametrisation the maximal AG only implies independence constraints, while a general AG implies other types of constraints. We will give a short discussion on maximality in this section. Maximal AGs are the simplest structures that capture the modified independence model, and are also of interest when the generating DAG is not known, but a set of conditional independencies is known. In the Gaussian case, maximal AGs are identified. In contrast to DAG models with hidden variable, the models are curved exponential families ric02 (), and conditional fitting algorithm for maximum likelihood estimation exists drt04 ().

Maximal stable mixed graph. A graph is called maximal if by adding any edge to the independence model induced by changes (gets smaller). Therefore, in maximal graphs, every missing edge corresponds to at least one independence statement in the induced independence model. This leads to validity of a so-called pairwise Markov property.

In ric02 (), maximality of the subclass of AGs was studied. This result also holds for RGs and says that a ribbonless graph is maximal if and only if does not contain any primitive inducing paths, which are paths of form , on which and for every , , is a collider on the path and . We shall not give the details in this paper.

Therefore, to generate a maximal stable mixed graph from a stable mixed graph one should repeatedly generate arrows from to for primitive inducing paths between non-adjacent and where there is no arrowhead pointing to , and generate arcs between and for primitive inducing paths between non-adjacent and where there are arrowheads pointing to and . Notice that by applying this algorithm after the generating algorithms one can generate a maximal AG, SG, or RG.

As discussed, maximal AGs possess many desired properties that AGs do not. For SGs, it is conjectured that maximal SGs possess the same statistical properties that both maximal AGs and SGs do possess. To show this, further work is needed.

The structure of different types of stable mixed graphs. If we suppose that stable mixed graphs are only used to represent the independence model after marginalisation and conditioning, then we can consider all types as equally appropriate. The question then will be reduced to how simple or fast generating a type of graph is. We have seen that AGs have the simplest structure among the three types of stable mixed graphs, and RGs are the most complex. Therefore, as we have also seen, it is more complex to generate an AG than to generate an SG, and to generate an SG than to generate an RG. On the other hand, the simpler structure allows a faster way of checking independence statements. Hence, it is a tradeoff that depends on the relative size of the marginalisation and conditioning sets in graphs.

When generating stable mixed graphs from DAGs, one always loses some information in order to obtain a simpler structure in stable mixed graphs. RGs have lost the least information among the three types of stable mixed graphs, while AGs the most. Here we discuss the lost information in the context of regression analysis.

Multivariate regression and stable mixed graphs. The problem of constructing stable mixed graphs was originally posed by wer94 () in the context of multivariate statistics based on regression analysis. In such literature, the DAG model is defined by sequences of univariate recursive regressions, called a linear triangular system by werc04 (), that is, for , each single response variable is regressed on , where the parents of are a subset of . Linear triangular systems can be written as , where is an upper-triangular matrix with unit diagonal elements, and is a vector of zero mean and uncorrelated random variables, called residuals. Here the nonzero regression coefficient of on can be attached the arrow from to in the DAG and is called the direct effect of on ; see cox93 ().

In particular, for linear triangular systems, RGs alert to distortions due to so-called over-conditioning via multiple edges consisting of a line and an arrow. Over-conditioning arises by conditioning on a variable that is a response of two variables, one of which itself is a response to the other one.

For example, in Figure 14, the generating process is given by three linear equations,

 Y1=βY2+δY3+ε1,Y2=γY3+ε2,Y3=ε3,

where each residual has mean zero and is uncorrelated with the explanatory variables on the right-hand side of the equation.

By conditioning on , the conditional dependence of on only is obtained, which consists of the direct effect and an indirect effect of on via . This may be seen by direct calculation, assuming that the residuals have a Gaussian distribution, which leads to

 E(Y2|Y3)=(γ−{(1−γ2)/(1−ρ213)}βρ13)Y2,where ρ13=δ+βγ.

Thus, the direct effect is distorted by . The potential presence of this distortion is represented in (b) by the addition of an arrow.

In addition, the existence of multiple edges with an arrow and an arc, and arcs with one endpoint ancestor of the other, which are not permissible in AGs, respectively, alerts distortions due to so-called direct and indirect confounding.

With the same generating process as explained for Figure 14, by integrating out in Figure 15, the conditional dependence of on only is obtained, which consists of the direct effect and an indirect effect of on via . This leads to

 E(Y1|Y2)=(β+δγ)Y2.

Thus, the direct effect is distorted by . The potential presence of this distortion is represented in (b) by the addition of an arc. This example indicates a distortion due to direct confounding; see werc08 (). Indirect confounding was also studied in werc08 () for marginalising only over a full set of background variables and also in wer08 () more generally relating SGs to corresponding maximal AGs.

## Appendix: Proofs

Here we present the proof of lemmas, propositions, and theorems of this paper, but first we introduce some observations that are used in our proofs as the following lemmas.

###### Lemma 2

If in , then in one of the following holds: (1) ; (2) or a descendant of is the endpoint of a line; (3) .

{pf}

We know that there is a direction-preserving path in . Consider the -edge. By Lemma 1 in given and , there is an -connecting path between and , on which there is no arrowhead pointing to . One can observe that if this path is not a direction-preserving path then one of the following holds: (1)  is an ancestor of a collider node on the path, which is in . Hence, ; (2) is the endpoint of a line or an ancestor of a node that is the endpoint of a line on the path. If (1) or (2) holds, then we are done, hence assume that . By the same argument and by induction along the nodes of , we conclude the result.

###### Lemma 3

For and outside , if in then one of the following holds: (1)  in ; (2) in .

{pf}

We know that there is a direction-preserving path in . Consider the -edge. We now have three cases: (1) If , then in and we are done. (2) If , then Algorithm 5.1 generates an arrow from to . (3) If , then in . By the same argument and by induction along the nodes of , we conclude the result. The following lemma deals with the concatenation of -connecting paths. We shall not give the details of the proof here; see sad12 ().

###### Lemma 4

In an RG, suppose that given and there are -connecting paths between and and between and . In this case, there is an -connecting path given and between and if one of the following holds:

1. [(b2)]

2. is collider and ;

3. with arrowhead pointing to on the -edge and ;

4. is non-collider and ;

5. with no arrowhead pointing to on the -edge and .

6. is collider and or a descendant of is the endpoint of a line or a direction-preserving cycle;

7. with arrowhead pointing to on the -edge and or a descendant of is the endpoint of a line or a direction-preserving cycle.

{pf*}

Proof of Proposition 1 Graphs generated by Algorithm 5.1 have obviously three desired types of edges and are loopless.

Now suppose, for contradiction, that there is a ribbon in a generated graph . By Lemma 1 in given and , there are -connecting paths between and and between and such that there are arrowheads at on both - and -edges. (Notice that it is possible that and it is also possible that or in .)

We also know that, in , the node is the endpoint of a line or on a direction-preserving cycle or there is a direction-preserving path from to such that is the endpoint of a line or on a direction-preserving cycle. Now we consider two cases. In case I, we suppose that such a does not exist and in case II we suppose that such a exists.

Case I. In case I.1, we suppose that is the endpoint of a line and in case I.2 we suppose that is on a direction-preserving cycle.

Case I.1. Suppose that is the endpoint of an -line in . By Lemma 1 in given and there is an -connecting path between and , on which there is no arrowhead pointing to or . One can observe that is an ancestor of either (1) a collider node on the path or (2) a node that is the endpoint of a line on the path. Thus, we have the two following cases:

(1) If is an ancestor of a collider node , then in since . Hence, by Lemma 4(a), there is an -connecting path given and between and in .

(2) If is an ancestor of a node that is the endpoint of a line on the path, then by Lemma 4(c) there is an -connecting path given and between and in .

By Lemma 1, both cases imply that in and the -edge is endpoint-identical to the -connecting path. Therefore, is not a ribbon, a contradiction.

Case I.2. Suppose that is on a direction-preserving cycle in . By Lemma 2 in one of the following holds: (1) ; (2) or a descendant of is the endpoint of a line; (3) . Cases (2) and (3) lead to contradiction as explained in case I.1. Therefore, suppose that is on a direction-preserving cycle in . This by Lemma 4(c) implies that there is an -connecting path given and between and in , which implies that is not a ribbon. This is a contradiction.

Case II. By Lemma 2 in one of the following holds: (1) ; (2) or a descendant of is the endpoint of a line; (3) . Cases (2) and (3) lead to contradiction as explained in case I.1. Hence, it holds that in . This together with the same argument as that of case I (for instead of ) leads to a contradiction. {pf*}Proof of Lemma 1 () If an edge between and in does not exist in , then it has been generated by certain intermediate graphs that have each been generated by adding one edge to the previous graph by one of the steps of Table 1. We denote these graphs by the sequence , where is the last step before removing and .

We prove by reverse induction on that in all , , between and there exists a path on which non-collider inner nodes are in and collider inner nodes or their descendants are either in or the endpoint of a line. For , there is obviously an edge between and . We show that if there is such a path in then we can find the same type of path between and in .

If all edges along the path exist in , then we should check that a collider node that is an ancestor of a member of or an ancestor of a node that is the endpoint of a line in is an ancestor of a member of or an ancestor of a node that is the endpoint of a line in . If an arrow has been generated along the direction-preserving path in , then it has been generated by the Vs of the first three steps or the V of step 8 of Table 1. If it is step 1, then we can replace the -arrow by to obtain a direction-preserving path. If it is steps 2 or 3, then node is the endpoint of a line and we are done. If it is step 8 then, since , the inner node of the V is in and we are done.

Thus, suppose that an -edge along the -connecting path is the edge that has been generated by this step. This has been generated by one of Vs of Table 1. Since in all cases the V is endpoint-identical to the -edge, and since all inner nodes of the non-collider Vs are in and all inner nodes of the collider Vs are in , by placing the V instead of the -edge on the path, we still get a path whose non-collider inner nodes are in and either whose collider inner nodes are in or whose collider nodes or a descendant of them are the endpoint of a line, as required.

Therefore, by reverse induction, there exists a path, as described above, in . However, since is ribbonless, the path cannot contain a collider V such that or a descendant of is the endpoint of a line unless and the -edge is endpoint-identical to . In this case, the -edge can be used instead of and, by induction, we obtain an -connecting path given and between and .

The fact that in Table 1 the Vs are endpoint-identical to the generated -edges implies that all discussed paths in each are endpoint-identical.

() Suppose that there is an -connecting path given and between and in , . We prove as long as if there is an -connecting path given and between and in with inner nodes then there is an -connecting path given and between and in with inner nodes. By induction we will finally obtain an -connecting path between and without inner nodes, that is, an edge between and .

Consider an -connecting path given and between and in with inner nodes. Consider an arbitrary inner node on the path. If this node is collider, then one of the Vs 8, 9, or 10 of Table 1 is employed to generate an edge between the endpoints of the V in . Since the generated edge is endpoint-identical to the V, one can use the generated edge instead of the V to obtain an -connecting path with inner nodes. If the arbitrary node is non-collider then one of the other Vs of Table 1 is used.

It is easy to check that the generated edges are endpoint-identical to the -connecting paths in the final graph. This implies the result. {pf*}Proof of Proposition 2 Let . We generate a directed graph from as follows: We leave arrows that are not on any direction-preserving cycle unchanged. For direction-preserving cycles, instead of one arbitrary arrow from to on the cycle we place , where and , and leave all other arrows unchanged. Instead of an arc between and we place a V between and with inner source node in . Instead of a line between and we place a V between and with inner collider node in . The graph is obviously a directed graph. Furthermore, all newly generated nodes have degree 2 and the direction of arrows changes on them, hence these cannot be on any direction-preserving cycle. In addition, if and are in and in then or in . Therefore, the existence of a direction-preserving cycle in implies the existence of the same direction-preserving cycle in . But by the nature of the construction of we know that direction-preserving cycles in do not make direction-preserving cycles in , hence is acyclic.

We should prove that . Let . Obviously . Suppose that (, , or ) in . Therefore, we have the active alternating path or one of the Vs between and that by Algorithm 5.1 forms exactly the same type of edge in .

Conversely, suppose that (, , or ) in . By Lemma 1 we know that there is an endpoint-identical -connecting path given and in . Consider a shortest endpoint-identical -connecting path . Since in there is no transition node in , is active alternating with respect to and . If has no collider node in , then by the nature of the construction of we know that it has two edges (if both endpoints are children or parents) or three (if it is from to on a direction-preserving cycle) and that it has been generated by an edge (arrow, arc, or line) in . Suppose, for contradiction, that there is a collider node on . We have that , and by the process of generating a DAG explained here, the only place that a node in has been generated is by a line or an arrow on a direction-preserving cycle in . Therefore, for a node that is the endpoint of a line or is on a direction-preserving cycle. Hence, contains a ribbon, or the endpoints of the collider V with as inner node are adjacent by an endpoint-identical edge. The former contradicts that is ribbonless, and the latter contradicts that is shortest. {pf*}Proof of Theorem 1 () If there is an -edge in , then by Lemma 1, there is an -connecting path between and given and in that is endpoint-identical to the edge.

For the V on , again by Lemma 1, given and there are -connecting paths between and and between and