Unifying Markov Propertiesfor Graphical Models

# Unifying Markov Properties for Graphical Models

University of Copenhagen
Universitetsparken 5
2100 Copenhagen
Denmark
Statistical Laboratory
Centre for Mathematical Sciences
Cambridge, CB3 0WA
United Kingdom
University of Copenhagen and University of Cambridge
###### Abstract

Several types of graphs with different conditional independence interpretations — also known as Markov properties — have been proposed and used in graphical models. In this paper we unify these Markov properties by introducing a class of graphs with four types of edges — lines, arrows, arcs, and dotted lines — and a single separation criterion. We show that independence structures defined by this class specialize to each of the previously defined cases, when suitable subclasses of graphs are considered. In addition, we define a pairwise Markov property for the subclass of chain mixed graphs which includes chain graphs with the LWF interpretation, as well as summary graphs (and consequently ancestral graphs). We prove the equivalence of this pairwise Markov property to the global Markov property for compositional graphoid independence models.

[
\kwd
\startlocaldefs\endlocaldefs\runtitle

Unifying Markov Properties

{aug}

and

\thankstext

t2Work of the second author was partially supported by grant FA9550-14-1-0141 from the U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA).

class=AMS] \kwd[Primary ]62H99 \kwd[; secondary ]62A99

AMP Markov property \kwd-separation \kwdchain graph \kwdcompositional graphoid \kwd-separation \kwdindependence model \kwdLWF Markov property \kwd-separation \kwdmixed graph \kwdpairwise Markov property \kwdregression chain Markov property

## 1 Introduction

Graphical models provide a strong and clear formalism for studying conditional independence relations that arise in different statistical contexts. Originally, graphs with a single type of edge were used; see, for example, [3] for undirected graphs (originating from statistical physics [11]), and [40; 13] for directed acyclic graphs (originating from genetics [43]).

With the introduction of chain graphs [18], and other types of graphs with edges of several types [2; 38; 26; 22] as well as different interpretations of chain graphs [1; 6], a plethora of Markov properties have emerged. These have been introduced with different motivations: chain graphs as a unification of directed and undirected graphs, the so-called AMP Markov property to describe dependence structures among regression residuals, bidirected graphs to represent structures of marginal independence, and other mixed graphs to represent selection effects and incomplete observations in causal models. Despite the similarities among these, the lack of a general theory as well as the use of different definitions and notation has undermined the original conceptual simplicity of graphical models. This motivates a unification of the corresponding Markov properties. In [29], we attempted this for different types of mixed graphs, but failed to include chain graph Markov properties. Here we follow an analogous approach using a single separation criterion, but using four types of edges: line, arrow, arc, and dotted line. To the best of our knowledge, this unifies most graphical independence models previously discussed in the literature. One exception is Drton’s [6] type III chain graph Markov property which has several unfortunate properties and so far has not played any specific role; we have chosen to avoid introducing a fifth type of edge to accommodate this property; another exception is the reciprocal graphs of Koster [14], which allow feedback cycles; other exceptions use graphs to describe conditional independence in dynamical systems [8; 5] which we do not discuss here. Our unification includes summary graphs — which include ancestral graphs as well as chain graphs with the multivariate regression Markov property [2] — chain graphs with the LWF Markov property [18; 9], and chain graphs with the AMP Markov property [1].

In addition to the unification of the (global) Markov property, we provide a unified pairwise Markov property. However, it seems technically complex to include the pairwise Markov property for chain graphs with the AMP interpretation and hence we only discuss this for the subclass of graphs with three types of edges where cycles of specific types are absent. Such graphs were called chain mixed graphs (CMGs) in [28] and its corresponding independence model unifies those of summary graphs (and ancestral graphs) as well as chain graphs with the LWF Markov property. For CMGs, we first discuss the notion of maximality and show that every missing edge in a maximal CMG corresponds to an independence statement, thus forming a potential base for specifying pairwise Markov properties. For CMGs we prove the equivalence of pairwise and global Markov properties for abstract independence models which are compositional graphoids.

The structure of the paper is as follows: In the next section, we define graphs with four types of edges and provide basic graph theoretical definitions. In Section 3, we discuss general independence models and compositional graphoids, provide a single separation criterion for such graphs, and show that the induced independence models are compositional graphoids. Further we demonstrate how the various independence models discussed in the literature are represented within this unification. In Section 4, we define the notion of maximal graphs, provide conditions under which a CMG is maximal, and show that any CMG can be modified to become maximal without changing its independence model. In Section 5, we provide a pairwise Markov property for CMGs, and prove that for compositional graphoids, the pairwise Markov property is equivalent to the global Markov property. Finally, we conclude the paper with a discussion in Section 6.

## 2 Graph terminology

### 2.1 Graphs

A graph is a triple consisting of a node set or vertex set , an edge set , and a relation that with each edge associates two nodes (not necessarily distinct), called its endpoints. When nodes and are the endpoints of an edge, these are adjacent and we write . We say the edge is between its two endpoints. We usually refer to a graph as an ordered pair . Graphs and are called equal if . In this case we write .

The graphs that we use are labeled graphs, i.e. every node is considered a different object. Hence, for example, the graph is not equal to the graph .

In addition, in this paper, we use graphs with four types of edges denoted by arrows, arcs (solid lines with two-headed arrows), lines (solid lines), and dotted lines; as will be seen in Section 3, we shall use dotted lines to represent chain graphs with the AMP Markov property. Henceforth, by ‘graph’, we mean a graph with these four possible types of edges. We do not distinguish between and , between and , or between and , but we do distinguish between and .

A loop is an edge with endpoints being identical. In this paper, we are only considering graphs that do not contain loops. Multiple edges are edges sharing the same pair of endpoints. A simple graph has neither loops nor multiple edges. Graphs we are considering in this paper may generally contain multiple edges, even of the same type. However we shall emphasize for all purposes in the present paper, multiple edges of the same type are redundant and hence at most one edge of every type is necessary to represent the objects we discuss.

We say that is a neighbor of if these are endpoints of a line; if there is an arrow from to , is a parent of and is a child of . We also say that is a spouse of if these are endpoints of an arc, and is a partner of if they are endpoints of a dotted line. We use the notations , , , and for the set of all neighbours, parents, spouses, and partners of respectively. More generally, for a set of nodes we let and similarly for , , and .

A subgraph of a graph is graph such that and each edge present in also occurs in and has the same type there. An induced subgraph by a subset of the node set is a subgraph that contains all and only nodes in and all edges between two nodes in .

A walk is a list of nodes and edges such that for , the edge has endpoints and . We allow a walk to consist of a single node . If the graph is simple then a walk can be determined uniquely by a sequence of nodes. Also, a non-trivial walk is always determined by its edges, so we may write without ambiguity. Throughout this paper, however, we often use only node sequences to describe walks even in graphs with multiple edges, when it is apparent from the context or the type of the walk which edges are involved. The first and the last nodes of a walk are its endpoints. All other nodes are inner nodes of the walk. We say a walk is between its endpoints. A cycle is a walk with at least two edges and no repeated node except . A path is a walk with no repeated node.

A subwalk of a walk is a walk that is a subsequence of between two occurrences of nodes (, ). If a subwalk forms a path then it is also a subpath of .

In this paper we need different types of walks as defined below. Consider a walk . We say that

• is undirected if it only consists of solid lines;

• is directed from to if all edges , , are arrows pointing from to ;

• is semi-directed from to if it has at least one arrow, no arcs, and every arrow is pointing from to ;

• is anterior from to if it is semi-directed from to or if it is composed of lines and dotted lines.

Thus a directed walk is also semi-directed and a semi-directed walk is also an anterior walk. If there is a directed walk from to () then is an ancestor of . We denote the set of ancestors of by . If there is an anterior walk from to () then we also say that is anterior of . We use the notation for the set of all anteriors of . For a set , we define . We also use the notations and for the set of reflexive ancestors and anteriors of so that and . In addition, we define a set to be anterior if for all ; in other words, is anterior if .

In fact, we are only interested in these walks when we discuss graphs without dotted lines. For example, consider the following walk (path) in such a graph:

Here it holds that there is an undirected walk between and and hence , but there is no semi-directed walk from to . In addition, we have that and , while there is a semi-directed walk from to . There is also no anterior walk from to .

Notice that, unlike most places in the literature (e.g. [26]), we use walks instead of paths to define ancestors and anteriors. Using walks instead of paths is immaterial for this purpose as the following lemma shows.

###### Lemma 1.

There is a directed or anterior walk from to if and only if there is a directed or anterior path from to respectively.

###### Proof.

If there is a path, there is a walk as a path is also a walk. Conversely, assume there is a directed or anterior walk from to . If then we are done by definition. Otherwise, start from and move on the walk towards . Consider the first place where a node is repeated on the walk. The walk from to forms a cycle. If we remove this cycle from the walk, the resulting walk remains directed; similarly, the walk resulting from an anterior walk remains anterior. Successively removing all cycles along the walk in this way implies the result. ∎

A section of a walk is a maximal subwalk consisting only of solid lines, meaning that there is no other subwalk that only consists of solid lines and includes . A walk decomposes uniquely into sections; sections may also be single nodes. The section is an inner section on the walk if all nodes on the section are inner nodes on the walk and an endpoint section if it contains an endpoint of the walk. A section on a walk is called a collider section if one of the four following walks is a subwalk of : , , , , and , i.e., a section is a collider if two arrowheads meet at or an arrowhead meets a dotted line. All other sections on are called non-collider sections; these are sections that are an endpoint of or the following sections: , , , , and . We may speak of collider or non-collider sections (or nodes) without mentioning the relevant walk when this is apparent from the context. Notice that a section may be a collider on one part of the walk and a non-collider on another. For example, in Fig. 1(a), the section is a collider on the walk . It is also a collider on via the edge , but a non-collider on via the edge . Notice also that is a non-collider on .

A tripath is a path with three distinct nodes. Note that [27] used the term V-configuration for such a path. If the inner node on a tripath is a collider we shall also say that the tripath itself is a collider or non-collider.

### 2.2 Subclasses of graphs

Most graphs discussed in the literature are subclasses of the graphs considered here. In addition, the global Markov property defined in the next section specializes to the independence structures previously discussed. Exceptions include MC graphs [15] and ribbonless graphs [27]. However, any independence structure represented by an MC graph or a ribbonless graph can also be represented by a summary graph or an ancestral graph [29], which are also covered in this paper.

Although we do not set any constraints on the class of graphs with four types of edges for the purpose of defining a global Markov property in Section 3, the most general class of graphs for which we explicitly define a pairwise Markov property in Section 5 is the class of chain mixed graphs (CMGs) [28]. CMGs are graphs without dotted lines and semi-directed cycles, hence reciprocal graphs as in [14] are not CMGs. CMGs may have multiple edges of all types except a combination of arrows and lines or arrows in opposite directions as such combinations would constitute semi-directed cycles. The graph in Fig. 1(a) is an example of graph with four types of edges, and the graph in Fig. 1(b) is not a CMG because of the semi-directed cycle .

It is helpful to classify subclasses of graphs into three categories: basic graphs, chain graphs, and mixed graphs, as briefly described below.

#### Basic graphs

These are graphs that only contain one type of edge; they include undirected graphs (UGs), containing only lines; bidirected graphs (BGs), containing only bidirected edges; dotted line graphs (DGs), containing only dotted lines; and directed acyclic graphs (DAGs), containing only arrows without any directed cycle. Clearly, a graph without arrows has no semi-directed cycles, and a semi-directed cycle in a graph with only arrows is a directed cycle. Note that [2; 12; 39; 7] use the terms concentration graphs and covariance graphs for UGs and BGs, referring to their independence interpretation associated with covariance and concentration matrices for Gaussian graphical models. DGs have not been studied specifically; as we shall see, any independence structure associated with a DG is Markov equivalent to the corresponding UG, where dotted lines are replaced by lines. DAGs have in particular been useful to describe causal Markov relations; see for example [13; 24; 17; 10; 31].

#### Chain graphs

A chain graph (CG) is a graph with the two following properties: 1) if we remove all arrows, all connected components of the resulting graph — called chain components — contain one type of edge only; 2) if we replace every chain component by a node then the resulting graph is a DAG. DAGs, UGs, DGs, and BGs are all instances of chain graphs. For a DAG, all chain components are singletons, and for a chain graph without arrows, the chain components are simply the connected components of the graph.

If all chain components contain lines, the chain graph is an undirected chain graph (UCG) (here associated with the LWF Markov property); if all contain arcs, it is a bidirected chain graph (BCG) (here associated with the multivariate regression chain graph Markov property); and if all contain dotted lines, it is a dotted line chain graph (DCG) (here associated with the AMP Markov property). For example, in Fig. 2(a) the graph is a chain graph with chain components , , and , but in Fig. 2(c) the graph is not a chain graph because of the semi-directed cycle .

Regression graphs [42] are chain graphs consisting of lines and arcs (although dashed undirected edges have previously been used instead of arcs in the literature), where there is no arrowhead pointing to nodes that are endpoints of lines.

#### Mixed graphs

Marginalization and conditioning in DCGs (studied in [22]) lead to marginal AMP graphs (MAMPs); in our formulation, where we use dotted lines in place of full lines, MAMPs are graphs without solid lines that satisfy three additional conditions:

1. has no quasi-directed cycles in the sense it has no walk containing at least one arrow and every arrow is pointing from to ;

2. has no cycles composed of dotted lines and one arc;

3. If and for some , then .

Graphs discussed here also contain different types of mixed graphs, a term previously used to denote graphs with lines, arrows, and arcs. These were introduced to describe independence structures obtained by marginalization and conditioning in DAG independence models; see for example [27] for a general discussion of this issue. Examples are summary graphs (SGs) [37], ancestral graphs (AGs) [26] and acyclic directed mixed graphs (ADMGs) [32; 25]. Summary graphs are CMGs that have no arrowhead pointing to nodes that are endpoints of lines. Ancestral graphs satisfy in addition that there are no arcs with one endpoint being an ancestor of the other endpoint. Note that in many papers about summary graphs, dashed undirected edges have been used in place of bidirected edges.

ADMGs are summary graphs without lines. Alternative ADMGs (AADMGs) were defined in [23], where arcs in ADMGs were replaced by dotted lines with our notation, although lines were used in the original definition.

CMGs are also mixed graphs, and originally defined in [28] in order to describe independence structures obtained by marginalization and conditioning in chain graph independence models. Anterial graphs (AnGs) were also defined in [28] for the same purpose, and they are CMGs in which an endpoint of an arc cannot be an anterior of the other endpoint.

The diagram in Fig. 3 illustrates the hierarchy of subclasses of graphs with four types of edges. Below we shall provide a unified separation criterion for all graphs with four types of edges and thus the associated independence models share the same hierarchy. The diagram is to be read transitively in the sense that, for example, BGs are also AGs, since the class of BGs form a subclass of BCGs, which again form a subclass of AGs; thus we omit the corresponding arrow from AG to BG.

The dashed arrow from DCG to UG indicates that although UGs are not DCGs, their associated independence models contain all independence models given by UGs and similarly for the dashed arrow from UCG to DG. The dotted arrow from SG to AG indicates that although AG is a subclass of SG, their associated independence models are the same. The dotted link between UG and DG indicates that the associated independence models are the same. These facts will be demonstrated in the next section.

## 3 Graphical independence models

Graphs are used to encode independence structures for graphical models; in this section we shall demonstrate how this can be done.

### 3.1 Independence models and compositional graphoids

An independence model over a finite set is a set of triples (called independence statements), where , , and are disjoint subsets of ; may be empty, but and are always included in . The independence statement is read as “ is independent of given ”. Independence models may have a probabilistic interpretation—see Section 3.4 for details—but this need not necessarily be the case. Similarly, not all independence models can be easily represented by graphs. For further discussion on general independence models, see [35].

An independence model over a set is a semi-graphoid if it satisfies the four following properties for disjoint subsets , , , and of :

1. if and only if (symmetry);

2. if then and (decomposition);

3. if then and (weak union);

4. and if and only if (contraction).

A semi-graphoid for which the reverse implication of the weak union property holds is said to be a graphoid; that is it also satisfies

1. if and then (intersection).

Furthermore, a graphoid or semi-graphoid for which the reverse implication of the decomposition property holds is said to be compositional, that is it also satisfies

1. if and then (composition).

### 3.2 Independence models induced by graphs

The notion of separation is fundamental for using graphs to represent models of independence. For a simple, undirected graph, separation has a direct intuitive meaning, so that a set of nodes is separated from a set by a set if all walks from to intersect . Notice that simple separation in an undirected graph will trivially satisfy all of the properties (S1)–(S6) above, and hence compositional graphoids are abstractions of independence models given by separation in undirected graphs. For more general graphs, separation may be more subtle, to be elaborated below.

We say that a walk in a graph is connecting given if all collider sections of intersect and all non-collider sections are disjoint from . For pairwise disjoint subsets , we say that and are separated by if there are no connecting walks between and given , and we use the notation . The set is called an -separator.

The notion of separation above is a generalization of the -separation for UCGs as defined in [34; 36]. The idea of using walks to simplify the separation theory was proposed by [15], who showed that, for DAGs, this notion of separation was identical to -separation [24].

For example, in the graph of Fig. 4, and do not hold. The former can be seen by looking at the connecting walk , where the only node and the node of the collider sections and are in the potential separator set . The latter can be seen by looking at the connecting walk , where the non-collider sections and are outside , but collider sections (nodes) and are inside However, for example, and since, in the former case, collider section is blocking all the walks and, in the latter case, one of the collider sections or is blocking any walk.

A graph induces an independence model by separation, letting . It turns out that any independence model defined in this way shares the six fundamental properties of undirected graph separation. More precisely we have the following:

###### Theorem 1.

For any graph , the independence model is a compositional graphoid.

###### Proof.

Let , and consider disjoint subsets , , , and of . We verify each of the six properties separately.

1) Symmetry: If then : If there is no connecting walk between and given then there is no connecting walk between and given .

2) Decomposition: If then : If there is no connecting walk between and given then there is a forteriori no connecting walk between and given .

3) Weak union: If then : Using decomposition 2) yields and . Suppose, for contradiction, that there exists a connecting walk between and given . If there is no collider section on then there is a connecting walk between and given , a contradiction. On , all collider sections must have a node in . If all collider sections have a node in then there is a connecting walk between and given , again a contradiction. Hence consider first the collider section nearest on that only has nodes in on ; next, consider the closest node to on that is in . The subwalk between and then contradicts .

4) Contraction: If and then : Suppose, for contradiction, that there exists a connecting walk between and given . Consider a shortest walk (i.e. a walk with fewest number of edges) of this type and call it . The walk is either between and or between and . The walk being between and contradicts . Therefore, is between and . In addition, since all collider sections on have a node in and , a non-collider section of must exist that has a node in , and, therefore, in . This contradicts the fact that is a shortest connecting walk between and given .

5) Intersection: If and then : Suppose, for contradiction, that there exists a connecting walk between and given . Consider a shortest walk of this type and call it . The walk is either between and or between and . Because of symmetry between and in the formulation, it is enough to suppose that is between and . Since all collider sections on have a node in and , a non-collider section of must exist that has a node in , and, therefore, in . This contradicts the fact that is a shortest connecting walk between and given .

6) Composition: If and then : Suppose, for contradiction, that there exist connecting walks between and given . Consider a walk of this type and call it . Walk is either between and or between and . Because of symmetry between and in the formula it is enough to suppose that is between and . But this contradicts . ∎

This theorem implies that we can focus on establishing conditional independence for pairs of nodes, formulated in the corollary below.

###### Corollary 1.

For a graph and disjoint subsets of nodes , , and , it holds that if and only if for every pair of nodes and .

###### Proof.

The result follows from the fact that satisfies decomposition and composition. ∎

### 3.3 Relation to other separation criteria

Four different types of independence models have previously been associated with chain graphs. These are known as the LWF Markov property, defined by [18] and later studied by e.g. [9; 36]; the AMP Markov property, defined and studied by [1], and the multivariate regression (MR) Markov property, introduced by [2] and studied e.g. by [20]; in addition, Drton [6] briefly considered a type III chain graph Markov property which we are not further considering here.

Traditionally these have been formulated using undirected chain graphs but with different separation criteria. In contrast, here we use a single notion of separation and the different independence models appear by varying the type of chain graph. In particular, the LWF Markov property corresponds to UCGs, the MR Markov property to BCGs, and the AMP Markov property to DCGs, as we shall see below.

Table 1 gives an overview of different types of colliders used in the various independence models associated with chain graphs.

For summary graphs and their subclasses, [29] showed that the unifying separation concept was that of -separation, defined as follows. A path is -connecting given if all collider nodes on intersect and all non-collider nodes on are disjoint from . Notice that -separation considers nodes, but the fact that there is no arrowhead pointing to a node that is endpoint of a line in a summary graph implies that every collider section of any walk consists of a single node. For pairwise disjoint subsets , and are -separated by if there are no -connecting paths between and given , and we use the notation to indicate this. The following lemma establishes that for summary graphs (and all subclasses of these), -separation is equivalent to the separation we have defined here. The idea is similar to that employed in [15].

###### Lemma 2.

Suppose that is a summary graph. Then

 A⊥B|C⟺A⊥mB|C.
###### Proof.

We need to show that for , there is a connecting walk between and if and only if there is an -connecting path between and given . If there is an -connecting path between and then there exists a connecting walk between and by taking and add the possible directed path from a collider node on to and its reverse from to .

Thus suppose that there is a connecting walk between and . Since there are no arrowheads pointing to nodes that are endpoints of lines, all collider sections on are single nodes; and hence we can talk of collider nodes instead of sections. Consider the walk between and obtained from by replacing any subwalk of type (for a subwalk ) by a single node subwalk . First of all, it is clear that the resulting walk is a path. Denote this path by . We show that an -connecting path can be constructed from :

It is not possible that a node that occurs (at least once) as a collider on and occurs also as a member of a non-collider section on : If is a collider node on then it is in . This means that there is an arrowhead at on all tripaths with inner node on . Hence, regardless of which two edges of with endpoint are on , the corresponding tripath remains collider.

Therefore, all non-collider nodes on are outside . If all collider nodes are in then we are done. Thus suppose that there is a collider node (on collider tripath ) on that is not in . This means that, on , is always within a non-collider section. Consider an edge on that is a part of the subwalk of , and notice that this edge is not on . The edge is not a line as otherwise there is an arrowhead pointing to an endpoint of a line. As the edge itself has no arrowhead at it must be an arrow from to . Following through from , inductively, we have three cases: 1) There exists a directed cycle, which is impossible. 2) is an ancestor of a collider node : We have that , and hence is an ancestor of . 3) is an ancestor of or : Without loss of generality, assume that . In this case, we modify by replacing the subwalk between and by a directed path from to . Notice that no node on this path is in . This completes the proof. ∎

For MAMPs, [22] provides a generalization of the -separation [19] for AMP chain graphs. In the language and notations of this paper, it is defined a follows: A path is -connecting given ( is our notation) for MAMPs if every collider node on is in and every non-collider node is outside unless there is a subpath of , such that or . We say that and are -separated given , and write , if there is no -connecting path between and given .

###### Lemma 3.

Suppose that is a MAMP. Then

 A⊥B|C⟺A⊥zB|C.
###### Proof.

We need to show that for , there is a connecting walk between and if and only if there is a -connecting path between and given . If there is a -connecting path between and we may construct a connecting walk between and by modifying as follows: 1) for a collider node , add a directed path from to and its reverse from to ; 2) for a non-collider node within in (see the definition of -separation), we distinguish two cases: if then one has by the definition of MAMP and one can shorten the tripath on ; if but exists then add the edge and its reverse to .

Thus suppose that there is a connecting walk between and . Since there are no lines, all sections on are single nodes; and hence we can talk of collider and non-collider nodes instead of sections. Similar to Lemma 2, consider the walk between and obtained from , and whenever there is a node with repeated occurrence in , replace the cycle from to in by a single occurrence of . The resulting walk is a path, denoted by . We show that -connecting path can be constructed from :

The only case where a node is a collider node on and it turns into a non-collider node on is when is the inner node of the tripath on . Therefore, all non-collider nodes on are outside unless this mentioned case occurs. However, in this case either or , which ensures that the condition of the definition of a -connecting path is still satisfied.

If all collider nodes are in then we are done. Thus suppose that there is a collider node (on collider tripath ) on that is not in . This means that, on , is always a non-collider node. There is an arrowhead at on at least one of the or the edges. Without loss of generality, assume that it is the edge. Consider an edge on that is a part of the subwalk of , and notice that this edge is not on . As the edge itself has no arrowhead at and is not a dotted line, it must be an arrow from to . Following through from , inductively, we have three cases: 1) There exists a directed cycle, which is impossible. 2) is an ancestor of a collider node : We have that , and hence is an ancestor of . 3) is an ancestor of or : Without loss of generality, assume that . In this case, we modify by replacing the subwalk between and by a directed path from to . Notice that no node on this path is in . This completes the proof. ∎

We are now ready to show that our concept of separation unifies the independence models discussed.

###### Theorem 2.

Independence models generated by separation in graphs with four types of edges are identical to the independence models associated with the subclasses in Fig. 3.

###### Proof.

It is shown in [29] that -separation, as defined above, unifies independence models for SGs and subclasses thereof and by Lemma 2 -separation is equivalent to our separation. The separation criterion in [28] for CMGs is identical to the separation given here when there are no dotted lines in the graph. Hence, the independence models generated by our separation criterion unifies independence models for all the subclasses of CMGs. Lemma 3 shows that, dotted lines replacing lines in Peña’s separation criterion, it becomes identical to ours. For AADMGs, Criterion 2 defined as the global Markov property in [23] is trivially a special case of the separation defined here. Therefore, our criterion unifies independence models in all subclasses of graphs. ∎

Notice that most of the associated classes of independence models presented in the diagram of Fig. 3 are distinct; exceptions are AGs and SGs, which are alternative representations of the same class of independence models, and the same holds for DGs and UGs. In addition, it can be seen from Table 1 that, for every type of chain graph, one different type of symmetric edge is needed since each of them forms different colliders; hence, the unification for the general class of graphs with four types of edges is not achieved by graphs with three types of edges.

### 3.4 Probabilistic independence models and the global Markov property

Consider a set and a collection of random variables with state spaces and joint distribution . We let etc. for each subset of . For disjoint subsets , , and of we use the short notation to denote that is conditionally independent of given [4; 16], i.e. that for any measurable and -almost all and ,

 P(XA∈Ω|XB=xB,XC=xC)=P(XA∈Ω|XC=xC).

We can now induce an independence model by letting

 ⟨A,B|C⟩∈J(P) if and only if A\,⊥⊥\,B|C w.r.t.\ P.

We note that for a probabilistic independence model , the marginal independence model to a set is the independence model generated by the marginal distribution. More formally, we define the marginal independence model over a subset of the node set as follows:

 α(J,M)={⟨A,B|C⟩:⟨A,B|C⟩∈J and (A∪B∪C)∩M=∅},

which is defined over .

###### Lemma 4.

Let be a probabilistic independence model; its marginal independence model is the independence model generated by the marginal distribution, i.e. for we have

###### Proof.

This is immediate.∎

For a graph , an independence model defined over satisfies the global Markov property w.r.t. a graph , if for disjoint subsets , , and of it holds that

 A⊥B|C⟹⟨A,B|C⟩∈J.

If satisfies the global Markov property w.r.t. a graph , we also say that is Markov w.r.t. . We say that an independence model is probabilistic if there is a distribution such that . We then also say that is faithful to . If is faithful to for a graph then we also say that is faithful to . Thus, if is faithful to it is also Markov w.r.t. .

Probabilistic independence models are always semi-graphoids [24], whereas the converse is not necessarily true; see [33]. If, for example, has strictly positive density, the induced independence model is always a graphoid; see e.g. Proposition 3.1 in [16]. If the distribution is a regular multivariate Gaussian distribution, is a compositional graphoid; e.g. see [35].

Probabilistic independence models with positive densities are not in general compositional; this only holds for special types of multivariate distributions such as, e.g. Gaussian distributions and the symmetric binary distributions used in [41]. However, the following statement implies that it is not uncommon for a probabilistic independence model to satisfy composition:

###### Proposition 1.

If there is a graph to which is faithful, then is a compositional graphoid.

###### Proof.

The result follows from Theorem 1 since then . ∎

## 4 Maximality for graphs

A graph is called maximal if adding an edge between any two non-adjacent nodes in changes the independence model . Notice that in [29] the non-adjacency condition was incorrectly omitted.

Conditions 2 and 3, which MAMPs satisfy (provided in Section 2.2) ensure that MAMPs are maximal; see [22]. However, graphs are not maximal in general. For example, there exist non-maximal ancestral and summary graphs [26; 29]; see also Fig. 5 for an example of a graph that is neither a summary graph (hence it is not ancestral) nor maximal: this CMG induces no independence statements of the form for any choice of : if we condition on or or both, the path is connecting since is a collider section; conditioning on makes the walk a connecting walk, and if we do not condition on anything, the walk is connecting.

The notion of maximality is important for pairwise Markov properties, to be discussed in the next section. For a non-maximal ancestral or summary graph, one can obtain a maximal ancestral or summary graph with the same induced independence model by adding edges to the original graph [26; 29]. As we shall show below, this is also true for general CMGs, but it is not generally the case for graphs containing dotted lines or directed cycles. Fig. 6 displays two small non-maximizeable graphs, where the graph in (a) contains a directed cycle.

For example, in the directed graph of Fig. 6(a), in order to make the graph maximal, one must connect and , and similarly and . Now notice that in the original graph it holds that and . However, after introducing new and edges, regardless of what type of edge we add, one of or does not hold.

To characterise maximal CMGs we need the following notion: A walk is a primitive inducing walk between and () if and only if it is an edge or where for every , , it holds that

(i)

all inner sections of are colliders;

(ii)

endpoint sections of are single elements;

(iii)

.

This definition is an extension of the notion of a primitive inducing path as defined for ancestral graphs in [26]. For example, in Fig. 5, is a primitive inducing walk. Next we need the following lemmas:

###### Lemma 5.

In a CMG, inner nodes of a walk between and that are on a non-collider section are either in or anteriors of a collider section on .

###### Proof.

Let be an inner node of and on a non-collider section on a walk in a CMG . Then from at least one side (say from ) there is no arrowhead on pointing to the section containing . By moving towards on the path as long as , , is on a non-collider section on the walk, we obtain that . This implies that if no is on a collider section then . ∎

###### Lemma 6.

For nodes and in a CMG that are not connected by any primitive inducing walks (and hence ), it holds that .

###### Proof.

Suppose that there is a connecting walk between and given .

If or are on a non-collider inner section on then is contained in since otherwise any other node in would be in , which is impossible. Then contains either only or only since is not an edge in the graph. Thus, is either single or single . In such a case remove the cycle between and (or between and ), which is a subwalk of . Repeat this process until there are no such non-collider sections. Denote the resulting walk by . We shall show that is primitive inducing:

(i) If, for contradiction, there is a node on an inner non-collider section of then, by Lemma 5, is either in or it is an anterior of nodes of a collider section on , but since is connecting given , collider sections intersect and hence are in themselves. (Hence, .) Now, contradicts the fact that is connecting given .

(ii) Unless is a line, endpoint sections of are single elements since they are non-collider on and, if not single elements, their members, excluding or , are in , which is impossible.

(iii) This condition is clear since all inner nodes are in collider sections and consequently (except for possibly or ) in . ∎

###### Lemma 7.

The only primitive inducing walk between and without arrowheads at its endpoints (i.e.  and ) is the line .

###### Proof.

Consider such a walk : Suppose, for contradiction, that there are other nodes other than on , and assume that is the node adjacent to the endpoint on (i.e., there is ). Notice that since otherwise , , for some , and lead to a contradiction.

Then the lack of semi-directed cycles implies that and hence there is another node on . Similarly for adjacent to on , . But we may then construct a semi-directed cycle by taking the edge, the anterior path from to , the edge, and the anterior path from back to , a contradiction. ∎

Next we say that two walks and (including edges) between and are endpoint-identical if there is an arrowhead pointing to the endpoint section containing in if and only if there is an arrowhead pointing to the endpoint section containing in and similarly for . For example, the paths , , and are all endpoint-identical as they have an arrowhead pointing to the section containing but no arrowhead pointing to the section containing on the paths, but they are not endpoint-identical to . We then have the following:

###### Lemma 8.

Let be a CMG with the node set . If there is a primitive inducing walk between and in , and , then a connecting walk between and given exists which is endpoint identical to .

###### Proof.

We denote the sections of the primitive inducing walk by and note that if a section intersects for any set , it holds that . By Lemma 7, it is enough to consider two cases:

Case 1) There is an arrowhead at and no arrowhead at on : First notice that the edge is an arrow from to . We construct an endpoint-identical connecting walk given between and . We start from and move towards on via where . As long as along , a section , , intersects , we do the following: If then we let move to . If but then we let move from to via an anterior path and back to by reversing this path, subsequently continuing to using the corresponding edge from .

So suppose that possibly reaches a section not intersecting . Note that cannot only contain since otherwise it intersects (through ). If only contains then we already have a connecting walk. Hence, the only case that is left is when there is a such that . If then notice that is an anterior of through and , which is impossible. Thus with no nodes on the anterior path in . We can now complete by letting it move to via this anterior path.

Notice that is endpoint-identical to since both have an arrowhead at and no arrowhead at .

Case 2) There is an arrowhead at and an arrowhead at on : We follow the same method as in Case 1 to construct . The only difference is that can be in without being an anterior of . (In fact, and may be on the same section on .) In this case we entirely replace the already constructed part of by the reverse of the anterior path from to (which is from to ), and let proceed to .

Again it is clear that the constructed and have an arrowhead at . If and are not in the same section or are not connected by an undirected path then it is clear that there is an arrowhead at , which is a single-node section on . If and are in the same section or are connected by an undirected path then there is an arrowhead at the endpoint section of that contains .

Next, in Theorem 3 we give a necessary and sufficient condition for a CMG to be maximal. The analogous result for ancestral graphs was proved in Theorem 4.2 of [26].

###### Theorem 3.

A CMG is maximal if and only if does not contain any primitive inducing walks between non-adjacent nodes.

###### Proof.

() Let be a primitive inducing walk between non-adjacent nodes and . By Lemma 8, there is therefore an endpoint-identical connecting walk between and given any choice of ; thus, there is clearly no separation of form . Let us add an endpoint-identical edge to . If a separation is destroyed then the edge is a part of the connecting walk given between and . Now by replacing by on , we clearly obtain a walk in that is connecting given . This implies that adding does not change ; hence, is not maximal.

() By letting for every non-adjacent pair of nodes and and using Lemma 6, we conclude that for every missing edge there is an independence statement in . This implies that is maximal. ∎

It now follows that for maximal graphs, every missing edge corresponds to a pairwise conditional independence statement in :

###### Corollary 2.

A CMG is maximal if and only if every missing edge in corresponds to a pairwise conditional independence statement in .

###### Proof.

() is clear. () follows from Theorem 3 and Lemma 6. ∎

Also, we have the following corollary.

###### Corollary 3.

If is a non-maximal CMG then it can be made maximal by adding edges without changing its independence model.

###### Proof.

We begin with a non-maximal CMG , and show that we can “close” all the primitive inducing walks in order to obtain a maximal graph with the same induced independence model. For every primitive inducing walk between and where in , add an edge that is endpoint-identical to if an edge of the same type does not already exist.

First we show that the resulting graph is a CMG: It is enough to show that an added edge does not generate a semi-directed cycle. By Lemma 7, the added edge is either an arrow or an arc. Since arcs are not on a semi-directed cycle, adding an arc would not generate a semi-directed cycle. Thus suppose that the added edge is an arrow from to . Notice that the adjacent node to on the primitive inducing walk is in and the edge is an arrow from to . Hence, if, for contradiction, the added arrow generates a semi-directed cycle, a semi-directed cycle already existed in the original graph, where is replaced by the anterior walk that consists of the arrow and the anterior walk from to . This is a contradiction.

Now, since the resulting graph does not contain any primitive inducing walks between non-adjacent nodes, it is maximal. In addition, by Lemma 8, there is a connecting walk between and , which is endpoint-identical to the primitive inducing walk. One can replace the endpoint-identical edge to this walk in any connecting walk in that contains as a subwalk. ∎

For example, in Fig. 5, was a primitive inducing walk; hence this graph was not maximal. We may then add the edge and it becomes maximal.

## 5 Pairwise Markov properties for chain mixed graphs

### 5.1 A pairwise Markov property

It is possible to consider a general pairwise Markov property for specific subclasses of graphs with four types of edges (that actually have the four types) by including the results of [22; 23], which define pairwise Markov properties for marginal AMP chain graphs and alternative ADMGs and show the equivalence of pairwise and global Markov properties for such graphs. However, such a unification would be technically complex. Hence, we henceforth focus on CMGs; thus the considerations here concerning pairwise Markov properties do not cover AMP chain graphs.

A pairwise Markov property provides independence statements for non-adjacent pairs of nodes in the graph. For maximal graphs any non-adjacent nodes and are independent given some set , but a pairwise Markov property yields a specific choice of for every non-adjacent pair . The choice we provide here for any CMG immediately extends the choice in [29]. We show that for a maximal CMG, this pairwise Markov property is equivalent to the global Markov property for compositional graphoid independence models; in other words, the pairwise statements combined with the compositional graphoid axioms generate the full independence model. The maximality is critical for the pairwise statements to hold, as discussed above.

An independence model defined over satisfies the pairwise Markov property (P) w.r.t. a CMG if for every pair of nodes and with it holds that

 (P):⟨i,j|ant({i,j})⟩∈J.

The pairwise Markov property simplifies for specific subclasses of graphs. For connected UGs we have and hence the standard pairwise Markov property appears; and for BGs we have , so the property is identical to pairwise independence of non-adjacent nodes. For SGs and AGs (which include DAGs), a semi-direction preserving path is of the form , hence the anterior path (and consequently (P)) specializes to those in [29] and [26] respectively.

Strictly speaking, the unification only contains “connected” UGs. It is not possible to extend the unification to all UGs and at the same time keep the pairwise Markov properties defined in the literature for other classes under any unified pairwise Markov property: In principle, it is fine to add nodes that are not in the connected component(s) of and to the conditioning set in any pairwise Markov property. However, although the well-known pairwise Markov property for UGs contains all such nodes, the known pairwise Markov properties for other classes do not.

### 5.2 Equivalence of pairwise and global Markov properties

Before establishing the main result of this section, we need several lemmas. We shall need to consider marginalization of independence models and use that it preserves the compositional graphoid property, shown in Lemma 8 of [29]:

###### Lemma 9.

Let be a compositional graphoid over a set and a subset of . It then holds that the marginal independence model is also a compositional graphoid.

Moreover, we have

###### Lemma 10.

Let be the independence model induced by a CMG and . If is an anterior set, the marginal model is determined by the induced subgraph :

 α(J(G),M)=J(G[D]).
###### Proof.

We need to show that for we have that if and only if this is true in the induced subgraph . Clearly, if a connecting walk between and runs entirely within it also connects in . Assume for contradiction that there is a connecting walk which has a node outside and consider an excursion on the walk that leaves at , reaches , and reenters into at . Since the walk is connecting, there are no collider sections on this excursion and thus it follows from Lemma 5 that is either anterior to or to , which contradicts the fact that is an anterior set. ∎

The following important lemma and its corollary imply that for any non-adjacent pair in a maximal CMG we can always find an -separator with .

###### Lemma 11.

For a pair of distinct nodes and and a subset of the node set in a maximal CMG, if for , then there is a node in such that .

###### Proof.

Let be arbitrary. If there is an so that but , then replace by , and repeat this process until it terminates, which is ensured by the transitivity of semi-directed walks and the lack of semi-directed cycles in the CMG. Call the final node . Thus, if for then we also have that . The lack of semi-directed cycles implies that this is equivalent to and being connected by lines.

We now claim that . Suppose, for contradiction, that there is a connecting walk between and given . If is not on then is also connecting given . In addition, we have that is on a non-collider section on . There is no arrowhead at from at least one side of the section, say from the side. We move towards on and denote the corresponding subwalk of by . As long as , , is on a non-collider section on , we obtain that there is a semi-directed walk from to . This implies that if no is on a collider section then there is an anterior walk from to , which is impossible.

Therefore, by moving towards from , we first reach an on that lies on a collider section and is in . Transitivity of anterior walks and the fact that there is no anterior walk from to or now imply that there is no anterior walk from to or . The construction of implies that and are on the same section, and hence is not on a non-collider section on , a contradiction. Hence we conclude that . ∎

###### Corollary 4.

For a pair of nodes and and a subset of the node set in a maximal CMG, if , then .

###### Proof.

Lemma 11 implies that we can repeatedly remove single nodes in and preserve separation to obtain that . This concludes the proof. ∎

A direct implication of Lemma 6 and Theorem 3 establishes that the induced independence model for a maximal CMG satisfies the pairwise Markov property (P):

###### Proposition 2.

If are non-adjacent nodes in a maximal CMG , it holds that .

Finally we are ready to show the main result of this section.

###### Theorem 4.

Let be a maximal CMG. If an independence model over the node set of is a compositional graphoid, then satisfies the pairwise Markov property (P) w.r.t.  if and only if it satisfies the global Markov property w.r.t. .

###### Proof.

That the global Markov property implies the pairwise property (P) follows directly from Proposition 2.

Now suppose that satisfies the pairwise Markov property (P) and compositional graphoid axioms. For subsets , </