Mutual Conditional Independence and Its Applications to Inference in Markov Networks

Mutual Conditional Independence and Its Applications to Inference in Markov Networks

\nameNiharika Gauraha \emailniharika.gauraha@gmail.com
Indian Statistical Institute
8th Mile, Mysore Road Bangalore, India
Abstract

The fundamental concepts underlying in Markov networks are the conditional independence and the set of rules called Markov properties that translates conditional independence constraints into graphs. In this article we introduce the concept of mutual conditional independence relationship among elements of an independent set of a Markov network. We first prove that the mutual conditional independence property holds within the elements of a maximal independent set afterwards we prove equivalence between the set of mutual conditional independence relations encoded by all the maximal independent sets and the three Markov properties(pair-wise, local and the global) under certain regularity conditions. The proof employs diverse methods involving graphoid axioms, factorization of the joint probability density functions and the graph theory. We present inference methods for decomposable and non-decomposable graphical models exploiting newly revealed mutual conditional independence property.

Mutual Conditional Independence and Inference in Markov NetworksNiharika \firstpageno1

\editor{keywords}

Markov Networks, Mutual Conditional Independence, Graphical Models

1 Introduction

A Markov network is a way of specifying conditional independence constraints between components of a multivariate distribution. Markov properties are the set of rules that determine how conditional independence constraints is translated into a graph. For details on Markov networks we refer the reader to Lauritzen (1996) and Jordan (2004). The three Markov properties usually considered for Markov networks are pairwise, local and the global Markov properties. These Markov properties are equivalent to one another for positive distributions, for details on equivalence of Markov properties see Matus (1992).

In an undirected graph, an independent set consists of mutually non-adjacent vertices or equivalently the elements of an independent set are mutually separated by the rest. For example let be an undirected graph and let be an independent set of G then the vertices are mutually separated by .

We extend the notion of similarity between separation in graph and conditional independence in probability to similarity between the mutual separation in graph and the mutual conditional independence in probability. The proof involves various methods from different disciplines; graphoid axioms, probability theory and the graph theory.

Using graph theoretic concepts, we first prove that All the Maximal Independent Sets(AMIS) uniquely determine the graph. There is one-to-one relationship between graphs and AMIS. Then by applying probability theory concepts we prove that Mutual Conditional Independence Property(MCIP) holds among the elements of a maximal independent set. Since for any Markov network there will be a unique set of AMIS and hence a unique set of mutual conditional independence relations. Considering all mutual conditional independence relations obtained by AMIS, we derive an alternative formulation for the three Markov properties. Then we prove equivalence between the mutual conditional independence property and Markov properties, under positive distribution assumption.

Finally we shift our focus to the problem of probabilistic inference in Markov networks. In a multivariate set up, inference is the problem of computing a conditional probability distribution for the set of components where the values for some of the components are given, for details on inference on graphical models we refer to Wainwright and Jordan (2008).

We introduce inference methods that take the MCIP of the model into account. It provides quick answers to the queries by filtering on specific criteria. For example let G be any Markov network graph, let U represent the set of unobserved components and O represent the set of observed components or evidence and we wish to compute conditional probability of U given O. The simplest possible inference is that suppose U forms an independent set and corresponding separator set is same as set of evidence O, where V is a set of vertices then it is straight forward to conclude that given the set of evidence O the elements of U are mutually conditionally independent. We note that the time complexity for checking whether a set of vertices form an independent set is linear in terms of number of vertices in the graph.

Our approach of the inference based on MCIP will be very useful for non-decomposable models where a closed form solution does not exist and for applications where it is more desirable to examine relationship among components than computing probabilities such as analysis of categorical data and gene expression arrays.

The discussion below is organized as follows. In Section 2 we start with brief overview and mathematical foundations of the theory of Markov networks. Section 3 is concerned with proving that MCIP holds within the elements of an independent set. Section 4 involves deriving the global Markov property using the MCIP and proving equivalence between them. In section 5 we discuss some applications of mutual conditional independence relations in terms of statistical inferences for decomposable and non-decomposable graphical models. In section 6 we give computational details that we used for statistical inferences. In Section 7 we conclude and discuss future scope and applicability of MCIP.

2 Overview and Mathematical Foundations

This section gives a general overview and mathematical foundations of Markov networks.

A graphical model is a technique for representation of the conditional independencies between variables in a multivariate probability distribution. The nodes or vertices in the graph correspond to random variables. The absence of an edge between two random variables denotes a conditional independence relation between them. In the literature several classes of graphs with various conditional independence interpretations have been described. Undirected graphical models (Markov Network) and directed acyclic graphs based graphical models(Bayesian Networks) are the most commonly known graphical models. In this article we only consider undirected graphical models, also known as Markov random fields or Markov networks. For details on the foundation of the theory of Markov networks we refer to Lauritzen (1996) , Whittaker (1990), Preston (1974) andSpitzer (1971).

2.1 Graph Theory

This section provides necessary concepts and definition of the graph theory that we will be using in later sections. For details on graph theory for graphical models we refer to Lauritzen (1996).

Notations

A graph G, is a pair G = (V, E), where V is a set of vertices and E is a set of edges. {definition}[Undirected Graphs] A graph is said to be an undirected graph if its vertices are connected by undirected edges. We consider only simple graph that has neither self loops nor multiple edges. {definition}[Maximal Independent set] An independent set of a graph G is a subset S of nodes such that no two nodes in S are adjacent. An independent set is said to be maximal if no node can be added to S without violating independent set property.

{theorem}

[Uniqueness of Maximal Independent Sets] Given the complete list of maximal independent sets of a graph, the graph is uniquely determined. {proof} Let be the complete list of the maximal independent sets of nodes of a graph, then we have to show that the node set V is the union and the edge set E consists of all unordered pairs of distinct elements of V such that is not contained in any of the .

Clearly V contains the above union. Conversely, if then is an independent set and hence is contained in some maximal independent set and then and hence belongs to the union. This determines V.

Also it is clear that any edge (where x, y are distinct elements of V) is not contained in any . Conversely, if is a pair of nodes which is not an edge, then is an independent set and hence is contained in some . Thus edge set E is also uniquely determined, as stated.

2.2 Conditional Independence

In this section we define conditional independence in probability and Markov properties for Markov networks.{definition}[Conditional Independence] If are random variables with joint distribution P. Random variables X and Y are said to be conditionally independent given the random variable Z if following holds.

 X\scalebox{1.07}{⊥⊥}Y∣Z ⟺P(X,Y∣Z)=P(X∣Z)P(Y∣Z) ⟺p(X∣Y,Z)=p(X∣Z)

2.3 Properties of Conditional Independence(graphoid axioms)

We define some properties of conditional independence in terms of graphoid axioms as follows.

 (1) Decomposition: X\scalebox{1.07}{⊥⊥}(Y∪W)∣Z⟹X\scalebox{1.07}% {⊥⊥}Y∣Z (2) (3) Contraction: X\scalebox{1.07}{⊥⊥}(Y)∣Z and X\scalebox{1.07}{⊥⊥}(W)∣(Z∪Y)⟹X\scalebox% {1.07}{⊥⊥}(Y∪W)∣(Z) (4) Intersection: X\scalebox{1.07}{⊥⊥}(Y)∣(Z∪W) and X\scalebox{1% .07}{⊥⊥}(W)∣(Z∪Y)⟹X% \scalebox{1.07}{⊥⊥}(Y∪W)∣(Z) (5)

An alternative set of complete axioms we refer to Geiger and Pearl (1993).

{definition}

[semi-graphoid and graphoid] A semi-graphoid is a dependency model which satisfies . If also Eq holds it is called a graphoid. {definition}[ Probabilistic graphoid ] In probability, Conditional independence defined as

 P(X,Y∣Z)=P(X∣Z)

is a semi-graphoid. when P is strictly positive conditional independence becomes a graphoid. {definition}[ Graph Separation as graphoid] Graph separation in undirected graph satisfies graphoid axioms. For details we refer to Lauritzen (1996) and Dawid (1979).

2.4 Markov Properties of Undirected Graphs

In this sections we define the following three Markov properties for Markov networks. Let be an undirected graph and be a probability distribution over G. {definition}[(P) Pairwise Markov Property] The probability distribution P satisfies the pairwise Markov property for the graph G if for every pair of non adjacent edges X and Y , X is independent of Y given everything else.

 X\scalebox{1.07}{⊥⊥}Y∣(V∖X,Y)
{definition}

[(L) Local Markov Property] The probability distribution P satisfies the local Markov property for the graph G if every variable X is conditionally independent of its non-neighbours in the graph, given its neighbours.

 X\scalebox{1.07}{⊥⊥}(V∖X∪bd(X))∣bd(X)

where bd(X) denotes boundary of X. {definition}[(G) Global Markov Property] The probability distribution P, is said to be global Markov with respect to an undirected graph G if, for any disjoint subsets of nodes A, B, C such that C separates A and B on the graph, if and only if the distribution satisfies

 A\scalebox{1.07}{⊥⊥}B|C
{proposition}

Let G be an undirected graph. A probabilistic independence model that satisfies semi-graphoid axioms with respect to G, the following holds. For proof we refer Lauritzen (1996).

 (G)⟹(L)⟹(P)
{proposition}

Let G be an undirected graph. A probabilistic independence model that satisfies graphoid axioms with respect to G, the following holds. For proof we refer Pearl (1988) and Dawid (1979).

 (G)⟺(L)⟺(P)

2.5 Markov Network Graphs and Markov Network

After discussing the graph theory, conditional independence and Markov properties for undirected graphs, now we are ready to define Markov network graphs and Markov networks.

{definition}

[Markov Network Graph] A Markov network graph is an undirected graph G = ( V, E ) where corresponds to random variables of a multivariate distribution.

{definition}

[Markov Network] A Markov network is a tuple where G is a Markov network graph, is a set of non negative functions for each clique and the joint pdf can be decomposed into factors as

 P(x)=1Z∏a∈Cmψa(x)

where Z is a normalizing constant.

{theorem}

[Hammersley-Clifford theorem] Let be a Markov network. Let probability density function of the distribution of be strictly positive. X satisfies global Markov property with respect to graph G if and only if it factorizes as follows.

 P(x)=∏a∈Cmψa(x)

where are the maximal cliques of G and depends on x through only.

It follows from the above discussion that if a strictly positive probability distribution factorizes with respect to G then it also satisfies all Markov properties(pair-wise,local and global) w.r.t. G.

3 Mutual Conditional Independence

In this section we prove that the elements of an independent set are mutually conditionally independent given the rest. {theorem}[Mutual Conditional Independence in Markov networks] Let G be a Markov network graph and P(X) is a strictly positive probability which supports the conditional independences relations required to satisfy pairwise, local, and the global Markov property for G, then elements of an independent set I of G are mutually conditionally independent given the rest .

{proof}

Let be an independent set of G. Since are mutually non-adjacent, when we condition on or equivalently when we remove the nodes from G, the remaining vertices are disconnected which implies in probability complete independence among vertices of I.

Since form independent set they belong to separate cliques say , for , where is a maximal clique in G. Without loss of generality we can assume that there are exactly k maximal cliques. From Theorem(Hammersley-Clifford theorem) the P factorizes as follows.

 P=ψ1(X1,Y1)ψ2(X2,Y2)...ψk(Xk,Yk)

where are the sets of nodes that connects two or more and each forms a maximal clique in G. It can be noted that can be empty in case of a disconnected graph and also union of .
The conditional probability can be expressed as

 P(I∣(Y1=y1,...,Yk=yk) =ψ1(X1,y1)ψ2(X2,y2)...ψk(Vk,kk) P(I∣V∖I) =ϕ1(X1)ϕ2(X2)...ϕk(Xk)

Hence are mutually conditionally independent given .

4 Mutual Conditional Independence and the Markov Properties

In this section we represent an alternative way to derive conditional independence relations required for satisfying the Markov properties of Markov networks. Specifically we prove equivalence between MCIP and pairwise Markov property and from proposition(12) it follows for other Markov properties(Local and the global). {theorem}[Equivalence of MCIP and Markov properties] Let G be a Markov network graph and let P be a strictly positive probability distribution which satisfies mutual conditional independence relations implied by maximal independent sets of G. Then conditional independence relations required to satisfy pairwise, local, and the global Markov properties for G also holds in P.

{proof}

We prove equivalence of MCIP and pair-wise Markov property. Then under assumption of positive distribution MCIP is equal to local and the global Markov property as well.

Let be the complete list of the maximal independent sets of nodes of G then as stated before and .

If conditional independence relations required to satisfy pairwise Markov property holds in P w.r.t. G, then elements of are mutually conditionally independent conditioned on by Theorem 16. So we have proved as

 pairwise Marko Property ⟹MCIP

We must recall that the Mutual conditional independence implies pair-wise conditional independence. Since elements of a are mutually conditionally independent given the , therefore they are also pairwise independent given .

Now let us Suppose that is a pair of nodes which is not an edge, then is an independent set and hence is contained in some and hence pairwise independent given the rest. Therefore

 pairwise Marko Property ⟺MCIP

We illustrate the proof by an example as follows. Let us consider the Markov network as given in figure (1).

All Maximal independent sets for graph G are as :

 S={{A,C,F},{A,C,G},{A,E},{B,D,F},{B,D,G},{B,E}}

Let us consider the first maximal independent sets and let us suppose that MCIP holds which implies that A, C, F are mutually independent given rest of the random vectors B,D,E,G.
Or equivalently independence relation can be expressed as

 A\scalebox{1.07}{⊥⊥}C% \scalebox{1.07}{⊥⊥}F∣(B,D,E,G)

Applying weak union graphoid axiom (Equation (3)) to the above independence relation we get

 A\scalebox{1.07}{⊥⊥}F∣(B,D,E,G)∪C C\scalebox{1.07}{⊥⊥}F∣(B,D,E,G)∪A A\scalebox{1.07}{⊥⊥}C∣(B,D,E,G)∪F

Applying similarly arguments for the other set of maximal independent set we can show that for every non-adjacent pair

 x\scalebox{1.07}{⊥⊥}y∣V∖{x,y}

which is also a definition of pair-wise Markov property.

Conversely given pair-wise Markov property we have to show that MCIP holds. From theorem (Hammersley-Clifford theorem), it is clear that under positive distribution assumption P satisfies pairwise Markov property with respect to graph G if and only if it factorizes as follows.

 P(x)=ψ1(A,B)ψ2(A,D)ψ3(B,C)ψ4(C,D,E)ψ(E,F,G)

Let us consider probability of (A,C,F) conditioned on (B,D,E,G), we obtain conditional probability as

 P(A,C,F∣B=b,D=d,E=e,G=g) =ϕ11(A,b)ϕ12(A,d)ϕ21(C,b)ϕ22(C,d,e)ϕ3(F,e,g) =ϕ1(A)ϕ2(C)ϕ3(F)

From above factorization of pdf it follows that (A,C,F) are mutually independent conditioned on (B,D,E,G).

Similarly we can show mutual conditional independence relations for the remaining maximal independent sets. Hence it follows as

 MCIP⟺ pair-wise Markov property

Applying proposition 12 we get following equivalence relation that completes the proof.

 MCIP⟺P⟺L⟺G

5 Applications and Illustrations

In the following, we illustrate applications of MCIP for inference in Markov networks for discrete and continuous data set.

5.1 Inference in Graphical Log-linear Models

First we consider the discrete data set, the Reinis data taken from the ”GRbase” R package(Risk factors for coronary heart disease, for details on Reinis dataset see Reinis et al. (1981)).

{example}

[Decomposable Graphical Model for Rienis Dataset:] The Reinis data is shown in the table (1).

Using stepwise model selection for Reinis data, the best decomposable model we get is as given in figure (2) with following and test statistics. We use Wermuth’s backward elimination algorithm, for details see Wermuth (1976).

 X2 =51.11705 G2 =51.35869 df =46 X2<<χ2(.95,46) =62.8, Hence the data supports the model selected.

We note that the variable set {phys, systol,family} forms an independent set as per the Markov network in figure (2).

Now we derive a closed form expression for expected count using MCIP as follows.

Let represent the random variables Family, Protein, Systol, Phys, Smoke , Mental respectively.

 P(X1,X3,X4∣X2,X5,X6) =P(X1,X2,X3,X4,X5,X6)P(X2,X5,X6) the joint pdf can be expressed as P(X1,X2,X3,X4,X5,X6) =P(X1,X3,X4∣X2,X5,X6)∗P(X2,X5,X6) sinceX1,X3,X4 are mutually independent conditioned on X2,X5,X6 henceP(X1,X3,X4∣X2,X5,X6) can be factorized as P(X1,X3,X4∣X2,X5,X6) =P(X1∣X2,X5,X6)∗P(X3∣X2,X5,X6)∗P(X4∣X2,X5,X6)

the joint pdf can be written as

 P(X1,X2,X3,X4,X5,X6)=P(X1∣X2,X5,X6)∗P(X3∣X2,X5,X6)∗P(X4∣X2,X5,X6)∗P(X2,X5,X6)
 =(X1,X2,X5,X6)∗P(X3,X2,X5,X6)∗P(X4,X2,X5,X6)P(X1,X2,X5,X6)2

After simplification we get following closed form expression for the maximum likelihood estimator of the expected cell counts. For details on computing closed form expressions for the expected cell counts for decomposable log-linear models, see Bishop et al. (1989 edition).

 ^mijklmn =nij..mnn.jk.mnn.j.lmnn2.j..mn where nijklmn = observed count in cell (i,j,k,l,m,n) ^mijklmn = Maximum Likelihood Estimate of the expected cell count mijklmn

Under the mutual conditional independence assumption the table of fitted values is given in the table(2).

To test if the above model holds, we perform Perason’s chi-square test. We use the follwing formula.

 X2=∑i(Oi−Ei)2Ei

where O denotes observed cell count and E as expected cell count.
The following test statistcs we get

 X2 =35.01022 df =46 χ2(.95,46) =62.8

As per the Chi-Squared test, the data supports the mutual conditional independence among {phys, systol,family} conditioned on {phys, systol,family}. For details on graphical log-linear model we refer the reader to Christensen (1997), and Bishop et al. (1989 edition).

{example}

[Non-Decomposable Graphical Model for Rienis Dataset:] Let us consider the Reinis data once again. We get the best non-decomposable graphical model as given in the figure (3) with following and test statistics.

 X2 =61.87653 G2 =62.84262 df =49 X2<χ2(.95,49) =66.3, Hence the data supports the model selected .

We notice that in this model the following mutual conditional relation holds.

 (phys\scalebox{1.07}{⊥⊥}systol\scalebox{1.07}{⊥⊥}family)∣(phys,systol,family).

Since the above relation is same as relation we got for decomposable model, hence the factorization of joint pdf, getting closed form expression for expected cell counts, chi-square test for model testing are the same as decomposable model as computed previously in example 1.

We also note that the closed form expression we get for estimated cell counts for non-decomposable graphical model must be same as one of the decomposable models. Hence MCIP can be directly used to get closed form estimates for non-decomposable graphical models without converting it to decomposable graphical models.

5.2 Inference in Gaussian Graphical Models

In this section, we illustrate application of MCIP for inference in Gaussiam graphical models(GGM). We consider the ”seeds” dataset which is available at UCI machine learning repository
https://archive.ics.uci.edu/ml/datasets/seeds. For details on GGM we refer to some selected classical and recent research papers Dempster (1972), Mohan et al. (2014), Tan et al. (2014) and Janzamin and Anandkumar (2014) {example}[A Decomposable Model for the Seeds dataset] The best decomposable graphical model we get using stepwise selection method is given in figure(4).

As per the graph in figure (4), the vertex set forms an independent set. The following conditional test results also supports that the variables are pair wise conditionally independent conditioned on the rest .

Test:
Chi-Square test statistic : df: p-value:

Test
Chi-Square test statistic: df: p-value:

Test
Chi-Square test statistic: df: p-value:

For normal variables, zero correlation implies independence and pairwise independence implies mutual independence. Hence the variables are also mutually conditionally independent conditioned on . Equivalently it can be expressed as

 V1\scalebox{1.07}{⊥⊥}V4%\scalebox1.07$⊥⊥$V6∣V2,V3,V5,V7

6 Computational details

All the experimental results in this paper were carried out using R 3.1.3 with the packages gRim and MASS. All packages used are available at http://CRAN.R-project.org/.

7 Conclusion

In summary, we discussed different Markov properties for the class of Markov networks. We derived an alternative formulation of the Markov properties of Markov networks. We have given a new perspective on conditional independence over an independent set as mutual conditional independence. We have proved equivalence between MCIP and the Markov properties, under positive distribution assumption. We have presented MCIP based approach for inference. The experimental results are carried out for the proposed MCIP based approach for inferences on discrete and continuous datasets. MCIP can be a promising new direction for model selection and inference in Markov networks.

References

• Bishop et al. (1989 edition) Y.M.M. Bishop, S.E. Fienberg, and P.W. Holland. Discrete Multivariate Analysis: Theory and Practice. The MIT Press, Cambridge, MA, 1989 edition.
• Christensen (1997) R. Christensen. Log-Linear Models and Logistic Regression. Springer, 2nd edition edition, 1997.
• Dawid (1979) A. P. Dawid. Conditional independence in statistical theory. Journal of the Royal Statistical Society, 41(1):1–31, 1979.
• Dempster (1972) A. Dempster. Covariance selection. Biometrics, 28:157–175, 1972.
• Geiger and Pearl (1993) Dan Geiger and Judea Pearl. Logical and algorithmic properties of conditional independence and graphical models. The Annals of Statistics, 24(4):2001–2021, 1993.
• Janzamin and Anandkumar (2014) Majid Janzamin and Animashree Anandkumar. High-dimensional covariance decomposition into sparse markov and independence models. Journal of Machine Learning Research, 15:1549–1591, 2014.
• Jordan (2004) Michael I. Jordan. Graphical models. Statistical Science, 19(1):140–155, 2004.
• Lauritzen (1996) S. L Lauritzen. Graphical Models. Oxford University Press Inc., New York, 2nd edition edition, 1996.
• Matus (1992) F. Matus. On equivalence of markov properties over undirected graphs. Journals of Applied Probability, 29:745–749, 1992.
• Mohan et al. (2014) Karthik Mohan, Palma London, Maryam Fazel, Daniela Witten, and Su-In Lee. Node-based learning of multiple gaussian graphical models. Journal of Machine Learning Research, 15:445–488, 2014.
• Pearl (1988) Judea Pearl. Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann Publishers Inc. San Francisco, 1988.
• Preston (1974) C. Preston. Random fields. Berlin, Germany: Springer-Verlag, 1974.
• Reinis et al. (1981) Z. Reinis, J. Pokorny, V. Basika, J. Tiserova, K. Gorican, D. Horakova, E. Stuchlikova, T.and Havranek, and F Hrabovsky. Prognostic significance of the risk profile in the prevention of coronary heart disease. Bratis. lek. Listy, 76:137–150, 1981.
• Spitzer (1971) F. Spitzer. Random fields and interacting particle systems. M.A.A.Summer Seminar Notes, 1971.
• Tan et al. (2014) Kean Ming Tan, Palma London, Karthik Mohan, Su-In Lee, Maryam Fazel, and Daniela Witten. Learning graphical models with hubs. Journal of Machine Learning Research, 15:3297–3331, 2014.
• Wainwright and Jordan (2008) Martin J. Wainwright and Michael I. Jordan. Graphical Models, Exponential Families, and Variational Inference. Foundations and Trends in Machine Learning, 2008.
• Wermuth (1976) Nanny Wermuth. Model search among multiplicative models. Biometrics, 32:253–263, 1976.
• Whittaker (1990) J. Whittaker. Graphical Models in Applied Multivariate Statistics. Chichester: Wiley, 2nd edition edition, 1990.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters