Bayesian inference for Gaussian graphical models beyond decomposable graphs

Bayesian inference for Gaussian graphical models beyond decomposable graphs

Kshitij Khare, University of Florida, USA
Bala Rajaratnam, Stanford University, USA
Abhishek Saha, University of Florida, USA
Abstract

Bayesian inference for graphical models has received much attention in the literature in recent years. It is well known that when the graph is decomposable, Bayesian inference is significantly more tractable than in the general non-decomposable setting. Penalized likelihood inference on the other hand has made tremendous gains in the past few years in terms of scalability and tractability. Bayesian inference, however, has not had the same level of success, though a scalable Bayesian approach has its respective strengths, especially in terms of quantifying uncertainty. To address this gap, we propose a scalable and flexible novel Bayesian approach for estimation and model selection in Gaussian undirected graphical models. We first develop a class of generalized -Wishart distributions with multiple shape parameters for an arbitrary underlying graph. This class contains the -Wishart distribution as a special case. We then introduce the class of Generalized Bartlett (GB) graphs, and derive an efficient Gibbs sampling algorithm to obtain posterior draws from generalized -Wishart distributions corresponding to a GB graph. The class of Generalized Bartlett graphs contains the class of decomposable graphs as a special case, but is substantially larger than the class of decomposable graphs. We proceed to derive theoretical properties of the proposed Gibbs sampler. We then demonstrate that the proposed Gibbs sampler is scalable to significantly higher dimensional problems as compared to using an accept-reject or a Metropolis-Hasting algorithm. Finally, we show the efficacy of the proposed approach on simulated and real data.

Keywords: Gaussian graphical models, Gibbs sampler, Generalized Bartlett graph, Generalized G-Wishart distribution, Scalable Bayesian inference

1 Introduction

Gaussian graphical models have found widespread use in many application areas. Besides standard penalized likelihood based approaches (see Khare et al. (2015) and references therein), Bayesian methods have also been proposed in the literature for analyzing undirected Gaussian graphical models (see Asci and Piccioni, 2007; Dawid and Lauritzen, 1993; Letac and Massam, 2007; Mitsakakis et al., 2011; Rajaratnam et al., 2008; Roverato, 2000, 2002; Wang and Carvalho, 2010, to name just a few). Bayesian methods have the distinct and inherent advantage that they can incorporate prior information and yield a full posterior for the purposes of uncertainty quantification (and not just a point estimate), whereas standard frequentist approaches for uncertainty quantification (such as the bootstrap) may be computationally burdensome and/or break down in high dimensional settings. However, it is well known that Bayesian methods for graphical models in high dimensional settings lag severely behind their regularized likelihood based counterparts, in the sense that they are not scalable except under restrictive assumptions on the underlying sparsity pattern (such as for decomposable graphs). Hence a scalable and more general approach to graphical models, with theoretical and computational safeguards, is critical to leveraging the advantages of posterior inference.

To outline the issues with current Bayesian methods more clearly, consider i.i.d. vectors drawn from a -variate normal distribution with mean vector and a sparse inverse covariance matrix . The sparsity pattern in can be encoded in terms of a graph on the set of variables as follows. If the variables and do not share an edge in , then . Hence, an undirected (or concentration) graphical model corresponding to restricts the inverse covariance matrix to a submanifold of the cone of positive definite matrices (referred to as ). A Bayesian statistical analysis of these models requires specification of a prior distribution (supported on ) for . Dawid and Lauritzen (1993) introduced a class of prior distributions for called the Hyper Inverse Wishart (HIW) distributions. The induced class of prior distributions for (supported on ) is known as the class of -Wishart distributions (see Roverato (2000)). This class of prior distributions is quite useful and popular, and has several desirable properties, including the fact that it corresponds to the Diaconis-Ylvisaker class of conjugate priors for the concentration graph model corresponding to the graph .

Closed form computations of relevant quantities corresponding to the -Wishart distribution, such as expected value of the precision matrix and quantiles, are in general available only if the underlying graph is decomposable, i.e., does not have any induced cycle of length greater than or equal to 4. A variety of approaches have been developed in the literature to generate samples from the -Wishart distribution corresponding to a general non-decomposable graph. Asci and Piccioni (2007) have developed a maximal clique based Markov Chain Monte Carlo (MCMC) approach to sample from the -Wishart distribution corresponding to a general graph . Lenkoski (2013) develops a direct sampler for -Wishart distributions corresponding to a general graph . This approach uses an iterative algorithm to minimize an objective function over the space of positive definite matrices with appropriate sparsity constraints. Wang and Carvalho (2010) have developed an accept-reject algorithm to generate direct samples from the -Wishart distribution corresponding to a general graph . Mitsakakis et al. (2011) have developed a Metropolis-Hastings based MCMC approach for the same.

While the -Wishart prior is clearly very useful for Bayesian inference in graphical models, it has an important drawback. In particular, the -Wishart distribution has only one shape parameter, which makes it potentially inflexible and restrictive in terms of prior specification. Letac and Massam (2007) address this issue by constructing the so-called and families of distributions which are flexible in the sense that they have multiple shape parameters. These distributions include the -Wishart as a special case, and form a standard conjugate family of prior distributions for undirected decomposable graphical models. The construction of the Letac and Massam distributions uses the structure associated with decomposable graphs. It would thus be useful to develop a class of prior distributions which is flexible (multiple shape parameters) and leads to tractable Bayesian inference for non-decomposable graphs.

In this paper, we aim to develop a scalable and flexible Bayesian approach for estimation and model selection in Gaussian undirected graphical models for general graphs. Our approach preserves the attractive properties of previous approaches, while overcoming their drawbacks. We first develop a class of generalized -Wishart distributions (for an arbitrary underlying graph), which has multiple shape parameters and contains the -Wishart distributions as a special case. These distributions form a family of standard conjugate prior distributions for Gaussian concentration graph models. Developing methods for efficient posterior draws from generalized -Wishart distributions is crucial for scalable Bayesian inference. We proceed to introduce the class of Generalized Bartlett (GB) graphs, and derive an efficient Gibbs sampling algorithm (with Gaussian or GIG conditionals) to simulate from generalized -Wishart distributions corresponding to a GB graph. The class of Generalized Bartlett graphs contains decomposable graphs as a special case, but is substantially larger than the class of decomposable graphs. For example, any cycle of length greater than is Generalized Bartlett, but is not decomposable. Our approach has the flexibility of using multiple shape parameters (as opposed to the single parameter -Wishart), but goes beyond the class of decomposable graphs without losing tractability.

For the generalized -Wishart case, the conditional densities for any maximal clique of are intractable to sample from. Hence, the sampling approaches in (Asci and Piccioni, 2007; Lenkoski, 2013) for -Wisharts on a general graph do not extend to the generalized -Wishart. On the other hand, we show that the accept-reject and Metropolis-Hastings based methods in Wang and Carvalho (2010) and Mitsakakis et al. (2011) can be easily extended to the generalized -Wishart case. We compare the performance and scalability of these two approaches with our Gibbs sampler in Section 7.2.1 and Section 7.2.2.

The rest of the paper is organized as follows. Section 2 contains a brief overview of relevant concepts from graph theory and matrix theory. In Section 3 and Section 4, we define generalized -Wishart distributions and GB graphs respectively, and establish some basic properties. In Section 5, we derive a tractable Gibbs sampling algorithm to simulate from the generalized -Wishart distribution corresponding to a GB graph. Section 6 provides additional examples and properties of GB graphs. Section 7 contains a comprehensive simulation and real data analysis study for the Bayesian approach developed in the paper. The proofs of most of the technical results in the paper and additional numerical work are provided in the Supplemental Document.

2 Preliminaries

2.1 Graph theoretic preliminaries

For any positive integer , let . Let denote an undirected graph, where represents the finite vertex set and denotes the corresponding edge set. A function is defined to be an ordering of if is a bijection from to . An undirected graph and an ordering of can be used to construct an ordered graph , where if and only if .

Definition 1.

An undirected graph is called decomposable if it does not have a cycle of length greater than or equal to as an induced subgraph.

Such graphs are also called triangulated, or chordal graphs. A useful concept associated to decomposable graphs is that of a perfect elimination ordering (see Lauritzen (1996)).

Definition 2.

An ordering for an undirected graph is defined to be a perfect elimination ordering if for each , the set forms a clique.

In fact, an undirected graph is decomposable if and only if it has a perfect elimination ordering (see Paulsen et al. (1989)).

Definition 3.

For a given undirected graph , is called a decomposable cover of if is decomposable and .

Decomposable covers are also known as triangulations in graph theory literature (see Parraal and Schefflerb (1997)).

2.2 Matrix theoretic preliminaries

We denote the set of symmetric matrices by , and the space of positive definite symmetric matrices by . Given an ordered graph , we define

and

The space is a submanifold of the space of positive definite matrices, where the elements are restricted to be zero whenever the corresponding edge is missing from . Similarly the space is a subspace of lower triangular matrices with diagonal entries equal to , such that the elements in the lower triangle are restricted to be zero whenever the corresponding edge is missing from .

A positive definite matrix can be uniquely expressed as , where is a lower triangular matrix with diagonal entries equal to , and is a diagonal matrix with positive diagonal entries. Such a decomposition is known as the modified Cholesky decomposition of (see for example Daniels and Pourahmadi (2002)). Paulsen et al. (1989) showed that if , then if and only if is decomposable and is a perfect elimination ordering. If either of these two conditions is violated, then the sparsity pattern in is a strict subset of the sparsity pattern in . The entries (with ) such that is not (functionally) zero, are known as “fill-in” entries. The problem of finding an ordering which minimizes the number of fill-in entries is well-known and well-studied in numerical analysis and in computer science/discrete mathematics. Although this problem is NP-hard, several effective greedy algorithms for reducing the number of fill-in entries have been developed and implemented in standard software such as MATLAB and R (see Davis (2006) for instance).

In subsequent sections, we will consider a reparametrization from (the inverse covariance) matrix to its modified Cholesky decomposition. Such a reparametrization inherently assumes an ordering of the variables. In many applications (such as longitudinal data), a natural ordering is available. In the absence of a natural ordering, one can choose a fill-reducing ordering using one of the available fill-reducing algorithms mentioned previously. We will see that a fill-reducing ordering will help in reducing the computational complexity of proposed Markov chain Monte Carlo procedures.

2.3 Undirected graphical models and -Wishart distribution

Let be an undirected graph with , and be an ordering of . The undirected graphical model corresponding to the the ordered graph is the family of distributions

Let be i.i.d. observations from a distribution in . Note that the joint density of given is given by

The -Wishart distribution on is a natural choice of prior for (see Dawid and Lauritzen (1993) and Roverato (2000)). The density of the -Wishart distribution with parameters and is proportional to

Thus the posterior density of given is proportional to

and corresponds to a -Wishart distribution with parameters and , which implies that the family of -Wishart priors are conjugate for the family of distributions .

3 Generalized G-Wishart distributions

In this section we propose a generalization of the -Wishart distribution that is endowed with multiple shape parameters, and contains the -Wishart family as a special case. We shall show in later sections that the flexibility offered by the multiple shape parameters is very useful in high dimensional settings.

3.1 Definition

We now define a multiple shape parameter generalization of the -Wishart distribution for a general graph . To do this, we transform the matrix to its Cholesky decomposition. Consider the modified Cholesky decomposition , where is a lower triangular matrix with diagonal entries equal to , and is a diagonal matrix with positive diagonal entries. The (unnormalized) density of the generalized -Wishart distribution with parameters and is given by

(3.1)

We note that other generalizations of the Wishart have also been considered in Ben-David et al. (2015); Daniels and Pourahmadi (2002); Dawid and Lauritzen (1993); Khare and Rajaratnam (2011); Letac and Massam (2007). It is clear that the -Wishart density arises as a special case of the generalized -Wishart (by considering all the ’s to be equal and noting that ), and that the family of generalized -Wishart distributions defined above is a conjugate family of prior distributions for undirected graphical models. In fact, the posterior density of corresponds to a generalized -Wishart distribution with parameters and .

3.2 Some properties of the generalized G-Wishart distribution

We now proceed to derive properties of the generalized -Wishart distribution. To do so, we transform to its modified Cholesky decomposition as defined in Section 2.2.

We define to be the set of functionally independent elements of . Then the transformation is a bijection from to with Jacobian equal to , where for . Then the (unnormalized) generalized -Wishart density for is given by

(3.2)

We first establish sufficient conditions for the density to be proper.

Theorem 1.

If is positive definite and ,

Also under these conditions, , .

The proof of Theorem 1 is provided in Supplemental Section A.1. Under the conditions in Theorem 1, we will refer to the normalized version of as .

From Roverato (2000), if follows -Wishart with parameters then for or ,

We now provide an extension of this result for the case of generalized -Wishart distributions.

Theorem 2.

Let be generalized -Wishart with parameters for some . Denote as the principal submatrix of , and let denote the matrix with as its appropriate principal submatrix and rest of the elements equal to . Define the matrix as . If , then

The proof of Theorem 2 is quite detailed and technical and is thus provided in Supplemental Section A.2. Theorem 2 provides a useful tool to monitor convergence of any Markov chain Monte Carlo method for sampling from (and particularly the Gibbs sampler introduced in Section 5) .

We also undertake a comparison between generalized -Wishart distribution and the useful priors introduced by Letac and Massam (2007) for the case of decomposable graphs. A careful analysis demonstrates that the generalized -Wishart and Letac-Massam priors are quite different for decomposable graphs. The generalized -Wishart coincides with the Letac-Massam Type II Wishart in the special case when is homogeneous. See Supplemental Section B for details.

4 Generalized Bartlett graphs

As discussed earlier, the class of decomposable graphs is endowed with many properties helpful for closed form computation of posterior quantities. The assumption of decomposability can be rather restrictive in higher dimensions, as they constitute a very small fraction of all graphs see (Figure 1(a)). We develop in this section a class of graphs, which is substantially larger than the class of decomposable graphs. We will show in later sections that for this class of graphs, we can generate posterior samples from the generalized G-Wishart, using a tractable Gibbs sampling algorithm.

4.1 Preliminaries and Definitions

We now provide the definition of Generalized Bartlett graphs. First consider the following procedure to obtain a decomposable cover (see Section 2.1) of an arbitrary ordered graph .

Denote .
while  do
     
     
end while
Algorithm 1 Triangulation Algorithm for an unordered graph

We use the above algorithm to construct a decomposable cover for as follows. Let , and let denote the edge set of . It follows by construction that the ordering is a perfect vertex elimination scheme for . Hence, is a decomposable cover for . Note that, two different orderings may give rise to different decomposable covers. We now define Generalized Bartlett graphs.

Definition 4.

An undirected graph is said to be a Generalized Bartlett graph if there exists an ordering , with the property that there does not exist vertices satisfying and .

In such a case (i.e., when this property is satisfied), is called a Generalized Bartlett ordering of . When it exists, the Generalized Bartlett ordering may not be unique. The following lemma helps in proving an alternate characterization of Generalized Bartlett graphs, one that does not depend on any ordering of the vertices. The lemma shows that Algorithm 5 leads to a collection of minimal decomposable covers, in the sense that the edge set of any decomposable cover of has to contain for some ordering .

Lemma 1.

For any undirected graph and a decomposable cover of , an ordering of s.t., .

Proof.

Let be a decomposable cover of . Since is decomposable let be the perfect elimination ordering of it. We shall prove inductively that for in , . Since , that will prove this lemma. It is trivial to note that . Let us assume that the claim holds for . Consider any such that . Thus . Since is the perfect elimination ordering for , . Thus which completes the induction step and proves the lemma. ∎

We now provide a second definition of Generalized Bartlett graphs.

Lemma 2.

An undirected graph satisfies the Generalized Bartlett property if and only if has a decomposable cover such that every triangle in contains an edge from . That is for any such that , at least one of belongs to .

Proof.

If satisfies the Generalized Bartlett property then by definition an ordering of such that any triangle in contains an edge from . In that case we can simply take . On the other hand, let be a decomposable cover of such that every triangle in contains an edge from . By Lemma 1, an ordering of , such that . Thus any triangle in is also a triangle in and hence has an edge in . This makes the Generalized Bartlett ordering and an Generalized Bartlett graph.

In the subsequent arguments we will also refer to Generalized Bartlett graphs as satisfying the Generalized Bartlett property. Some common Generalized Bartlett graphs are:

  1. All decomposable graphs (follows by using in Lemma 2).

  2. Any cycle (see Section 6.1 for a proof). This is the simplest example of a non-decomposable graph which satisfies the Generalized Bartlett property. Also note that cycle is the simplest non-decomposable graph.

  3. Any lattice with less than rows or less than columns (see Section 6.2 for a proof). Such graphs are useful in spatial applications (see Section 7.3).

Figure 1: (left and middle) Examples of Generalized Bartlett graphs. (right) The bipartite graph : the only 6-vertex connected graph which is not Generalized Bartlett.

It is a natural question to ask how much larger the class of Generalized Bartlett graphs is compared to the class of decomposable graphs. It is quite difficult to obtain a closed form expression for the exact (or approximate) number of connected decomposable graphs (or Generalized Bartlett graphs) with a given number of vertices. However, a list of all possible connected non-isomorphic graphs having at most vertices is available at http://cs.anu.edu.au/~bdm/data/graphs.html. Using this list, we computed the number of decomposable and Generalized Bartlett graphs with at most vertices. Figure 1(a) provides a graphical comparison of these proportions, and Table 2(a) gives the actual values of these proportions. It is quite clear that the proportion of Generalized Bartlett graphs is much larger than the proportion of decomposable graphs. As expected, the proportions of both classes of graphs decreases as the total number of vertices increases. However, the rate of decrease in the proportions is much larger for decomposable graphs. For example, less than of connected isomorphic graphs with vertices is decomposable, but more than of connected isomorphic graphs with vertices is Generalized Bartlett. In this case the number of Generalized Bartlett graphs are approximately times larger.

(a) Plot comparing the percentage
of Generalized Bartlett graphs with
decomposable graphs
(b) A grid which is a   
Generalized Bartlett graph.
Figure 2:
Order No. of graphs Percentage Ratio
Decom posable Gen. Bart.
2 1 100 100 100%
3 2 100 100 100%
4 6 83 100 83%
5 21 71 100 71%
6 112 52 99 53%
7 853 32 98 32%
8 11117 15 97 15%
9 261080 4.5 94 5%
10 11716571 0.9 86 1%
(a) Percentages for Generalized Bartlett graphs and decomposable graphs among connected non-isomorphic graphs with at most vertices.
(b) A grid where the dotted lines represent the extra edges in its Generalized Bartlett cover.
Figure 3:

4.2 Clique Sum Property of Generalized Bartlett graphs

An unordered graph is said to have a decomposition into components and if the vertex set can be decomposed as where such that the induced subgraph on is complete, and separates from (i.e., if and then ). If cannot be decomposed in the above manner it is called a prime graph. Hence any graph is either prime or can be broken down into several prime components by repeated application of the above procedure. It is well known that a graph is decomposable iff all of its prime components are complete. The following lemma provides a similar characterization for Generalized Bartlett graphs.

Lemma 3.

If all the prime components of a graph are Generalized Bartlett then the graph is also Generalized Bartlett.

Proof.

Note that, it is enough to prove the theorem, for a graph with two prime components. Let be an undirected graph with such that induced subgraph on is complete, and separates and . Let and be the corresponding induced subgraphs which are Generalized Bartlett. Then by Lemma 2, we can construct decomposable covers and of and respectively, such that every triangle in contains an edge in for . Define and . Note that is a cover for . We claim that is decomposable. Suppose to the contrary, that for for some there exists an induced cycle in on a set of vertices, say . Since and are both decomposable, all of cannot belong to exclusively in or in . Hence, there exist such that and . Since and are separated by , . Thus the subgraph induced by on the set of vertices contains two paths, both arising from and ending in and intersecting no where in between. Let be one of those paths. If , then there exists points in and connected to each other in , which is not possible. Thus s.t. . Similarly corresponding to the second path. Since is complete . Hence is a chord in the cycle giving us a contradiction. Hence, is a decomposable cover of .

To prove that is a Generalized Bartlett graph we shall prove that every triangle in the decomposable cover contains an edge in . Let us assume to the contrary that for , but . Since every triangle in has at least one edge from for , it follows that all cannot belong exclusively to or . Without loss of generality, let and . This implies giving us a contradiction. Thus is Generalized Bartlett. ∎

4.3 Constructing Generalized Bartlett covers

Given an ordered graph , Algorithm 2 below provides a Generalized Bartlett graph such that . Such a graph is referred to as a Generalized Bartlett cover of .

Set .
while  such that when expressed as a polynomial violates Property A or B do
     
end while
Algorithm 2 Construction of Generalized Bartlett cover

Recall from Section 6.2 that a grid is the smallest example of a grid which is not a Generalized Bartlett graph. The Generalized Bartlett cover for a grid using Algorithm 2 is provided in Figure 2(b). Note that this cover has only three extra edges.

5 A tractable Gibbs sampler for generalized -Wisharts

In Section 4 we studied the graph theoretic properties of GB graphs. In this section we shall investigate the statistical/properties of GB graphs. In particular, we develop a tractable Gibbs sampling algorithm to sample from the generalized -Wishart distribution when the underlying graph is Generalized Bartlett. The first step in achieving this goal requires considering a further transformation of the Cholesky parameter from Section 3.2.

5.1 A reparametrization of the Cholesky parameter

Let be an undirected graph with , and an ordering for . Let be the modified Cholesky decomposition of . To facilitate our analysis, we consider a one-to-one transformation of defined as follows:

where and for . The following lemma shows that terms of the form can be expressed as a polynomial in the entries of and (with negative powers allowed for entries of ).

Lemma 4.

For , terms of the form , which appear in the modified Cholesky expansion of , are either functionally zero, or can be expressed as a sum of terms, where each term has the following form:

(5.1)

Proof: Note that

for all , and the Jacobian of this transformation is . Hence, the posterior density of is proportional to

(5.2)

where for . Note that if and , then , which implies

Making repeated substitutions in the RHS of the above equation, it follows that the entry is either functionally zero, or can be expressed as a sum of terms, where each term looks like

(5.3)

for suitable non-negative integers , and non-positive integers . It is easy to see that

Hence, can be expressed as a polynomial in entries of and . The results now follows from .

Note that for every with , the functionally dependent entry can be expressed in terms of and as in (5.3). The above analysis indicates that, in general, the posterior is a complicated function of . However, we will show that if is a Generalized Bartlett graph, and is a Generalized Bartlett ordering for , then the full conditional posterior distributions of all individual entries of are either Gaussian or Generalized Inverse Gaussian distributions (and therefore easy to sample from). This property will then be used to derive a Gibbs sampling algorithm to sample from the posterior density in (5.2).

5.2 The Gibbs sampler

We now derive a Gibbs sampling algorithm to sample from the posterior density in (5.2). We start by defining two properties which will be crucial to the development of the Gibbs sampler.

Definition 5.

Let be an ordered graph, and for , be the modified Cholesky decomposition.

  1. The ordered graph is defined to have Property-A if for every such that , the following holds: for every with , is a linear function of (keeping other entries of and fixed). In other words, in , can only be or for every with .

  2. The ordered graph is defined to have Property-B if for every such that , the following holds: for every , is a linear function of (keeping other entries of and fixed). In other words, in , can only be or for every .

We now state three lemmas which will be useful in our analysis. The first lemma provides an equivalent formulation of Property-B. The proofs of these lemmas are provided in Supplemental Section A.4, A.5 and A.6 respectively.

Lemma 5.

The following statements are equivalent.

  1. The ordered graph satisfies Property-B.

  2. For every , can be expressed as a polynomial in entries of and (with negative powers allowed for entries of ). Furthermore for every term in the expansion of , the power of any entry of is either or .

Let with . As noted above, for , both and can be expressed as polynomials in entries of and (with negative powers allowed for entries of ).

Lemma 6.

Both and are either functionally zero, or any term in the expansion of cannot be the exact negative (functionally) of any term in the expansion of .

The next lemma shows that Generalized Bartlett graphs satisfies Properties A and B.

Lemma 7.

Let be a Generalized Bartlett graph, and be a Generalized Bartlett ordering for . Then, the ordered graph satisfies both Property A and B.

We are now ready to state and prove the main result of this section, which provides a Gibbs sampler for the posterior density in (5.2).

Theorem 3.

Let be a Generalized Bartlett Graph, and be a Generalized Bartlett ordering for . Suppose follows a generalized -Wishart distribution with parameters and for some positive definite matrix and . If is the modified Cholesky decomposition of , and we define , , for , then

  • the conditional posterior density of the independent entry , given all other entries of and , is univariate normal,

  • the conditional posterior density of given and other entries of is either a Generalized Inverse Gaussian or Gamma.

Proof.

Note that for , can be expressed as a polynomial in the entries of and . Recall that is the decomposable cover obtained by the triangulation process described in Algorithm 5. We first establish that for , is functionally non-zero iff . We begin by noticing that at each step in the construction of we are adding some extra edges to to get , but never deducting anything. So if was an independent entry, i.e. , then .

Now lets assume to the contrary that is the first dependent but functionally non-zero entry s.t. . Since is dependent but non-zero s.t. appears in the expansion of . Now can be independent and hence . Otherwise is non-zero dependent. Since (the first non-zero dependent not in ) comes after , . We recall that, for any , while constructing , we only join vertices higher than . Thus must have been joined before construction of , i.e. . By a similar argument . Thus and we have a contradiction. Hence,

To prove the reverse implication we note that for , implies that is independent and hence . Now we use induction and assume that the claim holds upto . If for , then and hence appears in the expansion of . Since , and by assumption and which with the help of Lemma 6 implies . Thus the assumption holds for , which completes the induction step.

It follows by (5.2) and Property-A that for every , the conditional posterior density of given all other entries of and is proportional to

for appropriate constants and . Hence the conditional posterior density of is a Gaussian density. Similarly, it follows from (5.2) and Property-B that for every , the conditional posterior density of given all entries of and is proportional to

for appropriate constants and . Hence the conditional posterior density of is Gamma, and for , the conditional posterior density of is a Generalized Inverse Gaussian density. ∎

The results in Theorem 3 can be used to construct a Gibbs sampling algorithm, where the iterations involve sequentially sampling from the conditional densities of each element of . It is well known that the joint posterior density of is invariant for the Gibbs transition density. Since the Gaussian density is supported on the entire real line, and the Generalized Inverse Gaussian density is supported on the entire positive real line, it follows that the Markov transition density of the Gibbs sampler is strictly positive. Hence, the corresponding Markov chain is aperiodic and -irreducible where is the Lebesgue measure on (Meyn and Tweedie (1993), Pg 87). Also, the existence of an invariant probability density together with -irreducibility imply that the chain is positive Harris recurrent (see Asmussen and Glynn (2011) for instance). We formalize the convergence of our Gibbs sampler below. The following lemma on the convergence of the Gibbs sampling Markov chain facilitates computation of expected values for generalized -Wishart distributions.

Lemma 8.

Let be a Generalized Bartlett graph, and be a Generalized Bartlett ordering for . Then, the Markov chain corresponding to the Gibbs sampling algorithm in Theorem 3 is positive Harris recurrent.

5.3 Maximality of Generalized Bartlett graphs

Note that the Gibbs sampling algorithm described in Theorem 3 is feasible only if Property-A and Property-B hold. The following theorem shows that if a graph is not Generalized Bartlett, then at least one of Property-A and Property-B does not hold.

Theorem 4.

If an ordered graph satisfies Property-A and Property-B, then the graph is a Generalized Bartlett graph and is a Generalized Bartlett ordering for .

Proof.

Suppose there exists s.t. but i.e. . Hence is in the expansion of . The power of in the expansion of and is . . Also we know that, no term of can cancel with any term of . Hence in the expansion of the power of is . Thus Property-B is violated. The result now follows by Definition 7.

Theorem 4 demonstrates that the class of Generalized Bartlett graphs is maximal, in the sense that the conditional distributions considered in Theorem 3 are Gaussian/Generalized-Inverse-Gaussian only if the underlying graph is Generalized Bartlett. In other words the above tractability is lost for graphs outside the Generalized Bartlett class.

5.4 Improving efficiency using decomposable subgraphs

It is generally expected that ‘blocking’ or ‘grouping’ improves the speed of convergence of Gibbs samplers (see Liu et al. (1994)). Suppose follows a generalized -Wishart distribution. In this section, we will show that under appropriate conditions, the conditional density of a block of variables in (given the other variables) is multivariate normal. Based on the discussion above, this result can be used to sample more efficiently from the joint density of .

Lemma 9.

Let be a Generalized Bartlett graph with vertices and follows generalized -Wishart with parameters . Suppose that for some , the induced subgraph of corresponding to the vertices is decomposable with a perfect elimination ordering. Then follows a multivariate normal distribution.

Proof.

We partition the matrix as

where has dimension and correspondingly,