Bayesian inference for Gaussian graphical models beyond decomposable graphs

Bayesian inference for Gaussian graphical models beyond decomposable graphs

Abstract

Bayesian inference for graphical models has received much attention in the literature in recent years. It is well known that when the graph is decomposable, Bayesian inference is significantly more tractable than in the general non-decomposable setting. Penalized likelihood inference on the other hand has made tremendous gains in the past few years in terms of scalability and tractability. Bayesian inference, however, has not had the same level of success, though a scalable Bayesian approach has its respective strengths, especially in terms of quantifying uncertainty. To address this gap, we propose a scalable and flexible novel Bayesian approach for estimation and model selection in Gaussian undirected graphical models. We first develop a class of generalized -Wishart distributions with multiple shape parameters for an arbitrary underlying graph. This class contains the -Wishart distribution as a special case. We then introduce the class of Generalized Bartlett (GB) graphs, and derive an efficient Gibbs sampling algorithm to obtain posterior draws from generalized -Wishart distributions corresponding to a GB graph. The class of Generalized Bartlett graphs contains the class of decomposable graphs as a special case, but is substantially larger than the class of decomposable graphs. We proceed to derive theoretical properties of the proposed Gibbs sampler. We then demonstrate that the proposed Gibbs sampler is scalable to significantly higher dimensional problems as compared to using an accept-reject or a Metropolis-Hasting algorithm. Finally, we show the efficacy of the proposed approach on simulated and real data.

Gaussian graphical models, Gibbs sampler, Generalized Bartlett graph, Generalized G-Wishart distribution, Scalable Bayesian inference

1Introduction

Gaussian graphical models have found widespread use in many application areas. Besides standard penalized likelihood based approaches (see [?] and references therein), Bayesian methods have also been proposed in the literature for analyzing undirected Gaussian graphical models [?]. Bayesian methods have the distinct and inherent advantage that they can incorporate prior information and yield a full posterior for the purposes of uncertainty quantification (and not just a point estimate), whereas standard frequentist approaches for uncertainty quantification (such as the bootstrap) may be computationally burdensome and/or break down in high dimensional settings. However, it is well known that Bayesian methods for graphical models in high dimensional settings lag severely behind their regularized likelihood based counterparts, in the sense that they are not scalable except under restrictive assumptions on the underlying sparsity pattern (such as for decomposable graphs). Hence a scalable and more general approach to graphical models, with theoretical and computational safeguards, is critical to leveraging the advantages of posterior inference.

To outline the issues with current Bayesian methods more clearly, consider i.i.d. vectors drawn from a -variate normal distribution with mean vector and a sparse inverse covariance matrix . The sparsity pattern in can be encoded in terms of a graph on the set of variables as follows. If the variables and do not share an edge in , then . Hence, an undirected (or concentration) graphical model corresponding to restricts the inverse covariance matrix to a submanifold of the cone of positive definite matrices (referred to as ). A Bayesian statistical analysis of these models requires specification of a prior distribution (supported on ) for . [?] introduced a class of prior distributions for called the Hyper Inverse Wishart (HIW) distributions. The induced class of prior distributions for (supported on ) is known as the class of -Wishart distributions (see [?]). This class of prior distributions is quite useful and popular, and has several desirable properties, including the fact that it corresponds to the Diaconis-Ylvisaker class of conjugate priors for the concentration graph model corresponding to the graph .

Closed form computations of relevant quantities corresponding to the -Wishart distribution, such as expected value of the precision matrix and quantiles, are in general available only if the underlying graph is decomposable, i.e., does not have any induced cycle of length greater than or equal to 4. A variety of approaches have been developed in the literature to generate samples from the -Wishart distribution corresponding to a general non-decomposable graph. [?] have developed a maximal clique based Markov Chain Monte Carlo (MCMC) approach to sample from the -Wishart distribution corresponding to a general graph . [?] develops a direct sampler for -Wishart distributions corresponding to a general graph . This approach uses an iterative algorithm to minimize an objective function over the space of positive definite matrices with appropriate sparsity constraints. [?] have developed an accept-reject algorithm to generate direct samples from the -Wishart distribution corresponding to a general graph . [?] have developed a Metropolis-Hastings based MCMC approach for the same.

While the -Wishart prior is clearly very useful for Bayesian inference in graphical models, it has an important drawback. In particular, the -Wishart distribution has only one shape parameter, which makes it potentially inflexible and restrictive in terms of prior specification. [?] address this issue by constructing the so-called and families of distributions which are flexible in the sense that they have multiple shape parameters. These distributions include the -Wishart as a special case, and form a standard conjugate family of prior distributions for undirected decomposable graphical models. The construction of the Letac and Massam distributions uses the structure associated with decomposable graphs. It would thus be useful to develop a class of prior distributions which is flexible (multiple shape parameters) and leads to tractable Bayesian inference for non-decomposable graphs.

In this paper, we aim to develop a scalable and flexible Bayesian approach for estimation and model selection in Gaussian undirected graphical models for general graphs. Our approach preserves the attractive properties of previous approaches, while overcoming their drawbacks. We first develop a class of generalized -Wishart distributions (for an arbitrary underlying graph), which has multiple shape parameters and contains the -Wishart distributions as a special case. These distributions form a family of standard conjugate prior distributions for Gaussian concentration graph models. Developing methods for efficient posterior draws from generalized -Wishart distributions is crucial for scalable Bayesian inference. We proceed to introduce the class of Generalized Bartlett (GB) graphs, and derive an efficient Gibbs sampling algorithm (with Gaussian or GIG conditionals) to simulate from generalized -Wishart distributions corresponding to a GB graph. The class of Generalized Bartlett graphs contains decomposable graphs as a special case, but is substantially larger than the class of decomposable graphs. For example, any cycle of length greater than is Generalized Bartlett, but is not decomposable. Our approach has the flexibility of using multiple shape parameters (as opposed to the single parameter -Wishart), but goes beyond the class of decomposable graphs without losing tractability.

For the generalized -Wishart case, the conditional densities for any maximal clique of are intractable to sample from. Hence, the sampling approaches in [?] for -Wisharts on a general graph do not extend to the generalized -Wishart. On the other hand, we show that the accept-reject and Metropolis-Hastings based methods in [?] and [?] can be easily extended to the generalized -Wishart case. We compare the performance and scalability of these two approaches with our Gibbs sampler in Section ? and Section ?.

The rest of the paper is organized as follows. Section 2 contains a brief overview of relevant concepts from graph theory and matrix theory. In Section 3 and Section 4, we define generalized -Wishart distributions and GB graphs respectively, and establish some basic properties. In Section 5, we derive a tractable Gibbs sampling algorithm to simulate from the generalized -Wishart distribution corresponding to a GB graph. Section 6 provides additional examples and properties of GB graphs. Section 7 contains a comprehensive simulation and real data analysis study for the Bayesian approach developed in the paper. The proofs of most of the technical results in the paper and additional numerical work are provided in the Supplemental Document.

2Preliminaries

2.1Graph theoretic preliminaries

For any positive integer , let . Let denote an undirected graph, where represents the finite vertex set and denotes the corresponding edge set. A function is defined to be an ordering of if is a bijection from to . An undirected graph and an ordering of can be used to construct an ordered graph , where if and only if .

Such graphs are also called triangulated, or chordal graphs. A useful concept associated to decomposable graphs is that of a perfect elimination ordering (see [?]).

In fact, an undirected graph is decomposable if and only if it has a perfect elimination ordering (see [?]).

Decomposable covers are also known as triangulations in graph theory literature (see [?]).

2.2Matrix theoretic preliminaries

We denote the set of symmetric matrices by , and the space of positive definite symmetric matrices by . Given an ordered graph , we define

and

The space is a submanifold of the space of positive definite matrices, where the elements are restricted to be zero whenever the corresponding edge is missing from . Similarly the space is a subspace of lower triangular matrices with diagonal entries equal to , such that the elements in the lower triangle are restricted to be zero whenever the corresponding edge is missing from .

A positive definite matrix can be uniquely expressed as , where is a lower triangular matrix with diagonal entries equal to , and is a diagonal matrix with positive diagonal entries. Such a decomposition is known as the modified Cholesky decomposition of (see for example [?]). [?] showed that if , then if and only if is decomposable and is a perfect elimination ordering. If either of these two conditions is violated, then the sparsity pattern in is a strict subset of the sparsity pattern in . The entries (with ) such that is not (functionally) zero, are known as “fill-in” entries. The problem of finding an ordering which minimizes the number of fill-in entries is well-known and well-studied in numerical analysis and in computer science/discrete mathematics. Although this problem is NP-hard, several effective greedy algorithms for reducing the number of fill-in entries have been developed and implemented in standard software such as MATLAB and R (see [?] for instance).

In subsequent sections, we will consider a reparametrization from (the inverse covariance) matrix to its modified Cholesky decomposition. Such a reparametrization inherently assumes an ordering of the variables. In many applications (such as longitudinal data), a natural ordering is available. In the absence of a natural ordering, one can choose a fill-reducing ordering using one of the available fill-reducing algorithms mentioned previously. We will see that a fill-reducing ordering will help in reducing the computational complexity of proposed Markov chain Monte Carlo procedures.

2.3Undirected graphical models and -Wishart distribution

Let be an undirected graph with , and be an ordering of . The undirected graphical model corresponding to the the ordered graph is the family of distributions

Let be i.i.d. observations from a distribution in . Note that the joint density of given is given by

The -Wishart distribution on is a natural choice of prior for (see [?] and [?]). The density of the -Wishart distribution with parameters and is proportional to

Thus the posterior density of given is proportional to

and corresponds to a -Wishart distribution with parameters and , which implies that the family of -Wishart priors are conjugate for the family of distributions .

3Generalized G-Wishart distributions

In this section we propose a generalization of the -Wishart distribution that is endowed with multiple shape parameters, and contains the -Wishart family as a special case. We shall show in later sections that the flexibility offered by the multiple shape parameters is very useful in high dimensional settings.

3.1Definition

We now define a multiple shape parameter generalization of the -Wishart distribution for a general graph . To do this, we transform the matrix to its Cholesky decomposition. Consider the modified Cholesky decomposition , where is a lower triangular matrix with diagonal entries equal to , and is a diagonal matrix with positive diagonal entries. The (unnormalized) density of the generalized -Wishart distribution with parameters and is given by

We note that other generalizations of the Wishart have also been considered in [?]. It is clear that the -Wishart density arises as a special case of the generalized -Wishart (by considering all the ’s to be equal and noting that ), and that the family of generalized -Wishart distributions defined above is a conjugate family of prior distributions for undirected graphical models. In fact, the posterior density of corresponds to a generalized -Wishart distribution with parameters and .

3.2Some properties of the generalized G-Wishart distribution

We now proceed to derive properties of the generalized -Wishart distribution. To do so, we transform to its modified Cholesky decomposition as defined in Section 2.2.

We define to be the set of functionally independent elements of . Then the transformation is a bijection from to with Jacobian equal to , where for . Then the (unnormalized) generalized -Wishart density for is given by

We first establish sufficient conditions for the density to be proper.

The proof of Theorem ? is provided in Supplemental Section Appendix B.1. Under the conditions in Theorem ?, we will refer to the normalized version of as .

From [?], if follows -Wishart with parameters then for or ,

We now provide an extension of this result for the case of generalized -Wishart distributions.

The proof of Theorem ? is quite detailed and technical and is thus provided in Supplemental Section Appendix B.2. Theorem ? provides a useful tool to monitor convergence of any Markov chain Monte Carlo method for sampling from (and particularly the Gibbs sampler introduced in Section 5) .

We also undertake a comparison between generalized -Wishart distribution and the useful priors introduced by [?] for the case of decomposable graphs. A careful analysis demonstrates that the generalized -Wishart and Letac-Massam priors are quite different for decomposable graphs. The generalized -Wishart coincides with the Letac-Massam Type II Wishart in the special case when is homogeneous. See Supplemental Section ? for details.

4Generalized Bartlett graphs

As discussed earlier, the class of decomposable graphs is endowed with many properties helpful for closed form computation of posterior quantities. The assumption of decomposability can be rather restrictive in higher dimensions, as they constitute a very small fraction of all graphs see (Figure 2). We develop in this section a class of graphs, which is substantially larger than the class of decomposable graphs. We will show in later sections that for this class of graphs, we can generate posterior samples from the generalized G-Wishart, using a tractable Gibbs sampling algorithm.

4.1Preliminaries and Definitions

We now provide the definition of Generalized Bartlett graphs. First consider the following procedure to obtain a decomposable cover (see Section 2.1) of an arbitrary ordered graph .

We use the above algorithm to construct a decomposable cover for as follows. Let , and let denote the edge set of . It follows by construction that the ordering is a perfect vertex elimination scheme for . Hence, is a decomposable cover for . Note that, two different orderings may give rise to different decomposable covers. We now define Generalized Bartlett graphs.

In such a case (i.e., when this property is satisfied), is called a Generalized Bartlett ordering of . When it exists, the Generalized Bartlett ordering may not be unique. The following lemma helps in proving an alternate characterization of Generalized Bartlett graphs, one that does not depend on any ordering of the vertices. The lemma shows that Algorithm ? leads to a collection of minimal decomposable covers, in the sense that the edge set of any decomposable cover of has to contain for some ordering .

Let be a decomposable cover of . Since is decomposable let be the perfect elimination ordering of it. We shall prove inductively that for in , . Since , that will prove this lemma. It is trivial to note that . Let us assume that the claim holds for . Consider any such that . Thus . Since is the perfect elimination ordering for , . Thus which completes the induction step and proves the lemma.

We now provide a second definition of Generalized Bartlett graphs.

If satisfies the Generalized Bartlett property then by definition an ordering of such that any triangle in contains an edge from . In that case we can simply take . On the other hand, let be a decomposable cover of such that every triangle in contains an edge from . By Lemma ?, an ordering of , such that . Thus any triangle in is also a triangle in and hence has an edge in . This makes the Generalized Bartlett ordering and an Generalized Bartlett graph.

In the subsequent arguments we will also refer to Generalized Bartlett graphs as satisfying the Generalized Bartlett property. Some common Generalized Bartlett graphs are:

  1. All decomposable graphs (follows by using in Lemma ?).

  2. Any cycle (see Section 6.1 for a proof). This is the simplest example of a non-decomposable graph which satisfies the Generalized Bartlett property. Also note that cycle is the simplest non-decomposable graph.

  3. Any lattice with less than rows or less than columns (see Section 6.2 for a proof). Such graphs are useful in spatial applications (see Section 7.3).

Figure 1: (left and middle) Examples of Generalized Bartlett graphs. (right) The bipartite graph K_{3,3}: the only 6-vertex connected graph which is not Generalized Bartlett.
Figure 1: (left and middle) Examples of Generalized Bartlett graphs. (right) The bipartite graph : the only 6-vertex connected graph which is not Generalized Bartlett.

It is a natural question to ask how much larger the class of Generalized Bartlett graphs is compared to the class of decomposable graphs. It is quite difficult to obtain a closed form expression for the exact (or approximate) number of connected decomposable graphs (or Generalized Bartlett graphs) with a given number of vertices. However, a list of all possible connected non-isomorphic graphs having at most vertices is available at http://cs.anu.edu.au/~bdm/data/graphs.html. Using this list, we computed the number of decomposable and Generalized Bartlett graphs with at most vertices. Figure 2 provides a graphical comparison of these proportions, and Table ? gives the actual values of these proportions. It is quite clear that the proportion of Generalized Bartlett graphs is much larger than the proportion of decomposable graphs. As expected, the proportions of both classes of graphs decreases as the total number of vertices increases. However, the rate of decrease in the proportions is much larger for decomposable graphs. For example, less than of connected isomorphic graphs with vertices is decomposable, but more than of connected isomorphic graphs with vertices is Generalized Bartlett. In this case the number of Generalized Bartlett graphs are approximately times larger.

Figure 2: Plot comparing the percentage of Generalized Bartlett graphs with decomposable graphs
Figure 2: Plot comparing the percentage
of Generalized Bartlett graphs with
decomposable graphs
Figure 3: A (k+1) \times 3 grid which is a  Generalized Bartlett graph.
Figure 3: A grid which is a
Generalized Bartlett graph.
Order No. of graphs Ratio
Decom posable Gen. Bart.
2 1 100 100 100%
3 2 100 100 100%
4 6 83 100 83%
5 21 71 100 71%
6 112 52 99 53%
7 853 32 98 32%
8 11117 15 97 15%
9 261080 4.5 94 5%
10 11716571 0.9 86 1%

Figure 4: A 4 \times 4 grid where the dotted lines represent the extra edges in its Generalized Bartlett cover.
Figure 4: A grid where the dotted lines represent the extra edges in its Generalized Bartlett cover.

4.2Clique Sum Property of Generalized Bartlett graphs

An unordered graph is said to have a decomposition into components and if the vertex set can be decomposed as where such that the induced subgraph on is complete, and separates from (i.e., if and then ). If cannot be decomposed in the above manner it is called a prime graph. Hence any graph is either prime or can be broken down into several prime components by repeated application of the above procedure. It is well known that a graph is decomposable iff all of its prime components are complete. The following lemma provides a similar characterization for Generalized Bartlett graphs.

Note that, it is enough to prove the theorem, for a graph with two prime components. Let be an undirected graph with such that induced subgraph on is complete, and separates and . Let and be the corresponding induced subgraphs which are Generalized Bartlett. Then by Lemma 2, we can construct decomposable covers and of and respectively, such that every triangle in contains an edge in for . Define and . Note that is a cover for . We claim that is decomposable. Suppose to the contrary, that for for some there exists an induced cycle in on a set of vertices, say . Since and are both decomposable, all of cannot belong to exclusively in or in . Hence, there exist such that and . Since and are separated by , . Thus the subgraph induced by on the set of vertices contains two paths, both arising from and ending in and intersecting no where in between. Let be one of those paths. If , then there exists points in and connected to each other in , which is not possible. Thus s.t. . Similarly corresponding to the second path. Since is complete . Hence is a chord in the cycle giving us a contradiction. Hence, is a decomposable cover of .

To prove that is a Generalized Bartlett graph we shall prove that every triangle in the decomposable cover contains an edge in . Let us assume to the contrary that for , but . Since every triangle in has at least one edge from for , it follows that all cannot belong exclusively to or . Without loss of generality, let and . This implies giving us a contradiction. Thus is Generalized Bartlett.

4.3Constructing Generalized Bartlett covers

Given an ordered graph , Algorithm ? below provides a Generalized Bartlett graph such that . Such a graph is referred to as a Generalized Bartlett cover of .

Recall from Section 6.2 that a grid is the smallest example of a grid which is not a Generalized Bartlett graph. The Generalized Bartlett cover for a grid using Algorithm ? is provided in Figure 4. Note that this cover has only three extra edges.

5A tractable Gibbs sampler for generalized -Wisharts

In Section 4 we studied the graph theoretic properties of GB graphs. In this section we shall investigate the statistical/properties of GB graphs. In particular, we develop a tractable Gibbs sampling algorithm to sample from the generalized -Wishart distribution when the underlying graph is Generalized Bartlett. The first step in achieving this goal requires considering a further transformation of the Cholesky parameter from Section 3.2.

5.1A reparametrization of the Cholesky parameter

Let be an undirected graph with , and an ordering for . Let be the modified Cholesky decomposition of . To facilitate our analysis, we consider a one-to-one transformation of defined as follows:

where and for . The following lemma shows that terms of the form can be expressed as a polynomial in the entries of and (with negative powers allowed for entries of ).

: Note that

for all , and the Jacobian of this transformation is . Hence, the posterior density of is proportional to

where for . Note that if and , then , which implies

Making repeated substitutions in the RHS of the above equation, it follows that the entry is either functionally zero, or can be expressed as a sum of terms, where each term looks like

for suitable non-negative integers , and non-positive integers . It is easy to see that

Hence, can be expressed as a polynomial in entries of and . The results now follows from .

Note that for every with , the functionally dependent entry can be expressed in terms of and as in (Equation 4). The above analysis indicates that, in general, the posterior is a complicated function of . However, we will show that if is a Generalized Bartlett graph, and is a Generalized Bartlett ordering for , then the full conditional posterior distributions of all individual entries of are either Gaussian or Generalized Inverse Gaussian distributions (and therefore easy to sample from). This property will then be used to derive a Gibbs sampling algorithm to sample from the posterior density in (Equation 3).

5.2The Gibbs sampler

We now derive a Gibbs sampling algorithm to sample from the posterior density in (Equation 3). We start by defining two properties which will be crucial to the development of the Gibbs sampler.

We now state three lemmas which will be useful in our analysis. The first lemma provides an equivalent formulation of Property-B. The proofs of these lemmas are provided in Supplemental Section Appendix B.4, Appendix B.5 and Appendix B.6 respectively.

Let with . As noted above, for , both and can be expressed as polynomials in entries of and (with negative powers allowed for entries of ).

The next lemma shows that Generalized Bartlett graphs satisfies Properties A and B.

We are now ready to state and prove the main result of this section, which provides a Gibbs sampler for the posterior density in (Equation 3).

Note that for , can be expressed as a polynomial in the entries of and . Recall that is the decomposable cover obtained by the triangulation process described in Algorithm ?. We first establish that for , is functionally non-zero iff . We begin by noticing that at each step in the construction of we are adding some extra edges to to get , but never deducting anything. So if was an independent entry, i.e. , then .

Now lets assume to the contrary that is the first dependent but functionally non-zero entry s.t. . Since is dependent but non-zero s.t. appears in the expansion of . Now can be independent and hence . Otherwise is non-zero dependent. Since (the first non-zero dependent not in ) comes after , . We recall that, for any , while constructing , we only join vertices higher than . Thus must have been joined before construction of , i.e. . By a similar argument . Thus and we have a contradiction. Hence,

To prove the reverse implication we note that for , implies that is independent and hence . Now we use induction and assume that the claim holds upto . If for , then and hence appears in the expansion of . Since , and by assumption and which with the help of Lemma ? implies . Thus the assumption holds for , which completes the induction step.

It follows by (Equation 3) and Property-A that for every , the conditional posterior density of given all other entries of and is proportional to

for appropriate constants and . Hence the conditional posterior density of is a Gaussian density. Similarly, it follows from (Equation 3) and Property-B that for every , the conditional posterior density of given all entries of and is proportional to

for appropriate constants and . Hence the conditional posterior density of is Gamma, and for , the conditional posterior density of is a Generalized Inverse Gaussian density.

The results in Theorem ? can be used to construct a Gibbs sampling algorithm, where the iterations involve sequentially sampling from the conditional densities of each element of . It is well known that the joint posterior density of is invariant for the Gibbs transition density. Since the Gaussian density is supported on the entire real line, and the Generalized Inverse Gaussian density is supported on the entire positive real line, it follows that the Markov transition density of the Gibbs sampler is strictly positive. Hence, the corresponding Markov chain is aperiodic and -irreducible where is the Lebesgue measure on ([?], Pg 87). Also, the existence of an invariant probability density together with -irreducibility imply that the chain is positive Harris recurrent (see [?] for instance). We formalize the convergence of our Gibbs sampler below. The following lemma on the convergence of the Gibbs sampling Markov chain facilitates computation of expected values for generalized -Wishart distributions.

5.3Maximality of Generalized Bartlett graphs

Note that the Gibbs sampling algorithm described in Theorem ? is feasible only if Property-A and Property-B hold. The following theorem shows that if a graph is not Generalized Bartlett, then at least one of Property-A and Property-B does not hold.

Suppose there exists s.t. but i.e. . Hence is in the expansion of . The power of in the expansion of and is . . Also we know that, no term of can cancel with any term of . Hence in the expansion of the power of is . Thus Property-B is violated. The result now follows by Definition ?.

Theorem ? demonstrates that the class of Generalized Bartlett graphs is maximal, in the sense that the conditional distributions considered in Theorem ? are Gaussian/Generalized-Inverse-Gaussian only if the underlying graph is Generalized Bartlett. In other words the above tractability is lost for graphs outside the Generalized Bartlett class.

5.4Improving efficiency using decomposable subgraphs

It is generally expected that ‘blocking’ or ‘grouping’ improves the speed of convergence of Gibbs samplers (see [?]). Suppose follows a generalized -Wishart distribution. In this section, we will show that under appropriate conditions, the conditional density of a block of variables in (given the other variables) is multivariate normal. Based on the discussion above, this result can be used to sample more efficiently from the joint density of .

We partition the matrix as

where has dimension and correspondingly,

Note that the density of is proportional to

A sample calculation gives:

Consider such that . Since the induced subgraph on is a decomposable graph with a perfect elimination ordering, there does not exist such that . It follows that is a function of entries in . Hence, all the dependent entries in are functions of . It follows by (Equation 6) that given , is a quadratic form in the independent entries of . Hence, the log of the conditional density of given the other entries in is a quadratic form. This proves the required result.

5.5Closed form expressions for decomposable graphs

A closed form expression for the mean of can be obtained if is assumed to be a decomposable graph. For and positive definite, the generalized -Wishart density on is,

Let us define,

Also let be the diagonal matrix with -th element as and is a matrix whose -th element is if , is if and otherwise. The following theorem provides closed form expectations of the elements of the matrix .

The proof is provided in the Supplemental Section Appendix B.3.

6Classes of Generalized Bartlett Graphs

As mentioned earlier, the class of Generalized Bartlett graphs contains the class of decomposable graphs. In this section we will consider two naturally occurring examples of non-decomposable Generalized Bartlett graphs. We then provide schemes for combining a group of Generalized Bartlett graphs to produce a bigger Generalized Bartlett graph.

6.1The -cycle

We show that the -cycle (with its standard ordering) satisfies Property-A and Property-B, and is hence a Generalized Bartlett graph. Let , where , is the identity permutation and

The independent entries of are . After some straightforward algebraic manipulations, the dependent entries of can be calculated as follows:

It is clear from the expressions in the above equation that Property-A and Property-B are satisfied and hence by Theorem 4, the -cycle is Generalized Bartlett.

6.2Grids

A grid is an undirected graph formed by the intersection of rows and columns where the vertices correspond to the intersection points and as a result edges are formed. In this section we shall prove that for some particular ordering all and grid are Generalized Bartlett. We order an grid row wise starting from the top as shown in Figure 3.

Let be an ordered graph, , and denote the modified Cholesky decomposition of . Note that Property-A and Property-B have been defined for ordered graphs, but we extend these notions to polynomials as follows. For any polynomial of , we say that satisfies Property-A if the power of any independent can be . Similarly we say satisfies Property-B if the power of any can be . We note that if and satisfies Property-B then also satisfies Property-B.

6.3Expansion property of Generalized Bartlett graphs

In this section we develop two methods, which combine an arbitrary number of Generalized Bartlett graphs in a suitable manner to produce a larger Generalized Bartlett graph.

Maximum vertex based expansion

We start by proving a lemma which will be useful for further analysis.

Since is Generalized Bartlett, let be the decomposable cover of such that any triangle in has at least one edge in . Let be the induced subgraph of for . Then is decomposable since it is a induced subgraph of a decomposable graph. Also any triangle in is a triangle in and thus has atleast one edge in and hence in . Thus by Lemma ?, is Generalized Bartlett. Moreover from the proof of Lemma ? we can observe that if is the Generalized Bartlett ordering for then the same ordering works for .

Let be a Generalized Bartlett graph with, . Suppose we replace each vertex of , by a Generalized Bartlett graph , where for , . Here and are the sizes of the graphs. Note that the graphs being considered here are already ordered. For ease of exposition, we will suppress the ordered graph notation, and refer to the graphs as just .

Hence is constructed from by replacing the -th vertex of by . An edge between and in translates to an edge between the maximal vertices of and namely and . For any , if , the notation shall denote , and shall denote

The proof this theorem is given in the Supplemental Section Appendix B.7.

Tree based expansion

Consider a tree with vertices . For each consider an arbitrary number of GB graphs say . We add an edge from to each vertex in for every . Denote the resulting graph by . Next the vertices in are labeled in the following order.

The labeling is done in such a way that the induced ordering on each is a GB ordering and every parent vertex in gets a higher label than any of its children in . Again for ease of exposition we will suppress the ordering notation and refer to the resulting ordered graph as .

The proof of this theorem is provided in the Supplemental Section Appendix B.8.

7Illustrations and Applications

We now illustrate the advantages of our Generalized Bartlett approach on both simulated and real data and demonstrate that the proposed GB method is scalable to significantly higher dimensions. In Section 7.1, we illustrate the advantage of having multiple shape parameters in the generalized -Wishart distribution. In Section ? and Section ?, we undertake a comparison of our algorithm with the accept-reject and Metropolis-Hastings approaches. Section 7.3 contains a real data analysis using data from a temperature study. Although the main focus of this paper is development of the flexible class of -Wishart distributions, and tractable methods to sample from these distributions, we also illustrate that the methods developed in this paper can be used for high-dimensional graphical model selection in conjunction with existing penalized likelihood methods (see Supplemental Sections Appendix D and Appendix E).

7.1Comparing -Wishart with generalized -Wisharts

In this section, we present a simulation experiment to demonstrate that the multiple shape parameters in the generalized -Wishart distribution can yield differential shrinkage and improved estimation as compared to the single parameter -Wishart in higher dimensional setting.

For the purposes of this experiment we consider a Generalized Bartlett graph with vertices, defined as follows. Let,

A graph is constructed by forming the -cycle and then connecting with all elements of for . An inverse covariance matrix is then constructed by taking , where if . Here is a lower triangular matrix with independent entries equal to , and dependent entries chosen such that .

We then generate samples from a