On the Geometric Ergodicity of Two-Variable Gibbs Samplers
A Markov chain is geometrically ergodic if it converges to its invariant distribution at a geometric rate in total variation norm. We study geometric ergodicity of deterministic and random scan versions of the two-variable Gibbs sampler. We give a sufficient condition which simultaneously guarantees both versions are geometrically ergodic. We also develop a method for simultaneously establishing that both versions are subgeometrically ergodic. These general results allow us to characterize the convergence rate of two-variable Gibbs samplers in a particular family of discrete bivariate distributions.
t1Jones is partially supported by the National Institutes of Health. Hobert is partially supported by the National Science Foundation.
class=AMS] \kwd[Primary ]60J10 \kwd[; secondary ]62F15
Geometric ergodicity, Gibbs sampler, Markov chain, Monte Carlo
Let be a probability distribution having support , and and denote the associated conditional distributions. We assume it is possible to simulate directly from and . Then there are two Markov chains having as their invariant distribution, each of which could be called a two-variable Gibbs sampler (TGS). The most common version of a TGS is the deterministic scan Gibbs sampler (DGS), which is now described. Suppose the current state of the chain is , then the next state, , is obtained as follows.
Iteration of DGS:
Draw , and call the observed value .
An alternative TGS is the random scan Gibbs sampler (RGS). Fix and suppose the current state of the RGS chain is . Then the next state, , is obtained as follows.
Iteration of RGS:
If , then draw and set .
If , then draw and set .
Despite the simple structure of either TGS, these algorithms are widely applicable in the posterior analysis of complex Bayesian models. A TGS also arises naturally when is created via data augmentation techniques (Hobert, 2011; Tanner and Wong, 1987).
Inference based on often requires calculation of an intractable expectation. Let and let denote the expectation of with respect to . If a TGS Markov chain is ergodic (see Tierney, 1994) and , then
Thus estimation of is simple. However, the estimator will be more valuable if we can attach an estimate of the unknown Monte Carlo error . An approximation to the sampling distribution of the Monte Carlo error is available when a Markov chain central limit theorem (CLT) holds
with . The variance accounts for the serial dependence in a TGS Markov chain and consistent estimation of it requires specialized techniques such as batch means, spectral methods or regenerative simulation. Let be an estimator of . If, with probability 1, as , then an asymptotically valid Monte Carlo standard error is . These tools allow the practitioner to use the results of a TGS simulation with the same level of confidence that one would have if the observations were a random sample from . For more on this approach the interested reader can consult Geyer (1992), Geyer (2011), Flegal, Haran and Jones (2008), Flegal and Jones (2010), Flegal and Jones (2011), Hobert et al. (2002), Jones et al. (2006), and Jones and Hobert (2001).
The CLT will obtain if for some and the Markov chain is rapidly mixing (Chan and Geyer, 1994). In particular, we require that the Markov chain be geometrically ergodic; that is, converge to the target in total variation norm at a geometric rate. Under these same conditions methods such as batch means and regenerative simulation provide strongly consistent estimators of . Thus establishing geometric ergodicity is a key step in ensuring the reliability of a TGS as a method for estimating features of .
The convergence rate of DGS Markov chains has received substantial attention. In particular, sufficient conditions for geometric ergodicity have been developed for several DGS chains for practically relevant statistical models (see e.g. Hobert and Geyer, 1998; Johnson and Jones, 2010; Jones and Hobert, 2004; Marchev and Hobert, 2004; Roberts and Polson, 1994; Roberts and Rosenthal, 1999; Román and Hobert, 2011; Román, 2012; Rosenthal, 1996; Roy and Hobert, 2007; Tan and Hobert, 2009). The convergence rates of RGS chains has received almost no attention despite sometimes being useful. Liu, Wong and Kong (1995) did investigate geometric convergence of RGS chains, but the required regularity conditions are daunting and, to our knowledge, have not been applied to practically relevant statistical models. Recently Johnson, Jones and Neath (2011) gave conditions which simultaneously establish geometric ergodicity of both the DGS chain and the corresponding RGS chain. These authors also conjectured that if the RGS chain is geometrically ergodic, then so is the DGS chain. That is to say, the qualitative convergence properties of TGS chains coincide. We are not able to resolve this conjecture in general, but in our main application (see Section 5) this is indeed the case.
A TGS chain which converges subgeometrically (ie, slower than geometric) would not be as useful as another chain which is geometrically ergodic–although with additional moment conditions it is still possible for a CLT to hold (Jones, 2004). Thus it would be useful to have criteria to check for subgeometric convergence. We are unaware of any previous work investigating subgeometric convergence of TGS Markov chains.
In the rest of this paper, we extend the results of Johnson, Jones and Neath (2011) and provide a condition which can be used to simultaneously establish geometric ergodicity of DGS and RGS Markov chains. We then turn our attention to development of a condition which ensures that both the DGS and RGS chains converge subgeometrically. Finally, we apply our results to a class of bivariate distributions where we are able to characterize the convergence properties of the DGS and RGS chains. But we begin with some Markov chain background and a formal definition of the Markov chains we study.
2 Background and Notation
Let be a topological space and denote its Borel -algebra. Also, let be a Markov chain having Markov transition kernel . That is, such that for each , is a nonnegative measurable function and for each , is a probability measure. As usual, acts to the left on measures so that if is a measure on and , then
For any , the -step Markov transition kernel is given by .
Let be an invariant probability measure for , that is, . Denote total variation norm by . If is ergodic, then for all we have as . Our goal is to study the rate of this convergence. Suppose there exist a real-valued function on and such that for all
Then is geometrically ergodic, otherwise it is subgeometrically ergodic.
2.1 Two-variable Gibbs samplers
In this section we define the Markov kernels associated with the DGS and RGS chains described in Section 1. We also introduce a third Markov chain which will prove crucial to our study of the other Markov chains.
Recall that is a probability distribution having support , . Let be a density of with respect to a measure . Then the marginal densities are given by
and similarly for . The conditional densities are and .
Consider the DGS Markov chain and let
Then the Markov kernel for is defined by
It is well known that the two marginal sequences comprising are themselves Markov chains (Liu, Wong and Kong, 1994). We now consider the marginal sequence and define
The Markov kernel for is
Note that has as its invariant distribution while has the marginal as its invariant distribution.
Finally, consider the RGS Markov chain . Let and denote Dirac’s delta. Define
Then the Markov kernel for is
It is easy to show via direct computation that is invariant for .
It is well known that and converge to their respective invariant distributions at the same rate (Diaconis, Khare and Saloff-Coste, 2008; Liu, Wong and Kong, 1994; Robert, 1995; Roberts and Rosenthal, 2001). Thus if one is geometrically ergodic, then so is the other. This relationship has been routinely exploited in the study of TGS chains for practically relevant statistical models (cf. Hobert and Geyer, 1998; Johnson and Jones, 2010; Jones and Hobert, 2004; Roy and Hobert, 2007; Tan and Hobert, 2009) since one of the two chains may be easier to analyze than the other. Recently, Johnson, Jones and Neath (2011) connected the geometric ergodicity of to that of . Thus establishing geometric ergodicity of TGS algorithms often comes down to analyzing . This is exactly the approach we take in Sections 3 and 5.
3 Conditions for Geometric Ergodicity
In this section we develop general conditions which ensure that , and are geometrically ergodic. First we need a couple of concepts from Markov chain theory. Recall the notation from Section 2. That is, is a Markov kernel on . Then is Feller if for any open set , is a lower semicontinuous function. The Markov kernel acts to the right on functions so that for measurable
A drift condition holds if there exists a function , and constants and satisfying
Recall that a function is said to be unbounded off compact sets if the sublevel set is compact for every . If is Feller, is unbounded off compact sets and satisfies (2), then is geometrically ergodic. See Meyn and Tweedie (1993) and Roberts and Rosenthal (2004) for details while Jones and Hobert (2001) give an introduction to the use of drift conditions.
3.1 Two-variable Gibbs samplers
Johnson, Jones and Neath (2011) gave a set of conditions which simultaneously prove that , and are geometrically ergodic. We build on their work and show how a drift condition for naturally provides a drift condition for . This allows us to develop an alternative set of conditions which are sufficient for the geometric ergodicity of , and . The application of this method is illustrated in Section 5.
The following result was essentially proved by Johnson, Jones and Neath (2011), but it was not stated in their paper; see Johnson (2009) for related material. Thus we provide a proof for the sake of completeness. First we set some notation. Suppose and let
Also, for define
Suppose there exist constants and such that for all
If and , then there exists such that
there exists such that
All that remains is to show that . Now
The following is an easy consequence of Lemma 1 and the material stated at the beginning of this section.
Suppose and are Feller. If there exists a function such that both and the corresponding (as defined at (3)) are unbounded off compact sets, and there exist constants and such that for all
then , and are geometrically ergodic.
4 Conditions for Subgeometric Convergence
Our goal in this section is to develop a condition which ensures that , and converge subgeometrically, but first we need a few concepts from general Markov chain theory. Recall the notation of Section 2. In particular, is a Markov kernel on having invariant distribution . A Markov kernel defines an operator on the space of measurable functions that are square integrable with respect to the invariant distribution, denoted . Also, let
For , define the inner product as
and . The norm of the operator is
If is symmetric with respect to , that is, if
then is self-adjoint so that . If is -symmetric, then is geometrically ergodic if and only if (Roberts and Rosenthal, 1997). Moreover, if and , then
The first equality is a property of self-adjoint operators while the second equality follows directly from the definition of inner product.
4.1 Two-variable Gibbs samplers
It is easy to see that is -symmetric and is -symmetric, but is not -symmetric. Because and are symmetric, the operator theory described above applies. In particular, if and , then
while if and , then
Note that despite our use of for both operator norms, these are different since they are based on different spaces.
If we can show that , then we will be able to conclude that , , and are subgeometrically ergodic. First, we need convenient characterizations of the operator norms.
If , then
Suppose , and . Then
In the above, the second equality follows from Liu, Wong and Kong (1994, Lemma 3.2) and the last equality holds since for
Now consider . Suppose and . Then
Now since ,
The result follows easily. ∎
Suppose there exists a sequence such that if , then
Then . Hence , and are subgeometrically ergodic.
The claim now follows easily since if , then
Thus we conclude that and are subgeometrically ergodic. Since and are either both geometrically ergodic or both subgeometric, it follows that also converges subgeometrically. ∎
5 A Discrete Example
We introduce a family of simple discrete distributions which admit usage of the TGS algorithms. We then apply our general results which will allow us to very nearly characterize the members of the family which admit geometrically ergodic TGS Markov chains.
Let and be strictly positive sequences satisfying
Also, let . Let the family consist of the discrete bivariate distributions having density with respect to counting measure on given by
Hence the marginals are given by
The full conditionals are easily seen to be
Then for the DGS
and hence for the marginal chain
It is easy to see that the kernel is Feller. If , then for the random scan Gibbs sampler (RGS) we have
Since for any open set
it is easy to see that is lower semicontinuous and hence is Feller.
We are now in position to establish sufficient conditions for the geometric ergodicity of , and .
then , and are geometrically ergodic.
We need only verify the conditions of Proposition 1 and we’ve already seen that both and are Feller. Set for some which will be determined later and note that
For any , the sublevel set is bounded. Since is a continuous function, is also closed, hence compact. Therefore is unbounded off compact sets on . On the other hand, for any , the sublevel set is bounded. Then for any , is unbounded off compact sets on because for any , is bounded and closed, hence compact. Now, all that remains is to construct a drift condition for . Note that for ,