Smoothness of marginal loglinear parameterizations
Abstract
We provide results demonstrating the smoothness of some marginal loglinear parameterizations for distributions on multiway contingency tables. First we give an analytical relationship between loglinear parameters defined within different margins, and use this to prove that some parameterizations are equivalent to ones already known to be smooth. Second we construct an iterative method for recovering joint probability distributions from marginal loglinear pieces, and prove its correctness in particular cases. Finally we use Markov chain theory to prove that certain cyclic conditional parameterizations are also smooth. These results are applied to show that certain conditional independence models are curved exponential families.
1 Introduction
Models for multiway contingency tables may include restrictions on various marginal or conditional distributions, especially in the context of longitudinal or causal models (see, for example, Lang and Agresti, 1994; Bergsma et al., 2009; Evans and Richardson, 2013, and references therein). Such models can often be parameterized by combining loglinear parameters from within different marginal tables. The resulting marginal loglinear parameterizations, introduced by Bergsma and Rudas (2002), provide an elegant and flexible way to parameterize a multivariate discrete probability distribution.
Setting these marginal loglinear parameters to zero can be used to define arbitrary conditional independence models (Rudas et al., 2010; Forcina et al., 2010), including those corresponding to undirected graphical models or Bayesian networks. If these zero parameters can be embedded into a larger smooth parameterization of the joint distribution, then the model defined by the conditional independence constraints is a curved exponential family, and therefore possesses good statistical properties. This approach is applied by Rudas et al. (2010) and Evans and Richardson (2013) to classes of graphical models.
Unfortunately, there exist models of conditional independence which—though believed to be curved exponential families—cannot be embedded into parameterizations currently known to be smooth. Forcina (2012) studies examples of models defined by ‘loops’ of conditional independences, such as
(1) 
which can be defined by constraints on the conditional distributions , and respectively. However it is not clear whether a smooth parameterization of the joint distribution can be constructed using these conditionals. The model can also be defined by setting a particular collection of marginal loglinear parameters to zero (see Section 5 for details), but there is no way to embed these parameters into a smooth parameterization of the kind studied by Bergsma and Rudas (2002), so their results do not apply. Forcina (2012) gives a numerical test for this model which is highly suggestive of smoothness, but no formal proof is available.
The contribution of this paper is to show that the class of smooth discrete parameterizations which can be constructed using marginal loglinear (MLL) parameters is considerably larger than had previously been known, and that models such as (1) can indeed be embedded into these parameterizations. We give three different methods for demonstrating smoothness in this context. First we provide an analytical expression for the relationship between loglinear parameters defined within different marginal distributions; this allows us to prove the equivalence of various parameterizations. Second we show that particular fixed point maps relating different parameters are contractions, and hence can be used to uniquely recover the joint probability distribution. Lastly we use Markov chain theory to show that we can smoothly recover joint probability distributions from ‘cyclic’ conditional distributions; this is used to show that certain conditional independence models, including the one above, are curved exponential families of distributions.
The rest of the paper is organized as follows: Section 2 reviews marginal loglinear parameters and their properties. Section 3 specifies the relationship between loglinear parameters defined within different margins, enabling certain parameterizations to be proven equivalent. Section 4 extends this by constructing fixed point methods that smoothly recover a joint distribution. Section 5 further extends the results of Section 3 using Markov chain theory, and demonstrates that certain conditional independence models are curved exponential families. Section 6 contains discussion, and a conjecture on the precise characterization of smooth MLL parameterizations.
2 Marginal LogLinear Parameters
We consider multivariate distributions over a finite collection of binary random variables , for ; we denote their joint distribution by . All the results herein also hold (or have analogues) in the case of general finite discrete variables, but the notation becomes more cumbersome. For we denote the marginal distribution over by , and for disjoint we denote the relevant conditional distribution by . Distributions are assumed to be strictly positive: .
Definition 2.1.
Let be the strictly positive probability simplex of dimension . We say that a homeomorphism onto an open set is a smooth parameterization of if is twice continuously differentiable, and its Jacobian has full rank everywhere.
The canonical smooth parameterization of is via loglinear parameters , defined by the Möbius expansion
here is the number of 1s in . It follows by Möbius inversion that
(2) 
see, for example, Lauritzen (1996). For example, if ,
It is well known that the collection provides a smooth parameterization of the joint distribution with .
Definition 2.2.
Clearly and, for example,
which is the logodds ratio between and . In order to fit a model with the constraint we could choose a parameterization that includes , and fix it to be zero.
One way to characterize the main idea of Bergsma and Rudas (2002) is as follows: given some arbitrary margins of a joint distribution , what additional information does one need to smoothly reconstruct the full joint distribution ? They show that one possibility is to take the collection of loglinear parameters where for any .
It follows that given any inclusionrespecting sequence of margins (i.e. only if ), we can smoothly parameterize with marginal loglinear parameters of the form , where but for any .
Example 2.3.
Take the inclusionrespecting sequence of margins , , . This gives us the smooth parameterization consisting of the vector below. The pairs are summarized (grouped by margin) in the adjacent table.^{1}^{1}1Note that here, and in the sequel, we abbreviate sets of integers by omitting the braces and commas in order to avoid overburdened notation: so, for example, means .
Now, let be an arbitrary collection of effectmargin pairs such that . Define
to be the corresponding vector of marginal loglinear parameters. The main question considered by this paper is: under what circumstances does constitute a smooth parameterization of ?
2.1 Existing Results
We say that is complete if every nonempty subset of appears as an effect in exactly once. If, in addition, the margins can be ordered so that each effect appears with the first margin of which it is a subset, we say that is hierarchical. Parameterizations that can be constructed from an inclusionrespecting sequence of margins in the manner of Example 2.3 correspond precisely to hierarchical . Bergsma and Rudas (2002) show that if is complete and hierarchical then gives a smooth parameterization of the joint distribution; in addition, they show that completeness is necessary for smoothness. Forcina (2012) shows that if is complete and contains only two distinct margins , then is smooth.
To our knowledge, these are the only existing results on the smoothness of marginal loglinear parameterizations. No example has been provided of a complete parameterization which is nonsmooth. In Sections 3, 4 and 5 we will show that, in fact, many more complete parameterizations are smooth than had previously been known.
The issue of smoothness in nonhierarchical models was raised by Forcina (2012) in the context of loop models of conditional independence, and expanded upon by Colombi and Forcina (2014) for models of contextspecific conditional independence; the latter consider a more general class of models than we do, but there is no overlap in the theoretical results. Examples of ordinary conditional independences models that require noncomplete parameterizations (and therefore are not curved exponential families) are found in Drton (2009).
3 An Analytical Map between Margins
To parameterize a marginal distribution we can use the marginal loglinear parameters . An analogous result holds for conditional distributions: for disjoint define
in other words, all the MLL parameters for the margin whose effect contains some element of . Then constitutes a smooth parameterization of the conditional distribution
A consequence of this is to aid us in understanding the relationship between loglinear parameters defined within different margins. Theorem 3 of Bergsma and Rudas (2002) shows that distinct MLL parameters corresponding to the same effect in different margins (i.e. and with ) are linearly dependent at certain points in the parameter space, and that therefore no smooth parameterization can include two such parameters. The following theorem elucidates the exact relationship between such parameters, and will later be used to demonstrate the smoothness of certain nonhierarchical parameterizations.
Theorem 3.1.
Let be disjoint subsets of . The loglinear parameter may be decomposed as
(3) 
for a smooth function , which vanishes whenever for some .
In addition, if
(4) 
(where are held fixed).
Proof.
We have
Since the second term is a smooth function of the conditional probabilities , it follows that it is also a smooth function of the claimed parameters. The implication of independence follows from Lemma 2.9 of Evans and Richardson (2013).
Now,
and similarly
Hence the derivative of (3) in the case becomes
and, since there is no dependence upon , this is the same as  
Then note that simply counts the number of 1s in and in , so is even if and only if is. Hence
which gives the required result. ∎
Remark 3.2.
We have shown that if the conditional distribution of given is fixed the relationship between and (and indeed any parameter of the form for ) is linear. In particular, if we know , then and become interchangeable as part of a parameterization, preserving smoothness and (when relevant) variation independence.
3.1 Constructing Smooth Parameterizations
The following example shows how Theorem 3.1 can be used to prove the smoothness of a parameterization.
Example 3.3.
Consider the complete collections and below.

is not hierarchical because in any inclusionrespecting ordering the margin 23 must precede 123, in which case the effect 2 (contained in the pair ) is not associated with the first margin of which it is a subset. Existing results therefore cannot tell us whether or not is smooth. However, by fixing the parameters Theorem 3.1 shows that and are interchangeable. Hence is smooth if and only if is also smooth which, since satisfies the conditions of a hierarchical parameterization, it is. In addition, and are both variation independent parameterizations (i.e. any corresponds to a valid probability distribution).
We generalize the approach used in the preceding example with the following definition and proposition.
Definition 3.4.
Let be a collection of MLL parameters, and define
That is, all effects involving are removed, and any margins containing are replaced by .
Proposition 3.5.
Let be a complete collection of marginal loglinear parameters over such that the variable is not in any margin except . Then is a smooth parameterization of if and only if is a smooth parameterization of . In addition, is variation independent if and only if is.
Proof.
Since is the only margin containing and the parameterization is complete, we have the parameters . Hence we can smoothly parameterize the distribution of with these parameters.
By Theorem 3.1, any other parameter such that is (having fixed the distribution of ) a smooth function of . It follows that we have a smooth map between and . Since is a function of , and smoothly parameterizes , it follows that smoothly parameterizes if and only if smoothly parameterizes .
Lastly, the two pieces and are variation independent of one another because this is a parameter cut, and parameters within are all variation independent since they are just ordinary loglinear parameters; therefore is variation independent if and only if is. ∎
Corollary 3.6.
Any complete parameterization in which the margins are strictly nested () is smooth and variation independent.
Lemma 6 of Forcina (2012) deals with the special case , which to our knowledge was the only prior result showing that a nonhierarchical MLL parameterization may be smooth.
Example 3.7.
Proposition 3.8.
Let be a complete parameterization, and suppose that for some , and every , the sets and appear as effects within the same margin in .
Then is a smooth parameterization of if and only if is a smooth parameterization of . In addition, is variation independent if and only if is variation independent.
Proof.
Since and appear in the same margin, say , set
which is zero unless , leaving  
But notice this is of the same form as an MLL parameter for the pair over the conditional distribution . It follows that for fixed the parameters form a complete MLL collection of the form for the conditional distribution of . If is smooth then we can smoothly recover the conditional distribution . Furthermore, if the effect is in a margin , then using (3) we obtain
and smoothly recover . In addition is variation independent of (since , constitutes a parameter cut) and has range , so the same is true of .
Conversely if is smooth, then given parameters we can set up a dummy distribution on in which for each , and , thus smoothly recovering . ∎
4 Fixed Point Mappings
The previous section gives analytical maps between some parameterizations, but Propositions 3.5 and 3.8 only apply directly to a relatively small number of cases. In this section we build on these results by presenting conditions for the existence of a smooth map, even without a closed form expression.
Given a particular complete MLL parameterization , the identity (3) in Theorem 3.1 can be written in vector form as
For a given this suggests that might be recovered using fixed point methods; the identity (4) gives us information about the Jacobian of .
Example 4.1.
Consider the parameterization based on
If we can smoothly recover , and from then it follows that is a smooth parameterization. From (3) we have
since , and are given in the parameterization we can assume these to be fixed, so abusing notation slightly  
Similarly, for some smooth , so is a solution to the equation
If can be shown to be a contraction mapping, then we are guaranteed to find a unique solution, and therefore recover the joint distribution. In addition, if is a contraction for all , then since it varies smoothly in we will have shown that is a smooth parameterization.
Define to be the smallest amount of probability assigned to any cell in our joint distribution, and to be the probability simplex consisting of such distributions. The Jacobian of an otherwise smooth parameterization can become singular on the boundary of the probability simplex, so it is useful to have control over this quantity.
The next result allows us to control the magnitude of the columns (or rows) of the Jacobian of in certain examples. The proof is given in the appendix.
Lemma 4.2.
Let , and . Then
Alternatively, if , then
Example 4.3.
Returning to the parameterization in Example 4.1, the derivative of is
which is the dot product of the vectors
By applying the two parts of Lemma 4.2, these vectors each have magnitude at most . Hence , and is a contraction on for every . It follows that the equation has a unique solution among all positive probability distributions (and this can be found by iteratively applying to any initial distribution), and by the inverse function theorem it is a smooth function of . Hence is indeed smooth.
Remark 4.4.
Lemma 4.2 enables us to formulate the following generalization of the idea used in the example above.
Lemma 4.5.
Let be complete and such that for any with , there is at most one other margin in with . Then is smooth.
Proof.
By Theorem 3.1,
Since is the only margin in such that , it follows that all the parameters in are known and fixed except for , where is the set of effects contained in the margin . Hence
(5) 
Now, consider the vector equation obtained by stacking (5) over all pairs . This defines a fixed point equation whose solution is , and the column of the Jacobian corresponding to has nonzero entries
From Lemma 4.2, each column has magnitude at most , and therefore the mapping is a contraction on for each . It follows that the fixed point equation has a unique solution which, by the inverse function theorem, is a smooth function of . ∎
From this result we obtain the following corollary, the conditions of which are easy to verify.
Corollary 4.6.
Any complete parameterization with at most three margins is smooth.
Proof.
Since one of the margins must be , it is clear that the conditions of Lemma 4.5 hold. ∎
Example 4.7.
Consider below.
Although it does not satisfy the conditions of Lemma 4.5 directly, one can use the basic idea to set up a smooth contraction mapping from to ; since is hierarchical, both parameterizations are smooth.
5 Cyclic Parameterizations
This section takes a third approach to determining smoothness, by using Markov chain theory to recover certain marginal distributions. This method allows us to demonstrate the smoothness of certain conditional independence models.
Forcina (2012, Example 2) considers the model defined (up to some relabelling) by the conditional independences
(6) 
which is equivalent to setting the parameters
(7) 
to zero. Note that we cannot embed these parameters into a larger hierarchical parameterization, because each pairwise effect will ‘belong’ to a margin preceding it; for example, is a subset of , so for hierarchy the margin must precede ; by a similar argument, must precede which must precede . We therefore have a cyclic parameterization, referred to as a ‘loop’ by Forcina. None of the methods used in the previous sections seem well suited to dealing with this situation.
Forcina (2012) presents an algorithm for recovering joint distributions given parameterizations of this kind, together with a condition under which it is guaranteed to converge to the unique solution. However, this condition is on the spectral radius of a complicated Jacobian, and is difficult to verify except in a few special cases: a numerical test is suggested, but this does not constitute a proof of smoothness. Here we show that, at least in some cases, Forcina’s algorithm can be recast as a Markov chain whose stationary distribution is some margin of the relevant probability distribution.
Theorem 5.1.
Let be a disjoint sequence of sets with such that the conditional distributions for are known, together with . Then the marginal distributions are smoothly recoverable.
Proof.
Define a matrix with entries
This is a (right) stochastic matrix with strictly positive entries, and the marginal distribution satisfies
In other words, is an invariant distribution for the Markov chain with transition matrix defined by . Since has a finite statespace and all transition probabilities are positive, the chain is positive recurrent and the equations have a unique solution (see, e.g. Norris, 1997). Hence is defined by the kernel of the matrix , and this is a smooth function of the original conditional probabilities. ∎
Remark 5.2.
The Markov chain corresponding to is that which would be obtained by picking some , and then evolving using until we get back to . The equations can be solved iteratively by repeatedly right multiplying any positive vector by , so that it converges to the stationary distribution of the chain; this corresponds precisely to Forcina’s algorithm.
Example 5.3 (Forcina (2012), Example 9).
Consider the cyclic parameterization .
The parameters corresponding to the first three margins in are equivalent to the conditional distributions , and . Using the conditionals in the manner suggested by Theorem 5.1, we can smoothly recover (for example) the margin (or equivalently ), and consequently is equivalent to the hierarchical parameterization .
Example 5.4.
Example 5.5.
Consider the model defined by
it consists of setting the parameters in below to zero.
We can embed in the complete parameterization . Note that using and the fact that , means we can construct the conditional distribution . Similarly we have , and . In a manner analogous to the previous example, we can set up a Markov chain whose stationary distribution is the marginal as follows. First pick . Now, for

draw from the distribution ;

draw from the distribution ;

draw from the distribution ;

draw from the distribution .
Then the distribution of converges to . We can therefore smoothly recover a distribution satisfying the conditional independence constraints from the 7 free parameters. The dimension of the model is full, so we have a smooth parameterization of the model, which is therefore a curved exponential family Lauritzen (1996).
Note that the construction of the Markov chain in Example 5.5 is only possible when the conditional independence constraints hold, so—unlike in Examples 5.3 and 5.4—we have not actually demonstrated that is generally smooth, only that the model defined by setting is a curved exponential family.
Remark 5.6.
Some conditional independence models are nonsmooth: e.g. the model defined by and (Drton, 2009). This is essentially because it requires that , and setting repeated (nonredundant) effects to zero always leads to nonsmooth parameterizations.
We remark that all discrete conditional independence models on four variables either require repeated effects to be constrained in different margins, or can be shown to be smooth using the results of this section. However, the next example shows that for five variables the picture is incomplete.
Example 5.7.
The conditional independence model defined by
contains no repeated effects, and yet does not appear to be approachable using the methods outlined above. Empirically, Forcina’s algorithm seems to converge to the correct solution, which suggests that the model is indeed smooth.
6 Discussion
We have presented three new approaches to demonstrating that complete but nonhierarchical marginal loglinear parameterizations are smooth, although a general result eludes us. Note that each of the approaches provides an explicit algorithm for obtaining the probabilities from the parameterization, either using the map in Section 3, the fixed point iteration in Section 4, or the Markov chain in Section 5.
There are 104 complete MLL parameterizations on three variables, of which 23 are hierarchical and a further 4 consist of only two margins, so are smooth by the results of Bergsma and Rudas (2002) and Forcina (2012) respectively. These 27 were the only ones known to be smooth prior to this paper.
A further 5 can be shown smooth using Proposition 3.5, and one using Proposition 3.8 (Example 3.9). Another 26 can be dealt with using Lemma 4.5 in combination with other methods, and the approach in Example 4.7 can be applied to three more. Example 5.3 brings the total number of known smooth models to 63.
In addition, of the remaining 41 complete parameterizations, there are smooth mappings between a group of four and a group of three, so it remains to establish the smoothness (or otherwise) of at most 36 distinct parameterizations. As an example of a parameterization whose smoothness is still not established, consider:
We conjecture that any complete parameterization is smooth, a result which would enable us to show that models such as that given in Example 5.7 are curved exponential families of distributions.
Conjecture 6.1.
Any complete MLL parameterization is smooth.
References
 Bergsma et al. (2009) W. Bergsma, M. A. Croon, and J. A. Hagenaars. Marginal models: For dependent, clustered, and longitudinal categorical data. Springer Science & Business Media, 2009.
 Bergsma and Rudas (2002) W. P. Bergsma and T. Rudas. Marginal models for categorical data. Ann. Stat., 30(1):140–159, 2002.
 Colombi and Forcina (2014) R. Colombi and A. Forcina. A class of smooth models satisfying marginal and context specific conditional independencies. Journal of Multivariate Analysis, 126:75–85, 2014.
 Drton (2009) M. Drton. Discrete chain graph models. Bernoulli, 15(3):736–753, 2009.
 Evans and Richardson (2013) R. J. Evans and T. S. Richardson. Marginal loglinear parameterizations for graphical Markov models. Journal of Royal Statistical Society, Series B, 75:743–768, 2013.
 Forcina (2012) A. Forcina. Smoothness of conditional independence models for discrete data. Journal of Multivariate Analysis, 106:49–56, 2012.
 Forcina et al. (2010) A. Forcina, M. Lupparelli, and G. M. Marchetti. Marginal parameterizations of discrete models defined by a set of conditional independencies. Journal of Multivariate Analysis, 101:2519–2527, 2010.
 Lang and Agresti (1994) J. B. Lang and A. Agresti. Simultaneously modeling joint and marginal distributions of multivariate categorical responses. Journal of the American Statistical Association, 89(426):625–632, 1994.
 Lauritzen (1996) S. L. Lauritzen. Graphical Models. Clarendon Press, Oxford, UK, 1996.
 Norris (1997) J. R. Norris. Markov Chains. Cambridge University Press, 1997.
 Rudas et al. (2010) T. Rudas, W. P. Bergsma, and R. Németh. Marginal loglinear parameterization of conditional independence models. Biometrika, 94:1006–1012, 2010.
Appendix A Technical Proofs
a.1 Proof of Lemma 4.2
Lemma A.1.
Let be a vector indexed by subsets . Then if and only if for any ,
Proof.
The matrix with th entry is orthogonal, and therefore preserves vector lengths. Then the vector has entries with magnitude at most , and therefore has total magnitude at most 1. The same is therefore true of . ∎
Proof of Lemma 4.2.
For , define
so that for . Given ,
and note that
where the expression in braces is 2 if or 0 otherwise, so  
Hence
which is an alternating sum of probabilities which sum to one, so has absolute value at most . The result follows from Lemma A.1. The second result is essentially identical, due to the symmetry between in (4). ∎