A General Framework for Mixed Graphical Models
Abstract
“Mixed Data” comprising a large number of heterogeneous variables (e.g. count, binary, continuous, skewed continuous, among other data types) are prevalent in varied areas such as genomics and proteomics, imaging genetics, national security, social networking, and Internet advertising. There have been limited efforts at statistically modeling such mixed data jointly, in part because of the lack of computationally amenable multivariate distributions that can capture direct dependencies between such mixed variables of different types. In this paper, we address this by introducing a novel class of Block Directed Markov Random Fields (BDMRFs). Using the basic building block of nodeconditional univariate exponential families from Yang et al. (2012), we introduce a class of mixed conditional random field distributions, that are then chained according to a blockdirected acyclic graph to form our class of Block Directed Markov Random Fields (BDMRFs). The Markov independence graph structure underlying a BDMRF thus has both directed and undirected edges. We introduce conditions under which these distributions exist and are normalizable, study several instances of our models, and propose scalable penalized conditional likelihood estimators with statistical guarantees for recovering the underlying network structure. Simulations as well as an application to learning mixed genomic networks from next generation sequencing expression data and mutation data demonstrate the versatility of our methods.
1 Introduction
1.1 Motivation
Acquiring and storing data is steadily becoming cheaper in this Big Data era. A natural consequence of this is more varied and complex data sets, that consist of variables of mixed types; we refer to mixed types as variables measured on the same set of samples that can each belong to differing domains such as binary, categorical, ordinal, counts, continuous, and/or skewed continuous, among others. Consider, for instance, three popular Big Data applications, all of which are comprised of such mixed data:

Highthroughput biomedical data. With new biomedical technologies, scientists can now measure full genomic, proteomic, and metabolomic scans as well as biomedical imaging on a single subject or biological sample.

National security data. Technologies exist to collect varied information such as calllogs, geographic coordinates, text messages, tweets, demographics, purchasing history, and Internet browsing history, among others.

Internetscale marketing data. Internet companies in an effort to optimize advertising revenues collect information such as Internet browsing history, social media postings, friends in social networks, status updates, tweets, purchasing history, adclick history, and online video viewing history, among others.
In each of these examples, variables of many different types are collected on the same samples, and these variables are clearly dependent. Consider the motivating example of highthroughput “omics” data in further detail. There has been a recent proliferation of genomics technologies that can measure nearly every molecular aspect of a given sample. These technologies, however, produce mixed types of variables: mutations and aberrations such as SNPs and copy number variations are binary or categorical, functional genomics comprising gene expression and miRNA expression as measured by RNAsequencing are countvalued, and epigenetics as measured by methylation arrays are continuous. Clearly, all of these genomic biomarkers are closely interrelated as they belong to the same complex biological system. Multivariate distributions such as graphical models applied to one type of data, typically gene expression, are popularly used, for tasks ranging from data visualization, finding important biomarkers, and estimating regulatory relationships. Yet, to understand the complete molecular basis of diseases, requires us to understand relationships not only within a specific type of biomarkers, but also between different types of biomarkers. Thus, developing a class of multivariate distributions that can directly model dependencies between genes based on gene expression levels (counts) as well as the mutations (binary) that influence gene expression levels is needed, to holistically model the genomic system. Few such multivariate distributions exist, however, that can directly model dependencies between mixed types of variables. In this paper, our goal is to address this lacuna, and define a parametric family of multivariate distributions, that can model rich dependence structures over mixed variables. We leverage theory of exponential family distributions to do so, and term our resulting novel class of statistical models as mixed Block Directed Markov Random Fields (BDMRFs).
A popular class of statistical studies of such mixed, multivariate data sidestep multivariate densities altogether. Instead, they relate a set of multivariate response variables of one type to multivariate covariate variables of another type, using multiple regression models or multitask learning models [2]. These are especially popular in expression quantitative trait loci (eQTL) analyses, which seek to link changes in functional gene expression levels to specific genomic mutations [22]. Recent approaches [51] further allow these multiple regression models to associate covariates with mixed types of responses. More general regression and predictive models such as Classification and Regression Trees have also been proposed for such mixed types of covariates [17]. Other approaches implicitly account for variables of mixed types in many machine learning procedures using suitable distance or entropybased measures [17, 18]. There have also been nonparametric extensions of probabilistic graphical models using copulas [13, 33] or rankbased estimators [45, 31], which could potentially be used for mixed data; nonparametric methods, however, may suffer from a loss of statistical efficiency when compared to parametric families, especially under very highdimensional sampling regimes. Others have proposed to build network models based on random forests [14] which are able to handle mixed types of variables, but these do not correspond to a multivariate density.
Among parametric statistical modeling approaches for such mixed data, the most popular, especially in survey statistics and spatial statistics [12], are hierarchical models that permit dependencies through latent variables. For example, Sammel et al. [37] propose a latent variable model for mixed continuous and count variables, while Rue et al. [36] propose latent Gaussian models that permit dependencies through a latent Gaussian MRF. While these methods provide statistical models for mixed data, they model dependencies between observed variables via a latent layer that is not observed; estimating these models with strong statistical guarantees is thus typically computationally expensive and possibly intractable.
In this paper, we seek to specify parametric multivariate distributions over mixed types of variables, that directly model dependencies among these variables, without recourse to latent variables, and that are computationally tractable with statistical guarantees for highdimensional data. Due in part to its importance, there has been some recent set of proposals towards such direct parametric statistical models, building on some seminal earlier work by Lauritzen and Wermuth [26]. We review these in the next subsection after first providing some background on Markov Random Fields (MRFs). As we will show in the next subsection, these are however largely targeted to the case where there are variables of two types: discrete and continuous. Some other recent proposals, including some of our prior work, consider more general mixed multivariate distributions, but as we will show these are not sufficiently expressive to allow for rich dependencies between disparate data types. Our proposed class of Block Directed Markov Random Fields (BDMRFs) serve as a vast generalization of these proposals, and indeed as the teleological porting of the work by Lauritzen and Wermuth [26] to the completely heterogeneous mixed data setting.
1.2 Background and Related Work
1.2.1 Markov Random Fields
Suppose is a random vector, with each variable taking values in a set . Let be an undirected graph over nodes corresponding to the variables . The graphical model over corresponding to is a set of distributions that satisfy Markov independence assumptions with respect to the graph [25]. By the HammersleyClifford theorem [11], any such distribution that is strictly positive over its domain also factors according to the graph in the following way. Let be a set of cliques (fullyconnected subgraphs) of the graph , and let be a set of cliquewise sufficient statistics. Then, any strictly positive distribution of within the graphical model family represented by the graph takes the form:
(1) 
where are weights over the sufficient statistics. Popular instances of this model include Ising models [44] for discretevalued qualitative variables, and Gaussian MRFs [40] for continuousvalued quantitative variables. Ising models specify joint distributions over a set of binary variables each with domain , with the form
where we have ignored singleton terms for simplicity. Gaussian MRFs on the other hand specify joint distributions over a set of continuous realvalued variables each with domain , with the form
(2) 
1.2.2 Conditional Gaussian Models
We now review conditional Gaussian models, which were the first proposed class of mixed graphical models, introduced in [26], and further studied in [16, 24, 27, 25]. Let be a continuous response random vector, and let be a discrete covariate random vector. Taken together, is then a mixed random vector with both continuous and discrete components. For such a mixed random vector, [26] proposed the following joint distribution:
(3) 
parameterized by , , and , such that . They termed this model a conditional Gaussian (CG) model, since the conditional distribution of given the joint distribution in (3) is given by a multivariate Gaussian distribution:
Consider the set of vertices and corresponding to random variables in and respectively, and consider the set joint vertices . Would Markov assumptions with respect to a graph over all the vertices entail restrictions on in the joint distribution in (3)? In the following theorem from Lauritzen and Wermuth [26], which we reproduce for completeness, they provide an answer to this question:
Theorem 1.
If the CG distribution in (3) is Markov with respect to a graph , then can be written as
where we use to denote and

depend on only through the subvector ,

for any subset which is not complete with respect to , the corresponding components are zero, so that .
Recently, there have been several proposals for estimating the graph structure of these CG models in highdimensional settings. Lee and Hastie [28] consider a specialization of CG models involving only pairwise interactions between any two variables and propose sparse nodewise estimators for graph selection. Cheng et al. [8] further consider threeway interactions between two binary variables and one continuous variable and also propose sparse nodewise estimators to select the network structure.
1.2.3 Graphical Models beyond Ising and Gaussian MRFs, via Exponential Families
A key caveat with the CG models as defined in (3) and [26], however, is that it is specifically constructed for mixed discrete and thintailed continuous data. What if the responses and/or the covariates were countvalued, or skewedcontinuous, or belonged to some other noncategorical nonthintailedcontinuous data type? This is the question we address in this paper. Towards this, we first briefly review here a recent line of work [46, 48, 50] (which extends earlier work by Besag [5]) which specified undirected graphical model distributions where the variables all belong to one datatype, but which could be any among a wide class of datatypes. Their development was as follows. Consider the general class of univariate exponential family distributions (which include many popular distributions such as Bernoulli, Gaussian, Poisson, negative binomial, and exponential, among others):
(4) 
with sufficient statistics , base measure , and lognormalization constant . Suppose that dimensional random vector has nodeconditional distributions specified by an exponential family,
where the function is any function that depends on the rest of all random variables except . Further suppose that the corresponding joint distribution factors according to the set of cliques of a graph . Yang et al. [46] then showed that such a joint distribution consistent with the above nodeconditional distributions exists, and moreover necessarily has the form
(5) 
where the function is socalled logpartition function, that provides the lognormalization constant for the multivariate distribution.
Yang et al. [48] then developed the conditional extension of (5) above. Consider a response random vector and a covariate random vector . Suppose that the node conditional distributions of for all follow exponential families:
and that the corresponding joint factors according to set of cliques among random variables in . Then, [48] showed that such a joint distribution consistent with the nodeconditional distributions does exist, and necessarily has the form:
(6) 
where is any function that only depends on the random vector .
1.2.4 Mixed MRFs
While (5),(6) specify multivariate distributions for variables of varied datatypes, they are nonetheless specified for the setting where all the variables belong to the same type. Accordingly, there have been some recent extensions [50, 7] of the above for the more general setting of interest in this paper, where each variable belongs to a potentially different type. Their construction was as follows, and can be seen to be an extension not only of the class of exponential family MRFs in (5), but also of the class of conditional Gaussian models in (3). Suppose that the node conditional distributions of each variable for now belongs to potentially differing univariate exponential families:
while as before, we require the corresponding joint distribution to factor according to the set of cliques of a graph . In a precursor to this paper Yang et al. [50], we showed that such a joint distribution consistent with the nodeconditional distributions does exist, and moreover necessarily has the form
(7) 
where the sufficient statistics and base measure in this case can be different across random variables. While these provide multivariate distributions over heterogeneous variables, as we will show in the main section of the paper, this class of distributions sometimes have stringent normalizability restrictions on the set of parameters . In this paper, we thus develop a far more extensive generalization that leverages blockdirected graphs.
1.3 Summary & Organization
This paper is organized as follows. In Section 2, we first introduce the class of Elementary Block Directed Markov Random Fields (EBDMRFs), which are the simplest subclass of our class of graphical models. Here, we assume the heterogeneous set of variables can be grouped into two groups: and (each of which could have heterogeneous variables in turn). Our class of EBDMRFs are then specified via a simple application of the chain rule as , where is set to an exponential family mixed MRF as in (7) from [50], and is a novel class of what we call exponential family mixed CRFs, that extend our prior work on exponential family CRFs in (6) from [48], and exponential family mixed MRFs in (7) from [50]. This can be seen as a generalization of the seminal mixed graphical models work of [26]. We then discuss the properties of this class of distribution, including conditions or restrictions on the parameter space under which the distribution is normalizability. As we show, our formulation of EBDMRFs have substantially weaker normalizability restrictions when compared with our preliminary work on exponential family mixed MRFs [50].
In Section 3, we then extend this construction further by recursively applying the chain rule respecting a directed acyclic graph over blocks of variables, resulting in a class of graphical models we call Block Directed Markov Random Fields (BDMRFs). The overall underlying graph of this class of graphical models thus has both directed edges between blocks of variables and undirected edges within blocks of variables. Our construction yields a very general and flexible class of mixed graphical models that directly parameterizes dependencies over mixed variables. We study the problem of parameter estimation and graph structure learning for our class of BDMRF models in Section 4, providing statistical guarantees on the recovery of our models even under highdimensional regimes. Finally, in Section 5 and Section 6, we demonstrate the flexibility and applicability of our models via both an empirical study of our estimators through a series of simulations, as well as an application to highthroughput cancer genomics data.
2 Elementary Block Directed Markov Random Fields (EBDMRFs)
In this section, we will introduce a simpler subclass of our eventual class of graphical models which we term Elementary Block Directed Markov Random Fields (EBDMRFs). Before doing so, we first develop a key building block: a novel class of conditional distributions.
2.1 Exponential Family Mixed CRFs
We now consider the modeling of the conditional distribution of a heterogeneous random response vector , conditioned on a heterogeneous random covariate vector . Suppose that we have a graph , with nodes associated with variables in . Denote the set of cliques of this graph by , and the set of response neighbors in for any response node by . Suppose further that we also have a set of nodes associated with the covariate variables in , and that for any response node , we have a set of covariate neighbors in denoted by .
Suppose that the response variables are locally Markov with respect to their specified neighbors, so that
(8) 
Moreover, suppose that this conditional distribution conditioned on the rest of and is given by an arbitrary univariate exponential family:
(9) 
with sufficient statistic and base measure . Note that there is no assumption on the form of functions . The following theorem then specifies the algebraic form of the conditional distribution .
Theorem 2.
Consider a dimensional random vector denoting the set of responses, and let be a dimensional covariate vector. Then, the nodewise conditional distributions satisfying the Markov condition in (8) as well as the exponential family condition in (9), are indeed consistent with a graphical model joint distribution, that factors according to , and has the form:
(10) 
where and is the lognormalization function, which is the function on the set of parameters .
The proof of the theorem follows along similar lines to that of the Hammersley Clifford Theorem, and is detailed in Appendix A.1. In Appendix A.5, we also discuss constraints on the covariate parameter functions that ensure the distribution is normalizable. Note that to ensure the Global Markov Property holds, these can be arbitrarily specified as long as they are functions solely of the covariate neighborhoods.
We term this class of conditional distributions as exponential family mixed CRFs. Notice that this framework provides a multivariate density over the random response vector , but not a joint density over both and as we ultimately desire.
2.2 Model Specification: EBDMRFs
We assume that the heterogeneous set of variables can be partitioned into two groups and ; note that each group could in turn be heterogeneous. Such a delineation of the overall set of variables into two groups is natural in many settings. For instance, the variables in could be the set of covariates, while the variables in could be the set of responses, or could be cause variables, while could be effect variables, and so on. Given the ordering of variables , suppose it is of interest to specify dependencies among conditioned on , and then of the marginal dependencies among .
Towards this, suppose that we have an undirected graph , with nodes associated with variables in , and set of cliques , and in addition, an undirected graph , with nodes associated with variables in , and set of cliques . Suppose in addition, we have directed edges from nodes in to . Thus, the overall graph structure has both undirected edges and among nodes solely in and respectively, as well as directed edges , from nodes in to , as shown in Fig. 1. For any response node , we will denote the set of responsespecific neighbors in by , and we will again denote covariatespecific neighbors in , by .
Armed with this notation, we propose the following natural joint distribution over :
where the two pieces, and , are specified as follows. The conditional distribution is specified by an exponential family mixed CRF as in (10),
while the marginal distribution is specified by an exponential family mixed MRF as in (7),
Thus, the overall joint distribution, which we call an Elementary Block Directed Markov Random Field (EBDMRF) is given as:
(11) 
We provide additional intuition on our class of EBDMRF distributions in the next few sections. First, we compare the conditional independence assumptions entailed by the mixed graph of the EBDMRF with the Global Markov assumptions entailed by an undirected graph in Section 2.3. Then in Section 2.4, we compare the form of the EBDMRF class of distributions with that of the mixed MRF distributions introduced earlier in Section 1.2.4. Next, we analyze the domain of the parameters of the distribution by considering parameter restrictions to ensure normalizability in Section 2.5. Finally, in Section 2.6, we provide several examples of our EBDMRF distributions, compare these to Mixed MRF distribution counterparts, and place them in the context of the larger graphical models literature.
2.3 Global Markov Structure
The EBDMRF distribution is specified by a mixed graph with both directed edges from to and undirected edges within and . A natural question that arises then is what are the conditional independence assumptions specified by the edges in the mixed graph? Consider the undirected edges in : by construction, these represent the Markov conditional independence assumptions in the marginal distribution . The remaining undirected and directed edges incident on the response nodes in turn represent Markov assumptions in the conditional distribution as described in (8) in the previous section. It can thus be seen that the set of conditional independence assumptions entailed by the mixed graph differ from those of a purely undirected graph obtained from the mixed graph by dropping the orientations of the edges from nodes in to nodes in .
But under what additional restrictions would a EBDMRF entail Markov independence assumptions with respect to an undirected graph over the joint set of vertices ? This was precisely the question asked in the classical mixed graphical model work of [26]. Specifically, we are interested in outlining what restrictions on in the joint distribution in (11) would entail global Markov assumptions with respect to a graph over all the vertices . In the following theorem, we provide an answer to this question:
Theorem 3.
Consider an EBDMRF distribution of the form (11), with graph structure specified by undirected edges among nodes in , and among nodes in , as well as directed edges from nodes in to . Then, if this distribution is globally Markov with respect to a graph with nodes and edges , then

.

For all response cliques ,
where depends on only through the subvector , and that for any subset such that is not complete with respect to , .
The theorem thus entails that the covariate parameters and in (11) factor with respect to the overall graph . In other words to ensure the Global Markov structure holds, the covariate parameters can be arbitrarily specified as long as they are functions solely of the covariateneighborhoods.
2.4 Comparison to Mixed MRFs
The previous section investigated the implications as far as global Markov properties for our EBDMRF distribution with both directed and undirected edges as compared to the undirected edges of the mixed MRF distribution. Here, we compare and contrast the factorized form of these distributions, in part because the key reason for the popularity of graphical model distributions is the factored form of their joint distributions.
Suppose we have two sets of variable, and , and consider the case of pairwise interactions. Then, this special case of the mixed MRF distributions in Section 1.2.4 can be written as:
(12) 
For this same pairwise case, the EBDMRF class of distributions in (11) can be written as:
(13) 
Noting that covariate functions can be set arbitrarily, it can be seen that the two distributions have almost similar forms when we set the covariate functions as
(14) 
In this case, the EBDMRF distribution in (2.4) and the mixed MRF distribution in (2.4) differ primarily due to the nonlinear term . Notice that in (2.4), is only dependent on parameters and hence the form (2.4) and indeed more generally all mixed MRFs belong to the class of multivariate exponential family distributions. On the other hand, in (2.4) depends on and hence even when the covariate functions are simple linear forms as in (14), these are not exponential family distributions.
Similarly, the conditional distribution of mixed MRF distribution in (2.4) can be easily derived as:
while the conditional distribution for the case of the EBDMRF, when setting the covariate functions as in (14), can be written as
which can again be seen to differ primarily due to the term .
It is also instructive to consider the differing forms of the conditional distributions in general. For pairwise mixed MRFs, this can be written as
(15) 
while that for the pairwise EBDMRF can be written as
(16) 
Thus, the two distributions differ only because the covariate functions , in the EBDMRF distribution can be set arbitrarily. However, when they are set precisely equal to the expressions in (14), these two distributions can be seen to identical. Overall, the pairwise EBDMRF and Mixed MRF distributions are strikingly similar in the case where the EBDMRF distribution has linear covariate functions, differing only by the normalization terms . This seemingly minor difference, however, has important consequences in terms of normalizability discussed next.
2.5 Normalizability
An important advantage of our EBDMRF class of distributions is that the parameter restrictions for normalizability of (11) can be characterized simply. Recall that the problem of normalizability refers to the set of restrictions on the parameter space that is required to ensure the joint density integrates to one; this entails ensuring that the logpartition functions are finitely integrable.
Theorem 4.
For any given set of parameters, the joint distribution in (11) exists and is welldefined, so long as the corresponding marginal MRF , as well as the conditional distribution are welldefined.
Thus the normalizability conditions for the joint distribution in (11) reduces to those for the marginal MRF and for the mixed CRF .
In the previous section, we saw that the pairwise EBDMRF and the pairwise mixed MRF distributions have very similar form when the covariate functions are set as in (14). It will thus be instructive to compare the normalizability restrictions imposed on the mixed MRF parameters with those on the EBDMRF parameters in this special case.
Let us introduce shorthand notations for the following expressions:
Then, the logpartition function in the mixed MRF distribution (2.4) can be written as
(17) 
On the other hand, the pairwise EBDMRF distribution in (2.4) has the following two “normalization” terms:
Thus, the overall “normalization” term in (2.4) can be written as:
(18) 
In contrast to those of the mixed MRF, these terms are not normalization constants as the term is a function of . Comparing (2.5) and (2.5), it can be seen that the two expressions become identical in the special case where for all , which would entail that and are independent, and that is only a function of .
But more generally, how would the two different normalization terms affect the normalizability of the two classes of distributions? Would one necessarily be more restrictive than the other? Interestingly, the following theorem indicates that the EBDMRF distribution imposes strictly weaker conditions for normalizability.
Theorem 5.
We provide a proof of this assertion in Appendix A.4. Namely if the logpartition function in (2.5) is finite, then both and in (2.5) must also be finite; thus, if pairwise mixed MRFs are normalizable, then so are EBDMRFs. The inverse of statement in Theorem 5 does not hold in general, which can be demonstrated by several counterexamples discussed in the next section.
2.6 Examples
We provide several examples of EBDMRFs to better illustrate the properties and implications of our models, and relate them to the broader graphical models literature.
GaussianIsing EBDMRFs
GaussianIsing mixed graphical models have been well studied in the literature [26, 8, 28, 50]. Here, we show that each of these studied classes are a special case of our class of EBDMRF models, and which in addition provide other formulations of these mixed models. Suppose we have a set of binary valued random variables , each with domain , and a set of continuous valued random variables , each with domain, . We can specify the nodeconditional distributions associated with as Bernoulli with sufficient statistics and base measure given by , for all ; similarly, we can specify the nodeconditionals associated with as Gaussian with sufficient statistics and base measure given by , and , respectively for all . Given these two sets of binary and real valued random variables, there are then three primary ways of specifying joint EBDMRF distributions: the mixed MRF from Yang et al. [50], Chen et al. [7] , the EBDMRF specified by , and the EBDMRF specified by ; note that these various formulations do not coincide and give distinct ways of modeling a joint density. When we consider only pairwise interactions and linear covariate functions as in (14), these models all take a similar form, only differing in the lognormalization terms:
(19)  
(20)  
(21) 
As discussed in Section 2.4, the only differences between these three models are the differing normalization terms, which are determined by the directionality of the edges between nodes of different types. If we define as the matrix, , then as discussed in Yang et al. [50] the GaussianIsing mixed MRF is normalizable when . Then following from Theorem 5, both forms of the GaussianIsing EBDMRF are also normalizable when . Upon inspection, we can see that this restriction cannot be further relaxed. We further examine these three formulations of pairwise MRF models for a set of binary and continuous random variables through numerical examples in Section 5.
Notice that the form of (19) is precisely that of a special case of the conditional Gaussian (CG) models first proposed by Lauritzen and Wermuth [26] and reviewed previously. This class of models can also be extended to the case of higherorder interactions. For example, the higherorder interactions in Cheng et al. [8] for the CG model are another special case of mixed MRF models. Also, this formulation of mixed MRF model can easily be extended further to consider categorical random variables as Lauritzen and Wermuth [26] and more recently Lee and Hastie [28] considered for the CG models.
Besides the class of mixed MRFs, our EBDMRF models in (20) and (21) are also normalizable joint distributions, and have many potential applications. Returning to our genomics motivating example, GaussianIsing EBDMRFs may be particularly useful for joint network modeling of binary mutation variables (SNPs) and continuous gene expression variables (microarrays). Biologically, SNPs are fixed point mutations that influence the dynamic and tissue specific gene expression. Thus, we can take the directionality of our EBDMRFs following from known biological processes: where binary random variables associated with SNPs influence and hence form directed edges with continuous random variables associated with genes.
GaussianPoisson EBDMRFs
Existing classes of mixed MRF distributions do not permit dependencies between types of variables for GaussianPoisson graphical models [50, 7], which is a major limitation. Here, we show that in contrast our EBDMRF formulation permits a full dependence structure with GaussianPoison graphical models, due to its weaker normalizability conditions. Consider a set of countvalued random variables , each with domain , and a set of continuous realvalued random variables , each with domain , with corresponding nodeconditional distributions specified by the Poisson and Gaussian distributions respectively.
Let us consider the simple case of pairwise models with linear covariate functions from (14). Then to specify normalizable EBDMRF distributions, Theorem 4 says that we need only show that the corresponding CRF and MRF distributions are normalizable. First, consider the EBDMRF defined by and consider the conditional Gaussian CRF distribution, , defined as
(22) 
This conditional Gaussian distribution is well defined for any value of as long as where is the matrix defined in the previous section. Hence, as long as for all and the Poisson MRF is normalizable [49], then the EBDMRF distribution given by is normalizable and has the following form:
(23) 
where is the log normalization constant of conditional Gaussian CRF and is that of Poisson MRF.
Similarly, consider the EBDMRF over count and continuous valued variables given by . Again, we have that the Gaussian MRF is normalizable if ; the Poisson CRF is also normalizable if for all as discussed in Yang et al. [48]. Thus, both forms of our GaussianPoison EBDMRF permit nontrivial dependencies between countvalued and continuous variables.
This interesting consequence should be contrasted with that for the mixed MRF (2.4), which does not allow for interaction terms between the countvalued and continuous variables, since otherwise the log partition function (2.5) cannot be bounded [50, 7]. In other words, the only way for a GaussianPoisson mixed MRF distribution to exist would be a product of independent distributions over the countvalued random vector and the continuousvalued vector. Thus, our EBDMRF construction has important implications permitting nontrivial dependencies in certain classes of mixed graphical models that were previously unachievable.
Other Examples of pairwise EBDMRF models
As our EBDMRF framework yields a flexible class of models, there are many possible other forms that these can take. Here, we outline classes of normalizable homogeneous pairwise EBDMRFs for easy reference. Note that we state normalizability conditions for these classes of models without proof as these can easily be derived from Theorem 4 and the conditions outlined in [50].

PoissonIsing EBDMRFs. If we let be a countvalued random vector and a binary random vector, then we can specify the appropriate nodeconditional distributions of as Poisson and of as Bernoulli. This gives us three ways of modeling PoissonIsing mixed graphical models: via mixed MRFs and or via our EBDMRFs. All three of these mixed graphical model distributions are normalizable only if for all .

ExponentialIsing EBDMRFs. Similar to the above classes of models, now let be a positive realvalued random vector and specify its nodeconditional distributions via the exponential distribution, and let be a binary random vector with Bernoulli nodeconditional distributions as before. Then, the normalizability conditions for the construction of will be simply and for all . The constructions of and mixed MRF require the additional condition that for all .

GaussianExponential EBDMRFs. Let be a positive realvalued random vector and a realvalued random vector; then we can specify the appropriate nodeconditional distributions of as exponential and of as Gaussian. Similar to the GaussianPoisson case, the mixed MRF distribution does not permit dependencies between and [50, 7]. But again, our EBDMRF distribution exists, permits nontrivial dependencies between nodes in and , and is normalizable under very similar conditions as the GaussianPoisson EBDMRF case.

ExponentialPoisson EBDMRFs. Interestingly, we can specify all three classes of mixed graphical model distributions for , a countvalued random vector with nodeconditionals specified as Poisson, and for , a positive realvalued random vector with nodeconditionals specified as exponential. Here, the normalizability conditions for the construction of will be for all and , for all . The constructions of and mixed MRF additionally require the condition that for all .
Note also, that other univariate exponential family distributions can be used to specify these homogeneous pairwise EBDMRFs. A particularly interesting class of these could be the variants of the Poisson distribution proposed by Yang et al. [49] to build Poisson graphical models that permit both positive and negative conditional dependencies. Within our EBDMRFs, these could be used to expand the possible formulations of mixed Poisson graphical models that are not restricted to negative conditional dependence relationships.
Additionally, we have only studied homogeneous pairwise models, but heterogeneous pairwise EBDMRFs may be of interest in many applications. For example, suppose we have countvalued nodes, , associated with Poisson nodeconditional distributions, and let be a set of mixed nodes with binaryvalued associated with Bernoulli nodeconditionals and