[

# [

## Abstract

We propose a general framework for non-normal multivariate data analysis called multivariate covariance generalized linear models (McGLMs), designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link function combined with a matrix linear predictor involving known matrices. The method is motivated by three data examples that are not easily handled by existing methods. The first example concerns multivariate count data, the second involves response variables of mixed types, combined with repeated measures and longitudinal structures, and the third involves a spatio-temporal analysis of rainfall data. The models take non-normality into account in the conventional way by means of a variance function, and the mean structure is modelled by means of a link function and a linear predictor. The models are fitted using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of different types of response variables and covariance structures, including multivariate extensions of repeated measures, time series, longitudinal, spatial and spatio-temporal structures.

Generalized Kronecker product; Linear covariance model; Matrix linear predictor; Non-normal data; Pearson estimating function; Quasi-likelihood; Spatio-temporal data

McGLM]Multivariate Covariance Generalized Linear Models Bonat, W.H and Jørgensen, B.]Wagner Hugo Bonat Bonat, W.H and Jørgensen, B.]Bent Jørgensen

## 1 Introduction

The analysis of non-normal multivariate data currently involves a choice between a considerable array of different modelling frameworks, ranging from, say, generalized estimating equations (GEE) and time-series models to generalized linear mixed models and model-based geostatistics. Each framework allows the modelling of a specific type of dependence or correlation structure, without fitting into any clear overall pattern. Current software implementations have, as we shall see below, limited capacity in terms of the complexity and size of data that can be handled.

This situation stands in sharp contrast to the univariate case, where Nelder and Wedderburn’s (1972) generalized linear models (GLMs) provide a unified and versatile approach to regression modelling of normal and non-normal data, implemented in an efficient fitting algorithm. A further advantage of the GLM approach is that estimation and inference for GLMs require only second-moment assumptions.

In order to obtain a multivariate modelling framework of comparable range and versatility, we shall propose the class of multivariate covariance generalized linear models (McGLMs), which, following Pourahmadi (1999), are specified via separate link functions and linear predictors for the mean vector and covariance matrix, respectively. This allows a unified approach to analysis of multivariate correlated data, taking into account response variable of mixed types, and allowing a wide range of covariance structures for repeated measures, longitudinal, spatial and spatio-temporal data. The models are fitted by means of quasi-likelihood and Pearson estimating functions, based on second-moment assumptions, and implemented in an efficient Newton scoring algorithm.

The idea of modelling a function of the covariance matrix by a linear structure goes back at least as far as Anderson (1973), followed later by Chiu et al. (1996), who used the matrix logarithm as covariance link function. More recently, the idea was extended in several different ways by Pourahmadi (1999, 2011), Pan and Mackenzie (2003) and Zhang et al. (2015), among others. These authors consider mainly the multivariate normal distribution, whereas we shall use a variance function to take non-normality into account in the style of Liang and Zeger (1986). Contrary to the latter authors we shall, however, emphasize the need to model the covariance structure explicitly, rather than treating it as a nuisance parameter.

The availability of standard software is an indicator for which kind of statistical methods are in currently use by the statistical and scientific communities. It is hence interesting to note that well-established R packages such as lme4 (Bates et al., 2014) and nlme (Pinheiro et al., 2013) do not deal with multivariate response variables. In the Bayesian context the flexible packages INLA (Rue et al., 2014) and MCMCpack (Martin et al., 2011) do not deal with multivariate response variables, judging from the package documentation. In R, there are at least two generalized linear mixed models packages that can deal with multivariate response variables, namely MCMCglmm (Hadfield, 2010), which uses Monte Carlo Markov Chain (MCMC) methods in the Bayesian framework, and the package SabreR (Crouchley, 2012), which uses marginal likelihood, but is limited to dealing with at most three response variables. The modelling of the covariance structure is currently restricted to making a selecting from a short list of pre-specified covariance structures, such as autoregression or compound symmetry. We were not able to find any R packages for fitting joint mean-covariance models, not even in the multivariate normal case. In SAS the GLIMMIX procedure for generalized linear mixed models (GLMMs) deals with multivariate response variables, but is limited to the exponential family of distributions and a few pre-determined covariance structures (SAS Institute, 2011). Other software platforms for fitting generic random effects models via MCMC, such as JAGS (Plummer, 2003) or WinBUGS (Lunn et al., 2000), can deal with multivariate response variables, but carry substantial overheads in terms of computational times and convergence checks, while being restricted to a small set of pre-specified covariance structures and probability distributions. These limitations on current software availability for joint mean-covariance modelling of multivariate response variables may reflect either a lack of interest on the part of software users, or a lack of sufficiently flexible modelling frameworks. In any case, we will use the latter as motivation for developing the new class of McGLMs.

We now present three correlated data examples along with a short review of currently available methods for each type of data. The examples were selected in order to highlight some of the limitations of current methodology, while illustrating the range of different problems that may be handled by the McGLM method.

### 1.1 Data set 1: Australian health survey

The first data set is from the Australian Health Survey for 1987–1988 (Deb and Trivedi, 1997; Cameron and Trivedi, 1998). We selected the following five count response variables for our analysis: number of consultations with a doctor or specialist (Ndoc) or with health professionals (Nndoc); total number of prescribed and non prescribed medications used in the past two days (Nmed); number of nights in a hospital during the most recent admission (Nhosp) and number of admissions to a hospital, psychiatric hospital, nursing or convalescence home in the past months (Nadm). The data set had nine covariates concerning social conditions (see Appendix for details). There were respondents and no missing data.

This example illustrates the fairly common situation of a multivariate regression problem with non-normal (discrete) response variables. The histograms in Figure 1 suggest that the five error distributions may not be identical, and hint at potential problems with excess of zeroes and under/overdispersion. These problems may, in turn, reflect on the solution to the main questions of the analysis, namely assessing the effects of the covariates on each outcome, and determining the residual correlation structure.

Given currently available software, it is a daunting task to select a suitable marginal error distribution for each of the five response variables. Besides the classical Poisson and negative binomial distributions, other distributions such as the Neyman Type A (Dobbie and Welsh, 2001) or the Poisson-inverse Gaussian (PIG) (Holla, 1967) may be relevant. Different distributions may have to be fitted by separate software packages, each of which comes with its own set of problems due to badly behaved likelihood function etc.

If we decide to use formal methods of model selection, we are faced with the choice of selection criterion, such as the Akaike or Bayesian information criterion in the likelihood framework, or the deviance information criterion in the Bayesian framework. The Bayesian case involves additional work due to the need for choosing suitable prior distributions. These problems persist in the special case where all error distributions belong to the same family. One option is the multivariate Poisson regression (Tsionas, 1999), which is suitable for multivariate count data, but is restricted to positive correlations and equidispersed data. A second option is the multivariate negative binomial distribution proposed by Shi and Valdez (2014). Such models are not easy to fit, and require careful attention to the implementation of algorithms and starting values. The assumption of a common error distribution required for these models may, however, not be satisfied in practice, and methods for handling the case of unequal marginal distributions do not seem to be easily available.

A different approach for correlated data is the family of generalized linear mixed models (GLMM) (Breslow and Clayton, 1993; Fong et al., 2010), which is based on specifying a GLM conditionally on a multivariate latent distribution, often the multivariate normal. A specific example of a GLMM for multivariate count data was presented by Rodrigues-Motta et al. (2013). GLMMs are computationally demanding, and many different algorithms have been proposed in the past three decades, see McCulloch (1997) and Fong et al. (2010) for reviews and further references.

A further aspect of GLMMs that gives rise to concern is the general lack of a closed-form expression for the likelihood and the marginal distribution of the data vector. This makes model selection even more complicated than for the marginal models discussed above. A related question is the special interpretation of parameters inherent from the construction of GLMMs. Thus, the covariate effects are conditional on the latent variables, whereas the correlation structure is marginal for the latent variables rather than for the response variables. An interesting discussion of random-effects and marginal models may be found in Lee and Nelder (2004).

Additional methods for specifying models for multivariate response variables include the copula models (Krupskii and Joe, 2013) and the class of hierarchical generalized linear models (Lee and Nelder, 1996). The fact that several different approaches are available for multivariate regression modelling, none of which is particularly easy to use, amplifies our call for a universal multivariate modelling framework, preferably one that facilitates model selection and allows marginal interpretation of parameters.

### 1.2 Data set 2: Respiratory physiotherapy on premature newborns

We consider some aspects of a prospective study to assess the effect of respiratory physiotherapy on the cardiopulmonary function of ventilated preterm newborn infants with birth weight lower than 1500 g. The study had three response variables: respiratory rate (RR), heart rate (HR) and oxygen saturation (OSat). The HR and OSat data were collected by electronic monitoring and RR by means of a stopwatch. Response variables were taken three times: before starting the physiotherapy (Evaluation 1), immediately after finishing (Evaluation 2), and five minutes after finishing the physiotherapy (Evaluation 3). Sixteen newborns were evaluated in consecutive sessions by two therapists at the neonatal unit. The number of evaluation days varied between and days. The data set has covariates concerning health conditions and there are cases (see the Appendix). Figure 2 shows the individual and average trajectories by outcome and evaluation.

The main goal of the investigation was to assess the effect of respiratory physiotherapy on the outcome variables, while taking into account the effects of covariates and the correlation induced by the repeated measures and the longitudinal structures. A special feature of these data is that the outcome variables are of mixed types. Thus, the variables HR and RR are continuous, whereas the oxygen saturation variable OSat takes values in the unit interval, including about exact ones, making it hard to propose a suitable probability distribution for this variable. We may, of course, use for example the beta (Bonat et al., 2015) or the simplex distribution (Zhang, 2014) with some ad hoc method for dealing with the exact ones. A better option may be to use the beta distribution inflated with ones (Ospina and Ferrari, 2010), but this model is complicated to fit and interpret. It may hence be preferable in this situation to use a quasi-likelihood method based on second-moment assumptions, which is easier to fit and interpret.

Similar to what we saw in Example , the literature may be divided into two main approaches: marginal models, mostly based on the GEE approach (O’Brien and Fitzmaurice, 2004; Rochon, 1996; Gray and Brookmeyer, 2000), and random-effects models based on GLMMs, see Verbeke et al. (2014). These authors also provide an extensive review of models for response variables of mixed type, whereas Fieuws et al. (2007) reviewed random-effects models for multivariate repeated measures. The question of how to model the covariance structure for repeated measures and longitudinal data is often solved by choosing from a short list of options, such as compound symmetry, autoregressive, banded and unstructured (Diggle et al., 2002). Such choices are, however, not suitable for the combination of repeated measures and longitudinal data found in the present data, thereby motivating the development of a more general and flexible approach for covariance modelling in multivariate data analysis.

### 1.3 Data set 3: Venezuelan rainfall data

This example concerns monthly rainfall data from stations in the Venezuelan state of Guaárico for a period of years ( months). The data set has cases with missing data. We also have the spatial coordinates (latitude and longitude) of the stations available, along with the covariate height (height above sea level). The data were previously analyzed by Sansó and Guenni (1999) using Bayesian MCMC methods, based on a censored and transformed multivariate normal distribution.

The statistical modelling of rainfall data involves a number of challenges, such as the need for simultaneous modelling of seasonal and geographical variation, the complicated nature of the spatio-temporal correlation structure, the special form of the marginal distribution (having a discrete component at zero), and the possible influence of the sampling scale on the form of the analysis (Dunn, 2004). The plots shown in Figure 3 illustrate some of these features for the Venezuelan rainfall data. In particular, the histogram in panel D highlights the right-skewed distribution and the considerable proportion of exact zeroes (around ), whereas the approximate linearity of the Taylor plot in Panel C suggests a variance function of power form.

A simple model for the marginal distribution of total rainfall over a certain time period is to write , where is the number of rainfall episodes, assumed to be Poisson distributed, and the i.i.d. variables are the amounts of rain for each episode, with the convention for , corresponding to a discrete component at zero. A special case of this compound Poisson model is the Tweedie family (Jørgensen, 1997) (where the are gamma distributed), with power variance functions, in agreement with the Taylor plot of Figure 3. The Tweedie model has been successfully applied to rainfall data by Dunn (2004) and Hasan and Dunn (2010, 2012). These authors, however, assume independent data, which is not realistic for the present data set.

A popular approach for analyzing rainfall data (Chandler and Wheater, 2002; Sigrist et al., 2012) is to use separate models for the discrete component, indicating the number of wet periods, and the continuous component, indicating the amount of rain for wet periods (Stern and Coe, 1984; Wilks, 1990). A variety of distributions have been proposed for modelling the continuous component of rainfall under the independence assumption, including the log-normal, Weibull, generalized log-normal, gamma and mixed gamma distributions (Hasan and Dunn, 2010, 2012). While these distributions may have their merits for analyzing rainfall data, the above compound Poisson model seems more natural, and the Tweedie family is flexible enough to mimic many of the shapes of other distributions.

Turning now to the question of spatio-temporal modelling of rainfall data, one possibility is to use models based on marked point processes (Wheater et al., 2000; Cowpertwait et al., 2006), which may be useful for detailed simulation studies. Another approach is to follow the conventional geostatistical paradigm, assuming a parametric covariance function (Diggle and Ribeiro, 2007). There are several parametric families available for modelling the joint space-time covariance structure (Cressie and Huang, 1999; Gneiting, 2002), although there are issues with their interpretability and computational complexity, making it difficult to handle large data sets with this approach.

A different approach to spatio-temporal modelling is to take into account the fundamental difference between the spatial and temporal dimensions, the latter obeying a natural ordering which is not present in the spatial dimension. It may hence be natural to assume a dynamic temporal evolution model in combination with spatially correlated errors, see Sansó and Guenni (1999, 2004); Sigrist et al. (2012) and the monograph by Cressie and Wikle (2011). While providing a flexible form of spatio-temporal modelling, this method is also computationally demanding, and handles response variables with a discrete component at zero by means of a censored multivariate normal distribution, which does not provide as reasonable an interpretation as the Tweedie model.

A significant simplification may be obtained by assuming that the spatial domain is discrete, rather than being continuous as in the last two methodologies discussed above. This approach is used for example in disease mapping (Besag et al., 1991), where the covariance structure is determined by a neighborhood matrix. This is computationally less demanding, because for a given neighborhood structure we may specify the inverse covariance (or precision) matrix. The precision matrix, in turn, contains information about the structure of conditional independence of the data (Rue and Held, 2005). The proposed simplification may hence be seen as a reasonable compromise between model complexity and the capacity to model real data sets, achieveable by modelling the covariance structure using a linear combination of neighborhood matrices. To accommodate rainfall data, such a modelling strategy should allow for Tweedie distributed response variables with power variance functions. Section 2 presents the class of McGLMs, and Section 3 considers the Newton scoring algorithm. The three data examples presented here are analyzed in Section 4 using McGLMs. The results are discussed in Section 5, including some directions for future investigations.

## 2 Multivariate covariance generalized linear models

In this Section we will present the McGLM approach as an extension of GLMs. Let be an response vector, an design matrix and a regression parameter vector. A GLM can be written in the following form:

 E(Y) = μ=g−1(Xβ) Var(Y) = Σ=V(μ;p)12(τ0I)V(μ;p)12 (1)

where is the link function, , is a diagonal matrix whose main entries are given by the variance function applied elementwise to the vector . Finally and are the power and dispersion parameters, respectively, and denotes the identity matrix.

The success enjoyed by the GLM framework comes from its ability to deal with a wide range of non-normal data using just two separate functions, namely the link and variance functions. The variance function plays an important role for GLMs, since different choices imply different assumptions about the response variable distribution. The power variance functions are a frequent choice in the GLM framework. It characterizes the Tweedie family of distributions, whose most important special cases are the normal , Poisson , Gamma and inverse Gaussian distributions (Jørgensen, 1987, 1997). But in spite of its flexibility the GLM approach has some limitations: it deals only with independent and univariate response variables, and the variance function is assumed to be known.

Our main objectives are to extend the GLM approach to deal with first non-independent data and second multivariate response variables. A third objective is to estimate the power parameter, which works as automatic model selection.

The Tweedie family is quite flexible for handling continuous response variables, but it is less flexible for discrete response variables. Therefore, we propose to use the Poisson-Tweedie family to deal with discrete data (El-Shaarawi et al., 2011). The Poisson-Tweedie family has variance function , and many important models for count data are special cases, for example the Hermite , Neyman Type A , negative binomial and Poisson-inverse Gaussian , see Jørgensen and Kokonendji (2014). When using the Poisson-Tweedie family, the matrix in (2) takes the special form because the dispersion parameter appears only in the second term. Another important case is when the response variable is binary, bounded, or the number of successes within a given number of trials. In that case the binomial variance function may be useful.

It is important to emphasize that by using just these three sets of variance functions we can deal with most frequently occurring types of response variables. Such flexibility is very useful, for example when analysing data set , where the choice of count distribution for each response variable is not obvious. Using the Poisson-Tweedie variance function we can deal with zero-inflation and overdispersion, such as that observed in data set . A similar situation appears for data set , where we have a bounded response variable with exact ones, which can be well modelled using the binomial variance function. The Tweedie family, through its power variance function, can model zero-inflated and right skewed response variables, such as the monthly rainfall data.

In Eq. (2) it is easy to see where the assumption of independent observations appears in the covariance matrix, which in turn suggests how to introduce dependence between observations. It is enough to change the identity matrix to a non-diagonal matrix . This approach is similar to the idea of a working correlation matrix in the Generalized Estimation Equation (GEE) framework (Liang and Zeger, 1986; Zeger et al., 1988). Our approach differs from GEE in that we propose to model in terms of a linear combination of known matrices, following the ideas of Anderson (1973) and Pourahmadi (2000), i.e.

 h(Ω(τ))=τ0Z0+⋯+τDZD. (2)

Here is the covariance link function, with are known matrices reflecting the covariance structure, and is a parameter vector. This structure is a natural analogue of the linear predictor of the mean structure, and we call it a matrix linear predictor. Plugging the matrix linear predictor (2) into Eq. (2), we obtain a so-called covariance generalized linear model.

Two new issues appear here, concerning how to specify the covariance link function and how to define the matrices . The first issue was discussed by Pinheiro and Bates (1996) and Pourahmadi (2011). In this paper we will focus on well-known covariance link functions, such as the identity and the inverse functions. In Section 4 we show how to specify the matrices in order to obtain some well-known models for time series, spatial and space-time data.

Many authors claim that a suitable covariance link function must provide an unrestricted and interpretable parametrization. While laudable, such a goal is probably over-optimistic, and does not seem to have been achieved yet, at least not for the general case (Pourahmadi, 2000; Pinheiro and Bates, 1996). The modified Cholesky decomposition proposed by Pourahmadi et al. (2007) presents both features, but is restricted to the case where there is a natural ordering of the observations. In general, the identity and inverse covariance link functions allow for simple interpretations, but these covariance link functions do not provide unrestricted parametrizations. In fact it is quite hard to define the parameter space for . In Section 3 we propose the so-called reciprocal likelihood algorithm where we use a tuning constant in order to control the step length of the algorithm and avoid unrealistic values for the parameter vector . From an algorithmic point of view, there is hence no need to require an unrestricted parametrization.

The second main contribution of this paper is to extend the covariance generalized linear model to deal with multivariate response variables. Let be a response variable matrix and let denote the corresponding matrix of expected values. To indicate that each response variable has its own covariance matrix we use the notation . It is important to emphasize that this matrix models the covariances within each response variable. We introduce the correlation matrix to model the correlation between response variables. To specify the joint covariance matrix for all response variables, we adopt the generalized Kronecker product proposed by Martinez-Beneito (2013) in the context of multivariate disease mapping. We hence define the McGLM by

 E(Y) = M={g−11(X1β1),…,g−1R(XRβR)} Var(Y) = C=ΣRG⊗Σb (3)

where is the generalized Kronecker product. The matrix denotes the lower triangular matrix of the Cholesky decomposition of . The operator denotes a block diagonal matrix and denotes an identity matrix.

## 3 Estimation and inference

In this Section we describe the estimating function approach used to estimate the model parameters (Jørgensen and Knudsen, 2004). We divide the set of parameters into two subsets, . In this notation denotes a vector containing all regression parameters. Similarly, we let be a vector of all dispersion parameters.

To simplify the discussion, let be the stacked vector of the response variable matrix by columns. Similarly, let be the stacked vector of the expected values matrix by columns.

We adopt the following quasi-score function for the regression parameters:

 ψβ(β,λ)=D⊤C−1(Y−M),

where is an matrix, and denotes the gradient operator. The matrix

 Sβ=E(∇βψβ)=−D⊤C−1D (4)

is the sensitivity matrix of and the matrix

 Vβ=Var(ψβ)=D⊤C−1D (5)

is the variability matrix of .

Similarly, we adopt the Pearson estimating function, defined by the components

 ψλi(β,λ)=tr(Wλi(r⊤r−C))fori=1,…,Q, (6)

where and . Details on how to compute these weight matrices are given in Section 3.1.

The entry of the sensitivity matrix of is given by,

 Sλij=E(∂∂λiψλj)=−tr(WλiCWλjC). (7)

We may show using results about characteristic functions of linear and quadratic forms of non-normal variables (Knight, 1985), that the entry of the variability matrix of is given by

 Vλij=Cov(ψλi,ψλj)=2tr(WλiCWλjC)+NR∑l=1k(4)l(Wλi)ll(Wλj)ll, (8)

where denotes the fourth cumulant of to be discussed below, see Eq. (14). To take into account the covariance between the vectors and , we compute the cross-sensitivity and -variability matrices. The entry of the cross-sensitivity matrix between and is given by

 Sβiλj=E(∂∂λjψβi)=0. (9)

In a similar way the entry of the cross-sensitivity matrix between and is given by

 Sλiβj=E(∂∂βjψλi)=−tr(WλiCWβjC). (10)

We can show that the entry of the cross-variability matrix between and is given by

 Vλiβj=E[NR∑k=1NR∑l=1NR∑m=1W(lm)λiA(j)krkrlrm], (11)

where and denotes the column of . In a similar way denotes the entry of the matrix . Furthermore, the joint sensitivity matrix of and is given by

 Sθ=(SβSβλSλβSλ),

whose entries are defined by (4), (7), (9) and (10). Finally, the joint variability matrix of and is given by

 Vθ=(VβV⊤λβVλβVλ),

whose entries are defined by (5), (8) and (11).

Let be the estimating function estimator of . Then the asymptotic distribution of is

 ^θ∼N(θ,J−1θ)

where is the inverse of Godambe information matrix,

 J−1θ=S−1θVθS−Tθ,

where .

Jørgensen and Knudsen (2004) proposed the modified chaser algorithm to solve the system of equations and , defined by

 β(i+1) = β(i)−S−1βψβ(β(i),λ(i)) λ(i+1) = λ(i)−S−1λψλ(β(i+1),λ(i)). (12)

The modified chaser algorithm uses the insensitivity property (9), which allows us to use two separate equations to update and . This procedure was implemented in R (R Core Team, 2015) and some generic functions are made available in the supplement material. The modified chaser algorithm is often quite efficient, but it does not have any way to control the step length. Thus, based on ideas from Jensen et al. (1991) we propose the reciprocal likelihood algorithm involving an additional tuning constant to control the step length. The reciprocal likelihood algorithm replaces the second equation of (3) by

 λ(i+1) = λ(i)−[αψλ(β(i+1),λ(i))Tψλ(β(i+1),λ(i))V−1λSλ+Sλ]−1ψλ(β(i+1),λ(i)). (13)

The strategy for choosing used in this paper consists of starting the algorithm with , and continuing with as long as the proposed value of corresponds to a positive-definite covariance matrix. In the opposite case, we increase the value of by a small quantity (e.g. ) and try again until the covariance matrix becomes positive-definite, after which we return to , corresponding to the modified chaser algorithm. Compared with conventional step length methods, our method is adaptive in the sense that directions where the estimating function is far from zero are being less penalized.

To compute the variance of the dispersion parameter estimators we used the empirical fourth cumulants, i.e.

 k(4)l=(yl−^μl)4−3^C2ll. (14)

The empirical third central moment was computed based on equation (11), ignoring the expectation. The main advantage of using empirical third and fourth moments is that the resulting method depends on second-moment assumptions only. The additional variability induced by the use of empirical moments implies, however, increased variability of the asymptotic covariance of the dispersion parameter estimators, in particular for small sample sizes.

The Pearson estimating function (6) is unbiased only if the vector of regression parameters is known. Jørgensen and Knudsen (2004) proposed a bias-correction for the Pearson estimating function. The th bias-correction term is given by

 bλi=−tr(J(λi)βJ−1β), (15)

where denotes the Godambe information matrix for and . The corrected Pearson estimating function may be solved using the same algorithm as for the Pearson estimating function. The variability matrix does not depend on the bias-correction term. This is not the case for the sensitivity matrix, but the contribution of the correction term to the sensitivity is so small that it can be ignored.

### 3.1 Derivatives of the covariance matrix

The key calculation in relation to the fitting algorithm is to compute the derivative of the covariance matrix . In this Section we will provide details of this calculation for the model presented in Eq. (2). Let for denote the correlation parameters. We use the convention to stack the lower triangle of the correlation matrix by columns. Let be an vector of power parameters. Finally, let be a vector of dispersion parameters. To denote a specific element we use the notation for and .

The weight matrix is defined by

 Wλi=−∂C−1∂λi=C−1∂C∂λiC−1.

The partial derivative of with respect to the element is given by

 ∂C∂ρi=Bdiag(~Σ1,…,~ΣR)(∂Σb∂ρi⊗I)Bdiag(~Σ1,…,~ΣR).

Using elementary matrix calculus the partial derivative of with respect to the element is given by

 ∂C∂pr=Bdiag(0,…,∂~Σr∂pr,…,0)(Σb⊗I)Bdiag(~ΣT1,…,~ΣTR) (16)

Similar equation may be obtained with respect to the elements in the vector . Given the block diagonal structure of Eq. (3.1), it is enough to compute the derivatives and insert in Eq. (3.1). Based on the results from Särkkä (2013) the partial derivative of with respect to and are given by

 ∂~Σr∂pr=~ΣrΦ(~Σ−1r∂Σr∂pr~Σ−1r),

and

 ∂~Σr∂τrd=~ΣrΦ(~Σ−1r∂Σr∂τrd~Σ−1r),

respectively, where the function returns the lower triangular part of the argument and half of its diagonal. Now, recalling that , we may hence see that the partial derivative with respect to and are given by

 ∂Σr∂pr=∂Vr(μ;p)12∂prΩr(τr)Vr(μ;p)12+Vr(μ;p)12Ωr(τr)∂Vr(μ;p)12∂pr, (17)

and

 ∂Σr∂τrd=Vr(μ;p)12∂Ωr(τr)∂τrdVr(μ;p)12,

respectively, where

 ∂Ωr(τr)∂τrd=∂h−1(U)∂UZrd (18)

and where . The derivative in Eqs. (17) and (18) depends on the derivative of the variance function and covariance link function, respectively, and it should be evaluated accordingly.

## 4 Data analyses

### 4.1 Results from data set 1

In this section we apply the McGLM approach to analyze the multivariate count data set presented in Section 1.1. We adopted the log link function, the Poisson-Tweedie variance function and the identity covariance link function for the five count response variables. The matrix linear predictor is composed of an identity matrix, since we have independent respondents. The linear predictor is composed of nine covariates plus the intercept for each response variable. The covariance structure is described by five power parameters, five dispersions and ten correlation parameters. We fitted this model using the modified chaser algorithm (3) and the correction term in (15). Table 1 shows the estimates and standard errors for the power and dispersion parameters.

The results in Table 1 show that for the response variables Ndoc, Nndoc and Nmed the suggested distribution is the Neyman Type A (), which indicates zero inflation relative to the Poisson distribution. Regarding the response variable Nhosp the model indicates that the Pólya-Aeppli distribution () is suitable. Finally, the model indicates that for Nadm both Neyman Type A, Pólya-Aeppli and negative binomial () distributions are suitable. This result is obtained because the dispersion parameter is not different from zero; hence the response variable Nadm is equidispersed and all these distributions work well, including the Poisson. In this case, we do not have enough information in the data to distinguish between these distributions. Therefore, we suggest to opt for the simplest possibility, i.e. the Poisson model. The dispersion estimates show weak overdispersion for the response variables Ndoc, Nndoc and Nmed and high overdispersion for Nhosp. In order to compare the regression coefficients with a conventional model, Figure 4 shows the estimates and confidence intervals for McGLM and a conventional Poisson log-linear model for each response variable. The intercept is not shown in order to avoid scale issues.

The results in Figure 4 show that the two approaches agree in terms of estimates, but differ in terms of standard errors. The differences may be explained by the covariance structure. The Poisson model assumes equidispersion, whereas the McGLM models allow for a flexible modelling of the covariance structure, allowing in particular various degrees of overdispersion and zero-inflation. For the response variable Nadm, the model shows that equidispersion is suitable, making the McGLM and Poisson confidence intervals similar. On the other hand, for the response variable Nhosp, where the overdispersion is strong, the McGLM confidence intervals are about five times wider than the Poisson ones. In a similar way, the McGLM confidence intervals for Nndoc, Ndoc and Nmed are on average , and wider than the corresponding Poisson intervals. These results highlight the importance of modelling the covariance structure even when the main interest is in the regression parameters, because the covariance structure controls the standard errors for the regression parameters.

An additional feature of McGLM is that we can estimate the correlation between response variables. It is important to emphasize that the estimation of the correlation matrix does not inflate the standard errors for the regression coefficients, due to the insensitivity of the quasi-score function with respect to the covariance parameters. The estimates and standard errors for the entries of the matrix were as follows:

 ^Σb=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣10.1066(0.0161)10.1708(0.0156)0.0601(0.0144)10.0905(0.0164)0.0679(0.0156)0.0478(0.0144)10.1503(0.0160)0.0688(0.0147)0.0699(0.0140)0.5464(0.0510)1⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

All correlations are significantly different from zero, but only the correlation between Nhosp and Nadm is substantial in size. The standard errors are all of a similar magnitude, which is natural since all are computed using the same sample size. Furthermore, these correlations take into account the effect of all covariates, zero inflation and overdispersion. We know of no other statistical method that allows estimation of correlations taking into account all these important features.

### 4.2 Results from data set 2

In this section, we apply the McGLM approach to analyze data set 2 from Section 1.2, which has response variables of mixed types. There are three response variables, namely HR, RR and OSat, the first two being continuous and the last being confined to the unit interval, having exact zeroes. We adopted the constant variance function, identity link function, and identity covariance link function for HR and RR, reflecting a belief that HR and RR are normally distributed. For OSat we adopted the logit link function combined with the binomial variance function and identity covariance link function. We fitted the model using the modified chaser algorithm (3) and the correction term (15).

The matrix linear predictor is composed of a diagonal matrix (intercept) combined with two sets of matrices to model the longitudinal and repeated measures structures. The longitudinal structure is modeled by a compound symmetry matrix (of ones), the reciprocal of Euclidean distances and reciprocal of Euclidean distances squared. The repeated measures structure is described by an unstructured covariance matrix. Since we have three Evaluations to represent this structure we need three matrices. Therefore, the matrix linear predictor is a linear combination of seven known matrices and it is described by dispersion parameters (seven for each outcome). Details of the matrix linear predictor is available in the supplementary material. In this example we have no power parameters and the matrix contains three parameters. Table 2 shows the estimates and standard errors for the dispersion parameters. The parameters ( and ) and ( and ) are associated to the repeated measures and longitudinal structures, respectively.

The results in Table 2 show that the longitudinal structure is not significant for all response variables. The repeated measures structure is significant for RR and HR. For the outcome RR the estimate of is not significant, which means that the covariance between Evaluation 1 and Evaluation 3 is not different from zero. For the outcome OSat there are no significant dispersion coefficients, so we may assume independent observations. The final model is composed of the repeated measures structure for the response variables RR and HR and independent structure (only ) for OSat.

We have a set of covariates entering the linear predictor, and we used a stepwise procedure to select the most significant set of covariates. This procedure selected a different set of covariates for each outcome. After completing this procedure, we included the covariate of particular interest, namely treat, which is a factor with two levels. Our goal is to assess whether or not the treatment has an effect on each response variable.

In order to evaluate the effects of the covariance structure on the regression coefficients, Figure 5 shows estimates and confidence intervals obtained from the final McGLM and a quasi GLM using the same link and variance functions as for the McGLM. The linear predictors for the outcome RR, HR and OSat are composed of , and regression coefficients, respectively. The intercept is not shown, in order to avoid scaling issues. It is important to emphasize that the last two regression coefficients (numbered 9–10, 12–13 and 9–10, respectively) measure the treatment effects.

The results in Figure 5 show that in general the confidence intervals from McGLM are wider than the corresponding ones based on quasi GLM. For the outcome RR and HR the standard errors from McGLM are on average and greater than the corresponding quasi GLM ones, respectively. These results are as expected, because correlation within response variables generally implies less information in the data on the regression coefficients. It is hence interesting to note that, in contras to the other regression coefficients, the two treatment coefficients for each response variable have smaller standard errors using McGLM than those obtained using quasi GLM. We attribute this effect to the fact that the treatment covariate is also used for modelling the covariance structure, which apparently improves on the estimation of the treatment effect, although we are uncertain if this is a general feature, or if it is specific to this data set.

Regarding the outcome OSat the standard errors from McGLM are on average smaller than those obtained by quasi GLM. This may be explained by the difference between the estimates of , which are and for McGLM and quasi binomial, respectively. This small difference seems to be due to the use of the corrected Pearson estimator in our model.

Regarding the treatment effects, the final model shows that for RR the Evaluation 1 differs from Evaluation 3, but not from Evaluation 2. On the other hand, for HR the model shows that the Evaluation 1 differs from Evaluation 2, but does not differ from Evaluation 3. This result contrasts with the quasi GLM analysis, which does not show any significant difference between Evaluation 1 and Evaluation 2. Finally, our final model shows that for OSat both constrasts are significant. The estimated correlation matrix between response variables was as follows:

 ^Σb=⎡⎢⎣10.1682(0.0607)1$−$0.0482(0.0607)$−$0.0733(0.0608)1⎤⎥⎦

We observe that there is a significant but weak correlation between RR and HR, whereas the other two correlation estimates are not significant.

### 4.3 Results from data set 3

In this section we apply the McGLM approach to the space-time data set presented in Section 1.3. The response variable monthly rainfall is right-skewed with a positive probability at zero. We hence adopted the log link function, the power variance function and the inverse covariance link function. The linear predictor is expressed in terms of Fourier harmonics (seasonal variation) and B-splines (general trend),

 g(μtj)=β0+β1cos(2πt/12)+β2sin(2πt/12)+4∑k=1βk+2Bk(j)

where indexes months, and indexes years. The form a B-spline basis with four degree of freedom. We used only the first term of the Fourier harmonics, since the second and third terms were not significant.

The main challenge in the analysis was to model the covariance structure suitably, in order to take into account the spatial and temporal autocorrelation and, if necessary, the interaction between space and time. We propose to model the space-time structure using a linear combination of neighborhood matrices. Let us motivate our approach using the Conditional Autoregressive (CAR) model. The CAR model specifies the inverse of the covariance matrix by

 Ω−1(τ,ρ)=τ(D−ρW)

where is a neighborhood matrix and is a diagonal matrix with the number of neighbors in the main diagonal. The model is parametrized by the precision () and autocorrelation () parameters. The matrices and can model both space, time and space-time interaction in a straightforward way, by using different neighborhood matrices.

For the Venezuelan rainfall data, we used the spatial coordinates (latitude and longitude) to build a Voronoi tessellation (see Figure 3B). The tessellation structure helps us to specify a neighborhood structure. Let us denote this structure by and . Temporal neighbors are naturally specified by the time structure. Let us denote these matrices by and . In the space-time case we have replicates of the space and time structures, so assuming independent replicates, the full neighborhood matrix is block-diagonal. Finally, the interaction between space and time is described by the Kronecker product between the space and time neighborhood structures. Let us denote these matrices by and . Thus, the matrix linear predictor is given by

 Ω−1(τ,ρ)=τt(Dt+ρtWt)+τs(Ds+ρsWs)+τst(Dst+ρstWst),

which may be written as a linear combination of known matrices as follows:

 Ω−1(τ)=τ0Z0+τ1Z1+τ2Z2+τ3Z3+τ4Z4+τ5Z5, (19)

where , , , , and . In a similar way, we find that , and . Finally, , and . In practical situations, fitting this model may be hard if the autocorrelation parameters are near . In the Bayesian context, it is common to fix the autocorrelation parameter at the value 1, which is the so-called Intrinsic Conditional Autoregressive (ICAR) model.

In order to investigate the space-time structure we fitted three models, cf. Table 3. The first (Model 1) considers time effects. The second (Model 2) considers space effects and the third (Model 3) the space-time interaction effects. After this procedure we decided that the autocorrelation parameters associated with space and interaction effects must be fixed at the value . We then fitted the final model (Model 4) with all three components time, space and space-time interaction. All models were fitted using the reciprocal likelihood algorithm (13). Table 3 presents estimates and standard errors for the power and dispersion parameters for each of the four models.

The results in Table 3 show a moderate temporal autocorrelation, but high spatial and space-time interaction autocorrelations. All the power parameter estimates are in the interval , suggesting a compound Poisson distribution, as expected, since the response variable is continuous with exact zeros. The fitted values and confidence interval are shown in Figure 3A above.

The model allows us to make prediction using the Best Linear Unbiased Predictor (BLUP). Furthermore, we may obtain predictions for different response variable measures, such as the mean number of precipitations events per month, the avarage amount of precipitation per event, or the probability of no precipitation. Such extensions are straightforward and will be presented elsewhere.

## 5 Discussion

In this paper we have developed a comprehensive statistical modelling framework for correlated data, obtained by using separate pairs of link functions and linear predictors for the mean and covariance structures in the style of Pourahmadi (2011). Motivated by three data examples, we have shown that the McGLM framework can deal with a wide variety of correlation structures where existing modelling approaches have difficulties. Following Nelder and Wedderburn (1972), there are obvious pedagogical advantages to the modular specification of models in McGLM, incentivating the researcher to think constructively about the covariance structure, while drawing on previous experiences from GLM modelling. The generalized Kronecker product facilitates the specification of the covariance structure for multivariate response variables.

The main features of the McGLM framework include the ability to deal with most common types of response variables, such as continuous, count and binary. Characteristics such as symmetry/asymmetry, excess of zeros and overdispersion are easily handled due to the flexibility of the model class. We can model many different types of dependencies, such as repeated measures, longitudinal, time series, spatial and space-time data. All of these features extend to multivariate response variables, including the case of mixed response variable types, while maintaining the population average interpretations of regression parameters as well as for covariance parameters. This gives a very flexible modelling of the covariance structure compared with for example current GEE implementations, where the researcher must choose from a small set of pre-defined covariance models.

The main technical advantage of the McGLM framework is the simplicity of the fitting method, which amounts to finding the root for a set of non-linear equations. Based on second-moment assumptions, we use a quasi-score function for the regression parameters and a Pearson estimating function for the covariance parameters. The modified chaser algorithm of Jørgensen and Knudsen (2004) requires an approximate derivative matrix in the form of the sensitivity matrix, which is usually relatively simple. The new reciprocal likelihood algorithm requires the additional calculation of the variability matrix in order to stabilize the covariance parameter update, resulting in a very efficient algorithm, although the sensitivity matrix may be hard to calculate for big data. In such cases a numerical approximation for the sensitivity matrix may be used. For both algorithms a careful choice of initial values is required. In any case, we avoid using computationally more intensive methods such as MCMC, numerical optimization or numerical integration, which are common in the context of random effects models.

An important feature of the McGLM fitting method is that while the mean parameter estimators depend relatively little on the form of the covariance structure, this is not the case for the standard errors of the mean parameter estimators, which depend directly on the choice of covariance structure. A related matter is that the discussion of the efficiency of the mean and covariance parameter estimators is difficult due to the lack of a fully specified probability model.

The current version of the fitting algorithm (available in the supplementary material) is a preliminary implementation of the McGLM method. We plan to develop a full McGLM R package with a GLM-style interface that takes full advantage of the modular specification of the models. There are many possible extensions to the basic model discussed in the present paper, including for example facilities for censored data in survival analysis and other special types of data, or new estimating functions to handle data not missing at random. The specification of the matrix linear predictor is one of the key points of the McGLM approach. While we have used some simple and easily interpretable matrices here, there is wide scope for further research on the proper choice of the matrix linear predictor. Similarly, other covariance link functions, such as the matrix-logarithm Chiu et al. (1996) may also be explored. It is also possible to incorporate penalized splines into the mean and covariance structures, and to use regularization for high-dimensional data, with important applications in genetics. Furthermore, McGLMs may be scaled to test for a common exposure effect in the style of Roy et al. (2003).

## Appendix - Data sets description

Data set 1: Australian health survey
Response variables: Ndoc - Number of consultations with a doctor or specialist. Nndoc - Number of consultations with health professionals. Nmed - Total number of prescribed and non prescribed medications used in the past two days. Nhosp - Number of nights in a hospital during the most recent admission. Nadm - Number of admissions to a hospital, psychiatric hospital, nursing or convalescence home in the past 12 months.
Covariates: sex - factor, two levels (0-Male; 1-Female). age - respondent’s age in years divided by 100. income - respondent’s annual income in Australian dollars divided by 1000. levyplus - factor, two levels (1- if respondent is covered by private health insurance fund for private patients in public hospital (with doctor of choice); 0 - otherwise). freepoor - factor, two levels (1 - if respondent is covered by government because low income, recent immigrant, unemployed; 0 - otherwise). freerepa - factor, two levels (1 - if respondent is covered free by government because of old-age or disability pension, or because invalid veteran or family of deceased veteran; 0 - otherwise). illness - number of illnesses in past 2 weeks, with 5 or more weeks coded as 5. actdays - number of days of reduced activity in the past two weeks due to illness or injury. hscore - respondent’s general health questionnaire score using Goldberg’s method; high score indicates poor health. chcond1 - factor, two levels (1 - if respondent has chronic condition(s) but is not limited in activity; 0 - otherwise). chcond2 - factor, two levels (1 if respondent has chronic condition(s) and is limited in activity; 0 - otherwise). id - respondent’s index.
Data set 2: Respiratory physiotherapy on premature newborns
Response variables
: RR - Respiratory rate (continuous). HR - Heart rate (continuous). O2Sat - Oxygen saturation (bounded).
Covariates: Sex - factor, two levels (Female; Male). GA - Gestational age (weeks). BW - Birth weight (mm). APGAR1M - APGAR index in the first minute of life. APGAR5M - APGAR index in the fifth minute of life. PRE - factor, two levels (Premature: yes; no). HD - factor, two levels (Hansen’s disease, yes; no). SUR - factor, two levels (Surfactant, yes; no). JAU - factor, two levels (Jaundice, yes; no). PNE - factor, two levels (Pneumonia, yes; no). PDA - factor, two levels (Persistence of ductus arteriosus, yes; no). PPI - factor, two levels (Primary pulmonary infection, yes; no). OTHERS - factor, two levels (Other diseases, yes; no). DAYS - Age (days). AUX - factor, two levels (Type of respiratory auxiliary, HOOD; OTHERS). TREAT - factor, three levels (Respiratory physiotherapy, Evaluation 1; Evaluation 2; Evaluation 3). UNIT - Unit sample code. TIME - Days of treatment.
Data set 3: Venezuelan rainfall data
Response variable
: rainfall - monthly rainfall (mm)
Covariates: month - month code. Longitude - Longitude (UTM). Latitude - Latitude (UTM). height - height above sea level (m).
Supplement material web page

## Acknowledgements

The first author is supported by CAPES (Coordenação de Aperfeiçoamento de Pessoal de Nível Superior) - Brazil.

### References

1. Anderson, T. W. (1973). Asymptotically efficient estimation of covariance matrices with linear structure, The Annals of Statistics 1(1): 135–141.
2. Bates, D., Maechler, M., Bolker, B. and Walker, S. (2014). lme4: Linear mixed-effects models using Eigen and S4. R package version 1.1-6.
3. Besag, J., York, J. and Mollié, A. (1991). Bayesian image restoration with two applications in spatial statistics, Annals of Institute of Statistical Mathematics 43(1): 1–59.
4. Bonat, W. H., Ribeiro Jr, P. J. and Zeviani, W. M. (2015). Likelihood analysis for a class of beta mixed models, Journal of Applied Statistics 42(2): 252–266.
5. Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models, Journal of the American Statistical Association 88(421): 9–25.
6. Cameron, A. C. and Trivedi, P. K. (1998). Regression Analysis of Count Data, Econometric society monographs, Cambridge University Press, Cambridge (UK).
7. Chandler, R. E. and Wheater, H. S. (2002). Analysis of rainfall variability using generalized linear models: A case study from the west of Ireland, Water Resources Research 38(10): 1–11.
8. Chiu, T. Y. M., Leonard, T. and Tsui, K. (1996). The matrix-logarithmic covariance model, Journal of the American Statistical Association 91(433): 198–210.
9. Cowpertwait, P. S. P., Lockie, T. and Davis, M. D. (2006). A stochastic spatial-temporal disaggregation model for rainfall, Journal of Hydrology (New Zealand) 45(1): 1–12.
10. Cressie, N. and Huang, H. (1999). Classes of nonseparable, spatio-temporal stationary covariance functions, Journal of the American Statistical Association 94(448): 1330–1339.
11. Cressie, N. and Wikle, C. K. (2011). Statistics for Spatio-Temporal Data, Wiley Series in Probability and Statistics, John Wiley & Sons, Inc., Hoboken, NJ.
12. Crouchley, R. (2012). sabreR: Multivariate Generalized Linear Mixed Models. R package version 2.0.
13. Deb, P. and Trivedi, P. K. (1997). Demand for medical care by the elderly: A finite mixture approach, Journal of Applied Econometrics 12(3): 313–36.
14. Diggle, P. J., Heagerty, P., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, Oxford Statistical Science Series, Oxford.
15. Diggle, P. and Ribeiro, P. (2007). Model-based Geostatistics, Springer Series in Statistics, Springer-Verlag New York.
16. Dobbie, M. J. and Welsh, A. H. (2001). Models for zero-inflated count data using the Neyman Type A distribution, Statistical Modelling 1(1): 65–80.
17. Dunn, P. K. (2004). Occurrence and quantity of precipitation can be modelled simultaneously, International Journal of Climatology 24(10): 1231–1239.
18. El-Shaarawi, A. H., Zhu, R. and Joe, H. (2011). Modelling species abundance using the Poisson-Tweedie family, Environmetrics 22(2): 152–164.
19. Fieuws, S., Verbeke, G. and Molenberghs, G. (2007). Random-effects models for multivariate repeated measures, Statistical Methods in Medical Research 16(5): 387–397.
20. Fong, Y., Rue, H. and Wakefield, J. (2010). Bayesian inference for generalized linear mixed models, Biostatistics 11(3): 397–412.
21. Gneiting, T. (2002). Nonseparable, stationary covariance functions for space-time data, Journal of the American Statistical Association 97(458): 590–600.
22. Gray, S. M. and Brookmeyer, R. (2000). Multidimensional longitudinal data: Estimating a treatment effect from continuous, discrete, or time-to-event response variables, Journal of the American Statistical Association 95(450): 396–406.
23. Hadfield, J. D. (2010). MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package, Journal of Statistical Software 33(2): 1–22.
24. Hasan, M. M. and Dunn, P. K. (2010). A simple Poisson-gamma model for modelling rainfall occurrence and amount simultaneously, Agricultural and forest meteorology 150(10): 1319–1330.
25. Hasan, M. M. and Dunn, P. K. (2012). Understanding the effect of climatology on monthly rainfall amounts in Australia using Tweedie GLMs, International Journal of Climatology 32(7): 1006–1017.
26. Holla, M. (1967). On a Poisson-inverse Gaussian distribution, Metrika 11(1): 115–121.
27. Jensen, S. T., Johansen, S. and Lauritzen, S. L. (1991). Globally convergent algorithms for maximizing likelihood function, Biometrika 78(4): 867–877.
28. Jørgensen, B. (1987). Exponential dispersion models, Journal of the Royal Statistical Society. Series B (Methodological) 49(2): 127–162.
29. Jørgensen, B. (1997). The Theory of Dispersion Models, Chapman & Hall.
30. Jørgensen, B. and Knudsen, S. J. (2004). Parameter orthogonality and bias adjustment for estimating functions, Scandinavian Journal of Statistics 31(1): 93–114.
31. Jørgensen, B. and Kokonendji, C. C. (2014). Discrete dispersion models and their Tweedie asymptotics, ArXiv e-prints .
32. Knight, J. L. (1985). The joint characteristic function of linear and quadratic forms of non-normal variables, Sankhyā: The Indian Journal of Statistics, Series A (1961-2002) 47(2): 231–238.
33. Krupskii, P. and Joe, H. (2013). Factor copula models for multivariate data, Journal of Multivariate Analysis 120(0): 85–101.
34. Lee, Y. and Nelder, J. A. (1996). Hierarchical generalized linear models, Journal of the Royal Statistical Society. Series B (Methodological) 58(4): 619–678.
35. Lee, Y. and Nelder, J. A. (2004). Conditional and Marginal models: Another view, Statistical Science 19(2): 219–238.
36. Liang, K.-Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models, Biometrika 73(1): 13–22.
37. Lunn, D. J., Thomas, A., Best, N. and Spiegelhalter, D. (2000). Winbugs: A bayesian modelling framework: Concepts, structure, and extensibility, Statistics and Computing 10(4): 325–337.
38. Martin, A. D., Quinn, K. M. and Park, J. H. (2011). MCMCpack: Markov Chain Monte carlo in R, Journal of Statistical Software 42(9): 22.
39. Martinez-Beneito, M. A. (2013). A general modelling framework for multivariate disease mapping, Biometrika 100(3): 539–553.
40. McCulloch, C. E. (1997). Maximum likelihood algorithms for generalized linear mixed models, Journal of the American Statistical Association 92(437): 162–170.
41. Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models, Journal of the Royal Statistical Society. Series A 135(3): 370–384.
42. O’Brien, L. M. and Fitzmaurice, G. M. (2004). Analysis of longitudinal multiple-source binary data using generalized estimating equations, Journal of the Royal Statistical Society: Series C (Applied Statistics) 53(1): 177–193.
43. Ospina, R. and Ferrari, S. L. P. (2010). Inflated beta distributions, Statistical Papers 51(1): 111–126.
44. Pan, J. and Mackenzie, G. (2003). On modelling mean-covariance structures in longitudinal studies, Biometrika 90(1): 239–244.
45. Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D. and R Core Team (2013). nlme: Linear and Nonlinear Mixed Effects Models. R package version 3.1-113.
46. Pinheiro, J. C. and Bates, D. M. (1996). Unconstrained parametrizations for variance-covariance matrices, Statistics and Computing 6(3): 289–296.
47. Plummer, M. (2003). JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling, Proceedings of the 3rd International Workshop on Distributed Statistical Computing.
48. Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation, Biometrika 86(3): 677–690.
49. Pourahmadi, M. (2000). Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix, Biometrika 87(2): 425–435.
50. Pourahmadi, M. (2011). Covariance estimation: The GLM and regularization perspectives, Statistical Science 26(3): 369–387.
51. Pourahmadi, M., Daniels, M. J. and Park, T. (2007). Simultaneous modelling of the Cholesky decomposition of several covariance matrices, Journal of Multivariate Analysis 98(3): 568–587.
52. R Core Team (2015). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0.
53. Rochon, J. (1996). Analyzing bivariate repeated measures for discrete and continuous outcome variables, Biometrics 52(2): 740–750.
54. Rodrigues-Motta, M., Pinheiro, H. P., Martins, E. G., Araújo, M. S. and dos Reis, S. F. (2013). Multivariate models for correlated count data, Journal of Applied Statistics 40(7): 1586–1596.
55. Roy, J., Lin, X. and Ryan, L. M. (2003). Scaled marginal models for multiple continuous outcomes, Biostatistics 4(3): 371–383.
56. Rue, H. and Held, L. (eds) (2005). Gaussian Markov Random Fields: Theory and Applications, Chapman & Hall, London.
57. Rue, H., Martino, S., Lindgren, F., Simpson, D., Riebler, A. and Krainski, E. T. (2014)). INLA: Functions which allow to perform full Bayesian analysis of latent Gaussian models using Integrated Nested Laplace Approximaxion. R package version 0.0-1404466487.
58. Sansó, B. and Guenni, L. (1999). Venezuelan rainfall data analysed by using a Bayesian space-time model, Journal of the Royal Statistical Society: Series C (Applied Statistics) 48(3): 345–362.
59. Sansó, B. and Guenni, L. (2004). A Bayesian approach to compare observed rainfall data to deterministic simulations, Environmetrics 15(6): 597–612.
60. Särkkä, S. (2013). Bayesian Filtering and Smoothing, Cambridge University Press, London. Cambridge Books Online.
61. SAS Institute (2011). SAS/STAT 9.3 User’s Guide: The GLIMMIX Procedure (Chapter), North Carolina, USA.
62. Shi, P. and Valdez, E. A. (2014). Multivariate negative binomial models for insurance claim counts, Insurance: Mathematics and Economics 55(0): 18–29.
63. Sigrist, F., Künsch, H. R. and Stahel, W. A. (2012). A dynamic nonstationary spatio-temporal model for short term prediction of precipitation, The Annals of Applied Statistics 6(4): 1452–1477.
64. Stern, R. and Coe, R. (1984). A model fitting analysis of daily rainfall data, Journal of the Royal Statistical Society. Series A (General) 1(147): 1–34.
65. Tsionas, E. G. (1999). Bayesian analysis of the multivariate Poisson distribution, Communications in Statistics - Theory and Methods 28(2): 431–451.
66. Verbeke, G., Fieuws, S., Molenberghs, G. and Davidian, M. (2014). The analysis of multivariate longitudinal data: A review, Statistical Methods in Medical Research 23(1): 42–59.
67. Wheater, H. S., Isham, V. S., Cox, D. R., Chandler, R. E., Kakou, A., Northrop, P. J., Oh, L., Onof, C. and Rodriguez-Iturbe, I. (2000). Spatial-temporal rainfall fields: modelling and statistical aspects, Hydrology and Earth System Sciences 4(4): 581–601.
68. Wilks, D. S. (1990). Maximum likelihood estimation for the gamma distribution using data containing zeros, Journal of Climate 3(12): 1495–1501.
69. Zeger, S. L., Liang, K.-Y. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach, Biometrics 44(4): 1049–1060.
70. Zhang, P., Q. Z. (2014). Regression analysis of proportional data using simplex distribution, Scientia Sinica Mathematica 44(1): 89–104.
71. Zhang, W., Leng, C. and Tang, C. Y. (2015). A joint modelling approach for longitudinal studies, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 77(1): 219–238.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters