Rubén LoaizaMaya is a PhD student and Michael Smith is Chair of Management (Econometrics), both at Melbourne Business School, University of Melbourne. Correspondence should be directed to Michael Smith at mike.smith@mbs.edu.
Variational Bayes Estimation of Time Series Copulas for Multivariate Ordinal and Mixed Data
Abstract
We propose a new variational Bayes method for estimating highdimensional copulas with discrete, or discrete and continuous, margins. The method is based on a variational approximation to a tractable augmented posterior, and is substantially faster than previous likelihoodbased approaches. We use it to estimate drawable vine copulas for univariate and multivariate Markov ordinal and mixed time series. These have dimension , where is the number of observations and is the number of series, and are difficult to estimate using previous methods. The vine paircopulas are carefully selected to allow for heteroskedasticity, which is a common feature of ordinal time series data. When combined with flexible margins, the resulting time series models also allow for other common features of ordinal data, such as zero inflation, multiple modes and under or overdispersion. Using data on homicides in New South Wales, and also U.S bankruptcies, we illustrate both the flexibility of the time series copula models, and the efficacy of the variational Bayes estimator for copulas of up to 792 dimensions and 60 parameters. This far exceeds the size and complexity of copula models for discrete data that can be estimated using previous methods.
Key Words: Augmented posterior; Drawable vines; Heteroskedasticity; Mixture copula; Ordinal time series; Sparse Gaussian variational approximation; Stochastic gradient ascent.
1 Introduction
Copula models for discretevalued variables are difficult to estimate because their likelihood involves evaluations of the copula function, so that it is computationally intractable for even moderate dimensions. To avoid this problem, Pitt et al. (2006) and Smith and Khaled (2012) propose using Markov chain Monte Carlo (MCMC) data augmentation, where a tractable augmented likelihood is employed instead. However, this approach becomes slow for copulas with higher dimensions and sample sizes. Gunawan et al. (2016) were the first to suggest using a variational Bayes (VB) estimator as an alternative. Their method — based on that in Tran et al. (2017) and labeled VBIL — uses an unbiased estimate to the intractable likelihood computed using importance sampling. However, as the copula dimension and the number of copula parameters increase, computing the unbiased estimate of the likelihood via importance sampling also makes this method computationally infeasible.
In this paper, we propose a new VB estimator for copulas with a substantially higher dimension and number of parameters than can be estimated by either MCMC data augmentation or VBIL. It uses a variational approximation to the tractable augmented likelihood of Smith and Khaled (2012), instead of the intractable likelihood. We use the new VB method to estimate multivariate times series copula models for ordinal, or a combination of continuous and ordinal (ie. ‘mixed’), time series variables. The models combine arbitrary timeinvariant margins with a copula that captures serial and crosssectional dependence jointly (Beare and Seo, 2015, Smith, 2015). These copulas are challenging to estimate because they have dimension , where is the number of time series observations and is the number of series. We first show that our method is accurate when compared to the exact posterior computed using (much slower) MCMC data augmentation for some univariate () ordinalvalued time series. We then employ it to estimate multivariate times series copulas, where neither data augmentation nor VBIL can be used in practice.
Ordinalvalued time series data arise in many fields, such as criminology (Mohler et al., 2013), marketing (Ravishanker et al., 2016) and finance (Bien et al., 2011, Aktekin et al., 2013). These series often exhibit over or underdispersion, multiple modes, truncation and zeroinflation in the margin, along with serial correlation in the level and also conditional variance (ie. heteroskedasticity). There is an extensive literature on models that can capture one or more of these features in univariate series; see Zeger (1988), Harvey and Fernandes (1989) and Davis et al. (2016) for some examples. In contrast, extension to multiple series is difficult and less common; for examples, see Quoreshi (2006), Heinen and Rengifo (2007), Pedeli and Karlis (2011) and Aktekin et al. (2017). In comparison, time series copula models allow for over or underdispersion, multiple modes, truncation and zero inflation in a time series through the choice of an arbitrary margin. They also extend readily to multiple ordinal or mixed time series. However, a major challenge is selecting a highdimensional copula that can capture both persistence in the mean and heteroskedasticity parsimoniously. To do so, we employ a drawable vine (Dvine) (Aas et al., 2009), which Beare and Seo (2015) and Smith (2015) show is parsimonious for Markov and stationary multivariate time series. Following (LoaizaMaya et al., 2017), the component ‘paircopulas’ are carefully selected to capture heteroskedasticity in the series. This is important because it is a feature exhibited by most ordinal time series. Last, we note that lowdimensional copulas have been used previously to capture crosssectional dependence between multiple ordinalvalued time series (Heinen and Rengifo, 2007). However, this is very different from what we propose here, where we employ intrinsically highdimensional copulas to capture both serial and crosssectional dependence jointly.
Two empirical examples illustrate both the efficacy of our new VB estimator, and the proposed copula time series model. The first example is monthly counts of the three crimes of murder, attempted murder and manslaughter in the Australian state of New South Wales. Univariate time series copula models show that both counts of murder and attempted murder exhibit strong serial dependence, along with heteroskedasticity. When compared to the (effectively exact) posterior computed using MCMC data augmentation, the VB estimates prove accurate, yet are much faster to compute. The VB estimator is then applied to a trivariate time series copula model of the three crime count series. The 792dimensional copula captures a rich multivariate serial dependence structure, and is difficult to estimate using MCMC data augmentation in reasonable time. The second example illustrates the mixed margin case, where we estimate the bivariate serial dependence structure of monthly counts of U.S. bankruptcies and the continuousvalued volatility index (VIX). The marginal distribution of the VIX is highly irregular, making the copula model an attractive approach because it can be modeled nonparametrically. The estimated 658dimensional copula captures heteroskedasticity in both series, and indicates that the VIX is a leading indicator of U.S. bankruptcies. These copula models cannot be estimated using the VB method of Gunawan et al. (2016) in reasonable time.
The paper outline is as follows. Section 2 outlines copula models for discrete data, and the Dvine copula for univariate time series. Section 3 presents our new VB method, including the variational approximation to the augmented posterior, a comparison with VBIL and MCMC data augmentation, and the univariate homicide examples. Section 4 extends the vine copula model to the case of multiple ordinal time series, and Section 5 to mixed series. Section 6 concludes.
2 Copula Model
2.1 Copula with Discrete Margins
Following Sklar (1959), the joint distribution function of a discretevalued random vector can be written as
(1) 
where , , , is the marginal distribution function of , and is a dimensional copula function that captures all dependence in . In the copula modeling literature it is usual to select a parametric copula for , with parameter vector . Because for a finite or countably infinite set , then at Equation (1) is only uniquely defined on its sample space (Genest and Nešlehová, 2007). Nevertheless, remains welldefined for any given parametric copula function . Let , and be the lefthand limit of at , then the corresponding probability mass function is
(2) 
where the difference notation of (Nelsen, 2006, p. 43) is employed with vector of differencing variables . Direct computation of Equation (2) is impractical as it involves evaluation of a total of times. However, following Smith and Khaled (2012), likelihoodbased estimation can be undertaken by considering the augmented density
(3) 
with copula density , and the indicator variable if is true, and otherwise. The margin in of Equation (3) is the required mass function at Equation (2).
When there are multiple independent observations on , as with most crosssectional or longitudinal datasets (Smith et al., 2010, Nikoloulopoulos, 2017), then the augmented likelihood is the product of augmented density terms over the observations. For the time series case that is the focus of this paper, the augmented likelihood is given directly by Equation 3.
2.2 Time Series Copula
We consider the case where is a strongly stationarity ordinalvalued stochastic process with Markov order . Then is time invariant and can be written as , and the main challenge in using the copula model at Equation (1) is the selection of to capture the serial dependence in the series. We note that ordinal time series usually exhibit persistence in both the mean and variance, so that should capture this feature. To do so, we adopt a Dvine copula (Aas et al., 2009) with paircopula components carefully selected to capture persistence in the first two moments.
In general, a Dvine copula density is equal to the product of bivariate copula densities called ‘paircopulas’. However, when the series has Markov order , the number of paircopulas is much smaller. Moreover, when the series is also stationary, the number of unique paircopulas is equal to the Markov order (Beare and Seo, 2015, Smith, 2015). For , by denoting , and , this parsimonious Dvine copula density is
(4)  
where and are the paircopula densities. Given , the arguments are computed using a recursive algorithm given in Smith (2015).
LoaizaMaya et al. (2017) show that is able to capture persistence in the variance if one or more allows for concentration of the probability mass in the four quadrants of the unit square. To do so they suggest the following mixture of rotated copulas:
(5) 
Here, , is a weight, and are two parametric bivariate copula densities with nonnegative Kendall’s tau and parameters and respectively. In our empirical work, for the mixture components and we employ the ‘convex Gumbel’ defined as follows. Let be the density of a Gumbel copula parameterized (uniquely) in terms of its Kendall tau value . Then the convex Gumbel has a density equal to the convex combination of that of the Gumbel and it’s rotation 180 degrees (ie. the survival copula), so that
with . This copula was suggested by Junker and May (2005). When employed for and in Equation (5), it gives a five parameter bivariate copula with , , and a density that is equal to a mixture of all four 90 degree rotations of the Gumbel copula. We use independent uniform priors on the elements of in our empirical work.
To measure the level of serial dependence captured by our copula model, we use the Spearman’s correlation between and for . Following Genest and Nešlehová (2007), for ordinalvalued variables this is
where is the copula function of the distribution of , which varies with when is stationary (Smith, 2015). This copula is constructed by simulating (many) draws of from , and then constructing the bivariate empirical copula from the draws of the two relevant elements of .
3 Bayesian Estimation
From Equation (3), the augmented posterior density is
(6) 
where is the prior. This admits the posterior as one of its margins and Smith and Khaled (2012) propose a MCMC data augmentation method for its evaluation. Despite being exact, this data augmentation approach is generally slow, and computationally infeasible for highdimensional copulas. Variational Bayes (VB) is an alternative inferential method to MCMC, with both methods typically applicable to the same problems. Here, we use the augmented posterior to develop a new VB estimator for .
3.1 Variational Bayes Estimator
VB makes possible the estimation of copula models with discrete margins, even for copulas in high dimensions and with a large number of parameters. Here, is approximated by a tractable density with parameters , called the variational approximation. Estimation consists of finding values of that minimize the KullbackLeibler divergence
This can be shown (Ormerod and Wand, 2010) to correspond to maximizing the lower bound of the logarithm of the marginal likelihood , given by
In selecting , it is common to assume independence between some or all parameters (McGrory and Titterington, 2007, Wand et al., 2011), and we do so here between between and . The variational approximation we use is
(7) 
where , is a tractable dimensional copula density with parameters , and is a tractable density with parameters . We specify and in greater detail later.
We follow much of the literature and use stochastic gradient ascent (SGA) methods to maximize . This approach only requires that (i) generation from is possible, and that (ii) the target distribution is tractable and can be evaluated up to proportionality. Condition (i) is met by our choices for and outlined below. Condition (ii) is met because the augmented posterior is tractable, whereas based on Equation (2) is not. To implement SGA, initial values for the parameters, , are selected and then the lower bound is sequentially optimized by values obtained by the updating formula
Here, is an unbiased estimate of the lower bound’s gradient , and is the learning rate, set using the ADADELTA method described in the Appendix A. To compute the gradient, SGA methods resort to the “logderivative trick” (), and show that
(8) 
with , and is the expectation with respect to . Notice from Equation (8) that an unbiased estimate is , where
with as the number of elements in . An advantage of our choice of variational approximation is that th element of the gradient simplifies to . For , the values and constrained to .
Algorithm 1 presents how the SGA optimization works within variational Bayes. Step (1b) is based on the work by Tran et al. (2017), which employs a vector of control variates, , for variance reduction of the unbiased estimate of the gradient. The stopping rule is commonly set as a fixed number of SGA steps taken (Ong et al., 2017).


Generate for

Estimate with
Cov(.) and Var(.) are sample estimates of covariance and variance based on the S samples from step (a).

.


Repeat until some stopping rule is satisfied

Generate for

Compute with

Estimate as in Step 1(b).
Compute using the ADADELTA method.
Set .
3.2 Variational Approximation
While a number of different choices for and are possible in Equation (7), we outline here specific choices that we find work well when estimating our copula model. Denoting the number of parameters in as , the most popular choice for is the density of a distribution, because it is quick to generate from and the gradient is available in closed form (Opper and Archambeau 2009, Challis and Barber 2013, Titsias and LázaroGredilla 2014, Kucukelbir et al. 2016, Salimans et al. 2013). To ensure is positive definite, is typically a convenient reparametrization of and . In applications where has a large number of elements, a sparse representation of helps to improve the accuracy of the gradient estimate and its speed of computation. We follow Ong et al. (2017), who suggest the factor representation of the covariance matrix , where the matrix is of dimension , is the number of factors and . All the elements in the upper triangle of are set to zero. is a diagonal matrix such that , where is the element of the vector .
One advantage of a factor decomposition is that the gradients required to implement Steps 1(b) and 2(b) can be computed in closed form. To succinctly present these the following notation is introduced. For a matrix of dimension , the function is defined as with for . Also, the vector of diagonal entries of the square matrix is written as . Employing this notation, the vector of parameters can be written as with , and the gradient
with
Fast calculation of these gradients can be undertaken using the Woodbury forumla; see Ong et al. (2017) for further details.
In our empirical work we select the independence copula density , so that . This is a convenient choice that further reduces the number of computations required to implement Algorithm 1, because generation of in Steps 1(a) and 2(a) consists of generating only independent uniforms. While this choice for is likely to result in a poor variational approximation to the marginal posterior , this is not a concern because the latent variables are introduced only to make the SGA efficient. We find in our empirical work that is an accurate approximation to .
Last, we make two additional observations. First, the VBIL estimator of Gunawan et al. (2016) uses an unbiased estimator of the intractable likelihood in Equation (2) computed using importance sampling. This involves drawing values of , at which is repeatedly evaluated. Whenever evaluating the copula density is computationally intensive — such as for here or with other highdimensional or complex copulas — then this approach will be slow. In comparison, our approach does not involve computing an approximation to the likelihood, so that it is many times faster for these cases. Second, we employ the variational approximation at Equation (7) because of its computational tractability. Alternative approximations where or , increase the computations required to maximize the lower bound by SGA greatly.
3.3 Data Augmentation
We now outline MCMC data augmentation, tailored for the parsimonious Dvine copula in Section 2.2. Key to implementation is the evaluation of the conditional densities and distribution functions below. If and , then
where is the conditional paircopula function. This is given in Appendix C1 of LoaizaMaya et al. (2017) for the mixture copula defined at Equation (5).
The values are integrated out of the augmented posterior as part of an MCMC sampling scheme. The scheme generates from the conditional posteriors (1) , and (2) . Given the values , step (2) can be undertaken using (adaptive) random walk MetropolisHastings (MH), where is generated conditional on for . However, step (1) is more involved, with the latent variables generated jointly using a MH step with proposal density , where and
Therefore, a proposal iterate can be obtained from by generating sequentially from the univariate densities . Each of these is a constrained univariate distribution with known distribution function, so that iterates can be generated easily using the inverse distribution method. An advantage of the proposal is that the MH acceptance ratio is fast to compute. The probability of accepting over the previous value is
In all our empirical work we employ a burnin sample of 10,000 iterates, followed by a further 20,000 iterates from which we compute posterior inference.
3.4 Examples: New South Wales Homicide
We illustrate the copula time series model, and the efficacy of the VB estimator, using three examples. These are monthly counts of the crimes of Murder, Attempted Murder and Manslaughter in the Australian state of New South Wales (NSW) between January 1995 and December 2016. The data is sourced from the NSW Bureau of Crime Statistics and Research, and are counts of crimes in the broader category of Homicide. Figure 1 gives both relative frequency histograms and time series plots of the observations of the three series.
We set , and fit the copula using paircopula components for , and . For and we chose convex Gumbels, so that for . This dimensional Dvine copula has a total of parameters. We set to the empirical distribution functions in Figure 1(b,d,f). To estimate the copula parameters we first use data augmentation to compute the exact posterior as outlined in Section 3.3. Table 1 reports the posterior means and intervals of the copula parameters. To summarize the serial dependence captured by the copula, Table 2 reports the posterior of the Spearman correlations for for the three crimes. Correlation is strong for Attempted Murder, but not for Murder and Manslaughter. However, this measures correlation in the level of the series, and not more general dependence, such as in higher order moments. Figure 2 presents the logdensities of the paircopulas at the posterior mean values. Note that most of these copula densities are far from uniform, which indicates more general serial dependence exists in these series. Most of the paircopulas have probability mass in the offdiagonal corners of the unit square, which LoaizaMaya et al. (2017) show is indicative of serial correlation in conditional variance (ie. heteroskedasticity).
This DVine was also estimated using the VB method proposed in Section 3.1. Because all parameters are bounded between 0 and 1, we transform them to the real line as , where . Estimation was implemented separately for factors. To compute the gradient estimate we set , and 5000 VB steps are used when . The estimates of and when were used as the initial values for the remaining cases when , for which an 3000 additional VB steps were used. Figure 3(ac) shows how the lower bound increases with , and any increase in the lower bound is small for . Figure 3(df) plots the lower bound against VB step when , suggesting that the SGA algorithm converges within 1000 steps.
To illustrate the accuracy of the VB approximation, Figure 4 plots the posterior means and standard deviations of from the VB approximation against their (effectively exact) values computed via MCMC. Both moments of the VB approximations are close to those of the true posterior. The first three rows in Table 3 present the copula specifications and total estimation times for the MCMC and VB methods for the three univariate examples. The computations were undertaken on a Dell Precision workstation using Matlab, and in parallel for key computations for both estimators. The results show that the VB approach is considerably faster than MCMC data augmentation. Moreover, the main computation of the VB estimator is the repeated evaluation of at Step 2(b). This is slow because computing the arguments of the paircopulas is computationally intensive, and the VB method proves even faster for simpler copulas.
4 Multivariate Ordinal Time Series
In this section we extend the time series copula to capture the dependence in multiple ordinalvalued series.
4.1 Copula Model and Estimation
Consider an dimensional stationary stochastic process , where , and each element is ordinalvalued with margin . Then if and , the augmented likelihood is
(9) 
where and . The copula density in Equation (9) is of dimension , and captures both crosssectional and serial dependence jointly. For this, Biller and Nelson (2003), Biller (2009) and Smith and Vahey (2016) use a Gaussian copula, with parameter matrix equal to the correlation matrix of a stationary vector autoregression. However, the Gaussian copula cannot capture the high level of persistence in the variance often exhibited in ordinal time series. Instead, we follow Beare and Seo (2015), Brechmann and Czado (2015) and Smith (2015) and again use a Dvine copula, but with a parsimonious form corresponding to a stationary Markov multivariate series. The paircopula components are of the form at Equation (5) to account for heteroskedasticity.
This Dvine has density
(10) 
The functionals are each products of blocks of paircopula densities, and do not vary with for stationary series. They are defined as
where is a bivariate paircopula density with parameters . When , there are of these associated with , and they collectively capture crosssectional dependence between the variables. For example, if they were each equal to the bivariate independence copula with density , then and the variables would be independent contemporaneously. When , there are paircopulas associated with block that capture serial dependence at lag . In total, there are unique paircopulas, which is much less than the in an unconstrained Dvine. The indices of the paircopula arguments are and , and the argument values are computed using the Algorithm 1 of LoaizaMaya et al. (2017). Last, we note that if , then and , so that with the notation , the copula densities at Equations (4) and (10) are the same.
To measure the dependence between and , with , we use the Spearman’s correlation
(11) 
Here, is the probability mass function corresponding to , while is the copula function of the bivariate marginal of . The latter is computed by simulating from and then constructing the empirical copula function for .
The augmented posterior of this copula time series model is
(12) 
Because of the very large number of elements in , estimation using data augmentation is computationally infeasible for even moderate values of and . However, our VB estimator can be employed with variational approximation
where is a dimensional copula density. Again, we assume , so that at Steps 1(a) and 2(a) of Algorithm 1, we generate from uniform distributions on for , , and .
4.2 Example: New South Wales Homicide
We consider a trivariate time series copula model for the NSW monthly Homicide counts, with the empirical distributions as univariate marginals. The copula density is given Equation (10), where we set and adopt paircopula densities of the form . The dimension of the Dvine copula is , and Table 3 reports its specification. The copula parameters are estimated using the VB estimator with factors. To estimate the gradient in Algorithm 1, and 5000 VB steps used with . Again, the estimates for and when , were the initial values for the remaining factor specifications, which employed additional VB steps. Figure 5 plots the variational lower bound against in panel (a), and against the VB step when in panel (b). A total of factors appears sufficient for . Table 4 reports the posterior means and intervals of the copula parameters.
Table 5 reports the estimates of the pairwise Spearman correlations. The contemporaneous correlations () are given in the top two rows, and first order serial correlations () in the bottom rows. There is positive contemporaneous correlation between Attempted Murder and Murder, and also (weakly) with Manslaughter. There is first order serial correlation in Attempted Murder, but not in the other two crimes. The most striking result is that Attempted Murder is positively correlated with Murder and Manslaughter one month later, suggesting it is a leading indicator of these two forms of homicide. However, these correlations measure dependence in the level only. Figure 6 displays the logarithm of the 15 unique paircopula densities. Most have mass in the offdiagonal corners of the unit square, indicating that the copula is capturing heteroskedasticity and ‘variance spillovers’ between the three series.
5 Mixed Multivariate Time Series
5.1 Copula Model and Estimation
Consider the case of a stochastic process , where consists of ordinal and continuous valued variables, which we refer to as ‘mixed’. Without loss of generality, assume that the first elements are ordinal, and that , with and . Similarly, we set with and . Then the likelihood augmented with is
where , and . From this, the augmented posterior is
(13) 
Note that the augmented posterior only includes latent variables , whereas are constants for . We employ the parsimonious DVine outlined in Section 4 for .
Smith and Khaled (2012) discuss how to implement data augmentation for copulas with a mixture of discrete and continuous margins. However, this can be slow or computationally infeasible for values of that occur frequently in time series analysis. To employ the VB estimator here, we adopt the variational approximation
to the augmented posterior above, where as before. Algorithm 1 can be employed with replaced by , and where the elements of are generated at Steps 1(a) and 2(a), whereas are not because they are constants.
Equation (4.1) can be used to compute the Spearman correlation between two ordinalvalued variables with and . If both variables are continuousvalued, then . But if is ordinal and is continuous, then
where the integral can be computed numerically.
5.2 Example: Bankruptcy and the VIX
We study the dependence between the continuousvalued VIX index, which measures U.S. market volatility, and the number of public company bankruptcy cases filed in U.S. courts. We employ the monthly average value of the VIX obtained from the FRED website, while the bankruptcies are monthly counts sourced from the UCLALoPucki Bankruptcy Research Database. The time series are from December 1989 to April 2017, so that . A positive relationship between market volatility and bankruptcies has been documented previously (Bauer and Agarwal, 2014). Figure 7(a) plots both series, while Figure 7(b) displays the empirical distribution of the VIX, conditional on the number of bankruptcies. The positive correlation between the two series is apparent in both panels.
We employ the DVine copula model with and a total number of parameters; Table 3 reports the copula specification. Separate variational approximations with factor decompositions for were estimated. The same values for and number of VB steps were adopted as in Section 4.2. Figure 8(a) plots the lower bound against , and it varies little for . This is consistent with the empirical results in Ong et al. (2017), who found that a higher number of factors are needed for the accurate approximation of more complex posteriors.
The copula parameter estimates are reported in the Online Appendix, while Table 6 reports the pairwise Spearman correlations. Both the number of bankruptcies and the VIX exhibit serial correlation, although the latter more so. The two series are positively correlated, both contemporaneously and in the lagged values. However, the lagged values of the VIX are more highly correlated with later bankruptcies, suggesting that the VIX is a leading indicator of public company bankruptcy filings. The fitted paircopulas densities are plotted in the Online Appendix, and their form is consistent with heteroskedastic time series. For example, and have mass concentrated in all four corners, indicating positive crosscorrelation in the variance of the two series at different lags; ie. volatility ‘spillover’.
6 Discussion
This paper makes two main contributions. The first is to propose a new VB estimator for copula models with discrete, or a combination of discrete and continuous, variables. The approach can be used to estimate copulas with a higher dimension and number of parameters than previous methods. We illustrate this using time series copulas of up to 792 dimensions and 60 parameters, although the method can be used to estimate copula models for crosssectional or longitudinal data just as readily. The second main contribution of the paper is to propose a new time series model for multivariate ordinalvalued variables, where a copula captures serial and crosssectional dependence jointly. Our proposed copula is a parsimonious Dvine that can capture serial dependence in both the level and conditional variance — with the latter being an important feature in much ordinalvalued data. The time series model is highly flexible, where any marginal features in a time series can be captured by an arbitrary distribution, and is easily extended to mixed series.
In our VB approach, a key observation is that it is computationally advantageous to employ a variational approximation to the augmented posterior , rather than the intractable posterior . This is consistent with Tan and Nott (2017) and Ong et al. (2017), who find that for mixed effects generalized linear models, using a variational approximation to the posterior augmented with the random effects values can also be computationally efficient. In selecting our variational approximation , we follow Ong et al. (2017) and employ a Gaussian approximation with factor covariance structure for at Equation (7). But for we suggest an approximating density of the form . Possible choices for include the Gaussian or t copula densities, although we find that the independence copula with density works well for the time series copula models estimated here, and greatly reduces the computations in the SGA algorithm.
Ordinalvalued time series frequently exhibit both serial dependence in the level and heteroskedasticity. Few existing copulas can capture both jointly, yet the Dvine used here can do so for Markov and stationary series. Smith et al. (2010) use a vine copula to model serial dependence in longitudinal data, although the copula was only 16dimensional and homoskedastic. As far as we are aware, this is the first paper to use highdimensional copulas to capture the serial dependence in ordinal time series data. An advantage of such copula time series models is that they allow for the more accurate modeling of data with multimodal and irregular margins, as well as being readily extended to multivariate series. However, their estimation is computationally challenging, and our proposed VB approach provides a solution that can work well, as illustrated by our examples.
Appendix A Adadelta
The learning rate can be set using different methods. For example Tran et al. (2017) set it as a sequence with manually tuned parameters. Ong et al. (2016) propose an adaptive learning rate based on previous work by Ranganath et al. (2013). Here, we employ the ADADELTA method because it provides reliable convergence of the SGA algorithms (Zeiler, 2012, Ong et al., 2017). This method consists of individually updating the step size for each element in as
with , denoting the element of and is given by