Embarrassingly Parallel Sequential Markov-chain Monte Carlo for Large Sets of Time Series
Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the Metropolis-Hastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an algorithm that combines “divide and conquer” ideas previously used to design MCMC algorithms for big data with a sequential MCMC strategy. The performance of the method is illustrated using a large set of financial data.
Key words: Big Data, Panel of Time Series, Parallel Monte Carlo, Sequential Markov-Chain Monte Carlo.
There is little doubt that one of the main challenges brought on by the advent of Big Data in Bayesian statistics is to develop Markov chain Monte Carlo (MCMC) algorithms for sampling a posterior distribution derived from a very large sample. While MCMC has become the default tool to study posterior distributions when they are not available in closed form, many commonly used sampling algorithms, e.g. the Metropolis-Hastings samplers, can become computationally prohibitive when a large number of likelihood calculations are needed at each iteration.
In recent years we have witnessed a large research effort devoted to dividing the MCMC computational load among a number of available processors and recombining the results with as little loss in statistical efficiency as possible. For instance, the approaches developed in Scott et al. (2013) and Neiswanger et al. (2014) divide the available data in smaller batches and sample the resulting partial posteriors obtained from each batch of data. They propose different methods to combine the resulting draws so that the efficiency of the resulting Monte Carlo estimator is close to the one that would have been obtained if the full data posterior had been sampled. In the consensus MCMC of Scott et al. (2013) this is achieved via reweighting the partial posterior samples, while the embarrassingly parallel approach of Neiswanger et al. (2014) relies on kernel density approximations of the partial posteriors to produce an approximation of the full one. In Wang and Dunson (2013) the authors propose a refined recombination strategy based on the Weierstrass transformation of all partial posteriors.
While dividing the whole data into batches can be done easily when the data are independent, one must proceed cautiously when the data exhibit long range dependencies, as is the case in time series. In such cases, simply splitting time series into blocks can lead to poor estimates of the parameters. Instead, one can sometimes bypass the computational load by sequentially updating the posterior over time (see, for instance, Dunson and Yang, 2013).
Sequential sampling may be improved when combined with parallel and possibly interacting Monte Carlo methods that were used elsewhere, e.g. for parallel adaptive MCMC (Craiu et al., 2009), for interacting MTM (Casarin et al., 2013), for population Monte Carlo (Cappe et al., 2004), for Sequential Monte Carlo (Del Moral and Miclo, 2000; Del Moral, 2004) and for massively parallel computing (Lee et al., 2010) .
Sequential estimation is useful in many applied contexts such as on-line inference of econometric models for both out-of-sample and in-sample analyses. However, sequential estimation is a challenging issue in Bayesian analysis due to the computational cost of the numerical procedures for posterior approximation. Moreover, the computational cost rapidly increases with the dimension of the model and the inferential task becomes impossible. In this sense our paper contributes to the recent stream of the literature on the use of Central Processing Unit (CPU) and Graphics Processing Unit (GPU) parallel computing in econometrics (e.g., see Doornik et al. (2002), Swann (2002), Creel (2005), Creel and Goffe (2008), Suchard et al. (2010)).
Our contribution here is to consider possible ways to combine strategies following the work of Neiswanger et al. (2014) with sequential MCMC in order to address difficulties that appear when studying large panel time series data models. Analyses of panel time series data appear frequently in the econometrics literature as discussed in the review papers of Canova and Ciccarelli (2013) and Hsiao (2015). Moreover, the use of latent variables for time series panel models in combination with a Bayesian inference approach (Kaufmann, 2010, 2015; Billio et al., 2015) can be quite challenging, due to the computational burden required for the latent variable estimation. These challenges motivate this work in which the Bayesian stochastic volatility model recently proposed in Windle and Carvalho (2014) and discussed in Casarin (2014) is adapted to the context of large panel of time series.
In Sections 2 and 3 we introduce, respectively, the issues related to sequential sampling in models with latent variables and present our algorithm. Section 4 contains the simulation studies and the real data analysis. The paper closes with conclusions and future directions.
2 Posterior Distribution Factorization
Consider a time series sample , with probability density function (pdf) where is a parameter vector. Of interest is sampling from the posterior distribution of . We are considering here the case in which the latter task is made easier if a data augmentation approach is adopted. We introduce auxiliary variables , that exhibit Markovian serial dependence, i.e. each has pdf . If is the conditional pdf of given and , then for prior distribution we obtain the joint posterior of the data and the latent variables as
where is the normalizing constant of , and .
Henceforth, we assume that the time series data has panel structure such that if we consider all the data collected up to time , , then it is possible to partition them into blocks of size each,
where the th block contains the measurements up to time for the components of (for notational simplicity we set but other allocations are possible), i.e. for all
Each set of the partition contains the th panel of dependent components of . An important assumption of the model is that, conditional on a parameter value , the components in two partition sets are independent, i.e.
for any .
Corresponding to the partition (1) there is an equivalent partition of the auxiliary variables
where the components of correspond to the components of included in . A second crucial assumption for the validity of our algorithm is the independence of the auxiliary variables contained in two elements of the partition (3), i.e.
for all . Finally, we assume that depends only on those auxiliary variables included in , i.e. given the latter we have
These assumptions are not unusual in the context of dynamic panel data models as they are used for theoretical derivations in Anderson and Hsiao (1982), Blundell and Bond (1998) and Boneva et al. (2015) as well as in applications (see, for instance, Abrevaya and Shen, 2014; Kaufmann, 2010, 2015; Billio et al., 2015).
The basic principle underlying our approach is to learn sequentially over time using different cross-sectional data blocks. Our algorithm relies on samples from the the joint posterior distribution of and , , conditional on the sub-sample
where , and is the -th block normalizing constant.
3 Embarrassingly Parallel SMCMC
So far we have discussed the factorization of the posterior distribution based on the panel-type structure of the data. In this section we show how the algorithm handles the serial dependence in the data and samples from for each using a sequential MCMC strategy. For expository purposes we assume that the parameter estimates are updated every time a new observation become available, but in the application we will update the estimates after every th observation is collected.
Let us define , , the time sequence of augmented parameter vectors with non-decreasing dimension , . In order to take advantage of the partition described in the previous section we also introduce , a parameter vector of dimension . Since the augmented parameter vector can then be partitioned as and for all , the correctness of the algorithms relies on the compatibility condition for the priors on at each time and for each th block of data. Specifically, we assume that if the prior is , where , then it satisfies the compatibility condition
Our embarrassing SMCMC algorithm iterates over time and data blocks. At each time , the algorithm consists of two steps. In the first step, for each data block , we use parallel SMCMC chains, each of which yields samples from , i.e. we generate , , and from . Based on all samples , , and we produce the kernel density estimates of the marginal sub-posteriors .
In the second step we take advantage of the factorization (7) and the posterior is approximated by combining the approximate sub-posteriors , . Samples from this distribution can be obtained by applying the asymptotically exact posterior sampling procedure detailed in Algorithm 1 of Neiswanger et al. (2014). It is worth noting that (7) holds for any so, if needed, one can approximate at intermediate times by combining the ’s. However, is not used directly in the final posterior so if one is interested only in the latter then the merging of partial posteriors is performed only at time .
The pseudocode of the proposed EP-SMCMC is given in Algorithm 1 and the details of the SMCMC and of the merge step are detailed in the following sections.
Embarrassingly Parallel SMCMC (EP-SMCMC)
For draw , , and from by using the SMCMC transition.
When needed (usually at time ):
Compute the kernel density estimate of the marginal sub-posteriors by using the samples , , and .
approximate the posterior by combining the approximate sub-posteriors , .
3.1 Sequential MCMC
In this section we discuss the construction of the SMCMC samplers that are used in the first step of the EP-SMCMC. To simplify the notation we drop the index indicative of the data block. In the SMCMC algorithm a population of parallel inhomogeneous Markov chains are used to generate the samples with , and from the sequence of posterior distributions , . Each Markov chain of the population is defined by a sequence of transition kernels , , that are operators from to , such that is a probability measure for all , and is measurable for all .
The kernel has as stationary distribution and results from the composition of a jumping kernel, and a transition kernel, , that is
where the fixed dimension transition is defined as
with , and is the identity kernel. We assume that the jumping kernel satisfies
where and . This condition ensures that the error propagation through the jumping kernel can be controlled over the SMCMC iterations.
In order to apply the SMCMC one need to specify the transition kernel and the jumping kernel at the iteration . The transition kernel at the iteration allows each parallel chain to explore the sample space of a given dimension, , and to generate samples , from the posterior distribution . The jumping kernel at the iteration , allows the chains to go from a space of dimension to one of dimension .
3.2 Merge step
The merge step relies on the following approximation the posterior distribution
where with and , , and is a Gaussian kernel with bandwidth parameter . Following the embarrassing MCMC approach the posterior distribution can be written as
where , and
3.3 Parameter tuning
As suggested by Dunson and Yang (2013), the number of iterations of the Sequential MCMC sampler at each time are chosen accordingly to the correlation across the parallel chains. We let the number of iterations at the iteration , , be the smallest integer such that , where is the rate function associated with the transition kernel and is a given threshold level. An upper bound for the rate function is provided by chain autocorrelation function at the -th lag. It can be estimated sequentially using the output of all the parallel chains: where
with the -th element of the vector of the parameters and the latent states generated up to time by the –th chain at the the –th iteration. is the average of the draws over the parallel chains.
4 Numerical Experiments
4.1 Simulated data
We consider a time series model in which is the realized variance for the -th log-return series at time , and is the latent stochastic volatility process. In order to capture the time variations in the volatility of the series we consider the exponential family state space model for positive observations recently proposed in Windle and Carvalho (2014) and extend it to the context of panel data. The model can be mathematically described then as:
for , where denotes the gamma distribution and the beta distribution of the first type.
We generate 1000 time series of 1000 observations each, and obtain a dataset of 1 million observations. In Fig. 1 we illustrate such a simulated dataset. Inference for a nonlinear latent variable model with this large a sample via MCMC sampling can be computationally challenging. In the simulation we set , and , with initial condition , . For each series we have generated 3,000 realizations, but the initial 2000 samples were discarded so that dependence on initial conditions can be considered negligible.
We aim to estimate the common parameters , and and assume a uniform prior distribution for , i.e. and proper vague prior distributions for and , that is and truncated on the region .
In order to apply the EP-SMCMC algorithm we assume that the series are split into blocks of series each. The updates are performed every observations, so the total number of updates is . Let us denote with the column vector a collection of variables with . We define the -th block of series and the -th block of latent variables as the -matrices and , respectively, with , and . Then, the complete-data likelihood function at time for the -th block is
where . Then the sub-sample posterior, based on the complete-data likelihood function of the -th block is
At time , and for the -th block, the -th SMCMC parallel chain has the transition kernel of a Gibbs sampler which iterates over the following steps:
with , and is a -dim matrix where the -th row elements drawn from the jumping kernel at time . At time , as a new observation become available, the dimension of the state space for the -th block SMCMC chains increase from to . We choose as jumping kernel of the -th parallel chain to be the transition kernel of a Gibbs sampler with the following full conditional distribution
where . The details of the sampling procedures for the three steps are given in Appendix A.
In the simulation example we compare our EP-SMCMC with a MCMC repeated sequentially over time and a Sequential Monte Carlo (SMC) with unknown parameters. For the EP-SMCMC we used parallel chains and a number of iterations close to 50, on average. For the MCMC we use a multi-move Gibbs sampler where the latent states are sampled in one block by applying a forward-filtering backward sampling (FFBS) procedure. The filtering and smoothing relationships are given in Appendix C. In order to update the parameters we have used a Metropolis-Hastings step. The MCMC chain of our multi-move sampler is mixing quite well due to the FFBS and also the Metropolis step has acceptance rates about 0.3 which is a good rate for many models as argued in Roberts and Rosenthal (2001).
In our MCMC analysis we considered two cases. The first one is based on samples of 1,000 iterations after a burn-in phase of 500 iterations in order to have a computational complexity similar to the one of the EP-SMCMC. The second one is based on samples of 25,000 iterations after a burn-in phase of 20,000 iterations and is used to have reliable MCMC estimates of the parameter posterior distribution based on the whole sample.
For the SMC we also consider the regularized auxiliary particle filter (RAFP) combined with the embarrassingly parallel algorithm to obtain a EP-RAPF. In Appendix B we present the computational details of running the r-APF for our stochastic volatility model. We refer to Liu and West (2001) for the definition of regularized particle filter and to Casarin and Marin (2009) for a comparison of different regularized filters for stochastic volatility models.
When we compare the EP-SMCMC, MCMC and EP-RAPF algorithms, we use as a measure of efficiency the mean square errors for the parameters , and after the last time-block of data has been processed. The mean square errors have been estimated using 40 independent runs of each algorithm. The same set of data has been used to reduce the standard error of the MSE estimates. The experiments have been conducted on a cluster multiprocessor system with 4 nodes; each node comprises of four Xeon E5-4610 v2 2.3GHz CPUs, with 8 cores, 256GB ECC PC3-12800R RAM, Ethernet 10Gbit, 20TB hard disk system with Linux. The algorithms have been implemented in Matlab (see The MathWorks, Inc. (2011)) and the parallel computing makes use of the Matlab parallel computing toolbox.
The structure of the EP-SMCMC sampler allows for sequential data acquisition and for parallel estimation based on different cross-sectional blocks. Thus, the MSE obtained after processing the last block is given in Table 1 for different dimensions of the time blocks (rows) and of the cross section blocks (columns). Fig. 2 shows the posterior approximation obtained from one run of the EP-SMCM on the dataset shown in Fig. 1. From our experiments we find that the parameters that are most difficult to estimate are and , whereas has lower MSEs regardless of the choice of block size and type of algorithm. For the EP-SMCMC sampler a larger size of the time blocks reduces the MSE, possibly due to a reduction in the propagation of the approximation error over the iterations. The behaviour of the MSE with respect to the cross-sectional block size is not monotonic. The MSE initially decreases with , but for larger it increases. From our experiments, values of between 20 and 40 yield the best results. The and MSEs for the MCMC are higher then their EP-SMCMC counterparts, likely due to the lack of convergence of the MCMC in the 1,500 iterations per time block. The large MSEs for the EP-RAPF are due to artificial noise introduced by the regularization step for the parameters and also to the hidden state estimation. In the EP-RAPF implementation we have used 1,500 particles and the artificial noise is necessary to avoid degeneration of their weights. Note that within each time block the EP-RAFP is using the filtered states instead of the smoothed states. A combination of the RAFP with a MH step or a particle MCMC would improve the performance of the EP-RAFP algorithm in the estimation of both states and parameters, at a price of increasing computing . We leave the further study of this issue for a future communication.
Also, we compare EP-SMCMC, MCMC and EP-RAPF in terms of computing time. For the EP-SMCMC we consider cross-sectional blocks of size . The computing times are given in Table 2 for different time acquisition rates (rows) and different CPUs working in parallel (columns). We conclude that the EP-SMCMC implemented on cluster multiprocessor system can be up to 80 times faster than the standard MCMC. These results are in line with the ones obtained in previous studies on parallel Monte Carlo methods (Casarin et al. (2015), Lee et al. (2010), Geweke and Durham (2012)). The calculation in all our experiments have been carried out using double precision. If an application allows for a lower degree of precision, then single precision calculation can lead to large gains in computing time as documented by Lee et al. (2010) in the context of GPU parallel computing.
4.2 Real data
We consider a panel of 12,933 assets of the US stock market and collect prices at a daily frequency from 29 December 2000 to 22 August 2014, which yields 3,562 time observations. Then we compute logarithmic returns for all stocks and obtain a dataset of 46,067,346 observations.
In order to control for the liquidity of the assets and consequently for long sequences of zero returns, we impose that each stock has been traded a number of days corresponding to at least 40% of the sample size. Also we focus on the last part of the sample which include observations from 8 February 2013 to 22 August 2014. After cleaning the dataset we obtain 6,799 series and 400 time observations, and the size of the dataset reduces to 2,846,038. The original and the cleaned datasets are give in Fig. 3.
In order to capture the time variations in the volatility of the series we consider the panel state space model presented in the previous section. We apply our EP-SMCMC sampler to the panel of 6,799 time series. The data are acquired sequentially over time in blocks of 100 time observations and each panel consists of series. At each point in time, for each parameter , and we obtain 500 posterior densities from each parallel SMCMC chain (see C.1-C.3 in Appendix C an example for ).
Figure 4 shows the sequential posterior inference after the merge step of the embarrassingly parallel SMCMC algorithm is applied to the output of the SMCMC samples. At each point in time we obtain an approximated posterior density for the whole cross-section from the embarrassingly parallel step (see solid lines in 4 for ).
The approximation of the posterior produced by the EP-SMCMC is close to the approximation based on a MCMC analysis of the full posterior. This can be seen in Fig. 4) where the solid line shows the EP-SMCMC approximation at the last update () and the dashed line represents the full-sample estimate based on a standard MCMC with 25,000 iterations after a burn-in period of 20,000 iterations. The posterior mean and posterior quantiles approximated by EP-SMCMC are given in Tab. 3. In order to approximate the high posterior density (HPD) intervals and the posterior mean we generate samples from the posterior distribution given in Eq. 8 by applying the independent Metropolis within Gibbs algorithm given in Neiswanger et al. (2014).
We propose a new MCMC algorithm which combines embarrassingly parallel MCMC and Sequential MCMC. The algorithm is developed for data that exhibit dependent patterns, in particular for large sets of time series for which a standard MCMC-based analysis would be very slow. Here we take advantage of the independence between the unit-specific observations and latent variables of the panel to partition the data and factor the full posterior in a product of partial posteriors. In the absence of clear independent panel units, an interesting and difficult question concerns alternative strategies to divide the data and combine the partial posterior samples.
It is apparent that the development of novel MCMC algorithms for big data is evolving rapidly. While “divide and conquer” strategies continue to develop, one must devise techniques to handle the additional approximations that are introduced by the current existing methods, including EP-SMCMC. Quantifying and controlling the error introduced by these approximations remains central to the success of MCMC for Big Data.
We thank the Editor, the Associate Editor and three referees for comments and suggestions that have greatly improved the paper. This research used the SCSCF multiprocessor cluster system at University Ca’ Foscari of Venice. Roberto Casarin’s research is supported by funding from the European Union, Seventh Framework Programme FP7/2007-2013 under grant agreement SYRTO-SSH-2012-320270, by the Institut Europlace of Finance, “Systemic Risk grant”, the Global Risk Institute in Financial Services, the Louis Bachelier Institute, “Systemic Risk Research Initiative”, and by the Italian Ministry of Education, University and Research (MIUR) PRIN 2010-11 grant MISURA. Radu Craiu’s research is supported by the Natural Sciences and Engineering Research Council of Canada. Fabrizio Leisen’s research is supported by the European Community’s Seventh Framework Programme FP72007-2013 under grant agreement number 630677.
- Abrevaya and Shen (2014) Abrevaya, J. and Shen, S. (2014). Estimation of censored panel-data models with slope heterogeneity. J. of Applied Econometrics 29 523–548.
- Anderson and Hsiao (1982) Anderson, T. and Hsiao, C. (1982). Formulation and estimation of dynamic models using panel data. Journal of Econometrics 18 47–82.
- Billio et al. (2015) Billio, M., Casarin, R., Ravazzolo, R. and van Dijk, H. K. (2015). Interactions between eurozone and US booms and busts: A Bayesian panel Markov-switching VAR model. Journal of Applied Econometrics forthcoming.
- Blundell and Bond (1998) Blundell, R. and Bond, S. (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87 115–143.
- Boneva et al. (2015) Boneva, L., Linton, O. and Vogt, M. (2015). A semiparametric model for heterogeneous panel data with fixed effects. Journal of Econometrics to appear.
- Canova and Ciccarelli (2013) Canova, F. and Ciccarelli, M. (2013). Panel vector autoregressive models: a survey. In VAR Models in Macroeconomics: New Developments and Applications: Essays in Honor of Christopher A. Sims (L. Thomas, B. Fomby and A. Murphy, eds.).
- Cappe et al. (2004) Cappe, O., Guillin, A., Marin, J.M. and Robert, C.P. (2004). Population Monte Carlo. J. Comput. Graph. Statist. 13 907–929.
- Casarin (2014) Casarin, R. (2014). A comment on a tractable state-space model for symmetric positive-definite matrices. Bayesian Analysis 9 793–804.
- Casarin et al. (2013) Casarin, R., Craiu, R. V. and Leisen, F. (2013). Interacting multiple try algorithms with different proposal distributions. Statistics and Computing 23 185–200.
- Casarin et al. (2015) Casarin, R., Grassi, S., Ravazzolo, F. and van Dijk, H. K. (2015). Parallel sequential Monte Carlo for efficient density combination: the DeCo Matlab toolbox. Jounal of Statistical Software forthcoming.
- Casarin and Marin (2009) Casarin, R. and Marin, J.-M. (2009). Online data processing: Comparison of Bayesian regularized particle filters. Electronic Journal of Statistics 3 239–258.
- Craiu et al. (2009) Craiu, R. V., Rosenthal, J. S. and Yang, C. (2009). Learn from thy neighbor: Parallel-chain adaptive and regional MCMC. Journal of the American Statistical Association 104 1454–1466.
- Creel (2005) Creel, M. (2005). User-Friendly Parallel Computations with Econometric Examples. Computational Economics 26 107 – 128.
- Creel and Goffe (2008) Creel, M. and Goffe, W. L. (2008). Multi-core CPUs, Clusters, and Grid Computing: A Tutorial. Computational Economics 32 353 – 382.
- Del Moral (2004) Del Moral, P. (2004). Feynman-Kac Formulae. Genealogical and Interacting Particle Systems with Applications. Springer.
- Del Moral and Miclo (2000) Del Moral, P. and Miclo, L. (2000). Branching and interacting particle systems approximations of feynmanc-kac formulae with applications to non linear filtering. In Séminaire de Probabilités XXXIV. Lecture Notes in Mathematics, No. 1729. Springer, 1–145.
- Doornik et al. (2002) Doornik, J. A., Hendry, D. F. and Shephard, N. (2002). Computationally-intensive econometrics using a distributed matrix-programming language. Philosophical Transactions of the Royal Society of London, Series A 360 1245 – 1266.
- Dunson and Yang (2013) Dunson, D. and Yang, Y. (2013). Sequential Markov chain Monte Carlo. ArXiv: http://arxiv.org/abs/1308.3861.
- Geweke and Durham (2012) Geweke, J. and Durham, G. (2012). Massively Parallel Sequential Monte Carlo for Bayesian Inference. Working papers, National Bureau of Economic Research, Inc.
- Hsiao (2015) Hsiao, C. (2015). Challenges for panel financial analysis. In Econometrics of Risk, Studies in Computational Intelligence (V.-N. Huynh, V. Kreinovich, S. Sriboonchitta and K. Suriya, eds.).
- Kaufmann (2010) Kaufmann, S. (2010). Dating and forecasting turning points by Bayesian clustering with dynamic structure: A suggestion with an application to Austrian data. Journal of Applied Econometrics 25 309–344.
- Kaufmann (2015) Kaufmann, S. (2015). K-state switching models with time-varying transition distributions - Does loan growth signal stronger effects of variables on inflation? Journal of Econometrics 187 82–94.
- Lee et al. (2010) Lee, A., Christopher, Y., Giles, M. B., Doucet, A. and Holmes, C. C. (2010). On the Utility of Graphic Cards to Perform Massively Parallel Simulation with Advanced Monte Carlo Methods. Journal of Computational and Graphical Statistics 19 769 – 789.
- Liu and West (2001) Liu, J. S. and West, M. (2001). Combined parameter and state estimation in simulation based filtering. In Sequential Monte Carlo Methods in Practice (A. Doucet, N. de Freitas and N. Gordon, eds.). Springer-Verlag.
- Neiswanger et al. (2014) Neiswanger, W., Y., Y. and Xing, E. (2014). Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the 30th International Conference on Conference on Uncertainty in Artificial Intelligence.
- Roberts and Rosenthal (2001) Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various Metropolis-Hastings algorithms. Statist. Sci. 16 351–367.
- Scott et al. (2013) Scott, S., Blocker, A. and Bonassi, F. (2013). Bayes and big data: The consensus Monte Carlo algorithm. In EFaBBayes 250 conference, vol. 16.
- Suchard et al. (2010) Suchard, M., Holmes, C. and West, M. (2010). Some of the what?, why?, how?, who? and where? of graphics processing unit computing for Bayesian analysis. Bulletin of the International Society for Bayesian Analysis 17 12 – 16.
- Swann (2002) Swann, C. A. (2002). Maximum Likelihood Estimation Using Parallel Computing: An Introduction to MPI. Computational Economics 19 145 – 178.
The MathWorks, Inc. (2011)
The MathWorks, Inc. (2011).
MATLAB – The Language of Technical Computing, Version
The MathWorks, Inc., Natick, Massachusetts.
- Wang and Dunson (2013) Wang, X. and Dunson, D. (2013). Parallel MCMC via Weierstrass sampler. ArXiv: http://arxiv.org/abs/1312.4605.
- Windle and Carvalho (2014) Windle, J. and Carvalho, Y. (2014). A tractable state-space model for symmetric positive-definite matrices. Bayesian Analysis 9 759–792.
Appendix A Computational details
a.1 Transition kernel
As regards the Step 1.1. of the transition kernel, the distribution is not tractable and we applied a Metropolis-Hastings. At the -th iteration of the -th SMCMC chain we generate the MH proposal from a Gaussian random walk on the transformed parameter space , and .
In the Step 1.2 of the transition kernel, we exploit the tractability of the state space model and apply a multi-move Gibbs sampler, where the hidden states are updated in one step. By applying Proposition 1 in Windle and Carvalho (2014) with , one gets the following filtered, and prediction distributions
where , and the backward smoothed distribution
which is used to generate given and .
a.2 Jumping kernel
As regard to the jumping kernel, when , it is given by a Gibbs sampler transition kernel with full conditional distribution
Appendix B Computational details for the EP-RAPF
Let us consider the following reparametrization . Given the initial sets of weighted random samples , the regularized APF performs the following steps, for , and :
Simulate where and are the empirical variance matrix and the empirical mean respectively and and ,
Simulate , with for
Update the weights
If , simulate from and set . Otherwise set
is the effective sample size.
In the merge step of our EP-RAPF, the particle set is used to build the following approximation of the posterior distribution
by applying the embarrassingly parallel algorithm, as in the EP-SMCMC.
Appendix C SMCMC output
Figures C.1-C.3 show an example of sequential SMCMC approximation of the posterior densities of the parameters , and for the different blocks of observations (different lines in each plot) and at different point in time (different plots) for our panel of time series. We consider cross-sectional blocks with observations each.