Embarrassingly Parallel Sequential Markovchain Monte Carlo for Large Sets of Time Series
Abstract
Bayesian computation crucially relies on Markov chain Monte Carlo (MCMC) algorithms. In the case of massive data sets, running the MetropolisHastings sampler to draw from the posterior distribution becomes prohibitive due to the large number of likelihood terms that need to be calculated at each iteration. In order to perform Bayesian inference for a large set of time series, we consider an algorithm that combines “divide and conquer” ideas previously used to design MCMC algorithms for big data with a sequential MCMC strategy. The performance of the method is illustrated using a large set of financial data.
Key words: Big Data, Panel of Time Series, Parallel Monte Carlo, Sequential MarkovChain Monte Carlo.
1 Introduction
There is little doubt that one of the main challenges brought on by the advent of Big Data in Bayesian statistics is to develop Markov chain Monte Carlo (MCMC) algorithms for sampling a posterior distribution derived from a very large sample. While MCMC has become the default tool to study posterior distributions when they are not available in closed form, many commonly used sampling algorithms, e.g. the MetropolisHastings samplers, can become computationally prohibitive when a large number of likelihood calculations are needed at each iteration.
In recent years we have witnessed a large research effort devoted to dividing the MCMC computational load among a number of available processors and recombining the results with as little loss in statistical efficiency as possible. For instance, the approaches developed in Scott et al. (2013) and Neiswanger et al. (2014) divide the available data in smaller batches and sample the resulting partial posteriors obtained from each batch of data. They propose different methods to combine the resulting draws so that the efficiency of the resulting Monte Carlo estimator is close to the one that would have been obtained if the full data posterior had been sampled. In the consensus MCMC of Scott et al. (2013) this is achieved via reweighting the partial posterior samples, while the embarrassingly parallel approach of Neiswanger et al. (2014) relies on kernel density approximations of the partial posteriors to produce an approximation of the full one. In Wang and Dunson (2013) the authors propose a refined recombination strategy based on the Weierstrass transformation of all partial posteriors.
While dividing the whole data into batches can be done easily when the data are independent, one must proceed cautiously when the data exhibit long range dependencies, as is the case in time series. In such cases, simply splitting time series into blocks can lead to poor estimates of the parameters. Instead, one can sometimes bypass the computational load by sequentially updating the posterior over time (see, for instance, Dunson and Yang, 2013).
Sequential sampling may be improved when combined with parallel and possibly interacting Monte Carlo methods that were used elsewhere, e.g. for parallel adaptive MCMC (Craiu et al., 2009), for interacting MTM (Casarin et al., 2013), for population Monte Carlo (Cappe et al., 2004), for Sequential Monte Carlo (Del Moral and Miclo, 2000; Del Moral, 2004) and for massively parallel computing (Lee et al., 2010) .
Sequential estimation is useful in many applied contexts such as online inference of econometric models for both outofsample and insample analyses. However, sequential estimation is a challenging issue in Bayesian analysis due to the computational cost of the numerical procedures for posterior approximation. Moreover, the computational cost rapidly increases with the dimension of the model and the inferential task becomes impossible. In this sense our paper contributes to the recent stream of the literature on the use of Central Processing Unit (CPU) and Graphics Processing Unit (GPU) parallel computing in econometrics (e.g., see Doornik et al. (2002), Swann (2002), Creel (2005), Creel and Goffe (2008), Suchard et al. (2010)).
Our contribution here is to consider possible ways to combine strategies following the work of Neiswanger et al. (2014) with sequential MCMC in order to address difficulties that appear when studying large panel time series data models. Analyses of panel time series data appear frequently in the econometrics literature as discussed in the review papers of Canova and Ciccarelli (2013) and Hsiao (2015). Moreover, the use of latent variables for time series panel models in combination with a Bayesian inference approach (Kaufmann, 2010, 2015; Billio et al., 2015) can be quite challenging, due to the computational burden required for the latent variable estimation. These challenges motivate this work in which the Bayesian stochastic volatility model recently proposed in Windle and Carvalho (2014) and discussed in Casarin (2014) is adapted to the context of large panel of time series.
In Sections 2 and 3 we introduce, respectively, the issues related to sequential sampling in models with latent variables and present our algorithm. Section 4 contains the simulation studies and the real data analysis. The paper closes with conclusions and future directions.
2 Posterior Distribution Factorization
Consider a time series sample , with probability density function (pdf) where is a parameter vector. Of interest is sampling from the posterior distribution of . We are considering here the case in which the latter task is made easier if a data augmentation approach is adopted. We introduce auxiliary variables , that exhibit Markovian serial dependence, i.e. each has pdf . If is the conditional pdf of given and , then for prior distribution we obtain the joint posterior of the data and the latent variables as
where is the normalizing constant of , and .
Henceforth, we assume that the time series data has panel structure such that if we consider all the data collected up to time , , then it is possible to partition them into blocks of size each,
(1) 
where the th block contains the measurements up to time for the components of (for notational simplicity we set but other allocations are possible), i.e. for all
Each set of the partition contains the th panel of dependent components of . An important assumption of the model is that, conditional on a parameter value , the components in two partition sets are independent, i.e.
(2) 
for any .
Corresponding to the partition (1) there is an equivalent partition of the auxiliary variables
(3) 
where the components of correspond to the components of included in . A second crucial assumption for the validity of our algorithm is the independence of the auxiliary variables contained in two elements of the partition (3), i.e.
(4) 
for all . Finally, we assume that depends only on those auxiliary variables included in , i.e. given the latter we have
(5) 
These assumptions are not unusual in the context of dynamic panel data models as they are used for theoretical derivations in Anderson and Hsiao (1982), Blundell and Bond (1998) and Boneva et al. (2015) as well as in applications (see, for instance, Abrevaya and Shen, 2014; Kaufmann, 2010, 2015; Billio et al., 2015).
The basic principle underlying our approach is to learn sequentially over time using different crosssectional data blocks. Our algorithm relies on samples from the the joint posterior distribution of and , , conditional on the subsample
(6) 
where , and is the th block normalizing constant.
From (7) we can infer that the type of factorization of the posterior distribution used in Neiswanger et al. (2014) holds in this case for every since
(8)  
3 Embarrassingly Parallel SMCMC
So far we have discussed the factorization of the posterior distribution based on the paneltype structure of the data. In this section we show how the algorithm handles the serial dependence in the data and samples from for each using a sequential MCMC strategy. For expository purposes we assume that the parameter estimates are updated every time a new observation become available, but in the application we will update the estimates after every th observation is collected.
Let us define , , the time sequence of augmented parameter vectors with nondecreasing dimension , . In order to take advantage of the partition described in the previous section we also introduce , a parameter vector of dimension . Since the augmented parameter vector can then be partitioned as and for all , the correctness of the algorithms relies on the compatibility condition for the priors on at each time and for each th block of data. Specifically, we assume that if the prior is , where , then it satisfies the compatibility condition
(9) 
Our embarrassing SMCMC algorithm iterates over time and data blocks. At each time , the algorithm consists of two steps. In the first step, for each data block , we use parallel SMCMC chains, each of which yields samples from , i.e. we generate , , and from . Based on all samples , , and we produce the kernel density estimates of the marginal subposteriors .
In the second step we take advantage of the factorization (7) and the posterior is approximated by combining the approximate subposteriors , . Samples from this distribution can be obtained by applying the asymptotically exact posterior sampling procedure detailed in Algorithm 1 of Neiswanger et al. (2014). It is worth noting that (7) holds for any so, if needed, one can approximate at intermediate times by combining the ’s. However, is not used directly in the final posterior so if one is interested only in the latter then the merging of partial posteriors is performed only at time .
The pseudocode of the proposed EPSMCMC is given in Algorithm 1 and the details of the SMCMC and of the merge step are detailed in the following sections.
Algorithm 1.
Embarrassingly Parallel SMCMC (EPSMCMC)
For

For draw , , and from by using the SMCMC transition.

When needed (usually at time ):

Compute the kernel density estimate of the marginal subposteriors by using the samples , , and .

approximate the posterior by combining the approximate subposteriors , .

3.1 Sequential MCMC
In this section we discuss the construction of the SMCMC samplers that are used in the first step of the EPSMCMC. To simplify the notation we drop the index indicative of the data block. In the SMCMC algorithm a population of parallel inhomogeneous Markov chains are used to generate the samples with , and from the sequence of posterior distributions , . Each Markov chain of the population is defined by a sequence of transition kernels , , that are operators from to , such that is a probability measure for all , and is measurable for all .
The kernel has as stationary distribution and results from the composition of a jumping kernel, and a transition kernel, , that is
where the fixed dimension transition is defined as
with , and is the identity kernel. We assume that the jumping kernel satisfies
where and . This condition ensures that the error propagation through the jumping kernel can be controlled over the SMCMC iterations.
In order to apply the SMCMC one need to specify the transition kernel and the jumping kernel at the iteration . The transition kernel at the iteration allows each parallel chain to explore the sample space of a given dimension, , and to generate samples , from the posterior distribution . The jumping kernel at the iteration , allows the chains to go from a space of dimension to one of dimension .
3.2 Merge step
The merge step relies on the following approximation the posterior distribution
(10) 
where
(11)  
where with and , , and is a Gaussian kernel with bandwidth parameter . Following the embarrassing MCMC approach the posterior distribution can be written as
(12) 
where , and
3.3 Parameter tuning
As suggested by Dunson and Yang (2013), the number of iterations of the Sequential MCMC sampler at each time are chosen accordingly to the correlation across the parallel chains. We let the number of iterations at the iteration , , be the smallest integer such that , where is the rate function associated with the transition kernel and is a given threshold level. An upper bound for the rate function is provided by chain autocorrelation function at the th lag. It can be estimated sequentially using the output of all the parallel chains: where
(13) 
with the th element of the vector of the parameters and the latent states generated up to time by the –th chain at the the –th iteration. is the average of the draws over the parallel chains.
4 Numerical Experiments
4.1 Simulated data
We consider a time series model in which is the realized variance for the th logreturn series at time , and is the latent stochastic volatility process. In order to capture the time variations in the volatility of the series we consider the exponential family state space model for positive observations recently proposed in Windle and Carvalho (2014) and extend it to the context of panel data. The model can be mathematically described then as:
(14)  
(15) 
for , where denotes the gamma distribution and the beta distribution of the first type.
We generate 1000 time series of 1000 observations each, and obtain a dataset of 1 million observations. In Fig. 1 we illustrate such a simulated dataset. Inference for a nonlinear latent variable model with this large a sample via MCMC sampling can be computationally challenging. In the simulation we set , and , with initial condition , . For each series we have generated 3,000 realizations, but the initial 2000 samples were discarded so that dependence on initial conditions can be considered negligible.
We aim to estimate the common parameters , and and assume a uniform prior distribution for , i.e. and proper vague prior distributions for and , that is and truncated on the region .
In order to apply the EPSMCMC algorithm we assume that the series are split into blocks of series each. The updates are performed every observations, so the total number of updates is . Let us denote with the column vector a collection of variables with . We define the th block of series and the th block of latent variables as the matrices and , respectively, with , and . Then, the completedata likelihood function at time for the th block is
(16)  
(17) 
where . Then the subsample posterior, based on the completedata likelihood function of the th block is
(18) 
At time , and for the th block, the th SMCMC parallel chain has the transition kernel of a Gibbs sampler which iterates over the following steps:

generate from

generate from
with , and is a dim matrix where the th row elements drawn from the jumping kernel at time . At time , as a new observation become available, the dimension of the state space for the th block SMCMC chains increase from to . We choose as jumping kernel of the th parallel chain to be the transition kernel of a Gibbs sampler with the following full conditional distribution

, with
where . The details of the sampling procedures for the three steps are given in Appendix A.
In the simulation example we compare our EPSMCMC with a MCMC repeated sequentially over time and a Sequential Monte Carlo (SMC) with unknown parameters. For the EPSMCMC we used parallel chains and a number of iterations close to 50, on average. For the MCMC we use a multimove Gibbs sampler where the latent states are sampled in one block by applying a forwardfiltering backward sampling (FFBS) procedure. The filtering and smoothing relationships are given in Appendix C. In order to update the parameters we have used a MetropolisHastings step. The MCMC chain of our multimove sampler is mixing quite well due to the FFBS and also the Metropolis step has acceptance rates about 0.3 which is a good rate for many models as argued in Roberts and Rosenthal (2001).
In our MCMC analysis we considered two cases. The first one is based on samples of 1,000 iterations after a burnin phase of 500 iterations in order to have a computational complexity similar to the one of the EPSMCMC. The second one is based on samples of 25,000 iterations after a burnin phase of 20,000 iterations and is used to have reliable MCMC estimates of the parameter posterior distribution based on the whole sample.
For the SMC we also consider the regularized auxiliary particle filter (RAFP) combined with the embarrassingly parallel algorithm to obtain a EPRAPF. In Appendix B we present the computational details of running the rAPF for our stochastic volatility model. We refer to Liu and West (2001) for the definition of regularized particle filter and to Casarin and Marin (2009) for a comparison of different regularized filters for stochastic volatility models.
When we compare the EPSMCMC, MCMC and EPRAPF algorithms, we use as a measure of efficiency the mean square errors for the parameters , and after the last timeblock of data has been processed. The mean square errors have been estimated using 40 independent runs of each algorithm. The same set of data has been used to reduce the standard error of the MSE estimates. The experiments have been conducted on a cluster multiprocessor system with 4 nodes; each node comprises of four Xeon E54610 v2 2.3GHz CPUs, with 8 cores, 256GB ECC PC312800R RAM, Ethernet 10Gbit, 20TB hard disk system with Linux. The algorithms have been implemented in Matlab (see The MathWorks, Inc. (2011)) and the parallel computing makes use of the Matlab parallel computing toolbox.
The structure of the EPSMCMC sampler allows for sequential data acquisition and for parallel estimation based on different crosssectional blocks. Thus, the MSE obtained after processing the last block is given in Table 1 for different dimensions of the time blocks (rows) and of the cross section blocks (columns). Fig. 2 shows the posterior approximation obtained from one run of the EPSMCM on the dataset shown in Fig. 1. From our experiments we find that the parameters that are most difficult to estimate are and , whereas has lower MSEs regardless of the choice of block size and type of algorithm. For the EPSMCMC sampler a larger size of the time blocks reduces the MSE, possibly due to a reduction in the propagation of the approximation error over the iterations. The behaviour of the MSE with respect to the crosssectional block size is not monotonic. The MSE initially decreases with , but for larger it increases. From our experiments, values of between 20 and 40 yield the best results. The and MSEs for the MCMC are higher then their EPSMCMC counterparts, likely due to the lack of convergence of the MCMC in the 1,500 iterations per time block. The large MSEs for the EPRAPF are due to artificial noise introduced by the regularization step for the parameters and also to the hidden state estimation. In the EPRAPF implementation we have used 1,500 particles and the artificial noise is necessary to avoid degeneration of their weights. Note that within each time block the EPRAFP is using the filtered states instead of the smoothed states. A combination of the RAFP with a MH step or a particle MCMC would improve the performance of the EPRAFP algorithm in the estimation of both states and parameters, at a price of increasing computing . We leave the further study of this issue for a future communication.
Parameter MSE  

EPSMCMC  MCMC  EPRAPF  
()  ()  ()  ()  ()  
50  3.91  2.37  2.07  4.81  7.02  29.68  1.14 
100  2.09  1.68  2.11  3.06  6.75  32.33  0.90 
200  3.07  1.68  1.20  0.77  3.98  25.06  6.90 
Parameter MSE  
EPSMCMC  MCMC  EPRAPF  
()  ()  ()  ()  ()  
50  45.04  48.52  39.73  53.55  51.90  78.20  31.18 
100  28.41  29.14  20.52  39.01  43.52  56.19  101.14 
200  29.83  21.01  11.92  28.54  39.07  58.23  53.12 
Parameter MSE  
EPSMCMC  MCMC  EPRAPF  
()  ()  ()  ()  ()  
50  0.02  0.01  0.01  0.01  0.01  0.01  0.02 
100  0.02  0.01  0.01  0.01  0.01  0.01  0.01 
200  0.01  0.01  0.01  0.01  0.01  0.01  0.01 
EPSMCMC  MCMC  

()  ()  ()  ()  ()  
50  10.87  6.66  2.66  1.64  1.01  49.39 
100  5.13  2.45  1.88  0.70  0.47  28.10 
200  2.57  1.43  0.71  0.32  0.19  15.27 
Also, we compare EPSMCMC, MCMC and EPRAPF in terms of computing time. For the EPSMCMC we consider crosssectional blocks of size . The computing times are given in Table 2 for different time acquisition rates (rows) and different CPUs working in parallel (columns). We conclude that the EPSMCMC implemented on cluster multiprocessor system can be up to 80 times faster than the standard MCMC. These results are in line with the ones obtained in previous studies on parallel Monte Carlo methods (Casarin et al. (2015), Lee et al. (2010), Geweke and Durham (2012)). The calculation in all our experiments have been carried out using double precision. If an application allows for a lower degree of precision, then single precision calculation can lead to large gains in computing time as documented by Lee et al. (2010) in the context of GPU parallel computing.
4.2 Real data
We consider a panel of 12,933 assets of the US stock market and collect prices at a daily frequency from 29 December 2000 to 22 August 2014, which yields 3,562 time observations. Then we compute logarithmic returns for all stocks and obtain a dataset of 46,067,346 observations.
In order to control for the liquidity of the assets and consequently for long sequences of zero returns, we impose that each stock has been traded a number of days corresponding to at least 40% of the sample size. Also we focus on the last part of the sample which include observations from 8 February 2013 to 22 August 2014. After cleaning the dataset we obtain 6,799 series and 400 time observations, and the size of the dataset reduces to 2,846,038. The original and the cleaned datasets are give in Fig. 3.
In order to capture the time variations in the volatility of the series we consider the panel state space model presented in the previous section. We apply our EPSMCMC sampler to the panel of 6,799 time series. The data are acquired sequentially over time in blocks of 100 time observations and each panel consists of series. At each point in time, for each parameter , and we obtain 500 posterior densities from each parallel SMCMC chain (see C.1C.3 in Appendix C an example for ).
Figure 4 shows the sequential posterior inference after the merge step of the embarrassingly parallel SMCMC algorithm is applied to the output of the SMCMC samples. At each point in time we obtain an approximated posterior density for the whole crosssection from the embarrassingly parallel step (see solid lines in 4 for ).
The approximation of the posterior produced by the EPSMCMC is close to the approximation based on a MCMC analysis of the full posterior. This can be seen in Fig. 4) where the solid line shows the EPSMCMC approximation at the last update () and the dashed line represents the fullsample estimate based on a standard MCMC with 25,000 iterations after a burnin period of 20,000 iterations. The posterior mean and posterior quantiles approximated by EPSMCMC are given in Tab. 3. In order to approximate the high posterior density (HPD) intervals and the posterior mean we generate samples from the posterior distribution given in Eq. 8 by applying the independent Metropolis within Gibbs algorithm given in Neiswanger et al. (2014).
t
EPSMCMC  MCMC  

HPD  HPD  
0.66  (0.61,0.71)  0.54  (0.31,0.58)  
0.87  (0.79,0.95)  0.91  (0.86,1.01)  
0.98  (0.97,0.99)  0.96  (0.94,0.97) 
5 Conclusion
We propose a new MCMC algorithm which combines embarrassingly parallel MCMC and Sequential MCMC. The algorithm is developed for data that exhibit dependent patterns, in particular for large sets of time series for which a standard MCMCbased analysis would be very slow. Here we take advantage of the independence between the unitspecific observations and latent variables of the panel to partition the data and factor the full posterior in a product of partial posteriors. In the absence of clear independent panel units, an interesting and difficult question concerns alternative strategies to divide the data and combine the partial posterior samples.
It is apparent that the development of novel MCMC algorithms for big data is evolving rapidly. While “divide and conquer” strategies continue to develop, one must devise techniques to handle the additional approximations that are introduced by the current existing methods, including EPSMCMC. Quantifying and controlling the error introduced by these approximations remains central to the success of MCMC for Big Data.
Acknowledgement
We thank the Editor, the Associate Editor and three referees for comments and suggestions that have greatly improved the paper. This research used the SCSCF multiprocessor cluster system at University Ca’ Foscari of Venice. Roberto Casarin’s research is supported by funding from the European Union, Seventh Framework Programme FP7/20072013 under grant agreement SYRTOSSH2012320270, by the Institut Europlace of Finance, “Systemic Risk grant”, the Global Risk Institute in Financial Services, the Louis Bachelier Institute, “Systemic Risk Research Initiative”, and by the Italian Ministry of Education, University and Research (MIUR) PRIN 201011 grant MISURA. Radu Craiu’s research is supported by the Natural Sciences and Engineering Research Council of Canada. Fabrizio Leisen’s research is supported by the European Community’s Seventh Framework Programme FP720072013 under grant agreement number 630677.
References
 Abrevaya and Shen (2014) Abrevaya, J. and Shen, S. (2014). Estimation of censored paneldata models with slope heterogeneity. J. of Applied Econometrics 29 523–548.
 Anderson and Hsiao (1982) Anderson, T. and Hsiao, C. (1982). Formulation and estimation of dynamic models using panel data. Journal of Econometrics 18 47–82.
 Billio et al. (2015) Billio, M., Casarin, R., Ravazzolo, R. and van Dijk, H. K. (2015). Interactions between eurozone and US booms and busts: A Bayesian panel Markovswitching VAR model. Journal of Applied Econometrics forthcoming.
 Blundell and Bond (1998) Blundell, R. and Bond, S. (1998). Initial conditions and moment restrictions in dynamic panel data models. Journal of Econometrics 87 115–143.
 Boneva et al. (2015) Boneva, L., Linton, O. and Vogt, M. (2015). A semiparametric model for heterogeneous panel data with fixed effects. Journal of Econometrics to appear.
 Canova and Ciccarelli (2013) Canova, F. and Ciccarelli, M. (2013). Panel vector autoregressive models: a survey. In VAR Models in Macroeconomics: New Developments and Applications: Essays in Honor of Christopher A. Sims (L. Thomas, B. Fomby and A. Murphy, eds.).
 Cappe et al. (2004) Cappe, O., Guillin, A., Marin, J.M. and Robert, C.P. (2004). Population Monte Carlo. J. Comput. Graph. Statist. 13 907–929.
 Casarin (2014) Casarin, R. (2014). A comment on a tractable statespace model for symmetric positivedefinite matrices. Bayesian Analysis 9 793–804.
 Casarin et al. (2013) Casarin, R., Craiu, R. V. and Leisen, F. (2013). Interacting multiple try algorithms with different proposal distributions. Statistics and Computing 23 185–200.
 Casarin et al. (2015) Casarin, R., Grassi, S., Ravazzolo, F. and van Dijk, H. K. (2015). Parallel sequential Monte Carlo for efficient density combination: the DeCo Matlab toolbox. Jounal of Statistical Software forthcoming.
 Casarin and Marin (2009) Casarin, R. and Marin, J.M. (2009). Online data processing: Comparison of Bayesian regularized particle filters. Electronic Journal of Statistics 3 239–258.
 Craiu et al. (2009) Craiu, R. V., Rosenthal, J. S. and Yang, C. (2009). Learn from thy neighbor: Parallelchain adaptive and regional MCMC. Journal of the American Statistical Association 104 1454–1466.
 Creel (2005) Creel, M. (2005). UserFriendly Parallel Computations with Econometric Examples. Computational Economics 26 107 – 128.
 Creel and Goffe (2008) Creel, M. and Goffe, W. L. (2008). Multicore CPUs, Clusters, and Grid Computing: A Tutorial. Computational Economics 32 353 – 382.
 Del Moral (2004) Del Moral, P. (2004). FeynmanKac Formulae. Genealogical and Interacting Particle Systems with Applications. Springer.
 Del Moral and Miclo (2000) Del Moral, P. and Miclo, L. (2000). Branching and interacting particle systems approximations of feynmanckac formulae with applications to non linear filtering. In Séminaire de Probabilités XXXIV. Lecture Notes in Mathematics, No. 1729. Springer, 1–145.
 Doornik et al. (2002) Doornik, J. A., Hendry, D. F. and Shephard, N. (2002). Computationallyintensive econometrics using a distributed matrixprogramming language. Philosophical Transactions of the Royal Society of London, Series A 360 1245 – 1266.
 Dunson and Yang (2013) Dunson, D. and Yang, Y. (2013). Sequential Markov chain Monte Carlo. ArXiv: http://arxiv.org/abs/1308.3861.
 Geweke and Durham (2012) Geweke, J. and Durham, G. (2012). Massively Parallel Sequential Monte Carlo for Bayesian Inference. Working papers, National Bureau of Economic Research, Inc.
 Hsiao (2015) Hsiao, C. (2015). Challenges for panel financial analysis. In Econometrics of Risk, Studies in Computational Intelligence (V.N. Huynh, V. Kreinovich, S. Sriboonchitta and K. Suriya, eds.).
 Kaufmann (2010) Kaufmann, S. (2010). Dating and forecasting turning points by Bayesian clustering with dynamic structure: A suggestion with an application to Austrian data. Journal of Applied Econometrics 25 309–344.
 Kaufmann (2015) Kaufmann, S. (2015). Kstate switching models with timevarying transition distributions  Does loan growth signal stronger effects of variables on inflation? Journal of Econometrics 187 82–94.
 Lee et al. (2010) Lee, A., Christopher, Y., Giles, M. B., Doucet, A. and Holmes, C. C. (2010). On the Utility of Graphic Cards to Perform Massively Parallel Simulation with Advanced Monte Carlo Methods. Journal of Computational and Graphical Statistics 19 769 – 789.
 Liu and West (2001) Liu, J. S. and West, M. (2001). Combined parameter and state estimation in simulation based filtering. In Sequential Monte Carlo Methods in Practice (A. Doucet, N. de Freitas and N. Gordon, eds.). SpringerVerlag.
 Neiswanger et al. (2014) Neiswanger, W., Y., Y. and Xing, E. (2014). Asymptotically exact, embarrassingly parallel MCMC. In Proceedings of the 30th International Conference on Conference on Uncertainty in Artificial Intelligence.
 Roberts and Rosenthal (2001) Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various MetropolisHastings algorithms. Statist. Sci. 16 351–367.
 Scott et al. (2013) Scott, S., Blocker, A. and Bonassi, F. (2013). Bayes and big data: The consensus Monte Carlo algorithm. In EFaBBayes 250 conference, vol. 16.
 Suchard et al. (2010) Suchard, M., Holmes, C. and West, M. (2010). Some of the what?, why?, how?, who? and where? of graphics processing unit computing for Bayesian analysis. Bulletin of the International Society for Bayesian Analysis 17 12 – 16.
 Swann (2002) Swann, C. A. (2002). Maximum Likelihood Estimation Using Parallel Computing: An Introduction to MPI. Computational Economics 19 145 – 178.

The MathWorks, Inc. (2011)
The MathWorks, Inc. (2011).
MATLAB – The Language of Technical Computing, Version
R2011b.
The MathWorks, Inc., Natick, Massachusetts.
URL http://www.mathworks.com/products/matlab/  Wang and Dunson (2013) Wang, X. and Dunson, D. (2013). Parallel MCMC via Weierstrass sampler. ArXiv: http://arxiv.org/abs/1312.4605.
 Windle and Carvalho (2014) Windle, J. and Carvalho, Y. (2014). A tractable statespace model for symmetric positivedefinite matrices. Bayesian Analysis 9 759–792.
Appendix A Computational details
a.1 Transition kernel
As regards the Step 1.1. of the transition kernel, the distribution is not tractable and we applied a MetropolisHastings. At the th iteration of the th SMCMC chain we generate the MH proposal from a Gaussian random walk on the transformed parameter space , and .
In the Step 1.2 of the transition kernel, we exploit the tractability of the state space model and apply a multimove Gibbs sampler, where the hidden states are updated in one step. By applying Proposition 1 in Windle and Carvalho (2014) with , one gets the following filtered, and prediction distributions
(A.1)  
(A.2) 
where , and the backward smoothed distribution
(A.3) 
which is used to generate given and .
a.2 Jumping kernel
As regard to the jumping kernel, when , it is given by a Gibbs sampler transition kernel with full conditional distribution
(A.4) 
Appendix B Computational details for the EPRAPF
Let us consider the following reparametrization . Given the initial sets of weighted random samples , the regularized APF performs the following steps, for , and :

Simulate where
with
.

Simulate where and are the empirical variance matrix and the empirical mean respectively and and ,

Simulate , with for

Update the weights

If , simulate from and set . Otherwise set
where
is the effective sample size.
In the merge step of our EPRAPF, the particle set is used to build the following approximation of the posterior distribution
(B.1) 
by applying the embarrassingly parallel algorithm, as in the EPSMCMC.
Appendix C SMCMC output
Figures C.1C.3 show an example of sequential SMCMC approximation of the posterior densities of the parameters , and for the different blocks of observations (different lines in each plot) and at different point in time (different plots) for our panel of time series. We consider crosssectional blocks with observations each.