Locally adaptive factor processes for multivariate time series
In modeling multivariate time series, it is important to allow time-varying smoothness in the mean and covariance process. In particular, there may be certain time intervals exhibiting rapid changes and others in which changes are slow. If such time-varying smoothness is not accounted for, one can obtain misleading inferences and predictions, with over-smoothing across erratic time intervals and under-smoothing across times exhibiting slow variation. This can lead to mis-calibration of predictive intervals, which can be substantially too narrow or wide depending on the time. We propose a locally adaptive factor process for characterizing multivariate mean-covariance changes in continuous time, allowing locally varying smoothness in both the mean and covariance matrix. This process is constructed utilizing latent dictionary functions evolving in time through nested Gaussian processes and linearly related to the observed data with a sparse mapping. Using a differential equation representation, we bypass usual computational bottlenecks in obtaining MCMC and online algorithms for approximate Bayesian inference. The performance is assessed in simulations and illustrated in a financial application.
Keywords: Bayesian nonparametrics; locally varying smoothness; long-range dependence; multivariate time series; nested Gaussian process; stochastic volatility.
1.1 Motivation and setting
In analyzing multivariate time series data, collected in financial applications, monitoring of influenza outbreaks and other fields, it is often of key importance to accurately characterize dynamic changes over time in not only the mean of the different elements (e.g., assets, influenza levels at different locations) but also the covariance. As shown in Figure 1, it is typical in many domains to cycle irregularly between periods of rapid and slow change; most statistical models are insufficiently flexible to capture such locally varying smoothness in assuming a single bandwidth parameter. Inappropriately restricting the smoothness to be constant can have a major impact on the quality of inferences and predictions, with over-smoothing occurring during times of rapid change. This leads to an under-estimation of uncertainty during such volatile times and an inability to accurately predict risk of extremal events.
Let denote a random vector at time , with and . Our focus is on Bayesian modeling and inference for the multivariate mean-covariance stochastic process, with . Of particular interest is allowing locally-varying smoothness, meaning that the rate of change in the process is varying over time. To our knowledge, there is no previous proposed stochastic process for a coupled mean-covariance process, which allows locally-varying smoothness. A key to our construction is the use of latent processes, which have time-varying smoothness. This results in a locally adaptive factor (LAF) process. We review the relevant literature below and then describe our LAF formulation.
1.2 Relevant literature
There is a rich literature on modeling a time-varying mean vector , covering multivariate generalizations of autoregressive models (VAR, e.g. Tsay, 2005), Kalman filtering (Kalman, 1960), nonparametric mean regression via Gaussian processes (GP) (Rasmussen and Williams, 2006), polynomial spline (Huang, Wu and Zhou, 2002), smoothing spline (Hastie and Tibshirani, 1990) and kernel smoothing methods (Wolpert, Clyde and Tu, 2011). Such approaches perform well for slowly-changing trajectories with constant bandwidth parameters regulating implicitly or explicitly global smoothness; however, our interest is allowing smoothness to vary locally in continuous time. Possible extensions for local adaptivity include free knot splines (MARS) (Friedman, 1991), which perform well in simulations but the different strategies proposed to select the number and the locations of knots (stepwise knot selection (Friedman, 1991), Bayesian knot selection (Smith and Kohn, 1996) or via MCMC methods (George and McCulloch, 1993)) prove to be computationally intractable for moderately large . Other flexible approaches include wavelet shrinkage (Donoho and Johnstone, 1995), local polynomial fitting via variable bandwidth (Fan and Gijbels, 1995) and linear combination of kernels with variable bandwidths (Wolpert, Clyde and Tu, 2011).
There is a separate literature on estimating a time-varying covariance matrix . This is particular of interest in applications where volatilities and co-volatilities evolve through non constant paths. One popular approach estimates via an exponentially weighted moving average (EWMA; see, e.g., Tsay, 2005). This approach uses a single time-constant smoothing parameter , with extensions to accommodate locally-varying smoothness not straightforward due to the need to maintain positive semidefinite at every time. To allow for higher flexibility in the dynamic of the covariances, generalizations of EWMA have been proposed including the diagonal vector ARCH model (DVEC), (Bollerslev, Engle and Wooldridge, 1988) and its variant, the BEKK model (Engle and Kroner, 1995). These models are computationally demanding and are not designed for moderate to large . DCC-GARCH (Engle, 2002) improves the computational tractability of the previous approaches through a two-step formulation. However, the univariate GARCH assumed for the conditional variances of each time series and the higher level GARCH models with the same parameters regulating the evolution of the time varying conditional correlations, restrict the evolution of the variance and covariance matrices. PC-GARCH (Ding, 1994 and Burns, 2005) and O-GARCH (Alexander, 2001) perform dimensionality reduction through a latent factor formulation (see also van der Wiede, 2002). However, time-constant factor loadings and uncorrelated latent factors constrain the evolution of .
Such models fall far short of our goal of allowing to be fully flexible with the dependence between and varying with not just the time-lag but also with time. In addition, these models do not handle missing data easily and tend to require long series for accurate estimation (Burns, 2005). Accommodating changes in continuous time is important in many applications, and avoids having the model be critically dependent on the time scale, with inconsistent models obtained as time units are varied.
Wilson and Ghahramani (2010) join machine learning and econometrics efforts by proposing a model for both mean and covariance regression in multivariate time series, improving previous work of Bru (1991) on Wishart processes in terms of computational tractability and scalability, allowing a more complex structure of dependence between and . Specifically, they propose a continuous time Generalised Wishart Process (GWP), which defines a collection of positive semi-definite random matrices with Wishart marginals. Nonparametric mean regression for is also considered via GP priors; however, the trajectories of means and covariances inherit the smooth behavior of the underlying Gaussian processes, limiting the flexibility of the approach in times exhibiting sharp changes.
Even for iid observations from a multivariate normal model with a single time stationary covariance matrix, there are well known problems with Wishart priors motivating a rich literature on dimensionality reduction techniques based on factor and graphical models. There has been abundant recent interest in applying such approaches to dynamic settings. Refer to Nakajima and West (2012) and the references cited therein for recent literature on Bayesian dynamic factor models for multivariate stochastic volatility. Their approach allows the factor loadings to evolve dynamically over time, while including sparsity through a latent thresholding approach, leading to apparently improved performance in portfolio allocation. They utilize a time-varying discrete-time autoregressive model, which allows the dependence in the covariance matrices and to vary as a function of both and . However, the result is an extremely richly parameterized and computationally challenging model, with selection of the number of factors proceeding by cross validation. Our emphasis is instead on developing continuous time stochastic processes for and , which accommodate locally-varying smoothness.
Fox and Dunson (2011) propose an alternative Bayesian covariance regression (BCR) model, which defines the covariance matrix as a regularized quadratic function of time-varying loadings in a latent factor model, characterizing the latter as a sparse combination of a collection of unknown Gaussian process (GP) dictionary functions. Although their approach provides a continuous time and highly flexible model that accommodates missing data and scales to moderately large , there are two limitations motivating this article. Firstly, their proposed covariance stochastic process assumes a stationary dependence structure, and hence tends to under-smooth during periods of stability and over-smooth during periods of sharp changes. Secondly, the well known computational problems with usual GP regression are inherited, leading to difficulties in scaling to long series and issues in mixing of MCMC algorithms for posterior computation.
1.3 Contribution and outline
Our proposed LAF process instead includes dictionary functions that are generated from nested Gaussian processes (nGP) (Zhu and Dunson, 2012). Such nGP reduces the GP computational burden involving matrix inversions from to , with denoting the length of the time series, while also allowing flexible locally-varying smoothness. Marginalizing out the latent factors, we obtain a stochastic process that inherits these advantages. We also develop a different and more computationally efficient approach to computation under this new model and propose online implementation, which can accommodate streaming data. In Section 2, we describe LAF structure with particular attention to prior specification. Section 3 explores the main features of the Gibbs sampler for posterior computation and outlines the steps for a fast online updating approach. In Section 4 we compare our model to BCR and to some of the most quoted models for multivariate stochastic volatility, through simulation studies. Finally in Section 5 an application to stock market indices across countries is examined.
2 Locally Adaptive Factor Processes
2.1 Notation and motivation
Our focus is on defining a novel locally adaptive factor (LAF) process for . In particular, taking a Bayesian approach, we define a prior , where is a probability measure over the space of -variate mean-covariance processes on . In particular, each element of corresponds to a realization of the stochastic process , and the measure assigns probabilities to a -algebra of subsets of .
Although the proposed class of LAF processes can be used much more broadly, in conducting inferences in this article, we focus on the simple case in which data consist of vectors collected at times , for . These times can be unequally-spaced, or collected under an equally-spaced design with missing observations. An advantage of using a continuous-time process is that it is trivial to allow unequal spacing, missing data, and even observation times across which only a subset of the elements of are observed. We additionally make the simplifying assumption that
It is straightforward to modify the methodology to accommodate substantially different observation models.
2.2 LAF specification
A common strategy in modeling of large matrices is to rely on a lower-dimensional factorization, with factor analysis providing one possible direction. Sparse Bayesian factor models have been particularly successful in challenging cases, while having advantages over frequentist competitors in incorporating a probabilistic characterization of uncertainty in the number of factors as well as the parameters in the loadings and residual covariance. For recent articles on Bayesian sparse factor analysis for a single large covariance matrix, refer to Bhattacharya and Dunson (2011), Pati et al. (2012) and the references cited there-in.
In our setting, we are instead interested in letting the mean vector and the covariance matrix vary flexibly over time. Extending the usual factor analysis framework to this setting, we say that if \cref@addtoresetequationparentequation
where is a matrix of constant coefficients, , while and are matrices comprising continuous dictionary functions evolving in time through nGP, and .
Restricting our attention on the generic element of the matrix (the same holds for ), the nGP provides a highly flexible stochastic process on the dictionary functions whose smoothness, explicitly modeled by their th order derivatives via stochastic differential equations (SDEs), is expected to be centered on a local instantaneous mean function , which represents a higher-level Gaussian Process (GP), that induces adaptivity to locally-varying smoothing. Specifically, we let
where , , and are independent Gaussian white noise processes with mean , for all , and covariance function if , otherwise. This formulation naturally induces a stochastic process for with varying smoothness, where , and initialization at based on the assumption
The Markovian property implied by SDEs in (2a) and (2b) represents a key advantage in terms of computational tractability as it allows a simple state space formulation. In particular, referring to Zhu and Dunson (2012) for and (this can be easily extended for higher and ), and for sufficiently small, the process for along with its first order derivative and the local instantaneous mean follow the approximated state equation
where , with .
Similarly to the nGP specification for the elements in , we can represent the nested Gaussian Process for with the following state equation
independently for , where , with . Similarly to
There are two crucial aspects to highlight. Firstly, this formulation allows continuous time and an irregular grid of observations over by relating the latent states at to those at through the distance between and where represents a discrete order index and the time value related to the th observation. Secondly, compared to Zhu and Dunson (2012) our approach represents an important generalization in: (i) extending the analysis to the multivariate case (i.e. is a p-dimensional vector instead of a scalar) and (ii) accommodating locally adaptive smoothing not only on the mean but also on the time-varying covariance functions.
2.3 LAF interpretation
where with and elements for . In LAF formulation we assume moreover that the time-varying factor loadings matrix is a sparse linear combination, with respect to the weights of the matrix , of a much smaller set of continuous nested Gaussian Processes comprising the , with , matrix . As a result
Such a decomposition plays a crucial role in further reducing the number of nGP processes to be modeled from to leading to a more computationally tractable formulation in which the induced follows a locally adaptive factor process where \cref@addtoresetequationparentequation
There is a literature on using Bayesian factor analysis with time-varying loadings, but essentially all the literature assumes discrete-time dynamics on the loadings while our focus is instead on allowing the loadings, and hence the induced processes, to evolve flexibly in continuous time. Hence, we are most closely related to the literature on Gaussian process latent factor models for spatial and temporal data; refer, for example, to Lopes, Salazar and Gamerman (2008) and Lopes, Gamerman and Salazar (2011). In these models, the factor loadings matrix characterizes spatial dependence, with time varying factors accounting for dynamic changes.
Fox and Dunson (2011) instead allow the loadings matrix to vary through a continuous time stochastic process built from latent dictionary functions independently for all , with the squared exponential correlation function having . In our work we follow the lead of Fox and Dunson (2011) in using a nonparametric latent factor model as in (5)-(6), but induce fundamentally different behavior on by carefully modifying the stochastic processes for the dictionary functions.
Note that the above decomposition of is not unique. Potentially we could constrain the loadings matrix to enforce identifiability (Geweke and Zhou, 1996), but this approach induces an undesirable order dependence among the responses (Aguilar and West, 2000, West, 2003, Lopes and West, 2004, Carvalho et al., 2008). Given our focus on estimation of we follow Ghosh and Dunson (2009) in avoiding identifiability constraints, as such constraints are not necessary to ensure identifiability of the induced mean and covariance . The characterization of the class of time-varying covariance matrices is proved by Lemma 2.1 of Fox and Dunson (2011) which states that for and sufficiently large, any covariance regression can be decomposed as in (1b). Similar results are obtained for the mean process.
2.4 Prior Specification
We adopt a hierarchical prior specification approach to induce a prior on with the goal of maintaining simple computation and allowing both covariances and means to evolve flexibly over continuous time. Specifically
Recalling the nGP assumption for the elements of : within LAF representation, we assume for each each element and of the matrices and respectively, the following priors
independently for each ; where denotes the Inverse Gamma distribution with shape and scale .
Similarly, the variances and in the state equation representation of the nGP for each are assumed
independently for each .
To address the issue related to the selection of the number of dictionary elements a shrinkage prior is proposed for . In particular, following Bhattacharya and Dunson (2011) we assume:
Note that if the expected value for is greater than . As a result, as goes to infinity, tends to infinity shrinking towards zero. This leads to a flexible prior for with a local shrinkage parameter and a global column-wise shrinkage factor which allows many elements of being close to zero as increases.
Finally for the variances of the error terms in vector , we assume the usual inverse gamma prior distribution. Specifically
independently for each .
3 Posterior Computation
For a fixed truncation level and a latent factor dimension , the algorithm for posterior computation alternates between a simple and efficient simulation smoother step (Durbin and Koopman, 2002) to update the state space formulation of the nGP in LAF prior, and standard Gibbs sampling steps for updating the parametric component parameters from their full conditional distributions.
3.1 Gibbs Sampling
We outline here the main features of the algorithm for posterior computation based on observations for , while the complete algorithm is provided in the Appendix.
A. Given and , a multivariate version of the MCMC algorithm proposed by Zhu and Dunson (2012) draws posterior samples from each dictionary element’s function , its first order derivative , the corresponding instantaneous mean , the variances in the state equations , and the variances of the error terms in the observation equation with .
B. Given , , and we implement a block sampling of , , ,, and following a similar approach as in step A.
C. Conditioned on , , and , and recalling the shrinkage prior for the elements of in (8), we update , each local shrinkage hyperparameter and the global shrinkage hyperparameters following the standard conjugate analysis.
D. Given the posterior samples from , , and the realization of LAF process for conditioned on the data is
3.2 Hyperparameter interpretation
We now focus our attention on the hyperparameters of the priors for , , and . Several simulation studies have shown that the higher the variances in the latent state equations, the better our formulation accommodates locally adaptive smoothing for sudden changes in . A theoretical support for this data-driven consideration can be identified in the connection between the nGP and the nested smoothing splines. It has been shown by Zhu and Dunson (2012) that the posterior mean of the trajectory with reference to the problem of nonparametric mean regression under the nGP prior can be related to the minimizer of the equation
where is the locally instantaneous function and and regulate the smoothness of the unknown functions and respectively, leading to less smoothed patterns when fixed at low values. The resulting inverse relationship between these smoothing parameters and the variances in the state equation, together with the results in the simulation studies, suggest to fix the hyperparameters in the Inverse Gamma prior for , , and so as to allow high variances in the case in which the time series analyzed are expected to have strong changes in their covariance (or mean) dynamic. A further confirmation of the previous discussion is provided by the structure of the simulation smoother required to update the dictionary functions in our Gibbs Sampling for posterior computation. More specifically, the larger the variances of , and , in the state equations, with respect to those of the vector of observations , the higher is the weight associated to innovations in the filtering and smoothing techniques, allowing for less smoothed patterns both in the covariance and mean structures (see Durbin and Koopman, 2002).
In practical applications, it may be useful to obtain a first estimate of to set the hyperparameters. More specifically, can be the output of a standard moving average on each time series , while can be obtained by a simple estimator, such as the EWMA procedure. With these choices, the recursive equation
become easy to implement.
3.3 Online Updating
The problem of online updating represents a key point in multivariate time series with high frequency data. Referring to our formulation, we are interested in updating an approximated posterior distribution for once a new vector of observations is available, instead of rerunning posterior computation for the whole time series.
Using the posterior estimates of the Gibbs sampler based on observations available up to time , , it is easy to implement (see in Appendix) a highly computationally tractable online updating algorithm which alternates between steps A, B and D outlined in the previous section for the new set of observations, and that can be initialized at using the one step ahead predictive distribution for the latent state vectors in the state space formulation.
Note that the initialization procedure for latent state vectors in the algorithm depends on the sample moments of the posterior distribution for the latent states at . As is known for Kalman smoothers (see, e.g., Durbin and Koopman, 2001), this could lead to computational problems in the online updating due to the larger conditional variances of the latent states at the end of the sample (i.e., at ). To overcome this problem, we replace the previous assumptions for the initial values with a data-driven initialization scheme. In particular, instead of using only the new observations for the online updating, we run the algorithm for , with small, and choosing a diffuse but proper prior for the initial states at . As a result the distribution of the smoothed states at is not anymore affected by the problem of large conditional variances leading to better online updating performance.
4 Simulation Studies
The aim of the following simulation studies is to compare the performance of our proposed LAF with respect to BCR, and to the models for multivariate stochastic volatility most widely used in practice, specifically: EWMA, PC-GARCH, GO-GARCH and DCC-GARCH. In order to assess whether and to what extent LAF can accommodate, in practice, even sharp changes in the time-varying means and covariances and to evaluate the costs of our flexible approach in settings where the mean and covariance functions do not require locally adaptive estimation techniques, we focus on two different sets of simulated data. The first is based on an underlying structure characterized by locally varying smoothness processes, while the second has means and covariances evolving in time through smooth processes. In the last subsection we also analyze the performance of the proposed online updating algorithm.
4.1 Simulated Data
A. Locally varying smoothness processes: We generate a set of -dimensional observations for each in the discrete set , from the latent factor model in (5) with . To allow sharp changes of means and covariances in the generating mechanism, we consider a (i.e. ) matrix of time-varying functions adapted from Donoho and Johnstone (1994) with locally-varying smoothness (more specifically we choose ‘bumps’ functions). The latent mean dictionary elements are simulated from a Gaussian process with length scale , while the elements in matrix can be obtained from the shrinkage prior in (8) with . Finally the elements of the diagonal matrix are sampled independently from .
B. Smooth processes: We consider the same dataset of -dimensional observations with investigated in Fox and Dunson (2011, section 4.1). The settings are similar to the previous with exception of which are (i.e. ) matrices of smooth GP dictionary functions with length-scale .
4.2 Estimation Performance
A. Locally varying smoothness processes:
Posterior computation for LAF is performed by using truncation levels (at higher level settings we found that the shrinkage prior on results in posterior samples of the elements in the additional columns concentrated around ). We place a prior on the precision parameters and choose . As regards the nGP prior for each dictionary element with and , we choose diffuse but proper priors for the initial values by setting and place an prior on each and in order to allow less smoothed behavior according to a previous graphical analysis of estimated via EWMA. Similarly we set in the prior for the initial values of the latent state equations resulting from the nGP prior for , and consider to balance the rough behavior induced on the nonparametric mean functions by the settings of the nGP prior on , as suggested from previous graphical analysis. Note also that for posterior computation, we first scale the predictor space to , leading to for .
For inference in BCR we consider the same previous hyperparameters setting for and priors as well as the same truncation levels and , while the length scale in GP prior for and has been set to 10 using the data-driven heuristic outlined in Fox and Dunson (2011). In both cases we run Gibbs iterations discarding the first as burn-in and thinning the chain every samples.
As regards the other approaches, EWMA has been implemented by choosing the smoothing parameter that minimizes the mean squared error (MSE) between the estimated covariances and the true values. PC-GARCH algorithm follows the steps provided by Burns (2005) with GARCH(1,1) assumed for the conditional volatilities of each single time series and the principal components. GO-GARCH and DCC-GARCH recall the formulations provided by van der Wiede (2002) and Engle (2002) respectively, assuming a GARCH(1,1) for the conditional variances of the processes analyzed, which proves to be a correct choice in many financial applications and also in our setting. Note that, differently from LAF and BCR, the previous approaches do not model explicitly the mean process but work directly on the innovations . Therefore in these cases we first model the conditional mean via smoothing spline and in a second step we estimate the models working on the innovations. The smoothing parameter for spline estimation has been set to , which was found to be appropriate to best reproduce the true dynamic of .
B. Smooth processes:
We mainly keep the same setting of the previous simulation study with few differences. Specifically, and has been fixed to and respectively (also in this case the choice of the truncation levels proves to be appropriate, reproducing the same results provided in the simulation study of Fox and Dunson (2011) where and ). Moreover the scale parameters in the Inverse Gamma prior on each and has been set to in order to allow a smoother behavior according to a previous graphical analysis of estimated via EWMA, but without forcing the nGP prior to be the same as a GP prior. Following Fox and Dunson (2011) we run Gibbs iterations which proved to be enough to reach convergence, and discarded the first as burn-in.
In the first set of simulated data, we analyzed mixing by the Gelman-Rubin procedure (see e.g. Gelman and Rubin, 1992), based on potential scale reduction factors computed for each chain by splitting the sampled quantities in pieces of same length. The analysis shows slower mixing for BCR compared with LAF. Specifically, in LAF of the chains have a potential reduction factor lower than , with a median equal to , while in LAF the th quantile is and the median equals . Less problematic is the mixing for the second set of simulated data, with potential scale reduction factors having median equal to for both approaches and th quantiles equal to and for LAF and BCR, respectively.
Figure 2 compares, in both simulated samples, true and posterior mean of the process over the predictor space together with the point-wise highest posterior density (hpd) intervals for LAF and BCR. From the upper plots we can clearly note that our approach is able to capture conditional heteroscedasticity as well as mean patterns, also in correspondence of sharp changes in the time-varying true functions. The major differences compared to the true values can be found at the beginning and at the end of the series and are likely to be related to the structure of the simulation smoother which also causes a widening of the credibility bands at the very end of the series; for references regarding this issue see Durbin and Koopman (2001). However, even in the most problematic cases, the true values are within the bands of the hpd intervals. Much more problematic is the behavior of the posterior distributions for BCR which badly over-smooth both covariance and mean functions leading also to many hpd intervals not containing the true values. Bottom plots in Figure 2 show that the performance of our approach is very close to that of BCR, when data are simulated from a model where the covariances and means evolve smoothly across time and local adaptivity is therefore not required. This happens even if the hyperparameters in LAF are set in order to maintain separation between nGP and GP prior, suggesting large support property for the proposed approach.
|Mean||90th Quantile||95th Quantile||Max|
|Mean||90th Quantile||95th Quantile||Max|
The comparison of the summaries of the squared errors between true process and the estimated elements of standardized with the range of the true processes and respectively, once again confirms the overall better performance of our approach relative to all the considered competitors. Table 1 shows that, when local adaptivity is required, LAF provides a superior performance having standardized residuals lower than those of the other approaches. EWMA seems to provide quite accurate estimates, but it is important to underline that we choose the optimal smoothing parameter in order to minimize the MSE between estimated and true parameters, which are clearly not known in practical applications. Different values of reduces significantly the performance of EWMA, which shows also lack of robustness. The closeness of the summaries of LAF and BCR in Table 2 confirms the flexibility of LAF even in settings where local adaptivity is not required and highlights the better performance of the two approaches with respect to the other competitors also when smooth processes are investigated.
To better understand the improvement of our approach in allowing locally varying smoothness and to evaluate the consequences of the over-smoothing induced by BCR on the distribution of with consider Figure 3 which shows, for some selected series in the first simulated dataset, the time varying mean together with the point-wise and quantiles of the marginal distribution of induced respectively by the true mean and true variance, the posterior mean of and from our proposed approach and the posterior mean of the same quantities from BCR. We can clearly see that the marginal distribution of induced by BCR is over-concentrated near the mean, leading to incorrect inferences. Note that our proposal is also able to accommodate heavy tails, a typical characteristic in financial series.
4.3 Online Updating Performance
To analyze the performance of the online updating algorithm in LAF model, we simulate new observations with , considering the same and used in the generating mechanism for the first simulated dataset and taking the subsequent observations of the bumps functions for the dictionary elements ; finally the additional latent mean dictionary elements are simulated as before maintaining the continuity with the previously simulated functions .
According to the algorithm described in subsection , we fix , , , , and at their posterior mean from the previous Gibbs sampler and consider the last three observations , and (i.e. ) to initialize the simulation smoother in through the proposed data-driven initialization approach. Posterior computation shows good performance in terms of mixing, and convergence is assessed after Gibbs iterations with a small burn-in of .
Figure 4 compares true mean and covariance to posterior mean of a select set of components of including also the hpd intervals. The results clearly show that the online updating is characterized by a good performance which allows to capture the behavior of new observations conditioning on the previous estimates. Note that the posterior distribution of the approximated mean and covariance functions tends to slightly over-estimate the patterns of the functions at sharp changes, however also in these cases the true values are within the bands of the credibility intervals. Finally note that the data-driven initialization ensures a good behavior at the beginning of the series, while the results at the end have wider uncertainty bands as expected.
5 Application Study
Spurred by the recent growth of interest in the dynamic dependence structure between financial markets in different countries, and in its features during the crises that have followed in recent years, we applied our LAF to the multivariate time series of the main national stock market indices.
5.1 National Stock Indices (NSI), Introduction and Motivation
National Stock Indices represent technical tools that allow, through the synthesis of numerous data on the evolution of the various stocks, to detect underlying trends in the financial market, with reference to a specific basis of currency and time. More specifically, each Market Index can be defined as a weighted sum of the values of a set of national stocks, whose weighting factors is equal to the ratio of its market capitalization in a specific date and overall of the whole set on the same date.
In this application we focus our attention on the multivariate weekly time series of the main (i.e. ) National Stock Indices from to . Figure 5 shows the main features in terms of stationarity, mean patterns and volatility of two selected NSI downloaded from http://finance.yahoo.com/. The non-stationary behavior, together with the different bases of currency and time, motivate the use of logarithmic returns , where is the value of the National Stock Index at time . Beside this, the marginal distribution of log returns shows heavy tails and irregular cyclical trends in the nonparametric estimation of the mean, while EWMA estimates highlight rapid changes of volatility during the financial crises observed in the recent years. All these results, together with large settings and high frequency data typical in financial fields, motivate the use of our approach to obtain a better characterization of the time-varying dependence structure among financial markets.
5.2 LAF for National Stock Index (NSI)
Posterior computation is performed by first rescaling the predictor space to and using the same setting of the first simulation study, with the exception of the truncation levels fixed at and (which we found to be sufficiently large from the fact that the last few columns of the posterior samples for assumed values close to ) and the hyperparameters of the nGP prior for each and with and , set to and to capture also rapid changes in the mean functions according to Figure 5. Missing values in our dataset do not represent a limitation since the Bayesian approach allows us to update our posterior considering solely the observed data. We run Gibbs iterations with a burn-in of . Examination of trace plots of the posterior samples for showed no evidence against convergence.
Posterior distributions for the variances in Figure 6 demonstrate that we are clearly able to capture the rapid changes in the dynamics of volatility that occur during the world financial crisis of , in early with the Greek debt crisis and in the summer of with the financial speculation in government bonds of European countries together with the rejection of the U.S. budget and the downgrading of the United States rating.
Moreover, the resulting marginal distribution of the log returns induced by the posterior mean of and , shows that we are also able to accommodate heavy tails as well as mean patterns cycling irregularly between slow and more rapid changes.
From the correlations between NASDAQ and the other National Stock Indices (based on the posterior mean of the covariances function) in Figure 7, we can immediately notice the presence of a clear geo-economic structure in world financial markets (more evident in LAF than in BCR), where the dependence between the U.S. and European countries is systematically higher than that of South East Asian Nations (Economic Tigers), showing also different reactions to crises. Plots at the top of the Figure 8 confirms the above considerations showing how Western countries exhibit more connection with countries closer in terms of geographical, political and economic structure; the same holds for Eastern countries where we observe a reversal of the colored curves. As expected, Russia is placed in a middle path between the two blocks. A further element that our model captures about the structure of the markets is shown in the plots at the bottom of Figure 8. The time-varying regression coefficients obtained from the standard formulas of the conditional normal distribution based on the posterior mean of highlight clearly the increasing dependence of European countries with higher crisis in sovereign debt and Germany, which plays a central role in Eurozone as expected.
The flexibility of the proposed approach and the possibility of accommodating varying smoothness in the trajectories over time, allow us to obtain a good characterization of the dynamic dependence structure according with the major theories on financial crisis. Top plot in Figure 7 shows how the change of regime in correlations occurs exactly in correspondence to the burst of the U.S. housing bubble (A), in the second half of . Moreover we can immediately notice that the correlations among financial markets increase significantly during the crises, showing a clear international financial contagion effect in agreement with other theories on financial crisis (see, e.g., Baig and Goldfaijn, 1999, and Claessens and Forbes, 2009). As expected the persistence of high levels of correlation is evident during the global financial crisis between late-2008 and end-2009 (C), at the beginning of which our approach also captures a sharp variation in the correlations between the U.S. and Economic Tigers, which lead to levels close to those of Europe. Further rapid changes are identified in correspondence of Greek crisis (D), the worsening of European sovereign-debt crisis and the rejection of the U.S. budget (F) and the recent crisis of credit institutions in Spain together with the growing financial instability Eurozone (G). Finally, even in the period of U.S. financial reform launched by Barack Obama and EU efforts to save Greece (E), we can notice two peaks representing respectively Irish debt crisis and Portugal debt crisis. Note also that BCR, as expected, tends to over-smooth the dynamic dependence structure during the financial crisis, proving to be not able to model the sharp change in the correlations between USA NASDAQ and Economic Tigers during late-2008, and the two peaks representing respectively Irish and Portugal debt crisis at the beginning of 2011.
5.3 National Stock Indices, Updating and Predicting
The possibility to quickly update the estimates and the predictions as soon as new data arrive, represents a crucial aspect to obtain quantitative informations about the future scenarios of the crisis in financial markets. To answer this goal, we apply the online updating algorithm presented in subsection , to the new set of weekly observations from to conditioning on posterior estimates of the Gibbs sampler based on observations available up to . We initialized the simulation smoother algorithm with the last observations of the previous sample.
Plots at the top of Figure 9 show, for selected National Stock Indices, the new observed log returns (black) together with the mean and the and quantiles of the marginal distribution (red) and conditional distribution (green) of with . We use standard formulas of the multivariate normal distribution based on the posterior mean of the updated after Gibbs iterations with a burn in of . This is sufficient for convergence based on examining trace plots of the time-varying mean and covariance matrices. From these results, we can clearly notice the good performance of our proposed online updating algorithm in obtaining a characterization for the distribution of new observations. Also note that the multivariate approach together with a flexible model for the mean and covariance, allow for significant improvements when the conditional distribution of an index given the others are analyzed.