Forecasting with time-varying vector autoregressive models

# Forecasting with time-varying vector autoregressive models

K. Triantafyllopoulos111Department of Probability and Statistics, Hicks Building, University of Sheffield, Sheffield S3 7RH, UK, Email: k.triantafyllopoulos@sheffield.ac.uk, Tel: +44 114 222 3741, Fax: +44 114 222 3759.
July 5, 2019
###### Abstract

The purpose of this paper is to propose a time-varying vector autoregressive model (TV-VAR) for forecasting multivariate time series. The model is casted into a state-space form that allows flexible description and analysis. The volatility covariance matrix of the time series is modelled via inverted Wishart and singular multivariate beta distributions allowing a fully conjugate Bayesian inference. Model performance and model comparison is done via the likelihood function, sequential Bayes factors, the mean of squared standardized forecast errors, the mean of absolute forecast errors (known also as mean absolute deviation), and the mean forecast error. Bayes factors are also used in order to choose the autoregressive order of the model. Multi-step forecasting is discussed in detail and a flexible formula is proposed to approximate the forecast function. Two examples, consisting of bivariate data of IBM shares and of foreign exchange (FX) rates for 8 currencies, illustrate the methods. For the IBM data we discuss model performance and multi-step forecasting in some detail. For the FX data we discuss sequential portfolio allocation; for both data sets our empirical findings suggest that the TV-VAR models outperform the widely used VAR models.

Some key words: Bayesian forecasting, multivariate time series, stochastic volatility, state space, foreign exchange rates, portfolio allocation.

## 1 Introduction

Over the past 30 years there has been an emerging literature on multivariate time series (Hamilton, 1995; Tsay, 2002; Lütkepohl, 2005). Multivariate time series forecasting is required in many financial applications, for example to enable optimal portfolio allocation or to construct trading strategies over sectors of the market, or exchange rates. In this direction vector autoregressive models (Sims, 1980; Tiao and Tsay, 1989; Ni and Sun, 2003) are well established as flexible and useful multivariate models. A comprehensive treatment of these models can be found in Lütkepohl (2005).

A recent advance in time series analysis is the development of autoregressive (AR) models, and more generally autoregressive moving average models, with time-varying coefficients. Such models are developed as in Kitagawa and Gersch (1985, 1996), Dahlhaus (1997), West et al. (1999), Prado and Huerta (2002), Andrieu et al. (2003), Lundbergh et al. (2003), Francq and Gautier (2004), Moulines et al. (2005), Huerta and Prado (2006), Abramovich et al. (2007), Triantafyllopoulos and Nason (2007), and Zhu and Wu (2007). All these studies refer to univariate time series; attempts to model vector time series with time-varying autoregressive (TV-VAR) models include Jiang and Kitagawa (1993), Sarantis (2006), Sato et al. (2007), and Triantafyllopoulos (2007). Although such models are capable of capturing the time-varying behaviour of time series data, it is desirable that multivariate stochastic volatility is included in the model, in particular for situations of financial time series forecasting. For example Uhlig (1997) uses a vector autoregressive (VAR) model that incorporates a stochastic volatility component, but its application is relatively limited due to the fact that one needs to resort to simulation-based estimation techniques (in particular Monte Carlo simulation). In a similar direction Aguilar and West (2000) describe the use of particle filters for portfolio allocation, but for forecasting and in particular for sequential multi-step forecasting, it is desirable to resort to analytic methods that are fast and computationally more stable than their simulation-based counterparts.

Suppose that the vector time series , which is observed at roughly equal intervals of time , is generated by the autoregressive model

 yt=ϕ0t+Φ1tyt−1+⋯+Φdtyt−d+ϵt,ϵt∼Np(0,Σt),t>d, (1)

where is a vector, is a matrix , is a positive integer (known as the lag order of the autoregression), is a covariance matrix, the sequence of innovations is independent and follows a -variate Gaussian distribution.

In VAR estimation, a common practice is to recast the model in VAR form of order 1, known as the reduced form, see e.g. Tiao and Tsay (1989). It is possible to extend this to our model (1), and write

 ⎡⎢ ⎢ ⎢ ⎢ ⎢⎣ytyt−1⋮yt−d+1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢⎣ϕ0t0⋮0⎤⎥ ⎥ ⎥ ⎥⎦+⎡⎢ ⎢ ⎢ ⎢ ⎢⎣Φ1tΦ2t⋯Φd−1,tΦdtIp0⋯00⋮⋮⋱⋮⋮00⋯Ip0⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢ ⎢⎣yt−1yt−2⋮yt−d⎤⎥ ⎥ ⎥ ⎥ ⎥⎦+⎡⎢ ⎢ ⎢ ⎢⎣ϵt0⋮0⎤⎥ ⎥ ⎥ ⎥⎦,

which can be expressed as , where , , and . Now the model for is a TV-VAR model of order 1. Although it can be claimed that possesses a more compact model as that of , in principle it is harder to derive multi-step forecasting equations for , as the forecast horizon will be included in the dimension of and this can cause technical problems. Furthermore, we notice that has a singular covariance matrix (only elements of are non-degenerate) and so if one wishes to apply a prior distribution on the covariance matrix of , this has to be a singular multivariate distribution, e.g. a singular Wishart distribution. As a result the above reduced formulation causes some technical difficulties. These difficulties may be reasonably overcome when , the dimension of , and , the AR order, are not too large, but if at least one of or are large, then the advantage of employing the reduced form model, seems to be lost.

The aim of this paper is to suggest alternative state-space forms which will enable the modeller to overcome many of the difficulties mentioned above. In particular, this paper proposes Bayesian estimation for (1) after it is put in an appropriate state-space form. The proposed model includes a stochastic evolution that can update the volatility covariance matrix from time to . This evolutionary law is supported by inverted Wishart and singular multivariate beta distributions, following developments of Uhlig (1994) and Triantafyllopoulos (2008). We discuss multi-step forecasting for this model and we propose a simple formula that approximates the true forecast function. The choice of the autoregressive order is an important problem (see e.g. de Waele and Broersen, 2003; or Abramovich et al., 2007), which here is resolved by using sequential Bayes factors as a means of model selection. For model judgment and model comparison we discuss several measures of goodness of fit (mean of squared standardized forecast errors, mean absolute forecast error, and mean forecast error) as well as a likelihood-based criterion and sequential Bayes factors. Two examples, consisting of bivariate data of IBM shares and foreign exchange (FX) rates of 8 currencies, illustrate the methods. For the IBM data set we discuss in detail model performance and multi-step forecasting; for the FX data we discuss a sequential portfolio selection strategy. For both data sets the time-varying model is found to outperform its time-invariant counterparts and as a result we propose that TV-VAR models are superior to VAR models.

The rest of the paper is organized as follows. In the next section we define the model, which estimation and forecasting is discussed in Section 3. Sections 4 and 5 include the data analyses and the paper closes with brief concluding comments.

## 2 Model setting

Consider model (1), with and generated by a random walk. This model postulates that and . Other evolutionary models may be considered, e.g. a Markovian evolution, but here we focus our attention on the random walk model.

Write

 Φ′t=[ϕ0tΦ1t⋯Φdt],

where denotes the transpose matrix of so that is a matrix and is a matrix.

Then we can express the above random walk as

 Φt=Φt−1+Ωt, (2)

where we assume that the random matrix follows a matrix-variate Gaussian distribution, i.e. . This implies that follows the -variate Gaussian distribution , where denotes the column stacking operator of a matrix and denotes the Kronecker product. Here the covariance matrix is assumed known, but later we discuss its specification using several discount factors. It is also assumed that is independent of and that for any , is independent of .

Model (1) can be written in state-space form as

 (3)

where is generated by evolution (2). We use the notation for the information set up to time , i.e. .

It remains to define an evolution for . Here we use the model of Triantafyllopoulos (2008), which adopts the multiplicative evolutionary law of Uhlig (1994), based on the convolution of the Wishart and the singular multivariate beta distribution. For the purpose of this paper, we assume that is a symmetric positive-definite matrix and we propose the following law

 Σ−1t=kU(Σ−1t−1)′BtU(Σ−1t−1), (4)

where denotes the upper triangular matrix of the Choleski decomposition of , follows independently of a singular multivariate beta distribution with degrees of freedom and , written . The singularity of this beta distribution is reflected on the fact that , and so the matrix is singular. The singular multivariate beta density, which is defined on the Stiefel manifold, replaces the determinant of (which is zero) by the product of the positive eigenvalues of that matrix; for more details on the beta distribution the reader is referred to Uhlig (1994). Here we set , where is a discount factor and is defined as . The choice of the beta distribution and the choice of are done so that resembles a random walk type evolution, i.e. and , where denotes covariance matrix and denotes the column stacking operator of a covariance matrix. This simply says that going from time to , the expectation of the volatility remains unchanged, and the respective covariance matrix of the vectorized form of the volatility is increased.

For , the model consists of equations (2), (3) and (4), together with the priors

 Φd∼N(dp+1)×pN(md,Pd,Σd)andΣ−1d∼Wp(n+2p,Sd), (5)

the latter of which, denotes a Wishart distribution with degrees of freedom and parameter matrix . The quantities are assumed known. A weakly informative prior specification suggests , and . The covariance matrix , which is responsible for the shocks of , can be defined by using a discount matrix , i.e. , with and being known at time so that . The discount matrix is the diagonal matrix of discount factors , i.e. . Similar discount models are discussed in West and Harrison (1997). The discount factors and can be regarded as hyperparameters and they can be specified by considering goodness of fit criteria.

We note that, although the above model is very general, subclasses of this include interesting and applicable models. For example, if , then we obtain a traditional stochastic volatility model and (Triantafyllopoulos, 2008). If , are all time-invariant, then we obtain a VAR model with stochastic volatility (Uhlig, 1997). If is time-invariant (this can be achieved by setting , but we need to replace by ), then we obtain the model of Triantafyllopoulos (2007); such a model may be useful for very short periods of time (when the volatility can be assumed time-invariant), or for time series that do not exhibit a heteroscedastic behaviour (such data may arise in environmental studies).

## 3 Estimation and forecasting

### 3.1 Bayesian estimation

Bayesian estimation follows from a generalization of the Kalman filter, the details of which can be found in Triantafyllopoulos (2008). Here we briefly describe the algorithm. Suppose that at time , the posterior distributions of , given , and are given by and (here denotes the inverted Wishart distribution so that follows a Wishart distribution). Moving on to time with information , we have and , where and are defined in the previous section. Then by observing , with information , we update the posteriors at as and , with , , , , , and . is called the adaptive vector (which plays a similar role as the Kalman gain in the standard Kalman filter), and is the one-step ahead forecast error. At time , starting with the priors (5), the above posterior distributions give a recursive algorithm for any .

Given some data the likelihood function of can be derived by

 L(Σd+1,…,ΣN;yN)=N∏t=d+1p(yt|Σt)p(Σt|Σt−1),

where is the density function of conditional on , and is the density function of , given . From the Kalman filter we have and from the singular multivariate beta density of model (4), we obtain

 p(Σt|Σt−1)=π−p/2k−p(n−p)/2Γ((n+1)/2)Γ(n/2)|Lt|−p/2|Σt−1|(n−p)/2|Σt|−(n−p−1)/2,

where and are defined in Section 2, and is the diagonal matrix with elements the non-zero eigenvalues of the matrix . This result is derived by the evolution of the volatility (4) and the singular multivariate beta density of , details of which can be found in Triantafyllopoulos (2008). Then the log-likelihood function of is

 ℓ(Σd+1,…,ΣN;yN) = c−12N∑t=d+1(yt−m′t−1Ft)′Σ−1t(yt−m′t−1Ft)+n−p2N∑t=d+1log|Σt−1| −n−p2N∑t=d+1log|Σt|−p2N∑t=d+1log|Lt|,

where

 c=−Np2log(2π2)−Np(n−p)2logk+NlogΓp((n+1)/2)Γp(n/2).

From , by replacing the posterior mean of in , we can evaluate the above log-likelihood function for the given data .

### 3.2 Stability of Pt

In this section we show that if is a bounded sequence, then the sequence is also bounded. This is an important property because if were unbounded, then the estimates of and the forecasts of would be totally unstable, possibly having infinite variances. In particular, given data , we assume that , where denotes the Eucledian norm of a vector or matrix and is some positive number. This in principle means that by writing , all are bounded scalar time series.

Since is bounded, is also bounded and thus is bounded. From the recursion we have (for all ) and so we can write

 P−1t=Δ(t−d)/2P−1dΔ(t−d)/2+t−d∑i=0Δi/2Ft−iF′t−iΔi/2≈t−d∑i=0Δi/2Ft−iF′t−iΔi/2,

by adopting the weakly informative prior . Then

 ∥P−1t∥≤t−d∑i=0∥Δi/2∥2∥Ft−iF′t−i∥≤Mt−d∑i=0(δi1+⋯+δidp+1)≤Mdp+1∑j=111−δj<∞,

which shows that is bounded. If , one needs to add a factor in the bound of , which again shows that is bounded, except if . Thus, for any prior , the sequence is bounded. Since is non-singular, for any , it follows that the sequence is also bounded, for any prior .

### 3.3 Choice of order d

The performance of the model is usually judged against its forecast performance, which is best described by its forecast distribution. Thus sequential Bayes factors, which for univariate time series are described in West and Harrison (1997), can be used in order to compare and contrast models which differ in the order of the AR. Suppose we have two TV-VAR models, which have respectively lag orders and and thus they consequently differ in the design vectors and , which are functions of and . According to the above, the one-step forecast distributions are Student , i.e. and thus the Bayes factor of model 1 against model 2 is defined by

 Ht(1)=p(yt+1|yt,model 1)p(yt+1|yt,model 2)=(|Q∗2t(1)||Q∗1t(1)|)1/2(βn+e′1,t+1(Q∗1t(1))−1e1,t+1βn+e′2,t+1(Q∗2t(1))−1e2,t+1)−(βn+p)/2,

where is the one-step forecast mean of (under model ), and is the scale matrix of the distribution (see also the next section). Comparison of with 1 can indicate preference of model 1 (if ) or preference of model 2 (if ), while if , the two models are equivalent. Thus we choose the order that yields larger values in the respective forecast function. The above formula for the Bayes factor can also be used more generally for comparison of two models that differ in the values of the hyperparameters, e.g. in the discount factors. the advantage of the above scheme is that it can be applied sequentially and thus indicate preference of a model at each time point. This can lead to sequential model comparison and in the context of this section one can consider schemes where the order of the AR is time-varying, although this is not developed further here.

### 3.4 Multi-step forecasting

Forecasting commences by considering the density of , for some positive integer , known as the forecast horizon. Denote with the -step forecast mean of , i.e. and with the respective -step forecast covariance matrix. Note that in the state-space formulation (3), can be stochastic, as it includes values of . We denote with the estimate of by replacing the unknown values of by the known , for all . Note that if we have

 ˆF′t+h=[1y′t(h−1)⋯y′t(h−d)]

while if we get

 ˆF′t+h=[1y′t(h−1)⋯y′t(1)y′t⋯y′t+h−d]

From the evolution (2) of we obtain

 Φt+h=Φt+h∑i=1Ωt+i.

From the posterior distribution we obtain and so from , the -step forecast density of can be approximated as , where and . Here denotes the -variate Student distribution, from which it follows

 yt(h)=m′tˆFt+handQt(h)=(ˆF′t+hRt(h)ˆFt+h+1)(1−β)k−13β−2St, (6)

where . For , the forecast mean vector and the forecast covariance matrix are exact, as , which, given is not stochastic.

We can define the -step forecast errors as , which approximately follow a distribution, i.e. , where . We can then define the standardized -step forecast errors as , where denotes the symmetric square root matrix of , so that, approximately, , from which we can derive quantiles and so we can build credible bounds for . Finally, from and , we can define standardized versions of as so that and . Then, a measure of goodness of fit is the mean of squared standardized -step forecast errors, defined as , which, if the model fit is good should be close to the vector . A second measure of goodness of fit is the mean absolute forecast error, known also as the mean absolute deviation, defined as , where and indicates absolute value. Finally a third measure of goodness of fit is the mean forecast error, defined by , which can be used to measure how biased are the forecasts. the log-likelihood function (evaluated at the posterior mean of the volatility) and the Bayes factors (see also the previous section) can be used for model performance and model comparison.

For the volatility, from the evolution (4) we have and so we can find some symmetric random matrix with zero mean so that , defining a random walk evolution for . This implies that and one way to support this evolution of is by defining , where follows, independently of , a singular multivariate beta distribution with degrees of freedom and 1/2 respectively. Then, from the Wishart distribution of , we have and so . Then we can write

 St(h)=E(Σt+h|yt)=E(Σt+1|yt)=(1−β)k−13β−2St.

## 4 Forecasting IBM shares and S&P 500 index data

We consider bivariate time series data consisting of 888 log-returns of IBM shares and of the S&P 500 index. The data, which are plotted in Figure 1, are collected in a daily frequency and they are discussed in Tsay (2002, Chapter 9). Here we propose the use of TV-VAR models with stochastic volatility in order to forecast time series and the volatility. Table 1 shows the log-likelihood function (evaluated at the posterior mean of ), the mean of squared standardized one-step forecast errors () and the mean absolute one-step forecast error (), for a variety of TV-VAR models. Here, for the evolution of , we have used a single discount factor so that . As gets small (here ), the performance of the models deteriorates. The largest log-likelihood function (for the given data) is obtained for the model with , , , and it is indicated with boldface in the table. This model produces also decent values in the , being close to 1. It is worth mentioning that large values of the order of the TV-VAR results in models that can approximate time-varying vector moving average models, e.g. here . However, according to Table 1 all models with are inferior to that model with , and . For we obtain a time-invariant VAR model with stochastic volatility, but this is inferior to the TV-VAR models, even compared with the less favourable TV-VAR models using (results not shown here). It turns out that a slow evolution of the AR matrices (corresponding to ) produces the best results.

In order to further compare the chosen model with other models, we use the Bayes factors as discussed in the previous section. We compare model 2 () with models 1,3,4,5,6 having respectively , where for all 6 models and . Figure 2 shows 5 plots of the Bayes factors of model 2 against the other 5 models. From these plots the superiority of model 2 is clear; for most of the time points model 2 produces Bayes factors larger than 1. For each of the comparisons the mean of the Bayeas factor is larger than 1, i.e. 3.487 (model 2 vs model 1), 1.688 (model 2 vs model 3), 2.083 (model 2 vs model 4), 14.965 (model 2 vs model 5), and 9.021 (model 2 vs model 6).

Figure 3 shows the one-step forecast of the volatility and the one-step forecast of the correlation between the IBM shares and the S&P 500 index. The forecast of the correlation indicates that IBM shares are highly correlated with values of the S&P 500 index and that this correlation is dynamic.

In the previous section it is shown that the sequence of the matrices is bounded and this was a key argument on the stability of the forecast covariance matrix. Figure 4 plots the values of , from which we can see that the diagonal elements of are bounded above by 1, while the off-diagonal elements of are bounded below by -0.030 and above by 0.007. However, we should note that, although is bounded, it does not converge to a stable matrix , something that is usual in dynamic linear models with time-invariant design components.

One of the advantages of the proposed model is its capability to forecast future values of the time series vector, but also to forecast the volatility. Table 2 shows three performance measures of multi-step forecasting, ranging from (one day forecast) to (twenty day forecast). As expected the forecast performance deteriorates as increases, but the model is still usable for a 5-day forecast (corresponding to a weekly forecast, considering only trading days). The 20-day forecast (corresponding to a monthly forecast) produces very large and . It is interesting to note that throughout the range of , the retains reasonable values (being close to 1), where for the forecast covariance matrix is slightly underestimated.

## 5 Sequential portfolio allocation: foreign exchange data

We consider optimal portfolio allocation for FX data. The exchange rates are the Australian dollar (AUD), British pounds (GBP), Canadian dollar (CAD), German Deutschmark (GDM), Dutch guilder (DUG), French frank (FRF), Japanese yen (JPY) and Swiss franc (SWF), all expressed as number of units of the foreign currency per US dollar. The data, part of which are discussed in Franses and Dijk (2000), consist of 774 daily observations from 2 January 1995 to 31 December 1997. Let be the exchange rate of currency and be the 8-dimensional time series vector () consisting of the geometric returns (shown in Figure 5) of the above exchange rates; being the return of AUD, , being the return of SWF. Here, as in Aguilar and West (2000), we have chosen to use geometric returns for the analysis, but log-returns can also be used (Soyer and Tanyeri, 2006).