Predicting cryptocurrencies using sparse nonGaussian state space models
Abstract
In this paper we forecast daily returns of cryptocurrencies using a wide variety of different econometric models. To capture salient features commonly observed in financial time series like rapid changes in the conditional variance, nonnormality of the measurement errors and sharply increasing trends, we develop a timevarying parameter VAR with tdistributed measurement errors and stochastic volatility. To control for overparameterization, we rely on the Bayesian literature on shrinkage priors that enables us to shrink coefficients associated with irrelevant predictors and/or perform model specification in a flexible manner. Using around one year of daily data we perform a realtime forecasting exercise and investigate whether any of the proposed models is able to outperform the naive random walk benchmark. To assess the economic relevance of the forecasting gains produced by the proposed models we moreover run a simple trading exercise.
Keywords:  Stochastic volatility, tdistributed errors, Bitcoin, density forecasting 
JEL Codes:  C11, C32, E51, G12 
1 Introduction
In the present paper we develop a nonGaussian state space model to predict the price of three cryptocurrencies. Taking a Bayesian stance enables us to introduce shrinkage into the modeling framework, effectively controlling for model and specification uncertainty within the general class of state space models. To control for potential outliers we propose a timevarying parameter VAR model (Cogley and Sargent, 2005; Primiceri, 2005) with heavy tailed innovations^{1}^{1}1For a recent exposition on how to introduce flexible error distributions in VAR models with stochastic volatility, see Chiu et al. (2017). as well as a stochastic volatility specification of the error variances. Since the literature on robust determinants of price movements in cryptocurrencies is relatively sparse (for an example, see, Cheah and Fry, 2015), we apply Bayesian shrinkage priors to decide whether using information from a set of potential predictors improves predictive accuracy.
The recent price dynamics of various cryptocurrencies point towards a set of empirical key features an appropriate modeling strategy should accommodate. First, conditional heteroscedasticity appears to be an important regularity commonly observed (Chu et al., 2017). This implies that volatility is changing over time with persistent manner. If this feature is neglected, predictive densities are either too wide (during tranquil times) or too narrow (in the presence of tail events, i.e. pronounced movements in the price of a given asset).^{2}^{2}2Controlling for heteroscedasticity in macroeconomic and financial data proves to be an important task when it comes to prediction, see Clark (2011); Clark and Ravazzolo (2015); Huber and Feldkircher (2017). Second, the conditional mean of the process is changing. This implies that, within a standard regression framework, the relationship between an asset price and a set of exogenous covariates is timevarying. In the case of various cryptocurrencies this could be due to changes in the degree of adoption of institutional and/or private investors, regulatory changes, issuance of additional cryptocurrencies or general technological shifts (Böhme et al., 2015). Thus, it might be necessary to allow for such shifts by means of timevarying regression coefficients. Third, and finally, a rather strong degree of comovement between various cryptocurrencies (see Urquhart, 2017). In our paper, we consider Bitcoin, Ethereum and Litecoin, three popular choices. All three of them tend to be strongly correlated with each other, implying that a successful econometric framework should incorporate this information.
The goal of this paper is to systematically assess how different empirically relevant forecasting models perform when used to predict daily changes in the price of Bitcoin, Ethereum and Litecoin. The models considered include a wide range of univariate and multivariate models that are flexible along several dimensions. We consider vector autoregressions that feature drifting parameters as well as timevarying error variances. To cope with the curse of dimensionality we introduce recent shrinkage priors (see Feldkircher et al., 2017) and a flexible specification for the law of motion of the regression parameters (Huber et al., 2017). In addition, we introduce a heavy tailed measurement error distribution to capture potential outlying observations (see, among others, Carlin et al., 1992; Geweke and Tanizaki, 2001).
We jointly forecast the three cryptocurrencies considered by using daily data from October 2016 to October 2017, with the last 160 days being used as a holdout period. In a forecasting comparison, we find that timevarying parameter VARs with some form of shrinkage perform well, beating univariate benchmarks like the AR(1) model with stochastic volatility (SV) as well as a random walk with SV. Constant parameter VARs tend to be inferior to their counterparts that feature timevarying parameters, but still prove to be relevant competitors. Especially during days which are characterized by large price changes, controlling for heteroscedasticity in combination with a flexible error variance covariance structure pays off in terms of predictive accuracy. These findings are generally corroborated by considering probability integral transforms, showing that more flexible models lead to better calibrated predictive distributions. Moreover, a trading exercise provides a comparable picture. Models that perform well in terms of predictive likelihoods also tend to do well when used to generate trading signals.
The remainder of this paper is structured as follows. Section 2 provides an overview of the data as well as empirical key features of the three cryptocurrencies considered. Moreover, this section details how the additional explanatory variables are constructed. Section 3 introduces the econometric framework adopted, providing a brief discussion of the model as well as the Bayesian prior setup and posterior simulation. Section 4 presents the empirical forecasting exercise while Section 5 focuses on applying the proposed models to perform portfolio allocation tasks. Finally, the last section summarizes and concludes the paper.
2 Empirical key features
In this section we first identify important empirical key features of cryptocurrencies and then propose a set of covariates that aim to explain the low to medium frequency behavior of the underlying price changes.
For the present paper, we focus on the daily change in the log price of Bitcoin, Ethereum and Litecoin. To explain movements in the price of the three cryptocurrencies considered, we include information on equity prices (measured through the log returns of the S&P500 index), the relative number of search queries for each respective cryptocurrency from Google trends, the number of English Wikipedia page views as well as the difference between the weekly cumulative price trend from common mining hardware and similar, but miningunsuitable, GPUrelated products to capture the effect of supplyside factors.
The data spans the period from 26th November 2016 to 3rd October 2017, yielding a panel of 316 daily oberservations. Bitcoin, Ethereum and Litecoin closing prices are taken from a popular cryptocurrency metaplatform.^{3}^{3}3For more information, see coinmarketcap.com. They originate from major crypto exchanges and are averaged according to their daily trading volume. Furthermore, alternative financial investments are represented by the S&P500 indices daily closing prices. Additionally, demandside predictors like the relative number of worldwide search operations from Google trends and the number of Wikipedia page views (in english) are used. Because largescale cryptocurrency mining impacts supply and prices for the required equipment at the same time, hardware price trends are utilized to express changes in supply. To capture these effects, we gather GPU prices from Amazon’s bestseller lists and extract the price trend of common mining hardware. We construct this predictor by computing the difference between the weekly cumulative price trend from common mining hardware (e.g., AMD Radeon RX 480 graphic cards) and similar, but GPUrelated products that are unsuitable for mining activities (e.g., a AMD Radeon R5 230 graphics card).
To provide additional information on the recent behavior of cryptocurrencies, Fig. 1 presents the log returns (left panel) as well as the squared log returns (right panel) for all three currencies under scrutiny.
At least two features are worth emphasizing. First, notice that in the first part of the sample (i.e. the end of 2016 and the beginning of 2017), price changes have been comparatively small. This can be seen in both panels of the figure and for Bitcoins and Litecoins. For Ethereum, the pattern is slightly different but we still observe a general increase in variation during the second part of 2017.
Second, the degree of comovement between the three currencies increased markedly in 2017, where most major peaks and troughs coincide. This carries over to the squared returns, where we find that especially the sharp increase in volatility in September 2017 was common to all three currencies considered.
These two empirical regularities suggest that the proposed model should be able to capture comovement between Bitcoin, Litecoin and Ethereum prices as well as changes in the first moment of the sampling density. Moreover, the right panel indicates that large shocks appear to be quite common, calling for a flexible error distribution that allows for heteroscedasticity.
In order to provide further information on the amount of comovement in our dataset, Fig. 2 shows a heatmap of the lower Cholesky factor of the empirical correlation matrix of the nine time series included.
The upper part of the figure reveals that all three assets display a pronounced degree of comovement. This indicates that each individual time series might carry important information on the behavior of the remaining two time series, pointing towards the necessity to control for this empirical regularity. For the remaining factors we do find nonzero correlation but these correlations appear to be rather muted. Nevertheless, we conjecture that the set of fundamentals above should be a reasonable starting point to explain movements in the price of cryptocurrencies.
3 Econometric framework
3.1 A multivariate state space model
To capture the empirical features of the three cryptocurrencies, a flexible econometric model is needed. We assume that the three cryptocurrencies as well as the additional covariates are stored in an dimensional vector that follows a VAR() model with timevarying coefficients,
(3.1) 
with (for ) being a set of dimensional coefficient matrices and is a multivariate vector of reduced form shocks with a timevarying variance covariance matrix,
(3.2) 
Hereby we let be a lower unitriangular matrix with and being a dimensional vector of ones. Moreover, is a diagonal matrix with typical diagonal element . The logarithmic volatilities are assumed to follow an AR(1) process,
(3.3) 
denotes the unconditional mean of the logvolatility process while and are the persistence and variance parameters, respectively.
Following Carriero et al. (2015) and Feldkircher et al. (2017) we rewrite Eq. (missing) as follows,
(3.4) 
where and is a vector of orthogonal shocks with a timevarying variancecovariance matrix.
Note that the th equation (for ) of this system can be written as,
(3.5) 
We let be the stacked vector of covariates and is the matrix of stacked coefficients with selecting the th row of the matrix concerned. Eq. (missing) is a simple regression model with heteroscedastic innovations and the (negative) of the reduced form shocks of the preceding equations as additional regressors. In the case of , Eq. (missing) reduces to a simple univariate regression with as covariates. It proves to be convenient to rewrite Eq. (missing) as follows
(3.6) 
where is a dimensional vector of regression coefficients and . One important implication of Eq. (missing) is that the covariance parameters are effectively estimated in one step alongside the VAR coefficients.
We assume that evolves according to a random walk process,
(3.7) 
The shocks to the states follow a Gaussian distribution with diagonal variancecovariance matrix . To facilitate variable selection/shrinkage we follow FrühwirthSchnatter and Wagner (2010); Belmonte et al. (2014); Bitto and FrühwirthSchnatter (2016) and rewrite the model given by Eqs. (3.6)  (3.7) as follows,
(3.8)  
(3.9)  
(3.10) 
The matrix is a matrix square root such that with typical element and the th element of reads . This parameterization, labeled the noncentered parameterization, implies that the state innovation variances are moved into the observation equation (see Eq. (missing)) and treated as standard regression coefficients. Thus, if , the coefficient associated with the th element in is constant over time.
Up to this point we have remained silent on the distributional assumptions on the measurement errors. In what follows we depart from the literature on TVPVARs and assume that the measurement errors are heavy tailed and follow a tdistribution. This choice is based on evidence in the literature (Geweke, 1994; Gallant et al., 1997; Jacquier et al., 2004) which calls for heavy tailed distributions when used to model daily financial market data. As can be seen in Fig. 1, we also observe multiple outlying observations for all three cryptocurrencies under consideration.
Since the assumption of nonGaussian errors would render typical estimation methods like the Kalman filter infeasible, we follow Harrison and Stevens (1976); West (1987); Gordon and Smith (1990) and use a scale mixture of Gaussians to approximate the tdistribution,
(3.11)  
(3.12) 
Notice that the degree of freedom parameter is equationspecific, implying that the excess kurtosis of the underlying error distribution is allowed to change across equations, a feature that might be important given the different time series involved. The latent process simply serves to rescale the Gaussian distribution in case of large shocks. Notice that if for all we obtain the standard timevarying parameter VAR as in Primiceri (2005).
3.2 Prior specification
The prior setup adopted closely follows Feldkircher et al. (2017). More specifically, we use a NormalGamma (NG) shrinkage prior on the elements of and .
The NG prior comprises of a Gaussian prior on the coefficients alongside a set of local and global shrinkage parameters for the first elements of and ,
(3.13)  
(3.14) 
for and . Here we let (for ) denote local shrinkage parameters with
(3.15) 
is a hyperparameter specified by the researcher and is a global shrinkage parameter that is lagspecific, i.e. applied to the elements in and associated with the th lag of , and constructed as follows
(3.16) 
This implies that if , the prior introduces more shrinkage with increasing lag orders. The degree of overall shrinkage is controlled through the hyperparameters and .
Notice that this specification pools the parameters that control the amount of timevariation as well as the timeinvariant regression parameters. This captures the notion that if a variable is not included initially, the probability of having a timevarying coefficient also decreases (by increasing the lagspecific shrinkage parameter ).
For the covariance parameters indexed by the prior is specified analogously to Eqs. (3.13)  (3.14) but with replaced by . This choice implies that all covariance parameters as well as the corresponding process innovation variances are pushed to zero simultaneously. For we again use a Gamma distributed prior,
(3.17) 
with being hyperparameters.
This prior specification has the convenient property that the parameters and introduce prior dependence, pooling information across different coefficient types (i.e. regression coefficients and process innovation variances), introducing strong global shrinkage on all coefficients concerned. By contrast, the introduction of the local scaling parameters serves to provide flexibility in the presence of strong overall shrinkage introduced by and . Thus, even if the aforementioned global scaling parameters are large (i.e. heavy shrinkage is introduced in the model), the local scalings provide sufficient flexibility to drag away posterior mass from zero and allowing for nonzero coefficients. The role of the hyperparameter is to control the tail behavior of the prior. If is small (close to zero), the prior places more mass on zero but the tails of the marginal prior obtained after integrating over the local scales become thicker (see Griffin et al., 2010, for a discussion).
For the parameters of the logvolatility equation in Eq. (missing) we follow Kastner and FrühwirthSchnatter (2014); Kastner (2015a) and use a normally distributed prior on , a Beta prior on and a Gamma prior on . In addition, we specify a uniform prior on , effectively ruling out the limiting case of a Gaussian distribution if becomes excessively large.
3.3 Full conditional posterior simulation
Estimation of the model is carried out using Markov chain Monte Carlo (MCMC) techniques. Our MCMC algorithm consists of the following blocks:

The full history of the logvolatility process as well as the parameters of Eq. (missing) are obtained by relying on the algorithm proposed in Kastner and FrühwirthSchnatter (2014) and implemented in the R package stochvol (Kastner, 2015b).

The timeinvariant components as well as are simulated from a multivariate Gaussian posterior that takes a standard form (see Feldkircher et al., 2017).

The sequence of local scaling parameters is simulated from a generalized inverted Gaussian (GIG) distributed posterior distribution given by,
(3.18) (3.19) for . The posterior distribution for the scalings associated with the covariance parameters is similar with replaced by .

We obtain draws from the posterior of the lagspecific shrinkage parameter associated with the th lag by combining the likelihood with the prior on . The resulting posterior distribution is a Gamma distribution,
(3.20) with the indicating the conditioning on everything else, and . The set selects all coefficients associated with the th lag of .
Similarly, the conditional posterior of is given by
(3.21) where denotes the number of covariance parameters in addition to the number of process variances for the corresponding parameters.

The full history of is obtained by independently simulating from an inverted Gamma distribution (see Kastner, 2015c),
(3.22) for .

To simulate the degrees of freedoms , we perform an independent Metropolis Hastings (MH) step described in Kastner (2015c).
This algorithm is repeated a large number of times with the first observations being discarded as burnin.^{4}^{4}4In the empirical application we use 30,000 overall iterations with the first 15,000 being discared as burnin. Notice that the equationbyequation algorithm yields significant computational gains relative to competing estimation algorithms that rely on fullsystem estimation of the VAR model.
4 Forecasting results
4.1 Model specification and design of the forecasting exercise
In this section, we briefly describe model specification and the design of the forecasting exercise. The prior setup for our benchmark specification (henceforth labeled the tTVP NG) model closely follows the existing literature on NG shrinkage priors (Griffin et al., 2010; Bitto and FrühwirthSchnatter, 2016; Huber and Feldkircher, 2017; Feldkircher et al., 2017). More specifically, we set , and to center the prior on above unity while . The choice for implies that we place a large amount of prior mass on zero while at the same time allow for relatively thick tails. Our choice for the Gamma prior on introduces heavy shrinkage on the covariance parameters as well as the corresponding process standard deviations.
For all models (i.e. the competitors introduced in the next subsection) we consider as well as the proposed model we include a single lag of the endogenous variables. Higher lag orders are generally possible but given the high dimension of the state space and the increased computational complexity we stick to one lag. In addition, experimenting with slightly higher lag orders leads to models that are relatively unstable during several points in time in our estimation sample.
The design of our forecasting exercise is the following. We start with an initial estimation period that spans the period between the end of November 2016 (22nd of November) to the end of April 2017 (26th of April ). The remaining 160 days are used as a holdout period. After obtaining the onestepahead predictive density for the 27th of April 2017, we consequently expand the estimation sample by a single day until the end of the sample is reached. This yields a sequence of 160 onedayahead predictive densities.
To assess the predictive fit of our model we use the logpredictive likelihood (LPL), motivated in, e.g., Geweke and Amisano (2010), and the root mean square forecast error (RMSE). Using LPLs enables us to assess not only how well the model fits in terms of point predictions but also how well higher moments of the predictive density are captured. In addition, to assess model calibration we use univariate probability integral transforms (Diebold et al., 1998; Clark, 2011; Amisano and Geweke, 2017).
4.2 Competing models
Our set of competing models ranges from univariate benchmark models that feature SV to a wide set of multivariate benchmark models. The first set of models considered are a random walk (RWSV) and the AR(1) model (henceforth labeld ARSV), both estimated with SV. We use noninformative priors on the AR(1) regression coefficient and the same prior setup for the logvolatility equation as discussed in the previous section. These two models serve to illustrate whether a multivariate modeling approach pays off and, in addition, whether allowing for structural changes in the underlying regression parameters improves predictive capabilities.
In addition, we consider a set of nested multivariate benchmark models. To quantify the accuracy gains of timevarying parameter specifications, we estimate three constant parameter VARs with SV. The first VAR uses the prior setup described above but with for all . The second model is a nonconjugate Minnesota VAR with asymmetric shrinkage across equations. To select the hyperparameters we follow Giannone et al. (2015) and place hyperpriors on all hyperparameters and estimate them using a random walk Metropolis Hastings step. The last VAR we consider is a model that features a stochastic search variable selection (SSVS) prior specified as in George et al. (2008). This implies that a two component Gaussian prior is used with the Gaussians differing in terms of their prior variance. One component features a large prior variance (labeled the slab distribution) which introduces relatively little prior information whereas the second component has a prior variance close to zero (the spike component) that strongly forces the posterior of the respective coefficient to zero. We set the hyperparameters (i.e. the prior standard deviations) for the slab distribution by using the OLS standard deviation times a constant (ten in our case) while the prior standard deviation on the spike component is set equal to times the OLS standard deviation.
Moreover, we include two timevarying parameter models with SV and Gaussian measurement errors. The first TVPVAR considered (labeled TVP) is based on an uninformative prior (obtained by setting the prior variances to unity for both, the initial states as well as the process standard deviations). The next benchmark model (called TVP NG) is our proposed specification with a NG prior but with Gaussian errors (i.e. for all ). This choice serves to assess whether additional flexibility on the measurement errors is needed.
Finally, the last model considered is the most flexible specification in terms of the law of motion of the latent states. This model, labeled the threshold TVPVAR (labeled TTVP) is based on Huber et al. (2017) and captures the notion that parameter movements are only allowed if they are sufficiently large. To achieve this, a threshold specification for the process variances is adopted. This specification depends on a latent indicator that, in turn, is driven by the absolute size of parameter changes. Thus, if the change in a given regression parameter is large (i.e. exceeds a certain threshold we estimate), we use a large variance in Eq. (missing). By contrast, if the change is small the process variance is set to a small constant that is close to zero. The prior specification adopted here closely follows the benchmark specification outlined in Huber et al. (2017) and we refer to the original paper for additional details.
4.3 Out of sample forecasting performance
We start by considering the forecasting performance in terms of log predictive likelihoods (LPS). Table 1 displays the LPS as well as the RMSEs for the competing models. The first column shows the joint LPS for the three cryptocurrencies considered while the next three columns display the marginal LPS for a given cryptocurrency. The final three columns show the RMSEs.
Log predictive score  Root mean square error  
JointLPS  Bitcoin  Litecoin  Ethereum  Bitcoin  Litecoin  Ethereum  
TTVP  621.023  286.360  134.231  153.201  0.050  0.084  0.078  
TVP  451.631  187.474  106.946  97.300  0.074  0.133  0.134  
TVP NG  632.410  286.134  144.629  159.562  0.050  0.083  0.079  
tTVP NG  643.873  277.679  161.768  166.988  0.050  0.084  0.078  
MinnVAR  577.779  283.399  123.580  153.274  0.051  0.085  0.078  
NGVAR  592.391  286.483  130.194  148.553  0.051  0.084  0.078  
SSVS  586.083  286.255  122.346  153.081  0.051  0.084  0.078  
RWSV  483.952  240.751  131.410  112.487  0.073  0.112  0.114  
ARSV  598.936  280.487  158.899  159.725  0.051  0.085  0.078 
Considering the joint LPS indicates that across models, the tTVP NG specification outperforms the remaining models. This points towards the necessity to allow for both, a flexible error distribution as well as timevarying parameters with appropriate shrinkage priors. Especially when compared to the constant parameter VAR models, all three TVPVAR specifications with some form of shrinkage yield pronounced accuracy gains. Notice also that the AR(1) model with SV proves to be a tough competitor relative to the set of Bayesian VARs.
The necessity of introducing shrinkage in the TVPVAR framework can be seen by comparing the joint forecasting performance of the TVP model with the remaining TVPVARs considered. Notice that in our mediumscale model, a TVPVAR with relatively little shrinkage leads to overfitting issues which in turn are detrimental for forecasting performance.
Zooming into the results for the three cryptocurrencies, we generally observe that models performing well in terms of the joint LPS also do well on average. One interesting exception is our proposed tTVP NG specification. While the performance gains for Litecoin and Ethereum appear to be substantial visavis the competing models, we find that Bitcoin predictions appear to be inferior relative to the TTVP and the TVP NG specifications. If the researcher is interested in predicting the price of Bitcoin, the two best performing models are the TTVP specification and the Bayesian VAR with a NormalGamma shrinkage prior. Interestingly, notice that the comparatively weaker joint performance of the BVAR models stems from weaker Litecoin and Ethereum predictions whereas Bitcoin predictions appear to be rather precise.
Considering point forecasting performance generally corroborates the findings for density forecasts. Here we again observe that models which yield precise predictive densities also work well when only point predictions are considered. Notice, however, that the differences in terms of RMSE between multivariate models and the univariate AR(1) model are negligible. This somewhat highlights that forecasting gains in terms of predictive likelihoods stem from higher moments of the predictive density like the predictive variance (in terms of the marginal log scores) or a more appropriate modeling strategy for the predictive variancecovariance structure.
Next, we investigate whether differences in forecasting performance appear to be timevarying. Fig. 3 shows the log predictive Bayes factors relative to the random walk with SV. Comparing the model performances over time points towards a pronounced degree of heterogeneity over time. For Bitcoin (see panel (a)) shows that the two best performing models are the TTVP and the TVP NG specifications. While the former yields a slightly better performance over time, the latter proves to be the best performing model during the first part of the holdout period. For the remaining models we find only relatively little timevariation in their predictive performance.
Considering the results for Litecoin (see panel (b)) we find pronounced movements in relative forecasting accuracy. More specifically, we find that while forecasting performance appears to be homogeneous during the first months of the holdout period. From May 2017 onward, the tTVP NG specification starts to perform extraordinarily well, improving upon all competitors by large margins.
Finally, panels (c) and (d) show the performance for Ethereum as well as the overall performance over time. Here we generally find results that are comparable with the findings described above. Notice that the overall log predictive likelihood displays a pattern similar to the one of the marginal LPS for the remaining cryptocurrencies. However, compared to panel (a) we observe that the tTVP specification also excels in terms of joint density predictions. The main difference is that the superior performance of the tTVP NG model in terms of predicting Litecoin prices lifts the log predictive Bayes factor above the ones obtained for all competing models.
4.4 Model evaluation using probability integral transforms
Following Diebold et al. (1998); Clark (2011); Amisano and Geweke (2017), if a given model is correctly specified one can show that
(4.1) 
for and and indicating the first observation of the holdout period (i.e. 22nd of November). Hereby we let denote the inverse distribution function of the standard normal distribution and denotes the cumulative distribution function associated with the underlying predictive distribution of model . If the model is correctly specified the sequence of normalized forecast errors is independent and identically standard normally distributed.
Fig. 4 (a) to (c) shows the normalized forecast errors across models and for all three cryptocurrencies considered while Table 2 provides statistical tests that aim to support our visual assessment of Fig. 4. In the case of Bitcoin and Litecoin, we find that the mean appears to be close to zero. This finding is corroborated by the first column in Table 2 which displays the empirical mean obtained by regressing on a constant, with values in parentheses. Notice that for Ethereum, we find the normalized forecast errors of the majority of models to be centered above zero. The two exceptions are the TVP NG specification and the Minnesota prior VAR. Considering again panel (c) reveals that these deviations from zero are mainly driven by the failure to capture the conditional mean during the beginning of the holdout period.
Mean (pvalue)  Variance (pvalue)  Persistence (pvalue)  
Bitcoin  
TTVP1  0.060 (0.401)  0.821 (0.024)  0.078 (0.329) 
TVP1  0.013 (0.838)  0.649 (0.000)  0.085 (0.283) 
TVP NG1  0.004 (0.948)  0.683 (0.000)  0.439 (0.000) 
tTVP NG1  0.051 (0.466)  0.783 (0.005)  0.060 (0.454) 
MinnVAR1  0.007 (0.902)  0.490 (0.000)  0.135 (0.089) 
NGVAR1  0.022 (0.756)  0.809 (0.018)  0.093 (0.243) 
SSVS1  0.058 (0.410)  0.799 (0.007)  0.052 (0.513) 
RWSV1  0.082 (0.246)  0.796 (0.007)  0.058 (0.470) 
ARSV1  0.098 (0.168)  0.804 (0.011)  0.051 (0.518) 
Litecoin  
TTVP2  0.121 (0.255)  1.790 (0.030)  0.023 (0.772) 
TVP2  0.096 (0.254)  1.120 (0.544)  0.011 (0.891) 
TVP NG2  0.009 (0.912)  1.154 (0.347)  0.052 (0.516) 
tTVP NG2  0.115 (0.202)  1.295 (0.187)  0.027 (0.731) 
MinnVAR2  0.049 (0.472)  0.732 (0.007)  0.084 (0.292) 
NGVAR2  0.114 (0.254)  1.596 (0.047)  0.008 (0.917) 
SSVS2  0.128 (0.253)  2.001 (0.031)  0.018 (0.821) 
RWSV2  0.144 (0.188)  1.920 (0.018)  0.017 (0.831) 
ARSV2  0.152 (0.177)  2.020 (0.021)  0.025 (0.756) 
Ethereum  
TTVP3  0.201 (0.025)  1.285 (0.212)  0.121 (0.127) 
TVP3  0.208 (0.026)  1.393 (0.047)  0.059 (0.461) 
TVP NG3  0.090 (0.250)  0.980 (0.93)  0.026 (0.743) 
tTVP NG3  0.148 (0.042)  0.848 (0.306)  0.072 (0.367) 
MinnVAR3  0.023 (0.672)  0.478 (0.000)  0.047 (0.556) 
NGVAR3  0.223 (0.014)  1.335 (0.107)  0.100 (0.207) 
SSVS3  0.188 (0.043)  1.393 (0.056)  0.075 (0.343) 
RWSV3  0.194 (0.040)  1.429 (0.071)  0.065 (0.417) 
ARSV3  0.176 (0.058)  1.380 (0.065)  0.052 (0.514) 
Considering the variances reveals that in the case of Bitcoin, the variances of the normalized errors are all well below unity, indicating that the estimated predictive variance is generally too high. Put differently, this is an indication for a situation where too many actual observations fall in the center of the predictive distribution. This finding appears to be strongly supported by the second column of Table 2, which displays the estimated variance of the normalized forecast error obtained by regressing the squared error on a constant. For the tTVP NG and TTVP specifications we find slightly higher variances. Our interpretation is that allowing for a flexible error specification either by directly using nonGaussian shocks in conjunction with stochastic volatility or by introducing more flexibility on the law of motion of the latent states slightly helps to push the variances towards one.
For Litecoin and Ethereum, the variances appear to be closer to one for all TVP specifications except for the TTVP model (in the case of Litecoin). It is noteworthy that especially for Litecoin, constant parameter models with SV tend to either underestimate the predictive variance or fail to capture observations in the tail of the empirical distribution.
Finally, considering the persistence of reveals that most models tend to produce normalized errors which display muted persistence levels. This is corroborated by the final column of Table 2 which shows the persistence parameter obtained by estimating AR(1) models in along with its values.
5 Economic performance criteria: A simple trading exercise
To assess which model performs well in terms of economic performance criteria, we perform a trading exercise where each model is used to generate a set of optimal weights attached to each of the three cryptocurrencies considered. Using the models discussed in the previous sections as well as two additional investment strategies that are based on equal weights and a simple passive investments in Bitcoin allows us to infer whether constructing a trading strategy based on more sophisticated econometric models pays off in terms of generating superior returns.
We assume that investors adopt two strategies to find a optimal sequence of weights . The first one is the standard minimum variance portfolio problem that aims to allocate money between the three assets considered such that the portfolio variance is minimized. This implies that for the investor solves
(5.1)  
subject to 
where is a dimensional vector of ones and denotes the variance of model ’s onestepahead predictive density.
The second strategy adds a specific portfolio target return to the optimization problem in Eq. (missing), i.e.,
(5.2) 
Here we let denote the onestepahead predictive mean of model and is a potentially timevarying target return the investor wants to match. This strategy, called the target meanvariance portfolio, tries to minimize the overall portfolio variance while at the same time maintaining the desired return (see Markowitz, 1952).
MinVariance  Target meanvariance  

TTVP  2.379  2.900  2.923  2.978 
TVP  2.579  2.015  2.019  2.031 
TVP NG  2.510  2.069  2.053  1.995 
tTVP NG  2.365  2.452  2.465  2.498 
MinnVAR  2.066  0.313  0.243  0.004 
NGVAR  2.023  2.845  2.725  2.312 
SSVS  1.997  2.942  2.948  2.943 
RWSV  2.040  1.399  1.415  1.464 
ARSV  2.201  2.390  2.407  2.453 
Equal weights  2.528  2.528  2.528  2.528 
only BTC  2.419  2.419  2.419  2.419 
Table 3 shows annualized Sharpe ratios for the minimumvariance portfolio strategy as well as for the target meanvariance portfolio strategy for . Considering the performance of the minimum variance portfolio (see first column in Table 3) shows that performance differences across models appear to be relatively small. This indicates that weights generated by the set of econometric models are similar, and when compared to the other strategies, more stable over time. Inspection of the weights (not shown) also suggests that this strategy yields weights that are seldom above one in absolute values (i.e. leveraged long/short positions). The single best performing model is the noshrinkage TVP specification, closely followed by the TVP NG model. Notice that using simple equal weights also yields favorable risk/return ratios.
Considering the target meanvariance strategy for different target returns yields more heterogeneous model performances. The two best performing models are the TTVP model and the constant parameter VAR coupled with the SSVS prior. For the TVP VAR and the TVP NG model, we find that performance decreases when compared to the minimum variance portfolio strategy while for the proposed tTVP NG we observe increasing Sharpe ratios. Comparing different yields no discernible differences, with most models that do well for modest target returns also performing well if target returns become more ambitious.
Across strategies it is worth noting that performing a passive investment in Bitcoin only (i.e. setting the corresponding weight equal to one for all ) also works well but one could still improve upon that strategy by considering more flexible portfolio allocation strategies.
6 Conclusive remarks
In this paper we perform a systematic comparison of univariate and multivariate time series models in terms of predicting onedayahead returns for three cryptocurrencies, namely Bitcoin, Litecoin and Ethereum. To match the pronounced degree of volatility observed in daily returns of cryptocurrencies, we propose a mediumscale multivariate state space model that features heavytailed measurement errors and stochastic volatility, a feature that turns out to be advantageous for density predictions. More generally, we find that it pays off to allow for timevarying parameters and a flexible error distribution only if suitable shrinkage priors are introduced. These priors introduce shrinkage to select the subset of timevarying coefficients in a flexible manner. To gauge the economic significance of our findings we also perform a trading exercise. The results show that models which perform well in forecasting also tend to work well when used to guide investment decisions.
References
 Amisano and Geweke (2017) Amisano G and Geweke J (2017) Prediction using several macroeconomic models. Review of Economics and Statistics 99(5), 912–925
 Belmonte et al. (2014) Belmonte MA, Koop G and Korobilis D (2014) Hierarchical shrinkage in timevarying parameter models. Journal of Forecasting 33(1), 80–94
 Bitto and FrühwirthSchnatter (2016) Bitto A and FrühwirthSchnatter S (2016) Achieving shrinkage in a timevarying parameter model framework. arXiv preprint arXiv:1611.01310
 Böhme et al. (2015) Böhme R, Christin N, Edelman B and Moore T (2015) Bitcoin: Economics, technology, and governance. The Journal of Economic Perspectives 29(2), 213–238
 Carlin et al. (1992) Carlin BP, Polson NG and Stoffer DS (1992) A Monte Carlo approach to nonnormal and nonlinear statespace modeling. Journal of the American Statistical Association 87(418), 493–500
 Carriero et al. (2015) Carriero A, Clark TE and Marcellino M (2015) Large vector autoregressions with asymmetric priors. Technical report, Working Paper, School of Economics and Finance, Queen Mary University of London
 Carter and Kohn (1994) Carter CK and Kohn R (1994) On Gibbs sampling for state space models. Biometrika 81(3), 541–553
 Cheah and Fry (2015) Cheah ET and Fry J (2015) Speculative bubbles in Bitcoin markets? An empirical investigation into the fundamental value of Bitcoin. Economics Letters 130, 32–36
 Chiu et al. (2017) Chiu, ChingWai, Mumtaz, Haroon and Pintér G (2017) Forecasting with VAR models: Fat tails and stochastic volatility. International Journal of Forecasting 33(4), 1124–1143
 Chu et al. (2017) Chu J, Chan S, Nadarajah S and Osterrieder J (2017) GARCH Modelling of Cryptocurrencies. Journal of Risk and Financial Management 10(4), 17
 Clark (2011) Clark TE (2011) Realtime density forecasts from Bayesian vector autoregressions with stochastic volatility. Journal of Business & Economic Statistics 29(3)
 Clark and Ravazzolo (2015) Clark TE and Ravazzolo F (2015) Macroeconomic Forecasting Performance under Alternative Specifications of TimeVarying Volatility. Journal of Applied Econometrics 30(4), 551–575
 Cogley and Sargent (2005) Cogley T and Sargent TJ (2005) Drift and Volatilities: Monetary Policies and Outcomes in the Post WWII U.S. Review of Economic Dynamics 8(2), 262–302
 Diebold et al. (1998) Diebold FX, Gunther TA and Tay AS (1998) Evaluating density forecasts with applications to financial risk management. International Economic Review 39(4), 863
 Feldkircher et al. (2017) Feldkircher M, Huber F and Kastner G (2017) Sophisticated and small versus simple and sizeable: When does it pay off to introduce drifting coefficients in Bayesian VARs? arXiv preprint arXiv:1711.00564
 FrühwirthSchnatter (1994) FrühwirthSchnatter S (1994) Data augmentation and dynamic linear models. Journal of Time Series Analysis 15(2), 183–202
 FrühwirthSchnatter and Wagner (2010) FrühwirthSchnatter S and Wagner H (2010) Stochastic model specification search for Gaussian and partial nonGaussian state space models. Journal of Econometrics 154(1), 85–100
 Gallant et al. (1997) Gallant AR, Hsieh D and Tauchen G (1997) Estimation of stochastic volatility models with diagnostics. Journal of Econometrics 81(1), 159–192
 George et al. (2008) George EI, Sun D and Ni S (2008) Bayesian stochastic search for VAR model restrictions. Journal of Econometrics 142(1), 553–580
 Geweke (1994) Geweke J (1994) [Bayesian Analysis of Stochastic Volatility Models]: Comment. Journal of Business & Economic Statistics 12(4), 397–399
 Geweke and Amisano (2010) Geweke J and Amisano G (2010) Comparing and evaluating Bayesian predictive distributions of asset returns. International Journal of Forecasting 26(2), 216–230
 Geweke and Tanizaki (2001) Geweke J and Tanizaki H (2001) Bayesian estimation of statespace models using the MetropolisHastings algorithm within Gibbs sampling. Computational Statistics & Data Analysis 37(2), 151–170
 Giannone et al. (2015) Giannone D, Lenza M and Primiceri GE (2015) Prior selection for vector autoregressions. Review of Economics and Statistics 97(2), 436–451
 Gordon and Smith (1990) Gordon K and Smith A (1990) Modeling and monitoring biomedical time series. Journal of the American Statistical Association 85(410), 328–337
 Griffin et al. (2010) Griffin JE, Brown PJ et al. (2010) Inference with normalgamma prior distributions in regression problems. Bayesian Analysis 5(1), 171–188
 Harrison and Stevens (1976) Harrison PJ and Stevens CF (1976) Bayesian forecasting. Journal of the Royal Statistical Society. Series B (Methodological) , 205–247
 Huber and Feldkircher (2017) Huber F and Feldkircher M (2017) Adaptive shrinkage in Bayesian vector autoregressive models. Journal of Business & Economic Statistics , 1–13
 Huber et al. (2017) Huber F, Kastner G and Feldkircher M (2017) A New Approach Toward Detecting Structural Breaks in Vector Autoregressive Models. arXiv preprint arXiv:607.04532v3
 Jacquier et al. (2004) Jacquier E, Polson NG and Rossi PE (2004) Bayesian analysis of stochastic volatility models with fattails and correlated errors. Journal of Econometrics 122(1), 185–212
 Kastner (2015a) Kastner G (2015a) Dealing with stochastic volatility in time series using the R package stochvol. Journal of Statistical Software. URL http://cran. rproject. org/web/packages/stochvol/vignettes/article. pdf
 Kastner (2015b) Kastner G (2015b) Dealing with stochastic volatility in time series using the R package stochvol. Journal of Statistical Software forthcoming
 Kastner (2015c) Kastner G (2015c) Heavytailed innovations in the R package stochvol. ePubWU Institutional Repository
 Kastner and FrühwirthSchnatter (2014) Kastner G and FrühwirthSchnatter S (2014) Ancillaritysufficiency interweaving strategy (ASIS) for boosting MCMC estimation of stochastic volatility models. Computational Statistics & Data Analysis 76, 408–423
 Markowitz (1952) Markowitz H (1952) Portfolio Selection. The Journal of Finance 7(1), 77–91
 Primiceri (2005) Primiceri GE (2005) Time varying structural vector autoregressions and monetary policy. The Review of Economic Studies 72(3), 821–852
 Urquhart (2017) Urquhart A (2017) Price clustering in Bitcoin. Economics Letters 159, 145–148
 West (1987) West M (1987) On scale mixtures of normal distributions. Biometrika 74(3), 646–648