Bayesian analysis of predictive NonHomogeneous hidden Markov models using PólyaGamma data augmentation^{1}^{1}1The authors acknowledge financial support from the Research Funding Program Large ShocksAristeia II5413 which is cofinanced by the European Union (European Social Fund – ESF) and Greek national funds through the Operational Program ”Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF).
Abstract
We consider NonHomogeneous Hidden Markov Models (NHHMMs) for forecasting univariate time series. We introduce two state NHHMMs where the time series are modeled via different predictive regression models for each state. Also, the timevarying transition probabilities depend on exogenous variables through a logistic function. In a hidden Markov setting, inference for logistic regression coefficients becomes complicated and in some cases impossible due to convergence issues. To address this problem, we use a new latent variable scheme, that utilizes the PólyaGamma class of distributions, introduced by Polson et al. (2013). Given an available set of predictors, we allow for model uncertainty regarding the predictors that affect the series both linearly – in the mean – and nonlinearly – in the transition matrix. Predictor selection and inference on the model parameters are based on a MCMC scheme with reversible jump steps. Singlestep and multiplestepsahead predictions are obtained based on the most probable model, median probability model or a Bayesian Model Averaging (BMA) approach. Simulation experiments, as well as an empirical study on real financial data, illustrate the performance of our algorithm in various setups, in terms of mixing properties, model selection and predictive ability.
Keywords: Non Homogeneous Hidden Markov Models; Model selection; Bayesian model averaging; Forecasting; PólyaGamma Data Augmentation, Continuous Ranked Probability Score
1 Introduction
This paper follows and extends the work of Meligkotsidou and Dellaportas (2011) by constructing a broad class of discretetime, finite statespace, NonHomogeneous Hidden Markov Models (NHHMMs) that can be used for modeling and predicting univariate timeseries. We introduce a twostates NHHMM where the time series are modeled via different predictive regression models for each state, whereas the transition probabilities are modeled via logistic regressions. Given an available set of predictors we allow for model uncertainty, regarding the predictors that affect the series both linearly, that is directly in the mean regressions, and nonlinearly, that is in the transition matrix.
Discretetime finite statespace Homogeneous Hidden Markov models (HHMMs) have been extensively studied and widely used to model stochastic processes consisting of an observed process and a latent (hidden) sequence of states which is assumed to affect the observation sequence, see for example Cappé et al. (2005) and Billio et al. (1999). We refer to Scott (2002) and Rydén (2008) for a review of a Bayesian approach of HHMMs. Bayesian inference, using Markov chain Monte Carlo (MCMC) techniques, has enhanced the applicability of HHMMs and has led to the construction of more complex model specifications including NHHMMs. Initially Diebold et al. (1994) studied the twostate Gaussian NHMMs where the time varying transition probabilities were modeled via logistic functions and their approach was based on the ExpectationMaximization algorithm (EM). Filardo and Gordon (1998) adopted a Bayesian perspective to overcome technical and calculation issues of classical approaches. Since then, various Bayesian methods have been proposed in the literature. For example Spezia (2006) modeled the timevarying transition probabilities via a logistic function depending on exogenous variables and performed model selection based on the Bayes factor. In the same spirit, Meligkotsidou and Dellaportas (2011) considered a mstage NHMM and assumed that the elements of the transition matrix are linked through exogenous variables with a multinomial logistic link whereas the observed process conditional on the unobserved process follows an autoregressive process of order p.
This paper considers inference and variable selection for predictive NHHMMs which exploit a set of available predictors by allowing them to affect the transition probabilities and/or the statespecific predictive regressions. We perform Bayesian inference for the proposed models based on MCMC. Our MCMC scheme aims at overcoming difficulties and solving convergence issues arising with existing MCMC algorithms, thus offering an improved inferential procedure for estimation and variable selection in NHHMMs. We exploit the missing data representation of hidden Markov models and construct an MCMC algorithm based on data augmentation, consisting of several steps. First, the latent sequence of states is sampled via the Scaled ForwardBackward algorithm (Scott, 2002), which is a modification of the ForwardBackward algorithm of Baum et al. (1970) who used it to implement the classical EM algorithm. Then we simulate the parameters of the mean predictive regression model for each stage, via Gibbs sampling step and following Meligkotsidou and Dellaportas (2011) we use a logistic regression representation of the transition probabilities. Using the data augmentation scheme of Holmes and Held (2006) as Meligkotsidou and Dellaportas (2011), may result in serious convergence issues, especially in cases that there exists model uncertainty. FrühwirthSchnatter and Frühwirth (2010) give a detailed comparison between various methods for dealing with binary and multinomial logit data and argue about the efficiency of Holmes and Held (2006) data augmentation scheme. Exploring various options, to deal with this problem we decided to use the PólyaGamma data augmentation scheme of Polson et al. (2013) which has a significantly improved performance. The recent work of Holsclaw et al. (2017) confirms that using PólyaGamma data augmentation to parametrize the transition probabilities, the algorithm mixes well and gives adequate estimates of the parameters. Finally, we account for model uncertainty by using the wellknown model selection method of Green (1995), the reversible jump algorithm. We perform a couple of reversible jump steps within our algorithm since we allow for different covariates affecting the mean equation and the transition probabilities.
Regarding stochastic variable selection in the transition matrix of the NHHMM we are the first to perform a reversible jump algorithm using the PólyaGamma data augmentation representation for the logistic regression coefficients. Recently, Holsclaw et al. (2017) proposed a model similar to ours for modeling multivariate meteorological time series data. In that paper, the hidden states are modeled via multinomial logistic regression affected by exogenous variables. The authors use the BIC criterion for choosing the best model among some prefixed models.
Apart from developing a stable algorithm for inferring NHHMMs in the presence of model uncertainty, another main goal of our work is to apply our model to forecasting real financial data. Financial data sets are characterized by large volatilities due to high market uncertainties. We argue that the logarithmic scoring rule for evaluating different models and their predictive accuracy is not appropriate, since the data have multimodal and fattailed distributions. We propose the use of Continuous Ranked Probability Score as a better alternative for assessing the quality of forecasts as well as for validating the model performance.
The paper proceeds as follows. In Section 2 we briefly describe the proposed model and in Section 3 we present analytically the Bayesian inference for this model, specifically for the model with a fixed number of predictors (covariates), as well as for the model with unknown number of predictors. Then, in Section 4, we present the forecasting criteria we used to asses the predictive ability of our method. To check our methodology we performed extensive experiments with simulated data. We provide in Section 5 some indicative case studies. We illustrate our results regarding the variable selection and forecasting evaluation and we make comparisons with other simpler models such as the Homogeneous Hidden Markov Model and the regression model with autoregressive terms. Next, we apply our methodology in monthly realized volatility data set (Section 6). Finally a summary of our work and possible extensions are discussed in LABEL:Discussion.
2 The NonHomogeneous Hidden Markov Model
In this section, we present the proposed NonHomogeneous Hidden Markov Model (NHHMM) for modeling univariate time series. Consider an observed random process and a hidden underlying process which is a twostate nonhomogeneous discretetime Markov chain that determines the states of the observed process. Let and be the realizations of the observed random process and of the hidden process , respectively. We assume that at time , depends on the current state and not on the previous states. Consider also a set of available predictors with realization at time . A subset of the predictors is used in the regression model for the observed process and a subset is used to model/describe the dynamics of the timevarying transition probabilities. Thus, we allow the covariates to affect the observed process in a nonlinear fashion.
In the case of a univariate random process , the NHHMM can be written in the form
where and . We use to denote the normal distribution with mean and variance . In a less formal way, if represents the hidden states, the observed series given the unobserved process has the form
The dynamics of the unobserved process can be described by the timevarying transition probabilities, which depend on the predictors and are given by the following relationship
where is the vector of the logistic regression coefficients to be estimated and is the set of covariates that affect the transition probabilities. Note that for identifiability reasons we adopt the convention of setting, for each row of the transition matrix, one of the ’s to be a vector of zeros. The unknown quantities of the NHHMM are , that is the parameters in the mean predictive regression equation and the parameters in the logistic regression equation for the transition probabilities of the unobserved process , . Our model can easily be generalized into an mstate NHHMM using almost the same methodology as the one that we describe.
3 Bayesian Inference and Computational Strategy
This section presents the Bayesian approach to inference for the NonHomogeneous Hidden Markov model. The key steps in our proposed framework are as follow. First, for a given Hidden Markov model with timevarying probabilities, we construct a Markov chain which has as a stationary distribution the posterior distribution of the model parameters. Simulation of this Markov chain provides, after some burnin period and adequately many iterations, samples from the posterior distribution of interest; see, for details, Besag et al. (1995). Second, for a given set of competing models each including a different set of predictors in the mean regression and/or in the transition probabilities equation, we base our inference about the models on their posterior probabilities. Thus, we avoid the usual approach which considers the models separately and chooses the best model via significance tests or via model selection criteria (AIC, BIC, DIC, WBIC).
3.1 Inference for fixed sets of predictors
Below we provide detailed guidelines on how to estimate the parameters of a given NHHMM, i.e. for fixed sets of predictors and . The proposed approach to Bayesian inference on the parameters of the NHHMM is based on constructing a Markov chain Monte Carlo algorithm which updates, in turn, the mean regression parameters, the logistic regression coefficients, and the latent variables . Let be the history of the observed process, the sequence of states up to time , and let denote the normal probability density function of , and the initial distribution of . The joint likelihood function of the observed data, , and the unobserved sequence of states, , is given by
If a prior distribution is specified for the model parameters, then inference on all the unknown quantities in the model is based on their joint posterior distribution
For the parameters in the mean predictive regression equation, we use conjugate prior distributions, i.e.
where denotes the InvertedGamma distribution. To make inference about the logistic regression coefficients we use the auxiliary variables method of Polson et al. (2013) as described in Subsection 3.1.1. Given an auxiliary variable , a conjugate prior for the logistic regression coefficients , is multivariate normal distribution .
The joint likelihood of the observed data and the hidden states is
We use the notation for the number of times the chain was in state , that is , the with the indicator function.
The MCMC sampling scheme is constructed with recursive updates of (i) the latent variables given the current value of the model parameters by using the scaled Forwardbackward algorithm (Scott (2002)) (ii) the logistic regression coefficients by adopting the auxiliary variables method of Polson et al. (2013) given the sequence of states , and (iii) the mean regression coefficients conditional on by using the Gibbs sampling algorithm.
3.1.1 Simulation of the logistic regression coefficients
In a twostate NHHMM, as Meligkotsidou and Dellaportas (2011) observe, we can model the two diagonal elements of probability transition matrix by linking them to the set of covariates using a logistic link. However, the algorithm adopting the auxiliary representation of Holmes and Held (2006) in the method of Meligkotsidou and Dellaportas (2011) does not converge in some cases, especially if there is model uncertainty in transition probabilities. Many dataaugmentation or MetropolisHastings algorithms are proposed to model the logistic regression model, see for example O’Brien and Dunson (2004); Fussl et al. (2013); Polson et al. (2013). We follow Polson et al. (2013) since as shown in their work, using PólyaGamma data augmentation gives superior results, it terms of convergence and mixing, among all the competing models.
Given the unobserved (latent) data we define . In words is the number of times where the chain was at the same stage for two consecutive time periods. Then,
Polson et al. (2013) proved that binomial likelihoods (thus Bernoulli likelihoods in our simpler case) parametrized by log odds can be represented as mixtures of Gaussian distributions with respect to PólyaGamma distribution. The main result of Polson et al. (2013) is that letting be the density of a latent variable with for , the following integral identity holds for all :
where . Furthermore, the conditional distribution of is also PólyaGamma, . Using the previous result and setting as a set of latent variables the likelihood for each state is
Conditioning on , one can derive the proportion
Assuming as prior distributions and , simulation from the posterior distribution can be done iteratively in two steps:
where denotes the PólyaGamma distribution and .
3.1.2 Simulation of the mean equation parameters using the Gibbs algorithm
We used conditionally conjugate prior distributions on the parameters of the mean predictive distribution, i.e.
After some straightforward algebra we derive the conditional and the marginal posterior distribution for the state specific parameters and ,
with
3.2 Inference under model uncertainty
Here, we consider the full model comparison problem where the uncertainty about which predictors should be included in the mean regression model is taken into account together with the uncertainty about the predictors that should be included in the transition probability equation. The proposed model is flexible, since we do not decide a priori which covariates affect the observed or the unobserved process. Instead, we have a common pool of covariates and within the MCMC algorithm we gauge which covariates are included in subset affecting the mean predictive equation of the observed process, and which covariates are included in subset affecting the timevarying transition probabilities. It is worth mentioning that within the framework of the proposed model, a set of covariates implies that there are possible models, hence the model selection problem becomes complicated.
Different approaches have been used in the literature to cope with the model selection problem. The use of information criteria, such as Akaike’s Information Criterion (AIC, Akaike et al. (1973)), Bayesian Information Criterion (BIC) of Schwarz (1978) or Deviance Information Criterion (DIC, Spiegelhalter et al. (2002)) or Widely applicable Bayesian Information Criterion (WBIC, Watanabe (2013)), is another approach to variable selection.
We propose a probabilistic approach to inference, which is based on the calculation of the posterior distribution of different NHHMMs or equivalently on computing the posterior probabilities of different hidden Markov models. Posterior probabilities can be used either for selecting the most probable model (i.e. making inference using the model with the highest posterior probability), or for Bayesian model averaging (i.e. producing inferences averaged over different NHHMM). Barbieri and Berger (2004) argue that the optimal predictive model is not necessarily the model with highest posterior probability but the median probability model, which is defined as the model consisting of those covariates which have overall posterior probability of being included in the model (inclusion probability) greater or equal to 0.5. Our method allows us to calculate the posterior probability of the model as well as the probabilities of inclusion. To this end, we develop a Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm (Green (1995),Green and Hastie (2009)) which explores the model space by jumping between different hidden Markov models. A more applied tutorial based on the lines of Green (1995), can be found Waagepetersen and Sorensen (2001). A study for comparing variable selection methods is well presented in O’Hara and Sillanpää (2009) whilst Dellaportas et al. (2002) study the variable selection methods in the context of model choice.
As Green and Hastie (2009) noticed, reversible jump MCMC is in fact a MetropolisHastings algorithm, formulated to allow sampling from a distribution on a union of spaces of differing dimension and to permit statedependent choice of move type. Suppose that a prior is specified over models in a countable set and for each we are given a prior distribution along with a likelihood for data . The joint prior for and is and obviously the joint posterior distribution is
The standard formulation of the MetropolisHastings algorithm relies on the construction of a timereversible Markov chain, that satisfies the detailed balance condition. This condition means that the probability that the state of a chain is in a general set and moves to a general set is the same with the probability that the state is in the set and moves to set . Note that this is a simple way to ensure that the limiting distribution of the chain is the desired target distribution. Define as the state and state space of chain. At the current state we generate random numbers from a known joint density . The new state of chain is constructed by some suitable deterministic function such that . The inverse move is made by generating from a suitable known density and then using the inverse function of to move from to . In practice, the construction of proposal moves between different models is achieved via the concept of ’dimension matching’. Specifically, if the dimensions of and are and respectively, then . If the move from to is accepted with probability then, the reverse move is accepted with probability . Under this formulation the detailed balance condition is
If the transformation from to and its inverse are differentiable then the equality holds if
where is the Jacobian of the transformation. Thus, the acceptance probability , for moving is
Holsclaw et al. (2017) recently introduced NHHMMs using the Pólyagamma latent data method. As a model selection strategy, they use BIC values to choose the best model among a prespecified set of models. We apply a reversible jump algorithm to account model uncertainty. Due to the fact that in the proposed model there is uncertainty both in the mean predictive equation, as well as in the transition probability equation, we will have to perform two reversible jump steps within our MCMC algorithm. Let us now assume that in a reversible jump step we propose a move type from to . Let denote the probability that move is attempted at state and the probability of the reverse move attempted at state . We accept the proposed move with probability where
We adopt the proposal of Meligkotsidou and Dellaportas (2011) to implement the above algorithm. In each step, we choose to add or remove one covariate with probability and then we randomly choose which covariate we will add/remove. We propose a new value for the mean equation coefficients or for the regression equation coefficients from the full conditional posterior density, conditionally on the other coefficients, thus the Jacobian of transformation will be equal to unity. To be more specific, if we want to update the covariates in the mean equation, the proposal distribution is just the product of the two conditional posterior distributions for derived in Subsection 3.1.2 given and if we want to update the covariates that affect the transition matrix, the proposal distribution is the product of conditional normal distributions derived in Subsection 3.1.1 given for . With some straightforward matrix algebra, the acceptance probability for the mean equation is and the acceptance probability for the transition matrix is where
and
3.3 MCMC Sampling Scheme
In the next lines, we summarize the predictive MCMC algorithm that we have constructed for joint inference on model specification and model parameters.

Start with initial values of .

Calculate the probabilities of timevarying transition matrix for .

Run a Scaled ForwardBackward (Scott (2002)) algorithm to simulate the hidden states given the parameters of the model.

Simulate the parameters of the regression model for the mean via a Gibbs sampler method.

Simulate the coefficients using the Pólyagamma representation by Polson et al. (2013).

Do a double reversible jump algorithm to define which covariates will affect the transition matrix and which covariates will affect the mean regression model.

Make onestepahead predictions conditional on the simulated unknown quantities.
Repeat steps 36 until convergence and then repeat steps 37.
4 Bayesian Forecasting and Scoring rules
4.1 Onestepahead predictions
The proposed modeling and inferential approach is used for forecasting. The posterior predictive density can not be found in closed form, but it can instead be evaluated numerically. Given model , the predictive distribution of is
where
In practice we follow an iterative procedure within the MCMC algorithm to draw a sample from the posterior predictive distribution. At the th iteration of our algorithm, the algorithm chooses model . Also, the hidden states and the unknown parameters are simulated as described before. Let be the hidden state is at time . To make an onestepahead prediction (i.e. simulate ), we first simulate the hidden state for time , from the discrete distribution based on the transition probabilities and then conditional on the hidden state we draw a value from
Given , the hidden state and the covariates , we update the transition matrix , simulate and simulate the prediction from the predicting distribution, in sequence.
4.2 Forecasting criteria
One way to evaluate a model is through the accuracy of its forecasts (Geweke and Whiteman (2006)). As Gelman et al. (2014) observe, predictive accuracy is valued not only for its own sake but also for comparing different models. Advances in numerical integration via MCMC algorithms made probabilistic forecasts possible, which are in most cases, preferable. Besides, having the posterior predictive distribution, one can obtain a point effective forecasts, using suitable scoring functions (Gneiting (2011)).
Scoring rules provide summary measures for the evaluation of probabilistic forecasts, by assigning a numerical score based on the forecast and on the event or value that materializes. We refer to Gneiting and Raftery (2007) for a review on the theory and properties of scoring rules. If the forecaster quotes the predictive distribution and the event materializes, then his reward is . If is the forecaster judgment then is the expected value of his reward under . A scoring rule is said to be proper if and if the equality holds if and only if , then the rule is strictly proper. Intuitively, proper scoring rules encourage a forecaster to report the truth about his judgment distribution and they are strictly proper is the issued forecasts are the same with the forecaster’s judgment.
There are various proper scoring rules for continuous variables, such as the probabilistic score, the Linear Score (LinS), the Quadratic Score (QS), the Logarithmic Score (LogS) and the Continuous Ranked Probability Score (CRPS), among others (Machete (2013); Gneiting and Raftery (2007)). Probabilistic and Linear score are often used, nevertheless they are improper and can have misleading results. A widely used and quite powerful criterion is the Logarithmic Score. It is based on the posterior predictive density evaluated at the observed value and there is a strong connection with the classical KullbackLeibler divergence, which is one of the reasons for being studied quite intensively (see for example Gelman et al. (2014), GschlÃ¶Ãl and Czado (2007)). The Logarithmic Score is defined as follows: Let be the total sample size and the number of outofsample forecasts. For let be the real observed values of the forecasts, the estimated forecasts and the unknown quantities. Finally, let be the number of draws of the simulation. Using the notation for the posterior predictive density of the new data, for the distribution of the true model and for the likelihood of the data, the predictive fit for the th observation would be,
Having in mind though that the new data are unknown Gelman et al. (2014) defined the expected outofsample log predictive density for the th observation as
To compute the predictive density for estimated forecasts in practice, since is unknown, we shall calculate the
Although the Logarithmic Score is a strictly proper rule, it lacks robustness as it involves harsh penalty for low probability events and thus is sensitive to extreme cases (Boero et al. (2011)). Besides, comparing the entropies of the forecasts, Machete (2013) showed that LS prefers the forecast density that is less informative. As Gneiting and Raftery (2007) notice, measures which are not sensitive to distance give no credit for assigning high probabilities to values near but not identical to the one materializing. Sensitivity to distance seems desirable when predictive distributions tend to be multimodal, which is the case for our model. To deal with this, one could calculate the Continuous Ranked Probability Score which is based on the cumulative predictive distribution. Recently, Boero et al. (2011) made a comparison between LS, CRPS and QS and they argued that when density forecasts are collected in histogram format, then the ranked probability score has advantages over the other scoring rules. Note than when the forecast is deterministic, CRPS is reduced to the Mean Absolute Forecast Error and therefore it is easily interpretable and provides a direct way for comparing various deterministic and probabilistic forecasts.
The posterior predictive cumulative density function is defined as , where is the forecast of interest and y is the history of the predictive quantity. Using the same notation as in Logarithmic score, the CRPS for the real observed value is defined as,
where denotes a step function along the real line that attains the value 1 if and the value 0 otherwise, is the cumulative distribution of the real value and is the known Heaviside function (Hersbach (2000))
As we described above, at each iteration of the MCMC algorithm, we obtain a sample of length from the predictive distributions of . Hence we can evaluate both the posterior predictive probability and cumulative distribution function. Instead,there is an easier way of evaluating CRPS using the identity of SzÃ©kely and Rizzo (2005).
were are independent copies of a random variable with distribution function (see also GschlÃ¶Ãl and Czado (2007)).
Finally, for the sake of completeness we estimate the mean, mode and median of the predictive distribution and compare them using two well known point forecasting criteria, the Mean Square Forecast Error, and the Mean Absolute Forecast Error, . Based on the MCMC sample, the computed point forecasting criteria is just the average score of MSFE or MAFE for all the sample values.
5 Simulation Study
We have tested our method using a number of simulated data sets. We conducted a series of simulation experiments to validate the superiority of our method. We tested quite extensively our algorithms, accounting for model uncertainty or not, using different sample sizes and assigning various values to the parameters. In all our experiments our model outperforms the competing models in forecasting the observed process . We compared our model with a Homogeneous Hidden Markov model and a linear model with autoregression terms (qAR). Although considered standard, for the sake of completeness, we give the definitions of HHM and qAR models. In the Homogeneous Hidden Markov model, covariates affect the mean equation but the transition probability matrix is constant. That is,
In the linear regression with autocovariates, besides covariates , lagged values of are assumed to affect the mean equation. Specifically,
The data were generated either from a Homogeneous Hidden Markov model (HHMM) or from a NHHMM model with independent normal distributed covariates. We found that the mean equation parameters converge rapidly whereas the logistic regression coefficients needed some burn in period to converge. The hidden chain was well estimated. For each iteration we kept a replication of the hidden chain (thus the number of the hiddenchain replications is the same with the MCMC sample size) and compared it with the real simulated hidden chain, using a 10 loss function. Also the reversible jump algorithm behaved really well in all of our simulations. Even in the case where the data were homogeneous, no covariates were selected in the logistic regression equation, thus the transition probability matrix remained constant through time. Furthermore, in order to test the predictive ability of our model we kept outofsample observations and we calculated the CRPS for every observation. We confirm that our forecasts were more accurate than the forecasts of the competing models. A summary of our comparisons are presented and discussed in the following subsections.
5.1 The fixed model
Below, we present two experiments with simulated data using the fixed NHHMM. For all our experiments we used noninformative priors for the unknown parameters , that is , and finally . Inferences are based on an MCMC sample of 15000 iterations after 10000 burn in period. In the first experiment we used a sample of observations. From a common pool of covariates we chose to affect the mean equation and to affect the transition matrix. The mean equation parameters were and whereas the logistic regression coefficients where and for states respectively. Secondly, we run another experiment with sample size . We used 3 covariates affecting the mean equation and the transition matrix. The mean equation parameters were and whereas the logistic regression coefficients where and for states respectively. For both experiments, we kept 10 outofsample observations, we computed a sequence of onestepahead forecasts of the real observed process. For completenes we present in Section 7 various plots that confirm the estimation performance and convergence of our methodology, parameter estimates and summary statistics of these experiment studies (Tables 9 and 7) and a comparison of the three competing models in terms of forecasting (Tables 10 and 8).
5.2 Model selection
In this section we present two case studies (experiments 3 and 4). We compared outputs of three models (NHHMM, HHMM, qAR) with model uncertainty. We performed stochastic variable selection with reversible jump for all the competing models. As in Subsection 5.1we used the same non informative priors and burnin period. For the first experiment we simulated data of size . We used the same two covariates affecting the mean equation and transition matrix via the logistic regression. To check the variable selection ability of our method we included three more covariates in the common pool of predictors. Next, we present a second, more complicated, experiment. We used 3 covariates affecting the mean equation and 5 covariates affecting the transition probability matrix. In Tables 3 and 1 we show that the most probable model and the Median probability model is the true one for the two experiments. Additionally, as in the fixed model case studies we kept 10 outofsample observations for forecasting evaluation exercise and we confirmed that our model performed better than the other models in terms of CRPS, MAFE and MSFE (see Tables 4 and 2). Finally, for each MCMC iteration we kept a replicated chain of the hidden process and we compared it with the true simulated chain. Using the 01 Loss function we computed the mean misestimated states in each chain. On average from the chain with 1200 hidden states of the first experiment, our model failed to recognize 22 states per iteration and on average from the chain of 1500 hidden states of the second experiment we failed to recognize 8 states. We note that we did not encounter any label switching problems.
In the following pages we present the mean estimated probabilities (for being at the same stage two consecutive time periods, that is or ) of the transition matrix and the differences with the true simulated probabilities (Figure 7) for the first case study and in Figure 11 for the second case study. A realization of the observed process and hidden process as well as one realization of the simulated hidden process for the two case studies presented in Figures 12 and 8 respectively. Lastly for each experiment, the predictive distribution using a kernel density function is plotted, for both NHHM and HHM models, are presented in Figures 14 and 10 and the quantiles of the predictive distributions in Figures 13 and 9.
True model  

Included Covariates, Mean  
Included Covariates, Transition Matrix  
Total Covariates  
Highest Posterior Probability model  
Included Covariates, Mean  
Included Covariates, Transition Matrix  
Posterior probability  0.9958 
Median Probability model  
Included Covariates, Mean  
Included Covariates, Transition Matrix 
Probabilistic Forecasting Criterion  

NHHMM  HHMM  qAR  
0.2428  0.2735  6.0555  
0.7669  5.5975  6.2297  
0.9958  2.6073  6.0018  
2.3435  5.1726  6.6054  