Bayesian analysis of predictive Non-Homogeneous hidden Markov models using Pólya-Gamma data augmentation1footnote 11footnote 1The authors acknowledge financial support from the Research Funding Program Large Shocks-Aristeia II-5413 which is co-financed by the European Union (European Social Fund – ESF) and Greek national funds through the Operational Program ”Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF).

Bayesian analysis of predictive Non-Homogeneous hidden Markov models using Pólya-Gamma data augmentation111The authors acknowledge financial support from the Research Funding Program Large Shocks-Aristeia II-5413 which is co-financed by the European Union (European Social Fund – ESF) and Greek national funds through the Operational Program ”Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF).

Constandina Koki kokiconst@aueb.gr Department of Statistics, University of Economics and Bussiness Loukia Meligkotsidou meligots@math.uoa.gr Department of Mathematics, National and Kapodistrian University of Athens Ioannis Vrontos vrontos@aueb.gr Department of Statistics, University of Economics and Bussiness
Abstract

We consider Non-Homogeneous Hidden Markov Models (NHHMMs) for forecasting univariate time series. We introduce two state NHHMMs where the time series are modeled via different predictive regression models for each state. Also, the time-varying transition probabilities depend on exogenous variables through a logistic function. In a hidden Markov setting, inference for logistic regression coefficients becomes complicated and in some cases impossible due to convergence issues. To address this problem, we use a new latent variable scheme, that utilizes the Pólya-Gamma class of distributions, introduced by Polson et al. (2013). Given an available set of predictors, we allow for model uncertainty regarding the predictors that affect the series both linearly – in the mean – and non-linearly – in the transition matrix. Predictor selection and inference on the model parameters are based on a MCMC scheme with reversible jump steps. Single-step and multiple-steps-ahead predictions are obtained based on the most probable model, median probability model or a Bayesian Model Averaging (BMA) approach. Simulation experiments, as well as an empirical study on real financial data, illustrate the performance of our algorithm in various setups, in terms of mixing properties, model selection and predictive ability.

Keywords: Non Homogeneous Hidden Markov Models; Model selection; Bayesian model averaging; Forecasting; Pólya-Gamma Data Augmentation, Continuous Ranked Probability Score

1 Introduction

This paper follows and extends the work of Meligkotsidou and Dellaportas (2011) by constructing a broad class of discrete-time, finite state-space, Non-Homogeneous Hidden Markov Models (NHHMMs) that can be used for modeling and predicting univariate time-series. We introduce a two-states NHHMM where the time series are modeled via different predictive regression models for each state, whereas the transition probabilities are modeled via logistic regressions. Given an available set of predictors we allow for model uncertainty, regarding the predictors that affect the series both linearly, that is directly in the mean regressions, and non-linearly, that is in the transition matrix.

Discrete-time finite state-space Homogeneous Hidden Markov models (HHMMs) have been extensively studied and widely used to model stochastic processes consisting of an observed process and a latent (hidden) sequence of states which is assumed to affect the observation sequence, see for example Cappé et al. (2005) and Billio et al. (1999). We refer to Scott (2002) and Rydén (2008) for a review of a Bayesian approach of HHMMs. Bayesian inference, using Markov chain Monte Carlo (MCMC) techniques, has enhanced the applicability of HHMMs and has led to the construction of more complex model specifications including NHHMMs. Initially Diebold et al. (1994) studied the two-state Gaussian NHMMs where the time varying transition probabilities were modeled via logistic functions and their approach was based on the Expectation-Maximization algorithm (EM). Filardo and Gordon (1998) adopted a Bayesian perspective to overcome technical and calculation issues of classical approaches. Since then, various Bayesian methods have been proposed in the literature. For example Spezia (2006) modeled the time-varying transition probabilities via a logistic function depending on exogenous variables and performed model selection based on the Bayes factor. In the same spirit, Meligkotsidou and Dellaportas (2011) considered a m-stage NHMM and assumed that the elements of the transition matrix are linked through exogenous variables with a multinomial logistic link whereas the observed process conditional on the unobserved process follows an autoregressive process of order p.

This paper considers inference and variable selection for predictive NHHMMs which exploit a set of available predictors by allowing them to affect the transition probabilities and/or the state-specific predictive regressions. We perform Bayesian inference for the proposed models based on MCMC. Our MCMC scheme aims at overcoming difficulties and solving convergence issues arising with existing MCMC algorithms, thus offering an improved inferential procedure for estimation and variable selection in NHHMMs. We exploit the missing data representation of hidden Markov models and construct an MCMC algorithm based on data augmentation, consisting of several steps. First, the latent sequence of states is sampled via the Scaled Forward-Backward algorithm (Scott, 2002), which is a modification of the Forward-Backward algorithm of Baum et al. (1970) who used it to implement the classical EM algorithm. Then we simulate the parameters of the mean predictive regression model for each stage, via Gibbs sampling step and following Meligkotsidou and Dellaportas (2011) we use a logistic regression representation of the transition probabilities. Using the data augmentation scheme of Holmes and Held (2006) as Meligkotsidou and Dellaportas (2011), may result in serious convergence issues, especially in cases that there exists model uncertainty. Frühwirth-Schnatter and Frühwirth (2010) give a detailed comparison between various methods for dealing with binary and multinomial logit data and argue about the efficiency of Holmes and Held (2006) data augmentation scheme. Exploring various options, to deal with this problem we decided to use the Pólya-Gamma data augmentation scheme of Polson et al. (2013) which has a significantly improved performance. The recent work of Holsclaw et al. (2017) confirms that using Pólya-Gamma data augmentation to parametrize the transition probabilities, the algorithm mixes well and gives adequate estimates of the parameters. Finally, we account for model uncertainty by using the well-known model selection method of Green (1995), the reversible jump algorithm. We perform a couple of reversible jump steps within our algorithm since we allow for different covariates affecting the mean equation and the transition probabilities.

Regarding stochastic variable selection in the transition matrix of the NHHMM we are the first to perform a reversible jump algorithm using the Pólya-Gamma data augmentation representation for the logistic regression coefficients. Recently, Holsclaw et al. (2017) proposed a model similar to ours for modeling multivariate meteorological time series data. In that paper, the hidden states are modeled via multinomial logistic regression affected by exogenous variables. The authors use the BIC criterion for choosing the best model among some prefixed models.

Apart from developing a stable algorithm for inferring NHHMMs in the presence of model uncertainty, another main goal of our work is to apply our model to forecasting real financial data. Financial data sets are characterized by large volatilities due to high market uncertainties. We argue that the logarithmic scoring rule for evaluating different models and their predictive accuracy is not appropriate, since the data have multimodal and fat-tailed distributions. We propose the use of Continuous Ranked Probability Score as a better alternative for assessing the quality of forecasts as well as for validating the model performance.

The paper proceeds as follows. In Section 2 we briefly describe the proposed model and in Section 3 we present analytically the Bayesian inference for this model, specifically for the model with a fixed number of predictors (covariates), as well as for the model with unknown number of predictors. Then, in Section 4, we present the forecasting criteria we used to asses the predictive ability of our method. To check our methodology we performed extensive experiments with simulated data. We provide in Section 5 some indicative case studies. We illustrate our results regarding the variable selection and forecasting evaluation and we make comparisons with other simpler models such as the Homogeneous Hidden Markov Model and the regression model with autoregressive terms. Next, we apply our methodology in monthly realized volatility data set (Section 6). Finally a summary of our work and possible extensions are discussed in LABEL:Discussion.

2 The Non-Homogeneous Hidden Markov Model

In this section, we present the proposed Non-Homogeneous Hidden Markov Model (NHHMM) for modeling univariate time series. Consider an observed random process and a hidden underlying process which is a two-state non-homogeneous discrete-time Markov chain that determines the states of the observed process. Let and be the realizations of the observed random process and of the hidden process , respectively. We assume that at time , depends on the current state and not on the previous states. Consider also a set of available predictors with realization at time . A subset of the predictors is used in the regression model for the observed process and a subset is used to model/describe the dynamics of the time-varying transition probabilities. Thus, we allow the covariates to affect the observed process in a non-linear fashion.

In the case of a univariate random process , the NHHMM can be written in the form

where and . We use to denote the normal distribution with mean and variance . In a less formal way, if represents the hidden states, the observed series given the unobserved process has the form

The dynamics of the unobserved process can be described by the time-varying transition probabilities, which depend on the predictors and are given by the following relationship

where is the vector of the logistic regression coefficients to be estimated and is the set of covariates that affect the transition probabilities. Note that for identifiability reasons we adopt the convention of setting, for each row of the transition matrix, one of the ’s to be a vector of zeros. The unknown quantities of the NHHMM are , that is the parameters in the mean predictive regression equation and the parameters in the logistic regression equation for the transition probabilities of the unobserved process , . Our model can easily be generalized into an m-state NHHMM using almost the same methodology as the one that we describe.

3 Bayesian Inference and Computational Strategy

This section presents the Bayesian approach to inference for the Non-Homogeneous Hidden Markov model. The key steps in our proposed framework are as follow. First, for a given Hidden Markov model with time-varying probabilities, we construct a Markov chain which has as a stationary distribution the posterior distribution of the model parameters. Simulation of this Markov chain provides, after some burn-in period and adequately many iterations, samples from the posterior distribution of interest; see, for details, Besag et al. (1995). Second, for a given set of competing models each including a different set of predictors in the mean regression and/or in the transition probabilities equation, we base our inference about the models on their posterior probabilities. Thus, we avoid the usual approach which considers the models separately and chooses the best model via significance tests or via model selection criteria (AIC, BIC, DIC, WBIC).

3.1 Inference for fixed sets of predictors

Below we provide detailed guidelines on how to estimate the parameters of a given NHHMM, i.e. for fixed sets of predictors and . The proposed approach to Bayesian inference on the parameters of the NHHMM is based on constructing a Markov chain Monte Carlo algorithm which updates, in turn, the mean regression parameters, the logistic regression coefficients, and the latent variables . Let be the history of the observed process, the sequence of states up to time , and let denote the normal probability density function of , and the initial distribution of . The joint likelihood function of the observed data, , and the unobserved sequence of states, , is given by

If a prior distribution is specified for the model parameters, then inference on all the unknown quantities in the model is based on their joint posterior distribution

For the parameters in the mean predictive regression equation, we use conjugate prior distributions, i.e.

where denotes the Inverted-Gamma distribution. To make inference about the logistic regression coefficients we use the auxiliary variables method of Polson et al. (2013) as described in Subsection 3.1.1. Given an auxiliary variable , a conjugate prior for the logistic regression coefficients , is multivariate normal distribution .

The joint likelihood of the observed data and the hidden states is

We use the notation for the number of times the chain was in state , that is , the with the indicator function.

The MCMC sampling scheme is constructed with recursive updates of (i) the latent variables given the current value of the model parameters by using the scaled Forward-backward algorithm (Scott (2002)) (ii) the logistic regression coefficients by adopting the auxiliary variables method of Polson et al. (2013) given the sequence of states , and (iii) the mean regression coefficients conditional on by using the Gibbs sampling algorithm.

3.1.1 Simulation of the logistic regression coefficients

In a two-state NHHMM, as Meligkotsidou and Dellaportas (2011) observe, we can model the two diagonal elements of probability transition matrix by linking them to the set of covariates using a logistic link. However, the algorithm adopting the auxiliary representation of Holmes and Held (2006) in the method of Meligkotsidou and Dellaportas (2011) does not converge in some cases, especially if there is model uncertainty in transition probabilities. Many data-augmentation or Metropolis-Hastings algorithms are proposed to model the logistic regression model, see for example O’Brien and Dunson (2004); Fussl et al. (2013); Polson et al. (2013). We follow Polson et al. (2013) since as shown in their work, using Pólya-Gamma data augmentation gives superior results, it terms of convergence and mixing, among all the competing models.

Given the unobserved (latent) data we define . In words is the number of times where the chain was at the same stage for two consecutive time periods. Then,

Polson et al. (2013) proved that binomial likelihoods (thus Bernoulli likelihoods in our simpler case) parametrized by log odds can be represented as mixtures of Gaussian distributions with respect to Pólya-Gamma distribution. The main result of Polson et al. (2013) is that letting be the density of a latent variable with for , the following integral identity holds for all :

where . Furthermore, the conditional distribution of is also Pólya-Gamma, . Using the previous result and setting as a set of latent variables the likelihood for each state is

Conditioning on , one can derive the proportion

Assuming as prior distributions and , simulation from the posterior distribution can be done iteratively in two steps:

where denotes the Pólya-Gamma distribution and .

3.1.2 Simulation of the mean equation parameters using the Gibbs algorithm

We used conditionally conjugate prior distributions on the parameters of the mean predictive distribution, i.e.

After some straightforward algebra we derive the conditional and the marginal posterior distribution for the state specific parameters and ,

with

3.2 Inference under model uncertainty

Here, we consider the full model comparison problem where the uncertainty about which predictors should be included in the mean regression model is taken into account together with the uncertainty about the predictors that should be included in the transition probability equation. The proposed model is flexible, since we do not decide a priori which covariates affect the observed or the unobserved process. Instead, we have a common pool of covariates and within the MCMC algorithm we gauge which covariates are included in subset affecting the mean predictive equation of the observed process, and which covariates are included in subset affecting the time-varying transition probabilities. It is worth mentioning that within the framework of the proposed model, a set of covariates implies that there are possible models, hence the model selection problem becomes complicated.

Different approaches have been used in the literature to cope with the model selection problem. The use of information criteria, such as Akaike’s Information Criterion (AIC, Akaike et al. (1973)), Bayesian Information Criterion (BIC) of Schwarz (1978) or Deviance Information Criterion (DIC, Spiegelhalter et al. (2002)) or Widely applicable Bayesian Information Criterion (WBIC, Watanabe (2013)), is another approach to variable selection.

We propose a probabilistic approach to inference, which is based on the calculation of the posterior distribution of different NHHMMs or equivalently on computing the posterior probabilities of different hidden Markov models. Posterior probabilities can be used either for selecting the most probable model (i.e. making inference using the model with the highest posterior probability), or for Bayesian model averaging (i.e. producing inferences averaged over different NHHMM). Barbieri and Berger (2004) argue that the optimal predictive model is not necessarily the model with highest posterior probability but the median probability model, which is defined as the model consisting of those covariates which have overall posterior probability of being included in the model (inclusion probability) greater or equal to 0.5. Our method allows us to calculate the posterior probability of the model as well as the probabilities of inclusion. To this end, we develop a Reversible Jump Markov Chain Monte Carlo (RJMCMC) algorithm (Green (1995),Green and Hastie (2009)) which explores the model space by jumping between different hidden Markov models. A more applied tutorial based on the lines of Green (1995), can be found Waagepetersen and Sorensen (2001). A study for comparing variable selection methods is well presented in O’Hara and Sillanpää (2009) whilst Dellaportas et al. (2002) study the variable selection methods in the context of model choice.

As Green and Hastie (2009) noticed, reversible jump MCMC is in fact a Metropolis-Hastings algorithm, formulated to allow sampling from a distribution on a union of spaces of differing dimension and to permit state-dependent choice of move type. Suppose that a prior is specified over models in a countable set and for each we are given a prior distribution along with a likelihood for data . The joint prior for and is and obviously the joint posterior distribution is

The standard formulation of the Metropolis-Hastings algorithm relies on the construction of a time-reversible Markov chain, that satisfies the detailed balance condition. This condition means that the probability that the state of a chain is in a general set and moves to a general set is the same with the probability that the state is in the set and moves to set . Note that this is a simple way to ensure that the limiting distribution of the chain is the desired target distribution. Define as the state and state space of chain. At the current state we generate random numbers from a known joint density . The new state of chain is constructed by some suitable deterministic function such that . The inverse move is made by generating from a suitable known density and then using the inverse function of to move from to . In practice, the construction of proposal moves between different models is achieved via the concept of ’dimension matching’. Specifically, if the dimensions of and are and respectively, then . If the move from to is accepted with probability then, the reverse move is accepted with probability . Under this formulation the detailed balance condition is

If the transformation from to and its inverse are differentiable then the equality holds if

where is the Jacobian of the transformation. Thus, the acceptance probability , for moving is

Holsclaw et al. (2017) recently introduced NHHMMs using the Pólya-gamma latent data method. As a model selection strategy, they use BIC values to choose the best model among a prespecified set of models. We apply a reversible jump algorithm to account model uncertainty. Due to the fact that in the proposed model there is uncertainty both in the mean predictive equation, as well as in the transition probability equation, we will have to perform two reversible jump steps within our MCMC algorithm. Let us now assume that in a reversible jump step we propose a move type from to . Let denote the probability that move is attempted at state and the probability of the reverse move attempted at state . We accept the proposed move with probability where

We adopt the proposal of Meligkotsidou and Dellaportas (2011) to implement the above algorithm. In each step, we choose to add or remove one covariate with probability and then we randomly choose which covariate we will add/remove. We propose a new value for the mean equation coefficients or for the regression equation coefficients from the full conditional posterior density, conditionally on the other coefficients, thus the Jacobian of transformation will be equal to unity. To be more specific, if we want to update the covariates in the mean equation, the proposal distribution is just the product of the two conditional posterior distributions for derived in Subsection 3.1.2 given and if we want to update the covariates that affect the transition matrix, the proposal distribution is the product of conditional normal distributions derived in Subsection 3.1.1 given for . With some straightforward matrix algebra, the acceptance probability for the mean equation is and the acceptance probability for the transition matrix is where

and

3.3 MCMC Sampling Scheme

In the next lines, we summarize the predictive MCMC algorithm that we have constructed for joint inference on model specification and model parameters.

  1. Start with initial values of .

  2. Calculate the probabilities of time-varying transition matrix for .

  3. Run a Scaled Forward-Backward (Scott (2002)) algorithm to simulate the hidden states given the parameters of the model.

  4. Simulate the parameters of the regression model for the mean via a Gibbs sampler method.

  5. Simulate the coefficients using the Pólya-gamma representation by Polson et al. (2013).

  6. Do a double reversible jump algorithm to define which covariates will affect the transition matrix and which covariates will affect the mean regression model.

  7. Make one-step-ahead predictions conditional on the simulated unknown quantities.

Repeat steps 3-6 until convergence and then repeat steps 3-7.

4 Bayesian Forecasting and Scoring rules

4.1 One-step-ahead predictions

The proposed modeling and inferential approach is used for forecasting. The posterior predictive density can not be found in closed form, but it can instead be evaluated numerically. Given model , the predictive distribution of is

where

In practice we follow an iterative procedure within the MCMC algorithm to draw a sample from the posterior predictive distribution. At the -th iteration of our algorithm, the algorithm chooses model . Also, the hidden states and the unknown parameters are simulated as described before. Let be the hidden state is at time . To make an one-step-ahead prediction (i.e. simulate ), we first simulate the hidden state for time , from the discrete distribution based on the transition probabilities and then conditional on the hidden state we draw a value from

Given , the hidden state and the covariates , we update the transition matrix , simulate and simulate the prediction from the predicting distribution, in sequence.

4.2 Forecasting criteria

One way to evaluate a model is through the accuracy of its forecasts (Geweke and Whiteman (2006)). As Gelman et al. (2014) observe, predictive accuracy is valued not only for its own sake but also for comparing different models. Advances in numerical integration via MCMC algorithms made probabilistic forecasts possible, which are in most cases, preferable. Besides, having the posterior predictive distribution, one can obtain a point effective forecasts, using suitable scoring functions (Gneiting (2011)).

Scoring rules provide summary measures for the evaluation of probabilistic forecasts, by assigning a numerical score based on the forecast and on the event or value that materializes. We refer to Gneiting and Raftery (2007) for a review on the theory and properties of scoring rules. If the forecaster quotes the predictive distribution and the event materializes, then his reward is . If is the forecaster judgment then is the expected value of his reward under . A scoring rule is said to be proper if and if the equality holds if and only if , then the rule is strictly proper. Intuitively, proper scoring rules encourage a forecaster to report the truth about his judgment distribution and they are strictly proper is the issued forecasts are the same with the forecaster’s judgment.

There are various proper scoring rules for continuous variables, such as the probabilistic score, the Linear Score (LinS), the Quadratic Score (QS), the Logarithmic Score (LogS) and the Continuous Ranked Probability Score (CRPS), among others (Machete (2013); Gneiting and Raftery (2007)). Probabilistic and Linear score are often used, nevertheless they are improper and can have misleading results. A widely used and quite powerful criterion is the Logarithmic Score. It is based on the posterior predictive density evaluated at the observed value and there is a strong connection with the classical Kullback-Leibler divergence, which is one of the reasons for being studied quite intensively (see for example Gelman et al. (2014), Gschlößl and Czado (2007)). The Logarithmic Score is defined as follows: Let be the total sample size and the number of out-of-sample forecasts. For let be the real observed values of the forecasts, the estimated forecasts and the unknown quantities. Finally, let be the number of draws of the simulation. Using the notation for the posterior predictive density of the new data, for the distribution of the true model and for the likelihood of the data, the predictive fit for the -th observation would be,

Having in mind though that the new data are unknown Gelman et al. (2014) defined the expected out-of-sample log predictive density for the -th observation as

To compute the predictive density for estimated forecasts in practice, since is unknown, we shall calculate the

Although the Logarithmic Score is a strictly proper rule, it lacks robustness as it involves harsh penalty for low probability events and thus is sensitive to extreme cases (Boero et al. (2011)). Besides, comparing the entropies of the forecasts, Machete (2013) showed that LS prefers the forecast density that is less informative. As Gneiting and Raftery (2007) notice, measures which are not sensitive to distance give no credit for assigning high probabilities to values near but not identical to the one materializing. Sensitivity to distance seems desirable when predictive distributions tend to be multimodal, which is the case for our model. To deal with this, one could calculate the Continuous Ranked Probability Score which is based on the cumulative predictive distribution. Recently, Boero et al. (2011) made a comparison between LS, CRPS and QS and they argued that when density forecasts are collected in histogram format, then the ranked probability score has advantages over the other scoring rules. Note than when the forecast is deterministic, CRPS is reduced to the Mean Absolute Forecast Error and therefore it is easily interpretable and provides a direct way for comparing various deterministic and probabilistic forecasts.

The posterior predictive cumulative density function is defined as , where is the forecast of interest and y is the history of the predictive quantity. Using the same notation as in Logarithmic score, the CRPS for the real observed value is defined as,

where denotes a step function along the real line that attains the value 1 if and the value 0 otherwise, is the cumulative distribution of the real value and is the known Heaviside function (Hersbach (2000))

As we described above, at each iteration of the MCMC algorithm, we obtain a sample of length from the predictive distributions of . Hence we can evaluate both the posterior predictive probability and cumulative distribution function. Instead,there is an easier way of evaluating CRPS using the identity of Székely and Rizzo (2005).

were are independent copies of a random variable with distribution function (see also Gschlößl and Czado (2007)).

Finally, for the sake of completeness we estimate the mean, mode and median of the predictive distribution and compare them using two well known point forecasting criteria, the Mean Square Forecast Error, and the Mean Absolute Forecast Error, . Based on the MCMC sample, the computed point forecasting criteria is just the average score of MSFE or MAFE for all the sample values.

5 Simulation Study

We have tested our method using a number of simulated data sets. We conducted a series of simulation experiments to validate the superiority of our method. We tested quite extensively our algorithms, accounting for model uncertainty or not, using different sample sizes and assigning various values to the parameters. In all our experiments our model outperforms the competing models in forecasting the observed process . We compared our model with a Homogeneous Hidden Markov model and a linear model with autoregression terms (qAR). Although considered standard, for the sake of completeness, we give the definitions of HHM and qAR models. In the Homogeneous Hidden Markov model, covariates affect the mean equation but the transition probability matrix is constant. That is,

In the linear regression with autocovariates, besides covariates , lagged values of are assumed to affect the mean equation. Specifically,

The data were generated either from a Homogeneous Hidden Markov model (HHMM) or from a NHHMM model with independent normal distributed covariates. We found that the mean equation parameters converge rapidly whereas the logistic regression coefficients needed some burn in period to converge. The hidden chain was well estimated. For each iteration we kept a replication of the hidden chain (thus the number of the hidden-chain replications is the same with the MCMC sample size) and compared it with the real simulated hidden chain, using a 1-0 loss function. Also the reversible jump algorithm behaved really well in all of our simulations. Even in the case where the data were homogeneous, no covariates were selected in the logistic regression equation, thus the transition probability matrix remained constant through time. Furthermore, in order to test the predictive ability of our model we kept out-of-sample observations and we calculated the CRPS for every observation. We confirm that our forecasts were more accurate than the forecasts of the competing models. A summary of our comparisons are presented and discussed in the following subsections.

5.1 The fixed model

Below, we present two experiments with simulated data using the fixed NHHMM. For all our experiments we used non-informative priors for the unknown parameters , that is , and finally . Inferences are based on an MCMC sample of 15000 iterations after 10000 burn in period. In the first experiment we used a sample of observations. From a common pool of covariates we chose to affect the mean equation and to affect the transition matrix. The mean equation parameters were and whereas the logistic regression coefficients where and for states respectively. Secondly, we run another experiment with sample size . We used 3 covariates affecting the mean equation and the transition matrix. The mean equation parameters were and whereas the logistic regression coefficients where and for states respectively. For both experiments, we kept 10 out-of-sample observations, we computed a sequence of one-step-ahead forecasts of the real observed process. For completenes we present in Section 7 various plots that confirm the estimation performance and convergence of our methodology, parameter estimates and summary statistics of these experiment studies (Tables 9 and 7) and a comparison of the three competing models in terms of forecasting (Tables 10 and 8).

Figure 1: Experiment 1: The upper panel shows a comparison between the transition probabilities . True probability values are marked with red dots whereas the posterior mean estimated transition probabilities are marked with blue crosses. Lower panel: The difference between real and mean estimated probabilities.
Figure 2: Forecasts of experiment 1. Empirical continuous approximation of the posterior predictive distribution (based on a normal kernel function) using the NHHMM model (blue continuous line) and the HHMM (red dotted line). True out-of-sample values are marked with yellow asterisk.
Figure 3: Experiment 1: This figure presents the quantiles of the predictive distribution of the studied model.. Blue dashed line presents the real observed out-of-sample values of interest. Gray bold line represents the median of the predictive distribution. The gray area around the median is defined by the and quantiles. Red circles represent the median values of the predictive distribution using the HHMM model.
Figure 4: Experiment 2: The upper panel shows a comparison between the transition probabilities and . True probability values are marked with red dots whereas the posterior mean estimated transition probabilities are marked with blue crosses. Lower panel: The difference between real and mean estimated probabilities.
Figure 5: Forecasts of experiment 2. Empirical continuous approximation of the posterior predictive distribution (based on a normal kernel function) using the NHHMM model (blue continuous line) and the HHMM (red dotted line). True out-of-sample values are marked with yellow asterisk.
Figure 6: This figure presents the quantiles of the predictive distribution of the studied model for the second experiment. Blue dashed line presents the real observed out-of-sample values of interest. Gray bold line represents the median of the predictive distribution. The gray area around the median is defined by the and quantiles. Red circles represent the median values of the predictive distribution using the HHMM model.

5.2 Model selection

In this section we present two case studies (experiments 3 and 4). We compared outputs of three models (NHHMM, HHMM, qAR) with model uncertainty. We performed stochastic variable selection with reversible jump for all the competing models. As in Subsection 5.1we used the same non informative priors and burn-in period. For the first experiment we simulated data of size . We used the same two covariates affecting the mean equation and transition matrix via the logistic regression. To check the variable selection ability of our method we included three more covariates in the common pool of predictors. Next, we present a second, more complicated, experiment. We used 3 covariates affecting the mean equation and 5 covariates affecting the transition probability matrix. In Tables 3 and 1 we show that the most probable model and the Median probability model is the true one for the two experiments. Additionally, as in the fixed model case studies we kept 10 out-of-sample observations for forecasting evaluation exercise and we confirmed that our model performed better than the other models in terms of CRPS, MAFE and MSFE (see Tables 4 and 2). Finally, for each MCMC iteration we kept a replicated chain of the hidden process and we compared it with the true simulated chain. Using the 0-1 Loss function we computed the mean mis-estimated states in each chain. On average from the chain with 1200 hidden states of the first experiment, our model failed to recognize 22 states per iteration and on average from the chain of 1500 hidden states of the second experiment we failed to recognize 8 states. We note that we did not encounter any label switching problems.

In the following pages we present the mean estimated probabilities (for being at the same stage two consecutive time periods, that is or ) of the transition matrix and the differences with the true simulated probabilities (Figure 7) for the first case study and in Figure 11 for the second case study. A realization of the observed process and hidden process as well as one realization of the simulated hidden process for the two case studies presented in Figures 12 and 8 respectively. Lastly for each experiment, the predictive distribution using a kernel density function is plotted, for both NHHM and HHM models, are presented in Figures 14 and 10 and the quantiles of the predictive distributions in Figures 13 and 9.

Figure 7: Experiment 3: True simulated transition probabilities vs mean estimated transition probabilities. Upper panel: True probabilities are marked with blue crosses and mean estimated probabilities are estimated with red dots. Lower panel: The difference between mean estimated probabilities for one estimated realized hidden process and the true simulated probabilities.
Figure 8: Observed process (black dotted line) and hidden process. True hidden states are marked with blue x and a realized simulated states are marked with red dots.
True model
Included Covariates, Mean
Included Covariates, Transition Matrix
Total Covariates
Highest Posterior Probability model
Included Covariates, Mean
Included Covariates, Transition Matrix
Posterior probability 0.9958
Median Probability model
Included Covariates, Mean
Included Covariates, Transition Matrix
Table 1: Experiment 3: Highest posterior probability model and median probability model.
Figure 9: Experiment 3: Empirical posterior predictive distribution for the 10 out-of-sample forecasts using the NHHMM (blue line) and the (red dotted line). Yellow asterisk represents the true simulated values.
Figure 10: Experiment 3: Quantiles of the predictive distribution. Ten one-step-look-ahead observed values (as taken from the sample) are marked with dashed blue line (and grey dots). Gray bold line represents the median of the predictive distribution. The gray area around the median is defined by the and quantiles. Red circles represent the median values of the predictive distribution using the HHMM model.
Probabilistic Forecasting Criterion
NHHMM HHMM qAR
-0.2428 -0.2735 -6.0555
-0.7669 -5.5975 -6.2297
-0.9958 -2.6073 -6.0018
-2.3435 -5.1726 -6.6054