Nonparametric Dynamic State Space Modeling of Observed Circular Time Series with Circular Latent States: A Bayesian Perspective

Nonparametric Dynamic State Space Modeling of Observed Circular Time Series with Circular Latent States: A Bayesian Perspective

Satyaki Mazumder111Indian Institute of Science Education and Research Kolkata and Sourabh Bhattacharya222Indian Statistical Institute, Kolkata.
Abstract

Circular time series has received relatively little attention in statistics and modeling complex circular time series using the state space approach is non-existent in the literature. In this article we introduce a flexible Bayesian nonparametric approach to state space modeling of observed circular time series where even the latent states are circular random variables. Crucially, we assume that the forms of the observational and evolutionary functions, both of which are circular in nature, are unknown and time-varying. We model these unknown circular functions by appropriate wrapped Gaussian processes having desirable properties.

We develop an effective Markov chain Monte Carlo strategy for implementing our Bayesian model by judiciously combining Gibbs sampling and Metropolis-Hastings methods. Validation of our ideas with a simulation study and two real bivariate circular time series data sets, where we assume one of the variables to be unobserved, revealed very encouraging performance of our model and methods.

We finally analyse a data consisting of directions of whale migration, considering the unobserved ocean current direction as the latent circular process of interest. The results that we obtain are encouraging, and the posterior predictive distribution of the observed process correctly predicts the observed whale movement.
Keywords: Circular time series; Latent circular process; Look-up table; Markov Chain Monte Carlo; State-space model; Wrapped Gaussian process.

Contents

1 Introduction

In contrast with the traditional time series where the variables take values on the real line, circular time series with angular variables have received much less attention in the statistical literature. Some of the relatively scarce undertakings on circular time series, all in the classical statistical framework, are Breckling (1989), Fisher and Lee (1994), Holzmann et al. (2006), Hughes (2007) and Di Marzio et al. (2012). The classical approach, however, is limited to models with relatively simple dependence structures, such as the -th order Markov dependence, to enable inference. Far more flexible and realistic models can be accommodated within the Bayesian paradigm without any compromise; the inference can be carried out arbitrarily accurately using powerful Markov chain Monte Carlo (MCMC) methods. In this article, we propose and develop a novel and highly flexible approach to modeling circular time series using a Bayesian nonparametric state space approach, and an effective MCMC methodology for Bayesian inference.

Indeed, versatility of state space structures in modeling complex time series is well-appreciated in various scientific disciplines such as statistics, engineering, economics, biology, ecology, finance, and most of the existing time series models admit the state space representation. Notably, state space models consist of an observational equation for the observed part of the time series and an evolutionary equation that accounts for the latent, unobserved process that is presumed to affect the observed process. In this work, we shall concern ourselves with Bayesian state space modeling of complex circular time series data, where the latent process is also circular in nature, and where the observational and evolutionary functions are time-varying, but unknown. We model both the unknown functions nonparametrically using appropriate wrapped Gaussian process priors developed by Mazumder and Bhattacharya (2016a). For our state space model, the circular observations are conditionally independent given the latent states and the observational function, and the latent states have the Markov property conditionally on the evolutionary function. Unconditionally, however, the latent circular states are realizations of a complex, intractable stochastic process possessing a complex, intractable dependence structure in a way that all the latent states depend upon each other. The observed circular process is also intractable unconditionally, with a dependence structure that makes all the observational variables interdependent. However, the attractive properties of the wrapped Gaussian process prior ensure that the dependence structure, although complex, is very reasonable. Thus, our Bayesian state space modeling approach frees the circular time series paradigm of the limitations of the classical approach, while the MCMC procedure we propose, takes care of the inference part.

It is worth noting that Mazumder and Bhattacharya (2016a) developed a Bayesian nonparametric state space approach to modeling observed linear time series, assuming circular latent process. The obvious difference of our current work with theirs is that in our case, even the observed time series is circular. This circular extension of the observed process, however, is not insignificant, and throws much difficult challenges compared to the approach of Mazumder and Bhattacharya (2016a). In particular, because of our Gaussian process approach, the joint distribution of the observed data conditional on the latent states and the other parameters, is a product of wrapped normal distributions, for which closed forms are not available, and as many discrete auxiliary variables as the data size are needed to be introduced to express the likelihood conditional on the auxiliary variables. To make things harder, these wrapped normal distributions, even conditionally on the auxiliary variables, involve the unknown observational function in a way that it can not be integrated out as in Mazumder and Bhattacharya (2016a). Indeed, thanks to the linear nature of the observed process, Mazumder and Bhattacharya (2016a) could marginalize over the unknown observational function to arrive at a multivariate normal distribution of the observed data conditionally on the latent states and other parameters.

Combination of our involved conditional distribution of the observed data with the heavily involved form of the joint distribution of the state variables yields a structure that is way beyond the scope of Mazumder and Bhattacharya (2016a). It is thus clear that in comparison with Mazumder and Bhattacharya (2016a), a substantially different and involved MCMC method is also required for the implementation of our approach.

Once we build the necessary Bayesian nonparametric model and methods, we evaluate them on a simulated data generated from a state space model with specific circular observational and circular evolutionary equations, but fitted with our proposed model and methods. We also consider two real data sets for validation of our ideas. In one case, wind directions at am and pm on each of 21 days, are recorded. Pretending that the wind directions at am are unobserved, we obtain the posteriors of these circular latent states and compare it with the actual observed values. The second data set for our validation purpose consists of spawning time of a particular fish and the low tide times, both measured on the 24-hour clock, for 86 days. In this case, we assume that the low tide times are unobserved, and fit our Bayesian model, viewing the times as circular variables. In all the three cases, we obtained excellent predictions of the latent circular states and excellent forecasts of both latent and observed states.

We then apply our ideas to a real data set consisting of only the directions associated with movements of a whale in course of its migration. The directions of ocean current, which are presumed to influence the movements of the whale, are not observed, and hence, we consider them to be the latent circular states of our interest. Our Bayesian model and methods yielded quite sensible results on the latent states and correct prediction of the whale movement.

The rest of this article is structured as follows. In Section 2 we introduce our Bayesian state space model with circular observed data and circular latent states, discussing our wrapped Gaussian process prior and explicitly providing the form of the Bayesian hierarchical model, and the forms of the priors on the parameters. In Section 3 we briefly discuss the main technical issues distinguishing our MCMC method with that of Mazumder and Bhattacharya (2016a), showing that our present MCMC methodology is far more involved. The details of our MCMC procedure are provided in the supplement Mazumder and Bhattacharya (2016b). We then validate our model and methods with a simulation study in Section 4 and two real data sets on pairs of wind directions and spawning/low tide times in Section 5. We apply our ideas to the whale data in Section 6, obtaining the posteriors of the ocean current directions and forecasting whale movement. Finally, we summarize our work and provide concluding remarks in Section 7.

2 Dynamic state space model with circular latent and observed states

Our proposed state space model is as follows: For ,

(1)
(2)

where is the observed circular time series; are the latent circular states; and are the unknown evolutionary functions with values on the circular manifold. In equations (1) and (2), stands for the operation. Note that

(3)
and
(4)

where and are the linear counterparts of and , that is, is the linear random variable such that and similarly, . For convenience, we shall frequently use the representations (3) and (4). For more details on such a representation the reader is referred to Mazumder and Bhattacharya (2016a). Indeed, to derive the distributions of and , we shall first obtain the distributions of the linear random variables and and then apply the operation to these variables to derive the distribution of the circular variables and .

Note that both the observational and evolutionary functions have the linear argument and the angular argument . The linear argument guarantees that the functions are time-varying, that is, the functions are allowed to freely evolve with time.

2.1 Wrapped Gaussian process representations of the observational and evolutionary functions

We now provide a brief discussion on modeling the unknown circular functions and . Mazumder and Bhattacharya (2016a) have constructed a wrapped Gaussian process by convolving a suitable kernel with standard Wiener process as a function of time and directions. The created wrapped Gaussian process has the desirable properties; for example, the covariance becomes 0 whenever the difference between the angles becomes , implying orthogonality of the directions, irrespective of the value of the time difference. Similarly, as the time difference goes to , the covariance converges to 0. Moreover, the desired continuity and smoothness properties do hold for the constructed Gaussian process (the proofs are provided in the supplement of Mazumder and Bhattacharya (2016a)). While constructing such a wrapped Gaussian process the basic idea was to first construct a Gaussian process as a function of time and angle and then apply the mod operation. Here also, to model and , we first model and , the linear parts of and , using the Gaussian process, created in Mazumder and Bhattacharya (2016a). The mean functions of and are of the forms = and = , where = ; here is an angular quantity and , are parameters in . For any fixed and , where are linear quantities and are angular quantities, the covariance structures are given by = and = , where and are positive, real valued parameters. For details, see Mazumder and Bhattacharya (2016a). After modeling and by the aforementioned Gaussian process, we derive the wrapped Gaussian distribution of and using the mod operation on the distributions of and , respectively.

2.2 Bayesian hierarchical structure of our proposed model

Our modeling approach can be described succinctly by the following hierarchical Bayesian representation:

(5)
(6)
(7)
(8)
(9)

where and In the above, GP stands for “Gaussian Process”.

For obtaining the joint distribution of the latent circular state variables, we utilize the “look-up” table approach (see Bhattacharya (2007)) along the same line as Mazumder and Bhattacharya (2016a). The idea and the complete details of the look-up table approach in the context of circular latent variable is discussed in Mazumder and Bhattacharya (2016a), and thereby we skip the details in this current article. In the next section we provide the forms of the prior distributions of the parameters associated with the above hierarchical structure.

2.3 Prior specifications

Our prior distributions have the following forms.

(10)
(11)
(12)
(13)
(14)
(15)
(16)

Here, by , we mean the 4 variate normal distribution. We discuss the choice of prior parameters in Sections 4, 5 and 6, in the contexts of applications to simulated and real data sets.

3 MCMC-based inference

The aim of our MCMC-based inference is to address the problem of forecasting and to learn about the latent circular states, given the observed data set . Unlike Mazumder and Bhattacharya (2016a) here is circular and hence dealing with the conditional distribution of given the data is rendered far more difficult. As already remarked in Section 1, the distribution of the observed variables is circular normal, and does not admit any closed form expression. To deal with such a situation one needs to bring in the auxiliary variables = , where is the linear part of the circular variable and denotes the largest integer not exceeding . The wrapped random variable can take values in the set , where denotes the set of the integers. For details on wrapped variable in the context of circular data, see, for example, Ravindran and Ghosh (2011); see also Mazumder and Bhattacharya (2016a). However, even conditionally on the auxiliary variables, can not be marginalized out of the model to simplify the proceedings, so that our MCMC method must include simulation of , in addition to the auxiliary variables. In contrast, Mazumder and Bhattacharya (2016a) was able to work with a much simpler form thanks to joint conditional normality of their linear observed data, which also enabled them to integrate out the unknown observational function, further simplifying their proceedings.

In our case, the posterior distribution of given admits the following form after introducing the auxiliary variables :

(17)

Therefore, once we have a sample from the joint posterior , we can generate an observation from by simply generating an observation from . Further, we note that given a sample from , has a wrapped normal distribution with parameter , because

Therefore, we need to obtain a sample from . Define = = . Note that given , follows a -variate normal distribution with mean = , and covariance matrix , where and = . Therefore, given and , follows a -variate normal with mean and covariance , where and = . Moreover, it is immediate that , where and with

By introducing the auxiliary variables , the wrapped variables for latent circular variables , defined as = , and the set of linear-circular grid points (defined below), we can write

(18)

We note that = , and define , where, for , is a set of linear-circular grid-points with and , for . This grid, which we denote by , is associated with the look-up table; see Mazumder and Bhattacharya (2016a) for details. Hence, from (3), we obtain

(19)

In (3) and (3) the identity is used.

The complete details of our MCMC algorithm is presented in the supplement.

4 Simulation study

4.1 True model

We now illustrate the performance of our model and methodologies using a simulation study, considering the following dynamic nonlinear model:

for with . In the above, and are normally distributed with means zero and variances and . We choose the values of , and to be , and , respectively, and fix the values of both and at . The first observations of are assumed to be known, and the last observation is set aside for the purpose of forecasting.

The true model can be thought of as the circularized version of a slightly generalized non-linear state-space model on the real line adopted in Example 3.2 of Carlin et al. (1992). A variant of such a nonlinear model is also considered in Ghosh et al. (2014) and in Mazumder and Bhattacharya (2016a).

4.2 Choices of prior parameters and the grid

For this simulation experiment we have chosen the four-variate normal prior distribution for with mean and covariance matrix as identity matrix. The prior mean for is taken to be . The choice of these prior parameters facilitated adequate mixing of our MCMC algorithm. As in Mazumder and Bhattacharya (2016a), we throughout fixed the third and fourth components of to be to tackle identifiability problems. Hence, we take the covariance matrix in the prior distribution of as a diagonal matrix with diagonal elements . Let us heuristically explain the identifiability problem and why fixing the values of the last two components of is expected to solve the problem. First note that both the observational and evolutionary equations are non-identifiable since the last two components of the unknown parameter are multiplied with the random quantities and in the observational equation and the last two components of the unknown parameter are multiplied with the random quantities and in the evolutionary equation. In other words, multiplying the last two components of the two parameter vectors with arbitrary non-zero constants and dividing the multiplicative random quantities with the same constant, does not change the equations, implying non-identifiability. In the case of the observational equation, the identifiability problem is less severe considering the fact that least are observed (known), and so, for , the quantity is identifiable and estimable from the observed data, although itself is not identifiable. However, putting a prior on which assigns low probabilities to undesirable regions, essentially solves the identifiability problem of . As is evident from the simulation based posterior inference, our prior on indeed ensures that the posteriors of all the components of are unimodal, for all the simulated and real examples that we reported, essentially indicating identifiability. However, for the evolutionary equation, the identifiability problem is more severe. This is because, unlike , which are observed, are unobserved, and so, the quantity can not be directly estimated using , and does not admit straightforward identifiability. This makes the task of choosing appropriate “identifiability-ensuring” prior on infeasible. The only solution available to us, is to fix the last two components of , which of course solves the identifiability problem.

Like Mazumder and Bhattacharya (2016a), in our current paper impropriety of the posteriors of , , , , and result when they all permitted to be random. To be precise, for any value of (or ), exactly the same of circular variable (or ) is obtained by the mod transformation applied to = (or = ). Therefore, given (or ), it is not possible to constrain (or ) unless both and (or and ) are bounded. Boundedness of and (or and ) would ensure finite variance of (or ), which in turn would imply finite variability of (or ). Since it is not immediate how to select a bounded prior for , , and we obtain the maximum likelihood estimates (MLEs) of these variances and plug those values in our model. For obtaining the MLEs we used the simulated annealing technique (see, for instance, Robert and Casella (2004), Liu (2001)) where at each iteration new values of these variances are considered, then all the other parameters are integrated out using average of Monte Carlo simulations, given the proposed values of , , and , so that the integrated likelihood is obtained given the proposed variances. Then we calculate the acceptance ratio, and finally decrease the temperature parameter of our simulated annealing algorithm before proceeding to the next iteration. The values of the MLEs turned out to be , , and .

As regards the von-Mises prior on , we throughout set , the mean of the interval and , which ensures moderately large variability.

Lastly, we divide the interval into sub-intervals of the forms , , and choose a random number uniformly for each of the sub-intervals; these values constitute the circular second component of the two dimensional grid . For the linear first component of , we choose a random number uniformly from each of the 100 subintervals , .

4.3 MCMC details

Our MCMC algorithm updates some parameters using Gibbs steps and the remaining ones using Metropolis-Hastings steps; see the supplement for details. We update using von-Mises distribution with the mean being the accepted value of