Spectral Density-Based and Measure-Preserving ABC
for partially observed diffusion processes
An illustration on Hamiltonian SDEs
Approximate Bayesian Computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling time dependent, real world phenomena with underlying random effects. When applying ABC to stochastic processes, two major difficulties arise. First, different realisations from the output process with the same choice of parameters may show a large variability due to the stochasticity of the model. Second, exact simulation schemes are rarely available, requiring the derivation of suitable numerical methods for the synthetic data generation. To reduce the randomness in the data coming from the SDE, we propose to build up the statistical method (e.g., the choice of the summary statistics) on the underlying structural properties of the model. Here, we focus on the existence of an invariant measure and we map the data to their estimated invariant density and invariant spectral density. Then, to ensure that the model properties are kept in the synthetic data generation, we adopt a structure-preserving numerical scheme. The derived property-based and measure-preserving ABC method is illustrated on the broad class of partially observed Hamiltonian SDEs, both with simulated data and with real electroencephalography (EEG) data. The proposed ABC method can be directly applied to all SDEs that are characterised by an invariant distribution and for which a measure-preserving numerical method can be derived.
Approximate Bayesian Computation, Likelihood-free inference, Stochastic differential equations, Structure-preserving numerical schemes, Invariant measure, Neural mass models
This research was partially supported by the Austrian Science Fund (FWF): W1214-N15, project DK14.
Over the last decades, stochastic differential equations (SDEs) have become an established and powerful tool for modelling time dependent, real world phenomena that underlie random effects. They have been successfully applied to a variety of scientific fields, ranging from biology over finance, to physics, chemistry, neuroscience and others. Besides the modelling, it is of primary interest to estimate the underlying model parameters. This is particularly difficult when the multivariate stochastic process is only partially observed through a function of its coordinates. Moreover, due to the increasing complexity of the SDEs, needed to understand and reproduce the real data, the underlying likelihood is often unknown or intractable. Hence, commonly used likelihood-based inference techniques cannot be applied. The inferential task belongs to the class of intractable-likelihood estimation problems, requiring the investigation of new mathematical, numerical and statistical techniques to handle it. Among several likelihood-free inference methods, we focus on the Approximate Bayesian Computation (ABC) approach to estimate the relevant model parameters. We refer to Marin et al. (2012) and to the recently published book “Handbook of Approximate Bayesian Computation” for an exhaustive discussion (Sisson et al. 2018).
ABC has become one of the major tools for parameter inference in complex mathematical models in the last decade. The method was first introduced in the context of population genetics; see, e.g., Beaumont et al. (2002); Pritchard et al. (1999); Tavare et al. (1997). Since then, it has been successfully applied in a wide range of fields; see, e.g., Barnes et al. (2012); Blum (2010a); Boys et al. (2008); McKinley et al. (2017); Moores et al. (2015); Toni et al. (2009). Moreover, ABC has also been proposed to infer parameters from time series models (see, e.g., Drovandi et al. 2016; Jasra 2015), state space models (see, e.g., Martin et al. 2016; Tancredi 2019) and SDE models (see, e.g., Kypraios et al. 2017; Picchini 2014; Picchini and Forman 2016; Picchini and Samson 2018; Sun et al. 2015; Zhu et al. 2016). In the Bayesian framework, prior beliefs about the parameters of interest are updated based on new evidence coming from the observed data through the likelihood, yielding the so-called posterior distribution. ABC is a simulation-based statistical inference method based on the idea of deriving an approximate posterior density targeting the true (unavailable) posterior by running massive simulations from the model to replace the intractable likelihood. Several advanced ABC algorithms have been proposed in the literature, such as, ABC-SMC, ABC-MCMC, sequential-annealing ABC, noisy ABC; see, e.g., Fan and Sisson (2018) and the references therein for a recent review. The goal of this paper is to illustrate how building up the ABC method on the structural properties of the underlying SDE and using a numerical method capable to preserve them in the generation of the data from the model leads to a successful inference even when applying ABC in its basic acceptance-rejection form. Certainly, our proposed quantities can be directly incorporated into more advanced ABC algorithms, but their investigation is out of scope of this paper.
The basic acceptance-rejection ABC algorithm is based on three steps: i. Sample a value from the prior; ii. Simulate new data, the so-called synthetic data, from the model; iii. Keep the sampled value as a realisation from the approximate posterior if the distance between the original and the synthetic data is smaller than some tolerance level. Quite often though, the practical implementation of ABC algorithms requires computations based on vectors of summary statistics instead of the full datasets. A key question is then how to derive low-dimensional summary statistics from the observed data with a minimal loss of information; see, e.g., Blum and François (2010); Blum (2010b); Blum et al. (2013); Prangle (2018) and the references therein, and Fearnhead and Prangle (2012) for a recent guideline. We refer to Blum et al. (2013) for a review on dimension reduction methods in ABC, and to Raynal et al. (2018) for an alternative approach, called ABC random forest, consisting in having no prior selection of the relevant components of the summary statistics and bypassing the derivation of the associated tolerance level. Another major issue with ABC is the choice of the distance criteria; see, e.g., Bernton et al. (2019) for a recent paper proposing the use of the Wasserstein distance for dependent time-series data.
Up to now, ABC methods focus on the choice of summary statistics, distances and tolerance levels mainly from a statistical perspective. We suggest to switch the focus from viewing the data mainly as a collection of values to bearing in mind the underlying model and its structural properties through which the data have initially been generated. When performing statistical inference for SDEs via ABC, one of the major difficulties is that different realisations from the model with identical choice of the parameters may show a large variability. What kind of data transformation, based on the structural properties of the underlying SDE, should be applied, such that different realisations obtained from the same parameter configuration are mapped into the same object, fully characterised by the underlying parameters? Answering this question will then enable us to almost eliminate the large variability in the data and, consequently, allow for the successful parameter inference.
Here, we illustrate this assuming that the observed data are coming from a partially observed multivariate stochastic process admitting a uniquely existing (unknown) invariant distribution. In particular, we provide a new interpretation of the vector of summary statistics. The idea is to map the data to their invariant density and invariant spectral density to consider also the frequency domain, taking thus the dependence structure of the dynamical model into account. This is the first key ingredient of the proposed ABC method. Based on one realisation from the model, we estimate the invariant density by a standard kernel estimator (Pons 2011) and the invariant spectral density with a smoothed periodogram estimator (Cadonna et al. 2017; Quinn et al. 2014). As a result, we succeed in “removing” the variability in the data, since different datasets show similar invariant and spectral densities only when generated from the same underlying parameters. Our specific proposal of summaries can be directly incorporated to any ABC algorithm used for inferring the parameters of any stochastic process which admits an invariant distribution. This is especially remarkable, since usually the choice of the summaries requires a careful tuning depending on the problem under consideration.
As other simulation-based statistical methods, e.g., MCMC, SMC or machine learning algorithms, ABC relies on the ability of simulating data from the model (step ii). However, the exact simulation from complex stochastic models is rarely possible and thus numerical methods to generate the data need to be applied; see, e.g., Kloeden and Platen (1992); Leimkuhler and Matthews (2015); Milstein and Tretyakov (2004). This introduces a new level of approximation into the ABC framework. When the statistical method is build upon the structural properties of the underlying model, the successful inference can only be guaranteed when these properties are preserved in the synthetic data generated from the model. While this may seem obvious, we believe that this issue is not yet sufficiently discussed between the scientific community of stochastic numerics and statistics. In fact, the issue of deriving a proper numerical method when applying ABC to SDEs is usually seen as not relevant; see, e.g., Picchini (2014); Picchini and Forman (2016); Picchini and Samson (2018); Sun et al. (2015). Here, adopting an efficient and structure-preserving numerical method is the second key ingredient of the proposed ABC algorithm.
In this paper, we illustrate our proposed Spectral Density-Based and Measure-Preserving ABC method on the broad class of stochastic Hamiltonian equations. SDEs of this type are often ergodic (see, e.g., Ableidinger et al. 2017; Mattingly et al. 2002) and allow for the use of efficient numerical splitting schemes that, in contrast to commonly used schemes, such as the Euler-Maruyama method, are able to preserve the underlying invariant distribution; see, e.g, Ableidinger et al. (2017); Leimkuhler and Matthews (2015); Leimkuhler et al. (2016); Milstein and Tretyakov (2004). Hamiltonian SDEs have been investigated in the field of molecular dynamics, where they are also referred to as Langevin equations. Recently, they have received remarkable attention in the field of neuroscience as the so-called neural mass models. It is important to highlight that the power of the proposed ABC method does not depend on the underlying model (that is quite general though, being formulated for the class of stochastic Hamiltonian equations). Indeed, it only relies on the existence of a unique invariant measure and of a numerical method preserving it.
The paper is organised as follows. In Section 2, we recall the acceptance-rejection ABC setting (yielding Algorithm 1) and adapt it for general SDEs, taking advantage of the model properties and paying attention to the influence of the numerical method used to simulate the synthetic data. We give a new interpretation of the summary statistics in the case of an underlying invariant distribution, and we propose a distance measure between the observed and the synthetic data based on that. This yields the proposed Spectral Density-Based ABC Algorithm 2, that makes use of the invariant density and of the invariant spectral density. We then discuss the importance of considering measure-preserving numerical schemes for the synthetic data generation when exact simulation methods are not applicable. This leads to the Spectral Density-Based and Measure-Preserving ABC Algorithm 3. In Section 3, we introduce the broad class of Hamiltonian SDEs, for which a unique invariant distribution exists, and we recall two splitting integrators preserving the invariant measure of the underlying model. In Section 4, we validate the proposed method by investigating the stochastic harmonic oscillator, for which an exact simulation of the synthetic data exists. This enables us to apply Algorithm 2 in the absence of an error caused by the numerical scheme. Then, we show that the approximated posterior density derived from the Spectral Density-Based and Measure-Preserving ABC Algorithm 3 is targeting the one derived using the exact simulation scheme, while the commonly used Euler-Maruyama method drastically fails. In Section 5, we apply the Spectral Density-Based and Measure-Preserving ABC method to the stochastic Jansen and Rit neural mass model (JR-NMM) (Ableidinger et al. 2017), for which the exact simulation of sample paths is not possible. This model has been reported to successfully reproduce electroencephalography (EEG) data (Jansen and Rit 1995). We illustrate the performance of the proposed ABC method with both simulated and real EEG data. Final remarks, possible extensions and conclusions are reported in Section 6. Further supplementary material is available111Further illustrations of the proposed method are available in the provided supplementary material, here reported as Section 7. In particular, we illustrate the performance of: a) Algorithm 2 for the estimation of the parameters of the critically damped stochastic oscillator, for which the exact simulation is possible; Algorithm 2 applied to the critically and weakly damped stochastic harmonic oscillators (see Section 4) when estimating a smaller number of parameters; b) Algorithm 3 for the estimation of the parameters of a non-linear damped stochastic oscillator; c) Algorithm 3 for the estimation of the two parameters of the stochastic JR-NMM, which are of specific interest. and a sample code used to generate the results will be available on github upon publication.
2 Spectral Density-Based and Measure-Preserving ABC for partially observed SDEs with an invariant distribution
Let be a complete probability space with the right-continuous and complete filtration . Let , , be a vector of relevant model parameters. We consider the following -dimensional, , non-autonomous SDE of Itô-type describing the time evolution of a system of interest
The initial value is either deterministic or a -valued random variable, measurable with respect to . Here, is a -dimensional, , Wiener process with independent and -adapted components. We further assume that the drift component and the diffusion component fulfil the necessary global Lipschitz and linear growth conditions, such that the existence and the pathwise uniqueness of an -adapted strong solution process of (1) is guaranteed; see, e.g., Arnold (1974).
We aim to infer the parameter vector inherent in the SDE (1), when the -dimensional solution process X is only partially observed through the 1-dimensional and parameter-dependent output process
where is a real-valued continuous function of the components of X.
When modelling physical or dynamical systems via SDE (1), the solution process X and thus the output process must obey some underlying structural properties, whose investigation and preservation is crucial. For example, structural model properties could be boundary properties, symmetries or the preservation of invariants or qualitative behaviour such as the ergodicity or the energy preservation. In this paper, we focus on a specific structural property, namely the existence of a unique invariant measure on of the output process . The process has invariant density and mean, autocovariance and variance given by
If the solution process X of SDE (1) admits an invariant distribution on , then the process inherits this structural property by means of the marginal invariant distributions of . Furthermore, if , then the output process evolves according to the distribution for all . Our goal is to perform statistical inference for the parameter vector of the SDE (1), when the solution process X is partially observed through discrete time measurements of the output process given in (2), by benefiting from the (in general unknown) invariant density .
2.1 The ABC setting for partially observed SDEs
For completeness, we provide a brief account of the acceptance-rejection ABC method for the setting where the underlying model X, solution of SDE (1), is only partially observed through the output process (2). We then adapt the method taking into account the underlying model properties and the need of a numerical simulation scheme preserving them. Let , , be the reference data, corresponding to discrete time observations of the output process . Here, we focus on the Bayesian framework, where we aim to estimate the posterior density for the parameters given the data , satisfying the relation
where denotes the likelihood function and describes the prior density. For multivariate complex SDEs as in (1), the underlying likelihood is often unknown or intractable. The idea of the ABC method is to derive an approximate posterior density for by replacing the unknown likelihood by possibly billions of synthetic dataset simulations generated from the underlying model (1) and mapped to through (2). The basic acceptance-rejection ABC algorithm consists of three steps: i. Sample a value from the prior ; ii. Simulate new data from the model (1) and derive the synthetic data , from the process given by (2); iii. Keep the sampled parameter value as a realisation from the posterior if the distance between a vector of summary statistics , of the original and the synthetic data is smaller than some threshold level , i.e., .
2.1.1 Challenges of the ABC approach
When and is a vector of sufficient statistics for , the acceptance-rejection ABC practice (summarised in Algorithm 1) produces samples from the true posterior . Due to the complexity of the underlying SDE (1), we cannot derive non-trivial sufficient statistics for . Moreover, due to the underlying stochasticity of the model, . Thus, is required to be strictly positive. Hence, the acceptance-rejection ABC Algorithm 1 yields samples from an approximated posterior according to
The quality of the approximation improves as decreases, and it has been shown that, under certain conditions, the approximated ABC posterior converges to the true one when (Jasra 2015). At the same time though, the computational cost increases when decreases. To guarantee a faster convergence, a possibility may be to consider more advanced ABC algorithms, see, e.g., Fan and Sisson (2018) for a recent overview. Another possibility is to use ad-hoc threshold selection procedures; see, e.g., Barber et al. (2015); Blum (2010b); Lintusaari et al. (2017); Prangle et al. (2014); Robert (2016). Here, we proceed differently and fix the tolerance level as a certain percentile of the calculated distances. This is a common practice used, for example, in Beaumont et al. (2002); Biau et al. (2015); Sun et al. (2015); Vo et al. (2015). We demonstrate that our subsequently discussed proposals already lead to a successful inference even when combined with the basic acceptance-rejection ABC procedure and with a standard choice of .
Besides the tolerance level , the quality of the ABC method depends on the choice of “informative enough” summary statistics for the purpose of dimension reduction and on suitable distance measures . These are among the key ingredients of any ABC method and their identification highly depends on the problem at hand. In Fearnhead and Prangle (2012), the authors give instructions for constructing nearly sufficient and informative summary statistics by introducing a “semi-automatic” linear regression approach. Typical choices for the distance are then the Euclidean metric or distances based on smoothing kernels; see, e.g., Picchini (2014) for the use of these procedures in the inference for SDEs via ABC. However, the semi-automatic choice (Fearnhead and Prangle 2012) does not benefit from the structural properties of the underlying SDE (1). We believe that it is crucial to use the properties of the model (1), inherited by the output process (2) and thus reflected in the synthetic data, as a guidance in the choice of the summary statistics and of a suitable distance . For example, this has been done in Picchini and Forman (2016), where they used an underlying stationary distribution for the construction of summary statistics. Moreover, Jasra (2015) reviewed and developed a class of approximation procedures based upon the idea of ABC, but specifically maintaining the probabilistic structure of the original statistical model.
Another common procedure to avoid the information loss caused by using non-sufficient summary statistics is to work with the entire dataset, i.e., ; see, e.g., Jasra (2015); Sun et al. (2015). This requires the application of more sophisticated distances such as the Wasserstein metric (Bernton et al. 2019; Muskulus and Verduyn-Lunel 2011) or other distances designed for time series; for an overview see, e.g., Mori et al. (2016) and the references therein. However, this may be computationally demanding for large datasets. Moreover, due to the internal randomness of the underlying SDE (1), the definition of a proper distance between the entire original and the synthetic dataset, allowing for a successful inference of via ABC, is tricky. In the following, we discuss this and other arising challenges for the statistical inference for SDEs via ABC, and suggest to make use of the underlying model properties to tackle them.
2.1.2 Challenges when applying ABC to SDEs
When applying ABC to SDEs, a new important statistical issue arises. Due to the internal randomness of the underlying SDE (1), repeated realisations of the process (2) under the same parameter vector yield different trajectories. Hence, even when working with the entire dataset, the distance between the observed and the synthetic data may depend on how “representative” the simulated data is for the sampled , e.g., how far is from the theoretical mean value . A possibility to tackle this issue may be to use n repeated simulations of the synthetic dataset per each sampled value , which, in general, does not necessarily guarantee a sufficient reduction of the internal variability. Indeed, whether this improves or not the quality of the estimation depends strongly on the underlying model, as recently discussed and reviewed in McKinley et al. (2017). In particular, under some conditions, Bornn et al. (2018) recently showed that for a simple rejection sampling ABC algorithm, is indeed optimal. Hence, the problem remains open and we propose to proceed differently.
Proposal 1: To interpret as a model property-based data transformation tool, which maps the data into an object that is invariant for repeated simulations under the same parameter configuration. To choose the distance measure according to the mapped data .
When applying ABC to SDEs, we suggest to switch the focus from viewing mainly as a set of available data to bearing in mind the underlying model and its structural properties through which the data have initially been generated. Instead of using summary statistics or working with the entire dataset , we propose to interpret as a data mapping tool based on the structural properties of the underlying SDE (1). The goal is to derive a data transformation that is able to map different realisations obtained from the same model parameters into the same object, fully characterised by the underlying parameters . This will reduce the variability in the data, making the hidden information about (caused by the internal randomness of the model) accessible and, consequently, enable its successful inference. In particular, in Subsection 2.2, we propose two data transformations based on the property that the output process evolves according to an invariant measure with autocovariance . The proposed method is reported in the ABC Algorithm 2.
Another crucial aspect of ABC, which is shared with all other simulation-based methods, is the ability of simulating from the model (Step of the acceptance-rejection ABC Algorithm 1). However, exact simulation schemes are rarely available for general SDEs as (1), or for its observed process (2). Consider a discretized time grid with the equidistant time step and let be a realisation from the output process , obtained through a numerical method, approximating at the discrete data points, i.e., . When exact simulation schemes are not available, the need of suitable numerical methods introduces a new level of approximation in the statistical inference. Indeed, Algorithm 1 samples from an approximated posterior density of the form
As a consequence, in Step of Algorithm 1 is replaced by its numerical approximation . While it may seem obvious that only efficient and structure-preserving numerical methods should be used to generate the synthetic data within the ABC framework, we believe that this issue has not yet been sufficiently discussed between the scientific community of stochastic numerics and statistics. Usually, it is recommended to use the Euler-Maruyama scheme or one of the higher order approximation methods described in Kloeden and Platen (1992); see, e.g., Picchini (2014); Picchini and Forman (2016); Picchini and Samson (2018); Sun et al. (2015). However, these methods may not preserve the underlying structural properties of the SDE (1); see, e.g., Ableidinger et al. (2017); Malham and Wiese (2013); Moro and Schurz (2007); Strømmen Melbø and Higham (2004) and the references therein.
Proposal 2: To adopt a numerical method for the synthetic data generation that is able to preserve the structural properties of the model.
The necessity of simulating synthetic datasets from the model makes the application of an efficient and property-preserving numerical scheme another key ingredient in ABC and in all simulation-based inference methods. When we incorporate structural properties of the underlying SDE (1) into the data mapping technique , the derived ABC method can be successful only if these properties are preserved in the numerical simulation of . In Subsection 2.3, we suggest to use a measure-preserving numerical scheme for the data generation within the ABC approach, introducing the Spectral Density-Based and Measure-Preserving ABC Algorithm 3.
2.2 A new proposal of summaries and distances: Spectral Density-Based ABC for partially observed SDEs with an invariant measure
An illustration of the internal randomness of the underlying SDE is given in Figure 1 (top and middle panels), where we report two trajectories of the output process of the stochastic JR-NMM (25) generated with an identical parameter configuration. This model is a specific SDE of type (1), observed through as in (2), and admitting an invariant distribution satisfying (3). See Section 5 for a description of the model. In the top panel, we visualise the full paths for a time , while in the middle panel we provide a zoom, showing only the initial part.
2.2.1 A new interpretation of summaries based on the invariant measure
To reduce the internal variability, we take advantage of the structural model property . Our idea is to switch the focus from the output process to its invariant density and its invariant spectral density . Both are deterministic functions characterized by the underlying parameters . The invariant spectral density is obtained from the Fourier transformation of the autocovariance function , and it is given by
for . This relation is also known as the Wiener-Khinchin Theorem; see, e.g., Gorban (2017); Khasminskii (2012). The angular frequency relates to the ordinary frequency via .
Since both and are typically unknown, we estimate them from the synthetic dataset generated from the stochastic process .
First, we estimate the invariant density with a kernel density estimator, denoted by ; see, e.g., Pons (2011). Second, we estimate the invariant spectral density (4) with a periodogram estimator, denoted by , which is typically evaluated at Fourier frequencies. Since the periodogram is not a consistent estimator of the spectral density, we consider smoothed periodograms; see, e.g., Cadonna et al. (2017); Quinn et al. (2014) and the references therein.
Differently from the invariant density, the invariant spectral density does not account for the mean but it has the important feature of capturing the dependence structure of the data coming from the dynamical model by taking the autocovariance into account. For this reason, it is crucial to consider this type of transformation when working with non independent data, related to the underlying SDE (1). To combine the different information arising from both densities, we define the invariant measure-based transformation tool of a dataset as
Applying (5) enables us to significantly reduce the variability in the data. This, combined with the fact that both densities react sensitively to small changes in the parameters, allows for the successful inference of the identifiable parameters entering into the two densities. Figure 1 shows the two estimated invariant densities (left lower panel) and invariant spectral densities (right lower panel), all derived from the full paths of the output process (top panel). Note that the variability in the two datasets is almost eliminated when mapping the data to their invariant density and when switching from time to frequency domain. Besides that, the data dimensionality reduces compared to the entire dataset . Additionally, our choice of (5) allows for an easy handling of data sampled at different time grids and with different time steps. Periodogram-based metrics are described, for example, in Caiado et al. (2006), while in Fay et al. (2015), the authors use spectral graph metrics in the context of ABC. The invariant density has been used in distance calculations in the context of ABC (Bernton et al. 2019) as well as for constructing summary statistics (Picchini and Forman 2016), but not as the basis of a mapping tool that reduces the variability as we propose here.
2.2.2 The choice of the distance as the integrated absolute error between two functions
After the crucial step of applying the data transformations proposed in (5), the distance can be chosen among the distance measures between two -valued functions.222Our considered examples suggest that, after performing the data mapping (5), the choice of the distance measure does not have a strong impact on the results (not shown). Here, we consider the integrated absolute error (IAE) between two arbitrary -valued functions and defined by
Another natural possibility could be a distance measure chosen among the so-called f-divergences (see, e.g., Sason and Verdú 2016), or the Wasserstein distance, recently proposed in the ABC context (Bernton et al. 2019). Within the ABC framework (see Step in Algorithm 1), we suggest to use the following distance
where is a weight that we assign to the part related to the IAE of the invariant densities such that the two errors are of the same “order of magnitude”. An automatic choice of the weight or other possibilities of combining the two errors will be investigated in the future. Since the densities and are estimated at discrete points, the IAE (6) is approximated by replacing the integral with the sum of the absolute differences of the functions evaluated at the discrete points multiplied by the length of the interval between two consecutive discrete points.
2.2.3 The Spectral Density-Based ABC Algorithm 2
After having introduced our data transformation tool (5) and the distance measure (7), we incorporate them into an acceptance-rejection ABC algorithm, which is valid for the broad class of SDEs (1), partially observed through the output process (2) and admitting an invariant distribution satisfying (3). We call the algorithm Spectral Density-Based ABC to emphasise the crucial role played by the invariant spectral density in accounting for the dependence between the data.
In Algorithm 2, we assume to observe datasets, referring to realisations of the output process sampled at discrete points in time, resulting in a matrix of observed data. The median (abbreviated as med) of the distances (7) computed for each of the datasets is then returned as a global distance in Step 7. One can interpret as a long-time trajectory (when using simulated observed reference data) or as a long-time recording of the modelled phenomenon (when using real observed reference data) that is cut into pieces. Alternatively, would consist of M independent repeated experiments or simulations, when dealing with real or simulated data, respectively. As expected, having datasets improves the quality of the estimation due to the increased number of observations, leading to more available information about the parameters in the data.
2.3 A new proposal of synthetic data generation: Measure-Preserving ABC for partially observed SDEs with an invariant distribution
As previously discussed, the ABC approach is based on the ability of simulating synthetic datasets from the model. Since exact simulation schemes are rarely available for general SDEs as in (1), we need to rely on numerical methods resulting in approximating .
2.3.1 The issue of preserving the invariant distribution in the synthetic data generation
When constructing a numerical method for the SDE (1), admitting an invariant measure satisfying (3), one of the major difficulties is to guarantee that the output process , obtained through a numerical scheme, preserves the underlying invariant measure of the true output process . Note that Algorithm 2 is based on this structural property. Therefore, this ABC scheme can only guarantee successful inference when the invariant distribution is preserved in the synthetic data generation step.
The commonly used Euler-Maruyama scheme yields discretised trajectories of the solution process X of the SDE (1) through
In Figure 2, we illustrate that the Euler-Maruyama scheme does not preserve the invariant measure of the weakly damped stochastic harmonic oscillator (22). This is a specific SDE of type (1), observed through as in (2) and with a known underlying invariant distribution . See Section 4 for a description of the model. Each subplot shows a comparison of the true invariant density (blue solid lines) and the corresponding kernel estimate based on a path from the model, generated from the measure-preserving numerical splitting scheme (21) (dashed orange lines) or the Euler-Maruyama approach (dotted green lines). See Section 3 for a description of the numerical splitting scheme. The data are generated using and different values for the time step, namely , , . Note how only the measure-preserving numerical method generates synthetic data whose estimated density is close to the true one, independently from the choice of the time step . In contrast, the Euler-Maruyama scheme loses the structure more and more as increases. It has been proved that the numerical splitting scheme preserves the corresponding invariant measure (see, e.g., Ableidinger et al. 2017; Leimkuhler et al. 2016), while the Euler-Maruyama method (8) does not preserve second moment properties of linear stochastic oscillators (Strømmen Melbø and Higham 2004). Similar issues have also been detected for non-linear oscillator SDEs in higher dimensions; see, e.g., Ableidinger et al. (2017). This illustrates the importance of deriving a measure-preserving numerical method before running any type of ABC algorithm or other simulation-based method.
2.3.2 The Spectral Density-Based and Measure-Preserving ABC Algorithm 3
After having constructed a measure-preserving numerical simulation method, it can be incorporated into the ABC framework. This yields the Spectral Density-Based and Measure-Preserving ABC Algorithm 3. It can be applied to any SDE (1), partially observed through (2) and admitting an invariant measure fulfilling (3) that is preserved by the numerical scheme chosen to generate the synthetic data.
2.3.3 Implementation details
The Spectral Density-Based ABC Algorithm 2 and the Spectral Density-Based and Measure-Preserving ABC Algorithm 3 are coded in the computing environment R (R Development Core Team 2011), using the package Rcpp (Eddelbuettel and François 2011), which offers a seamless integration of R and C++, drastically reducing the computational time of the algorithms.
The code is then parallelised using the R-packages foreach and doParallel, taking advantage of the for-loop in each algorithm. All simulations are run on the HPC Cluster RADON1, a high-performing multiple core cluster located at the Johannes Kepler University Linz.333Further information about RADON1 can be found at
https://www.ricam.oeaw.ac.at/hpc/ To obtain smoothed periodogram estimates, we apply the R-function spectrum. It requires the specification of a smoothing parameter span. In all our experiments, we use span . In addition, we avoid using a logarithmic scale by setting the log parameter to “no”. To obtain kernel estimates of the invariant density, we apply the R-function density. Here, we use the default value for the smoothing bandwidth bw. A sample code will be publicly available on github upon publication at https://github.com/massimilianotamborrino.
2.3.4 Evaluation of the ABC results
To evaluate the performance of the proposed ABC methods, we analyse the marginal posterior densities, denoted by , , obtained from the posterior density . It corresponds to , or , depending on whether we obtain it from the Spectral Density-Based ABC Algorithm 2, the Spectral Density-Based and Measure-Preserving ABC Algorithm 3 or the Spectral Density-Based ABC algorithm using the non-preservative Euler-Maruyama method, respectively. Following this notation, we compute the credible probabilities of the credible sets , defined by
We consider the ABC posterior mean defined by
as a point estimator444This is done to provide a link with the classical statistics to scientists outside the statistics community (e.g., an estimated value of the connectivity parameter of the stochastic JR-NMM (25) from EEG data is of interest in neuroscience). of the true underlying parameter and evaluate its bias considering the relative error defined by
Finally, we define the vector of credible probabilities by
and the vector of relative errors by
3 Spectral Density-Based and Measure-Preserving ABC for partially observed Hamiltonian SDEs
In this section, we focus on the class of Hamiltonian SDEs. They are often ergodic, resulting in admitting an invariant measure fulfilling (3). Hence, the Spectral Density-Based ABC Algorithm 2 can be applied to this class of SDEs when an exact simulation scheme is available. If exact simulation is not possible, we propose to use a numerical splitting scheme that, differently to commonly applied schemes, such as the Euler-Maruyama method, preserves the underlying invariant measure within Algorithm 3. In the subsequent sections, we illustrate the performance of our proposed methods by applying Algorithm 2 and Algorithm 3 to specific Hamiltonian SDEs.
Originally, Hamiltonian ordinary differential equations (ODEs) have been introduced in the field of classical mechanics. Later, Hamiltonian SDEs have been intensively studied in molecular dynamics, where they are typically referred to as Langevin equations; see, e.g., Leimkuhler and Matthews (2015); Leimkuhler et al. (2016); Mattingly et al. (2002). They are of special interest for modelling oscillatory dynamics. Recently, they have received considerable attention also in the field of neuroscience as neural mass models, used to describe the mean electrical activity of entire cortical neural populations in the brain; see, e.g., Ableidinger et al. (2017).
3.1 Partially observed Hamiltonian SDEs
We define the -dimensional (, ) stochastic process
consisting of the two -dimensional components
where denotes the transpose.
The -dimensional SDE of Hamiltonian type with initial value and -dimensional () Wiener process W describes the time evolution of X by
Equation (12) is a specific case of the general SDE (1). We denote with the -dimensional zero matrix and with and the gradient with respect to and , respectively. The SDE (12) consists of parts, each representing a certain type of behaviour. In this configuration, the first part is the Hamiltonian part involving given by
where is a diagonal matrix. The second part corresponds to the linear damping part . The third part is the non-linear displacement part, consisting of the non-linear and globally Lipschitz continuous function . The fourth part describes the diffusion part . The requirement of a globally Lipschitz continuous function guarantees the existence of a path-wise unique solution; see, e.g., Arnold (1974).
The solution process X of (12) is assumed to be partially observed via the output process as described in (2). Under the requirement of non-degenerate matrices , and , i.e., strictly positive diagonal entries, Hamiltonian SDEs as in (12) are ergodic.555In Ableidinger et al. (2017), the authors proved this result for the case and . The reason behind is that they were interested in a specific model (the stochastic JR-NMM (25)). However, it is expected that this result directly extends to the general setting in higher dimensions. As a consequence, the distribution of the solution process X (and thus of the output process ) converges exponentially fast towards a unique invariant measure on (and thus on ); see Ableidinger et al. (2017) and the references therein. In a few cases, we are able to exactly simulate from (12). Hence, we can apply the Spectral Density-Based ABC Algorithm 2 to infer the underlying parameter vector . In general though, explicit solutions of SDE (12) are not available, requiring the investigation of a suitable measure-preserving numerical method for the synthetic data generation within the Spectral Density-Based and Measure-Preserving ABC Algorithm 3.
3.2 Measure-Preserving numerical splitting schemes for Hamiltonian SDEs
The underlying invariant measure of SDEs as in (12) can be preserved in the synthetic data generation by applying numerical splitting schemes. They usually have the same order of convergence (mean-square order one) as the Euler-Maruyama scheme; see, e.g., Ableidinger et al. (2017) and the references therein. However, differently from this and other commonly used numerical methods, it can be directly proved (under negligible restrictions on the time step ) that the numerical splitting schemes preserve the structural ergodic property. There exists a considerable amount of literature on numerical splitting schemes for Hamiltonian SDEs; see, e.g., Ableidinger et al. (2017); Leimkuhler and Matthews (2015); Leimkuhler et al. (2016); Milstein and Tretyakov (2004); Misawa (2001). To give an insight, we recall two specific splitting schemes that can be directly applied to the broad class of Hamiltonian SDEs (12); see Ableidinger et al. (2017) and the references therein for a similar illustration.
where denotes the -dimensional zero vector. This results in a linear SDE with additive noise (13) and a non-linear ODE (14) that can be both explicitly solved. Since and , Subsystem (13) can be rewritten as
with and . Equation (15) with the specific and matrices allows for an exact simulation of the trajectories. This is a standard procedure that we recall for completeness; see, e.g., Arnold (1974). The exact path of System (15) is obtained through
where are -dimensional Gaussian vectors with null mean and variance , where the matrix follows the dynamics of the matrix-valued ODE
Moreover, since the non-linear term depends only on the component Q, the exact path of Subsystem (14) is given by
There exist different techniques on how to combine the exact solutions of the subsystems (13) and (14) to generate trajectories , approximating from the solution process of (12). For example, a possibility is the so-called Strang splitting given by
where and denote the exact solutions of (13) and (14), respectively; see, e.g., Ableidinger et al. (2017) and the references therein.666Another common procedure is the Lie-Trotter splitting given by . In Ableidinger et al. (2017), the authors suggest the use of the Strang approach (18), which has also turned out to perform better in our experiments (results not shown). Note also that the choice of the two subsystems is not unique. For example, another possibility is to combine the stochastic term with the non-linear part, yielding the subsystems
The exact path of (19) is given by
while the exact path of (20) is obtained through
where are -dimensional Gaussian vectors with null mean and variance . A resulting numerical scheme for the original SDE (12) can be obtained by the Strang approach
3.3 Spectral Density-Based and Measure-Preserving ABC for Hamiltonian SDEs
To obtain a realisation of the output process that preserves its structure, we suggest to use the numerical splitting schemes (18) or (21) within the ABC framework. The synthetic data are then guaranteed to be distributed according to . Applying the schemes (18) or (21) in Step 5 of Algorithm 3 leads to the Spectral Density-Based and Measure-Preserving ABC Algorithm 3, valid for the broad class of partially observed Hamiltonian SDEs.
4 Validation of the proposed ABC algorithms when exact simulation is possible
In this section, we illustrate the performance of the proposed Spectral Density-Based ABC Algorithm 2 and of the Spectral Density-Based and Measure-Preserving ABC Algorithm 3 on a model problem (weakly damped stochastic harmonic oscillator) of Hamiltonian type (12) with vanishing non-linear displacement term . Linear SDEs of this type allow for an exact simulation of paths. As a consequence, we can apply the Spectral Density-Based ABC Algorithm 2 under the optimal condition of exact data generation. Its performance is illustrated in Subsection 4.2. Additionally, in Subsection 4.3, we investigate the error introduced in the ABC method by a measure-preserving numerical scheme for the synthetic data generation. The approximated posteriors obtained from Algorithm 3 are then compared with those obtained through Algorithm 2 under the exact scheme.
4.1 Weakly damped stochastic harmonic oscillator: The model and its properties
We investigate the -dimensional Hamiltonian SDE
with strictly positive parameters , and . Depending on the choice of and , (22) models different types of harmonic oscillators, which are common in nature and of great interest in classical mechanics. Linear SDEs as (22) can be rewritten as (15), allowing for an exact simulation of the sample paths of the corresponding solution process through (16). Here, we focus on the weakly damped harmonic oscillator, satisfying the condition . Our goal is to estimate assuming that the solution process is partially observed through the first coordinate, i.e., .777An illustration of the performance of Algorithm 2 for the critically damped case (satisfying ), when only the second coordinate is observed, is reported in the supplementary material (Section 7). Figure 3 displays two paths of the output process obtained through the exact simulation method (16) for the same choice of parameters. The solution process X is normally distributed according to
Consequently, the structural property of the output process becomes
As discussed in Section 3, if , then the output process evolves according to the distribution for all . Therefore, the data points are identically distributed according to and their stationary dependency is captured by the autocovariance function
where . For , corresponds to the variance of the invariant distribution in (23).
4.2 Validation of the Spectral Density-Based ABC Algorithm 2
In this subsection, we illustrate the performance of the Spectral Density-Based ABC Algorithm 2 under the optimal condition of exact data generation from the model, deriving . The exact simulation scheme (16) guarantees the preservation of in the generation of the time discretised sample paths of the output process .
In all the considered examples (see also the supplementary material), the performance of the ABC algorithms for the estimation of the parameters of SDE (22) does not improve when incorporating the information of the invariant densities into the distance (7). This is because the mean of the invariant distribution (23) is zero. Hence, we set and base our distance only on the invariant spectral density, estimated by the periodogram. Throughout the current (and following) subsection, we consider observed paths simulated with the exact scheme (16), using a time step over a time interval of length . Hence, each trajectory has observation points. As true parameters for the simulation of the reference data, we choose
We use the exact simulation scheme (16) to generate synthetic datasets over the same time domain and with the same time step as the observed data. We choose independent uniform priors, in particular,
The tolerance level is chosen as the percentile of the calculated distances. Hence, we keep 250 of all the sampled values for .
Figure 4 (top panels) shows the marginal ABC posterior densities (blue lines) and their flat priors (red lines). The proposed Spectral Density-Based ABC Algorithm 2 provides marginal posterior densities centred around the true values , represented by the black vertical lines. The posterior mean point estimates (10) are given by
with relative errors (11) in percentage equal to
The marginal credible probabilities defined in (9) of the credible intervals ,
In the lower panels of Figure 4, we report the pairwise scatterplots of the kept ABC posterior samples. Note that, since the kept values of are uncorrelated with those of the other parameters, the obtained credible probability is approximately the same as when estimating only or (cf. supplementary material).
Vice versa, since the kept ABC posterior samples of the parameters and are correlated, is smaller than that obtained when estimating . Despite this correlation, Algorithm 2 allows for a successful inference of all the three parameters of the weakly damped stochastic harmonic oscillator under the optimal condition of exact and thus measure-preservative data simulation from the underlying model.
4.3 Validation of the Spectral Density-Based and Measure-Preserving ABC Algorithm 3
How does the numerical error, introduced by the numerical scheme in the synthetic data generation, impinge on the ABC performance? To answer this question, we compare the posterior density , previously obtained from Algorithm 2 using an exact simulation scheme, with the posterior density , obtained from Algorithm 3 using the measure-preserving numerical splitting method (21), and with the posterior density , obtained using the non-preservative Euler-Maryuama method.
In Figure 5, we report the marginal posteriors (blue solid lines) and (orange dashed lines) obtained with the same priors, and as before, for different values of the time step . In particular, we choose (top panels), (middle panels) and (lower panels). The posteriors obtained from the Spectral Density-Based and Measure-Preserving ABC Algorithm 3 successfully targets , even for a time step as large as . On the contrary, the Spectral Density-Based ABC method that generates the synthetic data with the Euler-Maruyama scheme is not even applicable. Indeed, the numerical scheme computationally pushes the amplitude of the oscillator towards infinity, resulting in data points . Thus, neither nor can be computed and the density cannot be derived.
As a further illustration of the poor performance of the Euler-Maruyama scheme, even for smaller choices of , we now consider the simplest possible scenario where we only estimate one parameter, namely . We set , , percentile and we choose a uniform prior . To be able to derive , we simulate the synthetic data using the Euler-Maruyama method with the time steps , and . Figure 6 shows the three ABC posterior densities (blue solid lines), (orange dashed lines) and (green dotted lines) for the different choices of . The horizontal red lines and the black vertical lines denote the uniform prior and the true parameter value, respectively. In all cases, the Spectral Density-Based ABC method using the non-preservative Euler-Maruyama scheme does not lead to a successful inference. In addition, these results are not stable for the different choices of , and the derived ABC posterior density may not even cover the true parameter value . The performance of the ABC method based on the Euler-Maruayma scheme may improve for “small enough”. However, there is a trade-off between the running time and the acceptance performance of the Euler-Maruyama-based ABC algorithm. Indeed, the simulation of one trajectory with a time step requires approximately hundred times more than the generation of one trajectory using a time step . Hence, a runtime of a few hours would turn to months. In addition, even for “arbitrary small” time steps , one cannot guarantee that the Euler-Maruyama scheme preserves the underlying invariant measure. Due to the high number of synthetic data needed to be generated, the runtime of the proposed ABC algorithms reduces drastically when choosing a large time steps . For these reasons, it is crucial to base our ABC method on the structure-preserving numerical scheme instead of the non-preservative Euler-Maruyama method.
5 Validation of the Spectral Density-Based and Measure-Preserving ABC Algorithm 3 on simulated and real data
We now illustrate the performance of Algorithm 3 by applying it to an SDE (12) with a non-vanishing displacement term . Due to the non-linearity in , an exact simulation of sample paths is not possible and we rely on an efficient numerical splitting scheme to guarantee the measure-preservative synthetic data generation within the ABC framework. In particular, here we consider the splitting scheme (18). We estimate the parameters of a stochastic version of the JR-NMM; see Jansen and Rit (1995) for the original model and Ableidinger et al. (2017) for its reformulation as a Hamiltonian SDE of type (12). After estimating the parameters from simulated data, we infer them from real EEG data.888In the available supplementary material (Section 7), we illustrate the performance of Algorithm 3 on another SDE, namely a non-linear damped stochastic oscillator that can be seen as an extension of the weakly damped harmonic oscillator discussed in Section 4.
5.1 The Stochastic Jansen and Rit Neural Mass Model
Due to the oscillatory and noisy nature of the neural firing activity, the Hamiltonian SDE (12) comprises a variety of mathematical models from the field of neuroscience. Here, we are interested in the so-called neural mass models. They provide a mathematical framework to describe the electrical activity of entire populations of neurons through their average properties. These models have been reported to successfully reproduce EEG data, and are applied in the research of neurological disorders such as epilepsy or schizophrenia; see, e.g., Jansen and Rit (1995); Wendling et al. (2000, 2002). Among others, we focus on the stochastic JR-NMM (Ableidinger et al. 2017)999The original version of the JR-NMM (Jansen and Rit 1995) is treated as an ODE with a stochastic input function, which is denoted as random “white noise” input. Since the solution process of this model inherits the properties of this input function, e.g., its poor smoothness arising from the stochasticity needed to reproduce the real data, the mathematical and numerical tools for deterministic differential equations are not useful for the analysis of this model. Compared to the original model, the SDE reformulation (25), introduced in Ableidinger et al. (2017), can be studied and understood through its structural properties by applying the well-established tools of stochastic analysis. Moreover, it offers the possibility of a measure-preserving numerical simulation from the model, as discussed before. This is a key feature enabling us to successfully infer the model parameters through the proposed Spectral Density-Based and Measure-Preserving ABC Algorithm 3.. It describes the average membrane potential of a cortical neural population by modelling the interaction of the main pyramidal cells with the surrounding excitatory and inhibitory interneurons. See Jansen and Rit (1995) for a more detailed description of the model. The stochastic JR-NMM is a -dimensional SDE of the form