Bayesian nonparametric spectral density estimation using B-spline priors

Bayesian nonparametric spectral density estimation using B-spline priors

Abstract

We present a new Bayesian nonparametric approach to estimating the spectral density of a stationary time series. A nonparametric prior based on a mixture of B-spline distributions is specified and can be regarded as a generalization of the Bernstein polynomial prior of [43] and [14]. Whittle’s likelihood approximation is used to obtain the pseudo-posterior distribution. This method allows for a data-driven choice of the smoothing parameter as well as the number and the location of the knots. Posterior samples are obtained using a parallel tempered Metropolis-within-Gibbs Markov chain Monte Carlo algorithm. We conduct a simulation study to demonstrate that for complicated spectral densities, the B-spline prior provides more accurate Monte Carlo estimates in terms of -error and uniform coverage probabilities than the Bernstein polynomial prior. Finally, we demonstrate the algorithm’s ability to estimate a spectral density with sharp features, using real gravitational wave detector data from LIGO’s sixth science run.

1Introduction

Useful information about a stationary time series is encoded in its spectral density, sometimes called the power spectral density (PSD). This quantity describes the variance (or power) each individual frequency component contributes to the overall variance of the time series, and forms a Fourier transform pair with the autocovariance function. More formally, assuming an absolutely summable autocovariance function (), the spectral density function of a zero-mean weakly stationary time series exists, is continuous and bounded, and is defined as:

where is angular frequency.

Spectral density estimation methods can be broadly classified into two groups: parametric and nonparametric. Parametric approaches to spectral density estimation are primarily based on autoregressive moving average (ARMA) models ([9]), but they tend to give misleading inferences when the parametric model is poorly specified.

A large number of nonparametric estimation techniques are based on smoothing the periodogram, a process that randomly fluctuates around the true PSD. The periodogram, , is easily and efficiently computed as the (normalized) squared modulus of Fourier coefficients using the fast Fourier transform. That is,

where is angular frequency, and is a stationary time series with discrete time points, .

Though the periodogram is an asymptotically unbiased estimator of the spectral density, it is not a consistent estimator ([9]). Smoothing techniques such as Bartlett’s method ([8]), Welch’s method ([57]), and the multitaper method ([53]) aim to reduce the variance of the periodogram by dividing a time series into (potentially overlapping) segments, calculating the periodogram for each segment, and averaging over all of these. Unfortunately, these techniques are sensitive to the choice of smoothing parameter (i.e., the number of segments), resulting in a variance/bias trade-off. Reducing the length of each segment also leads to lower frequency resolution.

Another common nonparametric approach to spectral estimation involves the use of splines. Smoothing spline techniques are not new to spectral estimation (see e.g., [16] for an early reference). [56] used splines to smooth the log-periodogram, with an automatic data-driven smoothing parameter, avoiding the difficult problem of having to choose this quantity. [31] used maximum likelihood and polynomial splines to approximate the log-spectral density function.

Bayesian nonparametric approaches to spectrum estimation have gained momentum in recent times. In the context of splines, [24] used a fixed low-order piecewise polynomial to estimate the log-spectral density of a stationary time series. They implemented a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm of [27], placing priors on the number of knots and their locations, with the goal of estimating spectral densities with sharp features. [14] placed a Bernstein polynomial prior ([43]) on the spectral density. The Bernstein polynomial prior is essentially a finite mixture of beta densities with weights induced by a Dirichlet process. The number of mixtures is a smoothing parameter, chosen to have a discrete prior. [59] generalized this and constructed a multi-dimensional Bernstein polynomial prior to estimate the spectral density function of a random field. Also extending the work of [14], [36] used informative priors to extract unobserved spectral components in a time series, and [37] generalized this to multiple time series.

Other interesting Bayesian nonparametric approaches include [12] inducing a prior on the log-spectral density using an integrated Wiener process, and [54] placing a Gaussian random field prior on the log-spectral density. [33], [47], and [13] used Bayesian nonparametric methods to estimate spectral densities from long memory time series, and [46] focused on time-varying spectra in nonstationary time series.

The majority of the Bayesian nonparametric methods (for short memory time series) mentioned here make use of Whittle’s approximation to the Gaussian likelihood, often called the Whittle likelihood ([58]). The Whittle likelihood, , for a mean-centered weakly stationary time series of length and spectral density has the following formulation:

where are the positive Fourier frequencies, is the greatest integer value less than or equal to , and is the periodogram defined in Equation (Equation 2).

The motivation for the work presented in this paper is to apply it in signal searches for gravitational waves (GWs) using data from Advanced LIGO ([1]) and Advanced Virgo ([6]). These interferometric GW detectors have time-varying spectra, and it will be important in future signal searches to be able to estimate the parameters describing the noise simultaneously with the parameters of a detected gravitational wave signal. In a previous study ([21]), we utilized the methodology of [14] to estimate the spectral density of simulated Advanced LIGO ([1]) noise, while simultaneously estimating the parameters of a rotating stellar core collapse GW signal. This method, based on the Bernstein polynomial prior, worked extremely well on simulated data, but we found that it was not well-equipped to detect the sharp and abrupt changes in an otherwise smooth spectral density present in real LIGO noise ([15]). Under default noninformative priors, the method tended to over-smooth the spectral density. As detailed in Section 2.2, this unsatisfactory performance is only partly due to the well-known slow convergence of order , where is the degree of the Bernstein polynomials ([42]), but mainly due to a lack of coverage of the space of spectral distributions by Bernstein polynomials. This can be overcome, though, by using B-splines with variable knots instead of Bernstein polynomials, yielding a much improved approximation of order of in the number of knots and adequate coverage of the space of spectral distributions.

The focus of this paper is to describe a new Bayesian nonparametric approach to modelling the spectral density of a stationary time series. Similar to [24], our goal is to estimate spectral density functions with sharp peaks. Here we present an alternative nonparametric prior using a mixture of B-spline densities, which we will call the B-spline prior.

Following [14], we induce the weights for each of the B-spline mixture densities using a Dirichlet process prior. Furthermore, in order to allow flexible, data-driven knot placements, a second (independent) Dirichlet process prior is put on the knot differences which, in turn, determines the shape and location of the B-spline densities and hence the structure of the spectral density. [17] applied a similar approach in the context of functional data analysis.

A noninformative prior on the number of knots allows for a data-driven choice of the smoothing parameter. The B-spline prior could naturally be interpreted as a generalization of the Bernstein polynomial prior, as Bernstein polynomials are indeed a special case of B-splines where there are no internal knots.

B-splines have the useful property of local support, where they are only non-zero between their end knots. We will demonstrate that if knots are close enough together, then the property of local support will generally allow us to model sharp and abrupt changes to a spectral density.

Samples from the pseudo-posterior distribution are obtained by updating the B-spline prior with the Whittle likelihood ([58]). This is implemented as a Metropolis-within-Gibbs Markov chain Monte Carlo (MCMC) sampler ([39]). To improve mixing and convergence, we use a parallel tempering scheme ([52]).

We will demonstrate that the B-spline prior is more flexible than the Bernstein polynomial prior and can better approximate sharp peaks in the spectral density. We will show that for complicated PSDs under noninformative priors, the B-spline prior gives sensible Monte Carlo estimates and outperforms the Bernstein polynomial prior in terms of integrated absolute error (IAE) and frequentist uniform coverage probabilities. Furthermore, the placement of these knots is based on the nonparametric Dirichlet process prior, meaning trans-dimensional methods such as RJMCMC ([27]) can be avoided. This is useful as RJMCMC is often fraught with implementation difficulties, such as finding an efficient jump proposal when there are indeed no natural choices for trans-dimensional jumps ([10]).

The paper is organized as follows. Section 2 sets the notation and defines B-splines and B-spline densities. After briefly reviewing the Bernstein polynomial prior, we explain the rationale for the B-spline prior before extending it to a prior for the spectral density of a stationary time series. We discuss the MCMC implementation in Section 3. Section 4 details the results of the simulation study, and in Section 5, we apply the method to real gravitational wave detector data. Concluding remarks are given in Section 6.

2The B-spline prior

In this section, the B-spline prior for spectral densities of stationary time series will be defined. To this end, we first set the notation and define B-splines and B-spline densities. We review the Bernstein polynomial prior and extend this approach to the B-spline prior with variable knots.

2.1B-splines and B-spline densities

A spline function of order is a piecewise polynomial of degree with so-called knots where the piecewise polynomials connect. A spline is continuous at the knots (or continuously differentiable to a certain order depending on the multiplicity of the knots). The number of internal knots must be . Any spline function of order defined on a certain partition can be uniquely represented as a linear combination of basis splines, B-splines, of the same order over the same partition ([45]). B-splines can be parametrized either recursively ([18]), or by using divided differences and truncated power functions ([45]). We will adopt the former convention. Without loss of generality, assume the global domain of interest is the unit interval .

For a set of B-splines of degree for some integer , define a nondecreasing knot sequence

of knots, comprised of internal knots and external knots. The external knots outside or on the boundary of (i.e., and ) are required for B-splines to constitute a basis of spline functions on . Here we assume that the external knots are all exactly on the boundary. The knot sequence yields a partition of the interval into subsets.

For , each individual B-spline of degree , , depends on only through the consecutive knots . The number of internal knots is equal to the degree of the B-spline if there are no knot multiplicities. There can be a maximum of coincident knots for (right) continuity. These knots determine the shape and location of each B-spline.

A B-spline with degree 0 is the following indicator function

Note that if , then .

Higher degree B-splines can then be defined recursively using

where is the degree and

B-spline densities are the usual B-spline basis functions, normalized so they each integrate to 1 ([11]). The recursive B-spline parametrization used in this paper allows us to easily analytically integrate each B-spline, which we then use as normalization constant for the B-spline density defined as

2.2Bernstein polynomial prior and B-spline prior

The Bernstein polynomial prior of [43] and [14] is based on the Weierstrass approximation theorem that states that any continuous function on can be uniformly approximated to any desired accuracy by Bernstein polynomials. Let denote a cumulative distribution function (cdf) with continuous density on , then the following mixture

converges uniformly to , where and and denote the cdf and density of the beta distribution with parameters and , respectively.

Define and . As shown by [42], the loss associated with the approximation of by the dimensional space with respect to loss function defined by

where , can not be made arbitrarily small. Thus the mixture of beta cdfs does not provide an adequate coverage of the space of cdfs on . However, [42] showed that if one replaces the beta distributions by B-spline distributions of fixed order (shown for order 2, i.e., triangular distributions) but with variable knots, the loss can be made arbitrarily small by increasing the number of knots. This is the rationale for using a mixture of B-spline distributions with variable knots in the following specification of a sieve prior.

The B-spline prior has the following representation as a mixture of B-spline densities:

where is the number of B-splines of fixed degree in the mixture, the weight vector, and the knot sequence. Rather than putting a prior on the ’s whose dimension changes with , we follow the approach of [14] and assume that the weights are induced by a cdf on . Similarly, we assume that the internal knot differences for are induced by a cdf on . Or equivalently, for , yielding the B-spline prior parametrized in terms of and :

Independent Dirichlet process priors are then put on and and a discrete prior on the number of mixture components .

The B-spline prior is similar in nature to the Bernstein polynomial prior introduced by [43] and applied to spectral density estimation by [14]. The primary difference is that the B-spline prior is a mixture of B-spline densities with local support rather than beta densities with full support on the unit interval as shown in Figure 1. Moreover, this local support of the B-splines is determined by the data.

Figure 1: Top panel: Eight cubic B-spline densities with equidistant knots at \omega = \{0, 0.2, 0.4, 0.6, 0.8, 1\}. Notice the local support. Bottom panel: Eight beta densities with full support on the entire unit interval.
Figure 1: Top panel: Eight cubic B-spline densities with equidistant knots at . Notice the local support. Bottom panel: Eight beta densities with full support on the entire unit interval.

When there are no internal knots, the B-spline basis becomes a Bernstein polynomial basis. Bernstein polynomials are thus a special case of B-splines, and the B-spline prior could be regarded as a generalization of the Bernstein polynomial prior.

Figure 2 demonstrates that it is possible to construct curves (B-spline mixtures) with sharp peaks if knots are close enough together. The top panel shows a set of B-spline density functions and the bottom panel shows a mixture of these with random weights. The local support property of B-splines is the reason the B-spline prior will be instrumental in estimating a spectral density with sharp peaks.

Figure 2: Top panel: Cubic B-spline densities with many knots close to each of the locations \omega = \{0.25, 0.5, 0.75\}. Bottom panel: A random mixture of these B-spline densities. It is possible to construct a B-spline mixture with abrupt, sharp peaks.
Figure 2: Top panel: Cubic B-spline densities with many knots close to each of the locations . Bottom panel: A random mixture of these B-spline densities. It is possible to construct a B-spline mixture with abrupt, sharp peaks.

2.3Prior for the spectral density

To place a prior on the spectral density of a stationary time series defined on the interval , first transform to the interval :

where is the normalization constant, and is the B-spline prior defined in Equation (Equation 9). The prior for therefore has the following hierarchical structure:

  • determines the weights (i.e., scale) for each of the B-spline densities. Let , where is the precision parameter and is the base probability distribution function with density .

  • determines the location of knots and hence the shape and location of the B-spline densities. Let , where is the precision parameter and is the base probability distribution function with density .

  • is the number of B-splines in the mixture (i.e., smoothness) and has discrete probability mass function for . Here is the largest possible value we allow to take. We limit the maximum value of for computational reasons and do pilot runs to ensure a larger is not required. A smaller implies smoother spectral densities.

  • is the normalizing constant needed to reparameterize the spectral density from to a density on the unit interval. Let .

Assume all of these parameters are a priori independent.

3Implementation using Markov chain Monte Carlo

As Dirichlet process priors have been placed on and , we require an algorithm to sample from these distributions. To sample from a Dirichlet process, we use Sethuraman’s stick-breaking construction ([50]), an infinite-dimensional mixture model. For computational purposes, the number of mixture distributions for Dirichlet process representations of and is truncated to large but finite positive integers ( and respectively). Let for simplicity. A larger choice of will yield more accurate approximations, but at the expense of increasing the computation time.

To set up the stick-breaking process, reparameterize to such that

and to such that

where is a probability density, degenerate at , i.e., at and 0 otherwise.

Conditional on , the above hierarchical structure provides a finite mixture prior for the spectral density of a stationary time series

with weights

and knot differences

for and . The denominator in the latter comes from assuming the exterior knots are the same as the boundary knots. Note also that we assume the lower internal boundary knot , meaning the first knot difference is . The subsequent knot placements are determined by taking the cumulative sum of the knot differences.

Abbreviating the vector of parameters as , the joint prior is

To produce the unnormalized joint posterior, this joint prior is updated using the Whittle likelihood defined in Equation (Equation 3).

We implement a Metropolis-within-Gibbs algorithm to sample points from the pseudo-posterior, using the same modular symmetric proposal distributions for B-spline weight parameters and as described by [14]. That is, say for , propose a candidate from a uniform distribution with , modulo the circular unit interval. If the candidate is greater than 1, take the decimal part only, and if the candidate is less than 0, add 1 to put it back into [0, 1]. This is done for all of the and parameters. [14] found that worked well for most cases, and we also adopt this. The same approach is used analogously for the B-spline knot location parameters and . Parameter has a conjugate inverse-gamma prior and may be sampled directly. Smoothing parameter could be sampled directly from its discrete full conditional (as done by [14]), though this can be computationally expensive for large , so we use a Metropolis proposal centered on the previous value of , such that there is a 75% chance of jumping according to a discrete uniform on , and a 25% chance of boldly jumping according to a discretized Cauchy random variable.

There is a common tendency towards multimodal posteriors in finite/infinite mixture models. If there are many isolated modes separated by low posterior density, it is important to use a sampling technique that mixes Markov chains efficiently, rather than relying on the random walk behaviour of the Metropolis sampler. In order to mitigate poor mixing and convergence of Markov chains, we use parallel tempering, or replica exchange ([52]) based on an idea borrowed from physical chemistry.

Parallel tempering involves introducing an auxiliary variable called inverse-temperature, denoted for chains . This variable becomes an exponent in the target distribution for each parallel chain, . That is, , where are the model parameters, and is the time series data vector. If , this is the posterior distribution of interest. All other inverse-temperature values produce tempered target distributions. Each chain moves on its own in parallel and occasionally swaps states between chains according to a Metropolis accept-reject rule.

We use cubic B-splines () for all of the examples in the following sections. The serial version of the (cubic) B-spline prior algorithm is available as a function called gibbs_bspline in the R package bsplinePsd [22]. This is available on CRAN. The parallel tempered version is implemented in R using the Rmpi library but is not publicly available. Please contact the first author for access to this code.

4Simulation study

In this section, we run a simulation study on two autoregressive (AR) time series of different order: AR(1) and AR(4). For the first scenario, an AR(1) with first order autocorrelation (a relatively simple spectral density) is generated. In the second scenario, an AR(4) with parameters , and is generated, such that the spectral density has two large peaks. Let each time series have lengths with unit variance Gaussian innovations.

We simulate 1,000 different realisations of AR(1) and AR(4) data and model the spectral densities by running the Bernstein polynomial prior algorithm of [14] and our B-spline prior algorithm on each of these. The MCMC algorithms (without parallel tempering as mixing is satisfactory for these toy examples) run for 400,000 iterations, with a burn-in period of 200,000 and thinning factor of 10.

For both spectral density estimation methods, we choose default noninformative priors. That is, for the B-spline prior, let . For comparability, we let the Bernstein polynomial prior have exactly the same prior set-up as the B-spline prior, but obviously without knot location parameter and distribution .

We set for both algorithms. This may seem unnecessarily large for the B-spline prior algorithm as these simple cases converge to a low . However, it is large enough to ensure the Bernstein polynomial algorithm converges to an appropriate , without being truncated at .

Based on the suggestions by [14], we set the stick-breaking truncation parameter to . This provides a reasonable balance between accuracy and computational efficiency.

The (cubic) B-spline prior algorithm is run using the gibbs_bspline function in the R package bsplinePsd [22]. The Bernstein polynomial prior algorithm is run using the gibbs_NP function in the R package beyondWhittle [30]. Both packages are available on CRAN.

An AR() model has theoretical spectral density,

where is the variance of the white noise innovations and are the model parameters. We can compare estimates to this true spectral density to measure relative performance of the nonparametric priors. One measure of closeness and accuracy is the integrated absolute error (IAE), also known as the -error. This is defined as:

where is the Monte Carlo estimate (posterior median) of the spectral density . We calculate the IAE for each replication and then compare the average IAE over all 1,000 replications. The results are presented in Table ?.

Median -error for the estimated spectral densities using B-spline prior and Bernstein polynomial prior on simulated AR(1) and AR(4) data.
AR(1)
B-spline 0.901 0.756 0.592
Bernstein 0.830 0.706 0.518
AR(4)
B-spline 3.242 2.371 1.886
Bernstein 3.427 2.920 2.656

Table ? compares the median IAE of the estimated spectral densities under the two different nonparametric priors. For the AR(1) cases, the median IAE is only marginally higher for the B-spline prior than the Bernstein polynomial prior. As the AR(1) has a simple spectral structure, this is a case where the global support of the Bernstein polynomials makes sense. However, when estimating the more complicated AR(4) spectral density, we see that the B-spline prior yields more accurate estimates than the Bernstein polynomial prior in terms of IAE. We also see that for both priors, as increases, median IAE decreases.

For each simulation, we calculate two different credible regions: the usual equal-tailed pointwise credible region, and the uniform (or simultaneous) credible band ([41]). Uniform credible bands are very useful as they allow the calculation of coverage levels for entire curves (spectral densities in this case) rather than pointwise intervals. To compute a % uniform credible band, we use the following form:

where is the pointwise posterior median spectral density, is the median absolute deviation of the posterior samples kept after burn-in and thinning (which are used as the estimate of dispersion of the sampling distribution of ), and we choose the such that

Based on these uniform credible bands, uniform coverage probabilities over all 1,000 replications of the simulation can be computed. That is, calculate the proportion of times that the true spectral density is entirely encapsulated within the uniform credible band. Computed coverage probabilities are shown in Table ?.

Coverage probabilities based on 90% uniform credible bands.
AR(1)
B-spline 1.000 1.000 0.998
Bernstein 1.000 0.987 0.499
AR(4)
B-spline 0.936 0.979 0.907
Bernstein 0.000 0.000 0.000

It can be seen in Table ? that the B-spline prior has higher coverage than the Bernstein polynomial prior in all examples (apart from the AR(1) with , where it is the same). The B-spline prior produces excellent coverage probabilities for the AR(1) cases. The Bernstein polynomial prior also performs well in this regard, apart from the case, where half are not fully covered. An example from one of the 1,000 replications of the AR(1) with is given in Figure 3. Here, the uniform credible band fully contains the true PSD for the B-spline prior but not for the Bernstein polynomial prior (the true PSD falls outside the uniform credible band at the highest frequencies). The pointwise credible region and posterior median log-PSD for both priors are also very accurate. This is not surprising as the AR(1) has a relatively simple spectral structure.

Figure 3: Estimated log-spectral densities for an AR(1) time series using the B-spline prior (left) and Bernstein polynomial prior (right). The solid line is the true log-PSD; the dashed line is the posterior median log-PSD; the dark shaded region is the pointwise 90% credible region; and the light shaded region is the uniform 90% credible band.
Figure 3: Estimated log-spectral densities for an AR(1) time series using the B-spline prior (left) and Bernstein polynomial prior (right). The solid line is the true log-PSD; the dashed line is the posterior median log-PSD; the dark shaded region is the pointwise 90% credible region; and the light shaded region is the uniform 90% credible band.

Coverage of the AR(4) spectral density under the B-spline prior is above 90% for each sample size. However, the Bernstein polynomial prior has extremely poor coverage in the AR(4) case, where none of the 1,000 replications are fully covered by the uniform credible band for each sample size. An example of this performance (for ) can be seen in Figure 4. The Bernstein polynomial prior (under the noninformative prior set-up) tends to over-smooth the second large peak of the PSD, and introduces additional incorrect peaks and troughs throughout the rest of estimate. These false peaks and troughs are present due to the Bernstein polynomial prior algorithm converging to large in an attempt to approximate the two large peaks of the AR(4) PSD. The B-spline prior gives a much more accurate Monte Carlo estimate. The posterior median log-PSD is close to the true AR(4) PSD, the 90% pointwise credible region mostly contains the true PSD, and the 90% uniform credible band fully contains it.

Figure 4: Estimated log-spectral densities for an AR(4) time series using the B-spline prior (left) and Bernstein polynomial prior (right). The solid line is the true log-PSD; the dashed line is the posterior median log-PSD; the dark shaded region is the pointwise 90% credible region; and the light shaded region is the uniform 90% credible band.
Figure 4: Estimated log-spectral densities for an AR(4) time series using the B-spline prior (left) and Bernstein polynomial prior (right). The solid line is the true log-PSD; the dashed line is the posterior median log-PSD; the dark shaded region is the pointwise 90% credible region; and the light shaded region is the uniform 90% credible band.

Of course, the Bernstein polynomial prior could perform better on spectral densities with sharp features if significant prior knowledge was known in advance. This can however be a formidable task, and is not very generalizable to other time series. A benefit of using the B-spline prior is its ability to estimate a variety of different spectral densities using the default noninformative priors used in this paper. We will see another example of this in the next section.

One slight drawback of the B-spline prior algorithm is its computational complexity relative to the Bernstein polynomial prior. B-spline densities must be evaluated many times per MCMC iteration (when sampling , and ) due to the variable knot placements, whereas beta densities can be pre-computed and stored in memory, saving much computation time.

Table ? shows the median run-time (over each 1,000 replication) for each of the six AR simulations.

Median absolute run-times (hours) and their associated relative run-times.
AR(1)
B-spline 2.967 3.186 3.659
Bernstein 1.423 1.572 1.844
B-spline/Bernstein 2.086 2.026 1.985
AR(4)
B-spline 4.044 4.422 5.174
Bernstein 1.443 1.694 2.281
B-spline/Bernstein 2.802 2.610 2.268

It can be seen in Table ? that the B-spline prior algorithm is approximately 2–3 times slower than the Bernstein polynomial prior algorithm for these examples. Due to the noted advantages (accuracy and coverage) that the B-spline prior has over the Bernstein polynomial prior, particularly for PSDs with complicated structures, we believe the increased computation time is an acceptable trade-off, though for simple spectral densities, the Bernstein polynomial prior should suffice.

5Application to recoloured LIGO S6 data

Gravitational waves (GWs) are ripples in the fabric of spacetime, caused by the most exotic and cataclysmic astrophysical events in the cosmos, such as binary black hole or neutron star mergers, core collapse supernovae, pulsars, and even the Big Bang. They are a consequence of Einstein’s general theory of relativity ([23]).

On September 14, 2015, the breakthrough first direct detection of GWs was made using the Advanced LIGO detectors ([3]). The signal, GW150914, came from a pair of stellar mass black holes that coalesced approximately 1.3 billion light-years away. This was also the first direct observation of black hole mergers. Two subsequent detections of pairs of stellar mass black holes, GW151226 and GW170104, were respectively made by the Advanced LIGO detectors on 26 December, 2015 ([4]), and 4 January, 2017 ([5]), signalling a new era of astronomy is now upon us.

Advanced LIGO is a set of two GW interferometers in the United States (one in Hanford, Washington, and the other in Livingston, Louisiana) ([1]). Data collected by these observatories are dominated by instrument and background noise — primarily seismic, thermal, and photon shot noise. There are also high power, narrow band, spectral noise lines caused by the AC electrical supplies and mirror suspensions, among other phenomena. Though GW150914 had a large signal-to-noise ratio, signals detected by these observatories will generally be relatively weak. Improving the characterization of detector/background noise could therefore positively impact signal characterization and detection confidence.

The default noise model assumes instrument noise is Gaussian, stationary, and has a known PSD. Real data often depart from these assumptions, motivating the development of alternative statistical models for detector noise. In the literature, this includes Student-t likelihood generalizations by [49] and [48], introducing additional scale parameters and marginalization by [35] and [55], modelling the broadband PSD with a cubic spline and spectral lines with Cauchy random variables by [34], and the use of a Bernstein polynomial prior by [21].

We found that due to the undesirable properties of the Bernstein polynomial prior, it was not flexible enough to estimate sharp peaks in the spectral density of real LIGO noise. This, coupled with the fact that B-splines have local support, provided the rationale for implementing the B-spline prior instead.

In the following example, using the parallel tempered B-spline prior algorithm, we estimate the PSD of a 1 s stretch of real LIGO data collected during the sixth science run (S6), recoloured to match the target noise sensitivity of Advanced LIGO ([15]). LIGO has a sampling rate of 16384 Hz. To reduce the volume of data processed, a low-pass Butterworth filter (of order 10 and attenuation 0.25) is applied, then the data are downsampled to 4096 Hz. Prior to downsampling, the data are differenced once to become stationary, mean-centred, and then Hann windowed to mitigate spectral leakage. Though a 1 s stretch may seem small in the context of GW data analysis, this time scale is important for on-source characterization of noise during short-duration transient events, called bursts ([2]). This is particularly true since LIGO noise has a time-varying spectrum, and systematic biases could occur if off-source noise was used to estimate the power spectrum of on-source noise.

We run 16 parallel chains (each at different temperatures) of the MCMC algorithm for iterations, with a burn-in of and thinning factor of 5. We propose swaps (of all parameters blocked together) between adjacent chains on every tenth iteration. For each chain , we found the following inverse-temperature scheme gave reasonable results:

where is the minimum inverse-temperature allowed, , and is the number of chains. The stick-breaking truncation parameter is set to and all of the other prior specifications are exactly the same as used in the AR simulation study of the previous section.

Figure 5: Estimated \mathrm{log}_{10} PSD for a 1 s segment of recoloured LIGO S6 data. The posterior median log-PSD (dashed black) along with the pointwise credible region (shaded blue) are overlaid with the log-periodogram (grey). The log transform is base 10 here.
Figure 5: Estimated PSD for a 1 s segment of recoloured LIGO S6 data. The posterior median log-PSD (dashed black) along with the pointwise credible region (shaded blue) are overlaid with the log-periodogram (grey). The log transform is base 10 here.

As demonstrated in the previous section (e.g., Figure 4), the Bernstein polynomial approach would have struggled to estimate the abrupt changes of power present in real detector data. It can be seen in Figure 5 though, that the B-spline mixture approach estimates the log-spectral density very well. The estimated log-PSD follows the general broad-band shape of the log-periodogram well, and the primary sharp changes in power are also accurately estimated. The method, however, seems to be less sensitive to some of the smaller spikes.

6Conclusions and Outlook

In this paper, we have presented a novel approach to spectral density estimation, using a nonparametric B-spline prior with variable number and location of knots. We have shown that for complicated PSDs, this method outperforms the Bernstein polynomial prior in terms of IAE and uniform coverage probabilities, and provides superior Monte Carlo estimates, particularly for spectral densities with sharp and abrupt changes in power. This is not surprising as B-splines have local support and better approximation properties than Bernstein polynomials. However, the favourable estimation qualities of the B-spline prior come at the expense of increased computation time.

The posterior distribution of the B-spline mixture parameters with variable number and location of knots could be sampled using the RJMCMC algorithm of [27], however RJMCMC methods are often fraught with implementation difficulties, such as finding efficient jump proposals when there are no natural choices for trans-dimensional jumps ([10]). We avoid this altogether by allowing for a data-driven choice of the smoothing parameter and knot locations using the nonparametric Dirichlet process prior. This yields a much more straightforward sampling method.

We have demonstrated that the B-spline prior provides a reasonable estimate of the spectral density of real instrument noise from the LIGO gravitational wave detectors. In a future paper, we will focus on characterizing this noise while simultaneously extracting a GW signal, similar to [21]. As the algorithm is computationally expensive, it will be well-suited towards the shorter burst-type signals (of order 1 s or less) like rotating core collapse supernovae. Using a large enough catalogue of waveforms, estimation of astrophysically meaningful parameters could be done by sampling from the posterior predictive distribution, similar to [20].

Though we have only presented the B-spline prior in terms of spectral density estimation, it could be used in a much broader context, such as in density estimation. A paper using this approach for density estimation is in preparation and could yield a more flexible, alternative approach to the Triangular- Dirichlet prior function TDPdensity in the R package DPpackage ([29]).

Acknowledgements

We thank Claudia Kirch, Alexander Meier, and Thomas Yee for fruitful discussions, and Michael Coughlin for providing us with the recoloured LIGO data. We also thank the New Zealand eScience Infrastructure (NeSI) for their high performance computing facilities, and the Centre for eResearch at the University of Auckland for their technical support. NC’s work is supported by National Science Foundation grant PHY-1505373. All analysis was conducted in R, an open-source statistical software available on CRAN (cran.r-project.org). We acknowledge the following R packages: Rcpp, Rmpi, bsplinePsd, beyondWhittle, splines, signal, bspec, ggplot2, grid and gridExtra. This paper carries LIGO document number LIGO-P1600239.

References

  1. Advanced LIGO.
    J. Aasi et al. Classical and Quantum Gravity
  2. All-sky search for gravitational-wave bursts in the second joint LIGO-Virgo run.
    J. Abadie et al. Physical Review D
  3. Observation of gravitational waves from a binary black hole merger.
    B. P. Abbott et al. Physical Review Letters
  4. GW151226: Observation of gravitational waves from a 22-solar-mass binary black hole coalescence.
    B. P. Abbott et al. Physical Review Letters
  5. Gw170104: Observation of a 50-solar-mass binary black hole coalescence at redshift 0.2.
    B. P. Abbott et al. Physical Review Letters
  6. Advanced Virgo: A second-generation interferometric gravitational wave detector.
    F. Acernese et al. Classical and Quantum Gravity
  7. Bayesian estimation of an autoregressive model using Markov chain Monte Carlo.
    G. Barnett, R. Kohn, and S. Sheather. Journal of Econometrics
  8. Periodogram analysis and continuous spectra.
    M. S. Bartlett. Biometrika
  9. Time Series: Theory and Methods


    P. J. Brockwell and R. A Davis. .
  10. Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions.
    S. P. Brooks, P. Giudici, and G. O. Roberts. Journal of the Royal Statistical Society: Series B (Methodological)
  11. Bayesian semiparametric modeling of survival data based on mixtures of B-spline distributions.
    B. Cai and R. Meyer. Computational Statistics and Data Analysis
  12. Semiparametric Bayesian inference for time series with mixed spectra.
    C. K. Carter and R. Kohn. Journal of the Royal Society, Series B
  13. Computational aspects of Bayesian spectral density estimation.
    N. Chopin, J. Rousseau, and B. Liseo. Journal of Computational and Graphical Statistics
  14. Bayesian estimation of the spectral density of a time series.
    N. Choudhuri, S. Ghosal, and A. Roy. Journal of the American Statistical Association
  15. LIGO S6 detector characterization studies.
    N. Christensen. Classical and Quantum Gravity
  16. Periodic splines and spectral estimation.
    R. Cogburn and H. R. Davis. The Annals of Statistics
  17. Posterior simulation across nonparametric models for functional clustering.
    J. L. Crandell and D. B. Dunson. Sankhya B
  18. B(asic)-spline basics, in: L. Piegl (Ed.), Fundamental developments of computer-aided geometric modeling


    C. de Boor. .
  19. Parallel tempering: Theory, applications, and new perspectives.
    D. J. Earl and M. W. Deem. Physical Chemistry Chemical Physics
  20. Bayesian parameter estimation of core collapse supernovae using gravitational wave signals.
    M. C. Edwards, R. Meyer, and N. Christensen. Inverse Problems
  21. Bayesian semiparametric power spectral density estimation with applications in gravitational wave data analysis.
    M. C. Edwards, R. Meyer, and N. Christensen. Physical Review D
  22. bsplinePsd: Bayesian power spectral density estimation using B-spline priors.
    M. C. Edwards, R. Meyer, and N. Christensen. R package
  23. Approximative integration of the field equations of gravitation.
    A. Einstein. Sitzungsberichte Preußischen Akademie der Wissenschaften
  24. Estimation of the spectral density of a stationary time series via an asymptotic representation of the periodogram.
    A. K. Gangopadhyay, B. K. Mallick, and D. G. T. Denison. Jounal of Statistical Planning and Inference
  25. Bayesian Data Analysis


    A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. .
  26. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.
    S. Geman and D. Geman. IEEE Trans. Patt. Anal. Mach. Intell.
  27. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.
    P. J. Green. Biometrika
  28. Monte Carlo sampling methods using Markov chains and their applications.
    W. K. Hastings. Biometrika
  29. DPpackage: Bayesian semi- and nonparametric modeling in R.
    A. Jara, T. E. Hanson, F. A. Quintana, P. Müller, and G. L Rosner. Journal of Statistical Software
  30. Beyond Whittle: Nonparametric correction of a parametric likelihood with a focus on Bayesian time series analysis.
    C. Kirch, M. C. Edwards, A. Meier, and R. Meyer. Pre-print
  31. Rate of convergence for logspline spectral density estimation.
    C. Kooperberg, C. J. Stone, and Y. K. Truong. Journal of Time Series Analysis
  32. Bootstrap prediction and confidence bands: A superior statistical method for the analysis of gait data.
    M. W. Lenhoff, T. J. Santner, J. C. Otis, M. G. Peterson, B. J. Williams, and S. I. Backus. Gait and Posture
  33. Bayesian semiparametric inference on long-range dependence.
    B. Liseo, D. Marinucci, and L. Petrella. Biometrika
  34. Bayesian inference for spectral estimation of gravitational wave detector noise.
    T. B. Littenberg and N. J. Cornish. Physical Review D
  35. Fortifying the characterization of binary mergers in LIGO data.
    T. B. Littenberg, M. Coughlin, B. Farr, and Farr W. M. Physical Review D
  36. Bayesian non-parametric signal extraction for Gaussian time series.
    C. Macaro. Journal of Econometrics
  37. Spectral decompositions of multiple time series: A Bayesian non-parametric approach.
    C. Macaro and R. Prado. Psychometrika
  38. beyondWhittle: Bayesian spectral inference for stationary time series.
    A. Meier, C. Kirch, M. C. Edwards, and R. Meyer. R package
  39. Equation of state calculations by fast computing machines.
    N. Metropolis, A. W. Rosenbluth, MN Rosenbluth, A. H. Teller, and Teller E. J. Chem. Phys.
  40. Regression-type inference in nonparametric regression.
    M. H. Neumann and J-P Kreiss. The Annals of Statistics
  41. Simultaneous bootstrap confidence bands in nonparametric regression.
    M. H. Neumann and J. Polzehl. Journal of Nonparametric Statistics
  42. Bayesian nonparametric modeling using mixtures of triangular distributions.
    F. Perron and K. Mengersen. Biometrics
  43. Random Bernstein polynomials.
    S. Petrone. Scandanavian Journal of Statistics
  44. Bayesian density estimation using Bernstein polynomials.
    S. Petrone. Canadian Journal of Statistics
  45. Approximation theory and methods


    M. J. D. Powell. .
  46. AdaptSpec: Adaptive spectral density estimation for nonstationary time series.
    O. Rosen, S. Wood, and A. Roy. Journal of the American Statistical Association
  47. Bayesian nonparametric estimation of the spectral density of a long or intermediate memory Gaussian time series.
    J. Rousseau, N. Chopin, and B. Liseo. The Annals of Statistics
  48. Student-t based filter for robust signal detection.
    C. Röver. Physical Review D
  49. Modelling coloured residual noise in gravitational-wave signal processing.
    C. Röver, R. Meyer, and N. Christensen. Classical and Quantum Gravity
  50. A constructive definition of Dirichlet priors.
    J. Sethuraman. Statistica Sinica
  51. Confidence bands for linear regression and smoothing.
    J. Sun and C. R. Loader. The Annals of Statistics
  52. Replica Monte Carlo simulation of spin glasses.
    R. H. Swendsen and J. S. Wang. Physical Review Letters
  53. Spectrum estimation and harmonic analysis.
    D. J. Thomson. Proceedings of the IEEE
  54. Random field priors for spectral density functions.
    S. F. Tonellato. Journal of Statistical Planning and Inference
  55. Data series subtraction with unknown and unmodeled background noise.
    S. Vitale, G. Congedo, R. Dolesi, V. Ferroni, M. Hueller, D. Vetrugno, W. J. Weber, H. Audley, K. Danzmann, I. Diepholz, M. Hewitson, N. Korsakova, L. Ferraioli, F. Gibert, N. Karnesis, M. Nofrarias, H. Inchauspe, E. Plagnol, O. Jennrich, P. W. McNamara, M. Armano, J. I. Thorpe, and P. Wass. Physical Review D
  56. Automatic smoothing of the log periodogram.
    G. Wahba. Journal of the American Statistical Association
  57. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms.
    P. D. Welch. IEEE Transactions on Audio and Electroacoustics
  58. Curve and periodogram smoothing.
    P. Whittle. Journal of the Royal Statistical Society: Series B (Methodological)
  59. Nonparametric Bayesian inference for the spectral density function of a random field.
    Y. Zheng, J. Zhu, and A. Roy. Biometrika
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minumum 40 characters
   
Add comment
Cancel
Loading ...
11267
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description