Bayesian nonparametric spectral density estimation using Bspline priors
Abstract
We present a new Bayesian nonparametric approach to estimating the spectral density of a stationary time series. A nonparametric prior based on a mixture of Bspline distributions is specified and can be regarded as a generalization of the Bernstein polynomial prior of Petrone (1999a, b) and Choudhuri et al. (2004). Whittle’s likelihood approximation is used to obtain the pseudoposterior distribution. This method allows for a datadriven choice of the smoothing parameter as well as the number and the location of the knots. Posterior samples are obtained using a parallel tempered MetropoliswithinGibbs Markov chain Monte Carlo algorithm. We conduct a simulation study to demonstrate that for complicated spectral densities, the Bspline prior provides more accurate Monte Carlo estimates in terms of error and uniform coverage probabilities than the Bernstein polynomial prior. Finally, we demonstrate the algorithm’s ability to estimate a spectral density with sharp features, using real gravitational wave detector data from LIGO’s sixth science run.
Keywords:
Bspline prior Bernstein polynomial prior Whittle likelihood Spectral density estimation LIGO Gravitational waves1 Introduction
Useful information about a stationary time series is encoded in its spectral density, sometimes called the power spectral density (PSD). This quantity describes the variance (or power) each individual frequency component contributes to the overall variance of the time series, and forms a Fourier transform pair with the autocovariance function. More formally, assuming an absolutely summable autocovariance function (), the spectral density function of a zeromean weakly stationary time series exists, is continuous and bounded, and is defined as:
(1) 
where is angular frequency.
Spectral density estimation methods can be broadly classified into two groups: parametric and nonparametric. Parametric approaches to spectral density estimation are primarily based on autoregressive moving average (ARMA) models (Brockwell and Davis (1991); Barnett et al. (1996)), but they tend to give misleading inferences when the parametric model is poorly specified.
A large number of nonparametric estimation techniques are based on smoothing the periodogram, a process that randomly fluctuates around the true PSD. The periodogram, , is easily and efficiently computed as the (normalized) squared modulus of Fourier coefficients using the fast Fourier transform. That is,
(2) 
where is angular frequency, and is a stationary time series with discrete time points, .
Though the periodogram is an asymptotically unbiased estimator of the spectral density, it is not a consistent estimator (Brockwell and Davis (1991)). Smoothing techniques such as Bartlett’s method (Bartlett (1950)), Welch’s method (Welch (1967)), and the multitaper method (Thomson (1982)) aim to reduce the variance of the periodogram by dividing a time series into (potentially overlapping) segments, calculating the periodogram for each segment, and averaging over all of these. Unfortunately, these techniques are sensitive to the choice of smoothing parameter (i.e., the number of segments), resulting in a variance/bias tradeoff. Reducing the length of each segment also leads to lower frequency resolution.
Another common nonparametric approach to spectral estimation involves the use of splines. Smoothing spline techniques are not new to spectral estimation (see e.g., Cogburn and Davis (1974) for an early reference). Wahba (1980) used splines to smooth the logperiodogram, with an automatic datadriven smoothing parameter, avoiding the difficult problem of having to choose this quantity. Kooperberg et al. (1995) used maximum likelihood and polynomial splines to approximate the logspectral density function.
Bayesian nonparametric approaches to spectrum estimation have gained momentum in recent times. In the context of splines, Gangopadhyay et al. (1999) used a fixed loworder piecewise polynomial to estimate the logspectral density of a stationary time series. They implemented a reversible jump Markov chain Monte Carlo (RJMCMC) algorithm of Green (1995), placing priors on the number of knots and their locations, with the goal of estimating spectral densities with sharp features. Choudhuri et al. (2004) placed a Bernstein polynomial prior (Petrone (1999a, b)) on the spectral density. The Bernstein polynomial prior is essentially a finite mixture of beta densities with weights induced by a Dirichlet process. The number of mixtures is a smoothing parameter, chosen to have a discrete prior. Zheng et al. (2010) generalized this and constructed a multidimensional Bernstein polynomial prior to estimate the spectral density function of a random field. Also extending the work of Choudhuri et al. (2004), Macaro (2010) used informative priors to extract unobserved spectral components in a time series, and Macaro and Prado (2014) generalized this to multiple time series.
Other interesting Bayesian nonparametric approaches include Carter and Kohn (1997) inducing a prior on the logspectral density using an integrated Wiener process, and Tonellato (2007) placing a Gaussian random field prior on the logspectral density. Liseo et al. (2001), Rousseau et al. (2012), and Chopin et al. (2013) used Bayesian nonparametric methods to estimate spectral densities from long memory time series, and Rosen et al. (2012) focused on timevarying spectra in nonstationary time series.
The majority of the Bayesian nonparametric methods (for short memory time series) mentioned here make use of Whittle’s approximation to the Gaussian likelihood, often called the Whittle likelihood (Whittle (1957)). The Whittle likelihood, , for a meancentered weakly stationary time series of length and spectral density has the following formulation:
(3) 
where are the positive Fourier frequencies, is the greatest integer value less than or equal to , and is the periodogram defined in Equation (2).
The motivation for the work presented in this paper is to apply it in signal searches for gravitational waves (GWs) using data from Advanced LIGO (Aasi et al. (2015)) and Advanced Virgo (Acernese et al. (2015)). These interferometric GW detectors have timevarying spectra, and it will be important in future signal searches to be able to estimate the parameters describing the noise simultaneously with the parameters of a detected gravitational wave signal. In a previous study (Edwards et al. (2015)), we utilized the methodology of Choudhuri et al. (2004) to estimate the spectral density of simulated Advanced LIGO (Aasi et al. (2015)) noise, while simultaneously estimating the parameters of a rotating stellar core collapse GW signal. This method, based on the Bernstein polynomial prior, worked extremely well on simulated data, but we found that it was not wellequipped to detect the sharp and abrupt changes in an otherwise smooth spectral density present in real LIGO noise (Christensen (2010); Littenberg and Cornish (2015)). Under default noninformative priors, the method tended to oversmooth the spectral density. As detailed in Section 2.2, this unsatisfactory performance is only partly due to the wellknown slow convergence of order , where is the degree of the Bernstein polynomials (Perron and Mengersen (2001)), but mainly due to a lack of coverage of the space of spectral distributions by Bernstein polynomials. This can be overcome, though, by using Bsplines with variable knots instead of Bernstein polynomials, yielding a much improved approximation of order of in the number of knots and adequate coverage of the space of spectral distributions.
The focus of this paper is to describe a new Bayesian nonparametric approach to modelling the spectral density of a stationary time series. Similar to Gangopadhyay et al. (1999), our goal is to estimate spectral density functions with sharp peaks. Here we present an alternative nonparametric prior using a mixture of Bspline densities, which we will call the Bspline prior.
Following Choudhuri et al. (2004), we induce the weights for each of the Bspline mixture densities using a Dirichlet process prior. Furthermore, in order to allow flexible, datadriven knot placements, a second (independent) Dirichlet process prior is put on the knot differences which, in turn, determines the shape and location of the Bspline densities and hence the structure of the spectral density. Crandell and Dunson (2011) applied a similar approach in the context of functional data analysis.
A noninformative prior on the number of knots allows for a datadriven choice of the smoothing parameter. The Bspline prior could naturally be interpreted as a generalization of the Bernstein polynomial prior, as Bernstein polynomials are indeed a special case of Bsplines where there are no internal knots.
Bsplines have the useful property of local support, where they are only nonzero between their end knots. We will demonstrate that if knots are close enough together, then the property of local support will generally allow us to model sharp and abrupt changes to a spectral density.
Samples from the pseudoposterior distribution are obtained by updating the Bspline prior with the Whittle likelihood (Whittle (1957)). This is implemented as a MetropoliswithinGibbs Markov chain Monte Carlo (MCMC) sampler (Metropolis et al. (1953); Hastings (1970); Geman and Geman (1984); Gelman et al. (2013)). To improve mixing and convergence, we use a parallel tempering scheme (Swendsen and Wang (1986); Earl and Deem (2005)).
We will demonstrate that the Bspline prior is more flexible than the Bernstein polynomial prior and can better approximate sharp peaks in the spectral density. We will show that for complicated PSDs under noninformative priors, the Bspline prior gives sensible Monte Carlo estimates and outperforms the Bernstein polynomial prior in terms of integrated absolute error (IAE) and frequentist uniform coverage probabilities. Furthermore, the placement of these knots is based on the nonparametric Dirichlet process prior, meaning transdimensional methods such as RJMCMC (Green (1995)) can be avoided. This is useful as RJMCMC is often fraught with implementation difficulties, such as finding an efficient jump proposal when there are indeed no natural choices for transdimensional jumps (Brooks et al. (2003)).
The paper is organized as follows. Section 2 sets the notation and defines Bsplines and Bspline densities. After briefly reviewing the Bernstein polynomial prior, we explain the rationale for the Bspline prior before extending it to a prior for the spectral density of a stationary time series. We discuss the MCMC implementation in Section 3. Section 4 details the results of the simulation study, and in Section 5, we apply the method to real gravitational wave detector data. Concluding remarks are given in Section 6.
2 The Bspline prior
In this section, the Bspline prior for spectral densities of stationary time series will be defined. To this end, we first set the notation and define Bsplines and Bspline densities. We review the Bernstein polynomial prior and extend this approach to the Bspline prior with variable knots.
2.1 Bsplines and Bspline densities
A spline function of order is a piecewise polynomial of degree with socalled knots where the piecewise polynomials connect. A spline is continuous at the knots (or continuously differentiable to a certain order depending on the multiplicity of the knots). The number of internal knots must be . Any spline function of order defined on a certain partition can be uniquely represented as a linear combination of basis splines, Bsplines, of the same order over the same partition (Powell (1981); Cai and Meyer (2011)). Bsplines can be parametrized either recursively (de Boor (1993)), or by using divided differences and truncated power functions (Powell (1981); Cai and Meyer (2011)). We will adopt the former convention. Without loss of generality, assume the global domain of interest is the unit interval .
For a set of Bsplines of degree for some integer , define a nondecreasing knot sequence
of knots, comprised of internal knots and external knots. The external knots outside or on the boundary of (i.e., and ) are required for Bsplines to constitute a basis of spline functions on . Here we assume that the external knots are all exactly on the boundary. The knot sequence yields a partition of the interval into subsets.
For , each individual Bspline of degree , , depends on only through the consecutive knots . The number of internal knots is equal to the degree of the Bspline if there are no knot multiplicities. There can be a maximum of coincident knots for (right) continuity. These knots determine the shape and location of each Bspline.
A Bspline with degree 0 is the following indicator function
(4) 
Note that if , then .
Higher degree Bsplines can then be defined recursively using
(5) 
where is the degree and
(6) 
Bspline densities are the usual Bspline basis functions, normalized so they each integrate to 1 (Cai and Meyer (2011)). The recursive Bspline parametrization used in this paper allows us to easily analytically integrate each Bspline, which we then use as normalization constant for the Bspline density defined as
(7) 
2.2 Bernstein polynomial prior and Bspline prior
The Bernstein polynomial prior of Petrone (1999a, b) and Choudhuri et al. (2004) is based on the Weierstrass approximation theorem that states that any continuous function on can be uniformly approximated to any desired accuracy by Bernstein polynomials. Let denote a cumulative distribution function (cdf) with continuous density on , then the following mixture
converges uniformly to , where and and denote the cdf and density of the beta distribution with parameters and , respectively.
Define and . As shown by Perron and Mengersen (2001), the loss associated with the approximation of by the dimensional space with respect to loss function defined by
where , can not be made arbitrarily small. Thus the mixture of beta cdfs does not provide an adequate coverage of the space of cdfs on . However, Perron and Mengersen (2001) showed that if one replaces the beta distributions by Bspline distributions of fixed order (shown for order 2, i.e., triangular distributions) but with variable knots, the loss can be made arbitrarily small by increasing the number of knots. This is the rationale for using a mixture of Bspline distributions with variable knots in the following specification of a sieve prior.
The Bspline prior has the following representation as a mixture of Bspline densities:
(8) 
where is the number of Bsplines of fixed degree in the mixture, the weight vector, and the knot sequence. Rather than putting a prior on the ’s whose dimension changes with , we follow the approach of Choudhuri et al. (2004) and assume that the weights are induced by a cdf on . Similarly, we assume that the internal knot differences for are induced by a cdf on . Or equivalently, for , yielding the Bspline prior parametrized in terms of and :
(9) 
Independent Dirichlet process priors are then put on and and a discrete prior on the number of mixture components .
The Bspline prior is similar in nature to the Bernstein polynomial prior introduced by Petrone (1999a, b) and applied to spectral density estimation by Choudhuri et al. (2004). The primary difference is that the Bspline prior is a mixture of Bspline densities with local support rather than beta densities with full support on the unit interval as shown in Figure 1. Moreover, this local support of the Bsplines is determined by the data.
When there are no internal knots, the Bspline basis becomes a Bernstein polynomial basis. Bernstein polynomials are thus a special case of Bsplines, and the Bspline prior could be regarded as a generalization of the Bernstein polynomial prior.
Figure 2 demonstrates that it is possible to construct curves (Bspline mixtures) with sharp peaks if knots are close enough together. The top panel shows a set of Bspline density functions and the bottom panel shows a mixture of these with random weights. The local support property of Bsplines is the reason the Bspline prior will be instrumental in estimating a spectral density with sharp peaks.
2.3 Prior for the spectral density
To place a prior on the spectral density of a stationary time series defined on the interval , first transform to the interval :
(10) 
where is the normalization constant, and is the Bspline prior defined in Equation (9). The prior for therefore has the following hierarchical structure:

determines the weights (i.e., scale) for each of the Bspline densities. Let , where is the precision parameter and is the base probability distribution function with density .

determines the location of knots and hence the shape and location of the Bspline densities. Let , where is the precision parameter and is the base probability distribution function with density .

is the number of Bsplines in the mixture (i.e., smoothness) and has discrete probability mass function for . Here is the largest possible value we allow to take. We limit the maximum value of for computational reasons and do pilot runs to ensure a larger is not required. A smaller implies smoother spectral densities.

is the normalizing constant needed to reparameterize the spectral density from to a density on the unit interval. Let .
Assume all of these parameters are a priori independent.
3 Implementation using Markov chain Monte Carlo
As Dirichlet process priors have been placed on and , we require an algorithm to sample from these distributions. To sample from a Dirichlet process, we use Sethuraman’s stickbreaking construction (Sethuraman (1994)), an infinitedimensional mixture model. For computational purposes, the number of mixture distributions for Dirichlet process representations of and is truncated to large but finite positive integers ( and respectively). Let for simplicity. A larger choice of will yield more accurate approximations, but at the expense of increasing the computation time.
To set up the stickbreaking process, reparameterize to such that
(11)  
(12)  
(13)  
(14)  
(15)  
(16) 
and to such that
(17)  
(18)  
(19)  
(20)  
(21)  
(22) 
where is a probability density, degenerate at , i.e., at and 0 otherwise.
Conditional on , the above hierarchical structure provides a finite mixture prior for the spectral density of a stationary time series
(23) 
with weights
(24) 
and knot differences
(25)  
(26) 
for and . The denominator in the latter comes from assuming the exterior knots are the same as the boundary knots. Note also that we assume the lower internal boundary knot , meaning the first knot difference is . The subsequent knot placements are determined by taking the cumulative sum of the knot differences.
Abbreviating the vector of parameters as , the joint prior is
To produce the unnormalized joint posterior, this joint prior is updated using the Whittle likelihood defined in Equation (3).
We implement a MetropoliswithinGibbs algorithm to sample points from the pseudoposterior, using the same modular symmetric proposal distributions for Bspline weight parameters and as described by Choudhuri et al. (2004). That is, say for , propose a candidate from a uniform distribution with , modulo the circular unit interval. If the candidate is greater than 1, take the decimal part only, and if the candidate is less than 0, add 1 to put it back into [0, 1]. This is done for all of the and parameters. Choudhuri et al. (2004) found that worked well for most cases, and we also adopt this. The same approach is used analogously for the Bspline knot location parameters and . Parameter has a conjugate inversegamma prior and may be sampled directly. Smoothing parameter could be sampled directly from its discrete full conditional (as done by Choudhuri et al. (2004)), though this can be computationally expensive for large , so we use a Metropolis proposal centered on the previous value of , such that there is a 75% chance of jumping according to a discrete uniform on , and a 25% chance of boldly jumping according to a discretized Cauchy random variable.
There is a common tendency towards multimodal posteriors in finite/infinite mixture models. If there are many isolated modes separated by low posterior density, it is important to use a sampling technique that mixes Markov chains efficiently, rather than relying on the random walk behaviour of the Metropolis sampler. In order to mitigate poor mixing and convergence of Markov chains, we use parallel tempering, or replica exchange (Swendsen and Wang (1986); Earl and Deem (2005)) based on an idea borrowed from physical chemistry.
Parallel tempering involves introducing an auxiliary variable called inversetemperature, denoted for chains . This variable becomes an exponent in the target distribution for each parallel chain, . That is, , where are the model parameters, and is the time series data vector. If , this is the posterior distribution of interest. All other inversetemperature values produce tempered target distributions. Each chain moves on its own in parallel and occasionally swaps states between chains according to a Metropolis acceptreject rule.
We use cubic Bsplines () for all of the examples in the following sections. The serial version of the (cubic) Bspline prior algorithm is available as a function called gibbs_bspline in the R package bsplinePsd (Edwards et al., 2017). This is available on CRAN. The parallel tempered version is implemented in R using the Rmpi library but is not publicly available. Please contact the first author for access to this code.
4 Simulation study
In this section, we run a simulation study on two autoregressive (AR) time series of different order: AR(1) and AR(4). For the first scenario, an AR(1) with first order autocorrelation (a relatively simple spectral density) is generated. In the second scenario, an AR(4) with parameters , and is generated, such that the spectral density has two large peaks. Let each time series have lengths with unit variance Gaussian innovations.
We simulate 1,000 different realisations of AR(1) and AR(4) data and model the spectral densities by running the Bernstein polynomial prior algorithm of Choudhuri et al. (2004) and our Bspline prior algorithm on each of these. The MCMC algorithms (without parallel tempering as mixing is satisfactory for these toy examples) run for 400,000 iterations, with a burnin period of 200,000 and thinning factor of 10.
For both spectral density estimation methods, we choose default noninformative priors. That is, for the Bspline prior, let . For comparability, we let the Bernstein polynomial prior have exactly the same prior setup as the Bspline prior, but obviously without knot location parameter and distribution .
We set for both algorithms. This may seem unnecessarily large for the Bspline prior algorithm as these simple cases converge to a low . However, it is large enough to ensure the Bernstein polynomial algorithm converges to an appropriate , without being truncated at .
Based on the suggestions by Choudhuri et al. (2004), we set the stickbreaking truncation parameter to . This provides a reasonable balance between accuracy and computational efficiency.
The (cubic) Bspline prior algorithm is run using the gibbs_bspline function in the R package bsplinePsd (Edwards et al., 2017). The Bernstein polynomial prior algorithm is run using the gibbs_NP function in the R package beyondWhittle (Kirch et al., 2017; Meier et al., 2017). Both packages are available on CRAN.
An AR() model has theoretical spectral density,
(27) 
where is the variance of the white noise innovations and are the model parameters. We can compare estimates to this true spectral density to measure relative performance of the nonparametric priors. One measure of closeness and accuracy is the integrated absolute error (IAE), also known as the error. This is defined as:
(28) 
where is the Monte Carlo estimate (posterior median) of the spectral density . We calculate the IAE for each replication and then compare the average IAE over all 1,000 replications. The results are presented in Table 1.
AR(1)  

Bspline  0.901  0.756  0.592 
Bernstein  0.830  0.706  0.518 
AR(4)  
Bspline  3.242  2.371  1.886 
Bernstein  3.427  2.920  2.656 
Table 1 compares the median IAE of the estimated spectral densities under the two different nonparametric priors. For the AR(1) cases, the median IAE is only marginally higher for the Bspline prior than the Bernstein polynomial prior. As the AR(1) has a simple spectral structure, this is a case where the global support of the Bernstein polynomials makes sense. However, when estimating the more complicated AR(4) spectral density, we see that the Bspline prior yields more accurate estimates than the Bernstein polynomial prior in terms of IAE. We also see that for both priors, as increases, median IAE decreases.
For each simulation, we calculate two different credible regions: the usual equaltailed pointwise credible region, and the uniform (or simultaneous) credible band (Neumann and Polzehl (1998); Neumann and Kreiss (1998); Lenhoff et al. (1999); Sun and Loader (1994)). Uniform credible bands are very useful as they allow the calculation of coverage levels for entire curves (spectral densities in this case) rather than pointwise intervals. To compute a % uniform credible band, we use the following form:
(29) 
where is the pointwise posterior median spectral density, is the median absolute deviation of the posterior samples kept after burnin and thinning (which are used as the estimate of dispersion of the sampling distribution of ), and we choose the such that
(30) 
Based on these uniform credible bands, uniform coverage probabilities over all 1,000 replications of the simulation can be computed. That is, calculate the proportion of times that the true spectral density is entirely encapsulated within the uniform credible band. Computed coverage probabilities are shown in Table 2.
AR(1)  

Bspline  1.000  1.000  0.998 
Bernstein  1.000  0.987  0.499 
AR(4)  
Bspline  0.936  0.979  0.907 
Bernstein  0.000  0.000  0.000 
It can be seen in Table 2 that the Bspline prior has higher coverage than the Bernstein polynomial prior in all examples (apart from the AR(1) with , where it is the same). The Bspline prior produces excellent coverage probabilities for the AR(1) cases. The Bernstein polynomial prior also performs well in this regard, apart from the case, where half are not fully covered. An example from one of the 1,000 replications of the AR(1) with is given in Figure 3. Here, the uniform credible band fully contains the true PSD for the Bspline prior but not for the Bernstein polynomial prior (the true PSD falls outside the uniform credible band at the highest frequencies). The pointwise credible region and posterior median logPSD for both priors are also very accurate. This is not surprising as the AR(1) has a relatively simple spectral structure.
Coverage of the AR(4) spectral density under the Bspline prior is above 90% for each sample size. However, the Bernstein polynomial prior has extremely poor coverage in the AR(4) case, where none of the 1,000 replications are fully covered by the uniform credible band for each sample size. An example of this performance (for ) can be seen in Figure 4. The Bernstein polynomial prior (under the noninformative prior setup) tends to oversmooth the second large peak of the PSD, and introduces additional incorrect peaks and troughs throughout the rest of estimate. These false peaks and troughs are present due to the Bernstein polynomial prior algorithm converging to large in an attempt to approximate the two large peaks of the AR(4) PSD. The Bspline prior gives a much more accurate Monte Carlo estimate. The posterior median logPSD is close to the true AR(4) PSD, the 90% pointwise credible region mostly contains the true PSD, and the 90% uniform credible band fully contains it.
Of course, the Bernstein polynomial prior could perform better on spectral densities with sharp features if significant prior knowledge was known in advance. This can however be a formidable task, and is not very generalizable to other time series. A benefit of using the Bspline prior is its ability to estimate a variety of different spectral densities using the default noninformative priors used in this paper. We will see another example of this in the next section.
One slight drawback of the Bspline prior algorithm is its computational complexity relative to the Bernstein polynomial prior. Bspline densities must be evaluated many times per MCMC iteration (when sampling , and ) due to the variable knot placements, whereas beta densities can be precomputed and stored in memory, saving much computation time.
Table 3 shows the median runtime (over each 1,000 replication) for each of the six AR simulations.
AR(1)  

Bspline  2.967  3.186  3.659 
Bernstein  1.423  1.572  1.844 
Bspline/Bernstein  2.086  2.026  1.985 
AR(4)  
Bspline  4.044  4.422  5.174 
Bernstein  1.443  1.694  2.281 
Bspline/Bernstein  2.802  2.610  2.268 
It can be seen in Table 3 that the Bspline prior algorithm is approximately 2–3 times slower than the Bernstein polynomial prior algorithm for these examples. Due to the noted advantages (accuracy and coverage) that the Bspline prior has over the Bernstein polynomial prior, particularly for PSDs with complicated structures, we believe the increased computation time is an acceptable tradeoff, though for simple spectral densities, the Bernstein polynomial prior should suffice.
5 Application to recoloured LIGO S6 data
Gravitational waves (GWs) are ripples in the fabric of spacetime, caused by the most exotic and cataclysmic astrophysical events in the cosmos, such as binary black hole or neutron star mergers, core collapse supernovae, pulsars, and even the Big Bang. They are a consequence of Einstein’s general theory of relativity (Einstein (1916)).
On September 14, 2015, the breakthrough first direct detection of GWs was made using the Advanced LIGO detectors (Abbott et al. (2016a)). The signal, GW150914, came from a pair of stellar mass black holes that coalesced approximately 1.3 billion lightyears away. This was also the first direct observation of black hole mergers. Two subsequent detections of pairs of stellar mass black holes, GW151226 and GW170104, were respectively made by the Advanced LIGO detectors on 26 December, 2015 (Abbott et al. (2016b)), and 4 January, 2017 (Abbott et al. (2017)), signalling a new era of astronomy is now upon us.
Advanced LIGO is a set of two GW interferometers in the United States (one in Hanford, Washington, and the other in Livingston, Louisiana) (Aasi et al. (2015)). Data collected by these observatories are dominated by instrument and background noise — primarily seismic, thermal, and photon shot noise. There are also high power, narrow band, spectral noise lines caused by the AC electrical supplies and mirror suspensions, among other phenomena. Though GW150914 had a large signaltonoise ratio, signals detected by these observatories will generally be relatively weak. Improving the characterization of detector/background noise could therefore positively impact signal characterization and detection confidence.
The default noise model assumes instrument noise is Gaussian, stationary, and has a known PSD. Real data often depart from these assumptions, motivating the development of alternative statistical models for detector noise. In the literature, this includes Studentt likelihood generalizations by Röver et al. (2011) and Röver (2011), introducing additional scale parameters and marginalization by Littenberg et al. (2013) and Vitale et al. (2014), modelling the broadband PSD with a cubic spline and spectral lines with Cauchy random variables by Littenberg and Cornish (2015), and the use of a Bernstein polynomial prior by Edwards et al. (2015).
We found that due to the undesirable properties of the Bernstein polynomial prior, it was not flexible enough to estimate sharp peaks in the spectral density of real LIGO noise. This, coupled with the fact that Bsplines have local support, provided the rationale for implementing the Bspline prior instead.
In the following example, using the parallel tempered Bspline prior algorithm, we estimate the PSD of a 1 s stretch of real LIGO data collected during the sixth science run (S6), recoloured to match the target noise sensitivity of Advanced LIGO (Christensen (2010)). LIGO has a sampling rate of 16384 Hz. To reduce the volume of data processed, a lowpass Butterworth filter (of order 10 and attenuation 0.25) is applied, then the data are downsampled to 4096 Hz. Prior to downsampling, the data are differenced once to become stationary, meancentred, and then Hann windowed to mitigate spectral leakage. Though a 1 s stretch may seem small in the context of GW data analysis, this time scale is important for onsource characterization of noise during shortduration transient events, called bursts (Abadie et al. (2012)). This is particularly true since LIGO noise has a timevarying spectrum, and systematic biases could occur if offsource noise was used to estimate the power spectrum of onsource noise.
We run 16 parallel chains (each at different temperatures) of the MCMC algorithm for iterations, with a burnin of and thinning factor of 5. We propose swaps (of all parameters blocked together) between adjacent chains on every tenth iteration. For each chain , we found the following inversetemperature scheme gave reasonable results:
(31) 
where is the minimum inversetemperature allowed, , and is the number of chains. The stickbreaking truncation parameter is set to and all of the other prior specifications are exactly the same as used in the AR simulation study of the previous section.
As demonstrated in the previous section (e.g., Figure 4), the Bernstein polynomial approach would have struggled to estimate the abrupt changes of power present in real detector data. It can be seen in Figure 5 though, that the Bspline mixture approach estimates the logspectral density very well. The estimated logPSD follows the general broadband shape of the logperiodogram well, and the primary sharp changes in power are also accurately estimated. The method, however, seems to be less sensitive to some of the smaller spikes.
6 Conclusions and Outlook
In this paper, we have presented a novel approach to spectral density estimation, using a nonparametric Bspline prior with variable number and location of knots. We have shown that for complicated PSDs, this method outperforms the Bernstein polynomial prior in terms of IAE and uniform coverage probabilities, and provides superior Monte Carlo estimates, particularly for spectral densities with sharp and abrupt changes in power. This is not surprising as Bsplines have local support and better approximation properties than Bernstein polynomials. However, the favourable estimation qualities of the Bspline prior come at the expense of increased computation time.
The posterior distribution of the Bspline mixture parameters with variable number and location of knots could be sampled using the RJMCMC algorithm of Green (1995), however RJMCMC methods are often fraught with implementation difficulties, such as finding efficient jump proposals when there are no natural choices for transdimensional jumps (Brooks et al. (2003)). We avoid this altogether by allowing for a datadriven choice of the smoothing parameter and knot locations using the nonparametric Dirichlet process prior. This yields a much more straightforward sampling method.
We have demonstrated that the Bspline prior provides a reasonable estimate of the spectral density of real instrument noise from the LIGO gravitational wave detectors. In a future paper, we will focus on characterizing this noise while simultaneously extracting a GW signal, similar to Edwards et al. (2015). As the algorithm is computationally expensive, it will be wellsuited towards the shorter bursttype signals (of order 1 s or less) like rotating core collapse supernovae. Using a large enough catalogue of waveforms, estimation of astrophysically meaningful parameters could be done by sampling from the posterior predictive distribution, similar to Edwards et al. (2014).
Though we have only presented the Bspline prior in terms of spectral density estimation, it could be used in a much broader context, such as in density estimation. A paper using this approach for density estimation is in preparation and could yield a more flexible, alternative approach to the Triangular Dirichlet prior function TDPdensity in the R package DPpackage (Jara et al. (2011)).
Acknowledgements
We thank Claudia Kirch, Alexander Meier, and Thomas Yee for fruitful discussions, and Michael Coughlin for providing us with the recoloured LIGO data. We also thank the New Zealand eScience Infrastructure (NeSI) for their high performance computing facilities, and the Centre for eResearch at the University of Auckland for their technical support. NC’s work is supported by National Science Foundation grant PHY1505373. All analysis was conducted in R, an opensource statistical software available on CRAN (cran.rproject.org). We acknowledge the following R packages: Rcpp, Rmpi, bsplinePsd, beyondWhittle, splines, signal, bspec, ggplot2, grid and gridExtra. This paper carries LIGO document number LIGOP1600239.
References
 Aasi et al. [2015] J. Aasi et al. Advanced LIGO. Classical and Quantum Gravity, 32:074001, 2015.
 Abadie et al. [2012] J. Abadie et al. Allsky search for gravitationalwave bursts in the second joint LIGOVirgo run. Physical Review D, 85:122007, 2012.
 Abbott et al. [2016a] B. P. Abbott et al. Observation of gravitational waves from a binary black hole merger. Physical Review Letters, 116:061102, 2016a.
 Abbott et al. [2016b] B. P. Abbott et al. GW151226: Observation of gravitational waves from a 22solarmass binary black hole coalescence. Physical Review Letters, 116:241103, 2016b.
 Abbott et al. [2017] B. P. Abbott et al. Gw170104: Observation of a 50solarmass binary black hole coalescence at redshift 0.2. Physical Review Letters, 118:221101, 2017.
 Acernese et al. [2015] F. Acernese et al. Advanced Virgo: A secondgeneration interferometric gravitational wave detector. Classical and Quantum Gravity, 32(2):024001, 2015.
 Barnett et al. [1996] G. Barnett, R. Kohn, and S. Sheather. Bayesian estimation of an autoregressive model using Markov chain Monte Carlo. Journal of Econometrics, 74:237–254, 1996.
 Bartlett [1950] M. S. Bartlett. Periodogram analysis and continuous spectra. Biometrika, 37:1–16, 1950.
 Brockwell and Davis [1991] P. J. Brockwell and R. A Davis. Time Series: Theory and Methods. Springer, New York, 2nd edition, 1991.
 Brooks et al. [2003] S. P. Brooks, P. Giudici, and G. O. Roberts. Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions. Journal of the Royal Statistical Society: Series B (Methodological), 65:3–55, 2003.
 Cai and Meyer [2011] B. Cai and R. Meyer. Bayesian semiparametric modeling of survival data based on mixtures of Bspline distributions. Computational Statistics and Data Analysis, 55:1260–1272, 2011.
 Carter and Kohn [1997] C. K. Carter and R. Kohn. Semiparametric Bayesian inference for time series with mixed spectra. Journal of the Royal Society, Series B, 59:255–268, 1997.
 Chopin et al. [2013] N. Chopin, J. Rousseau, and B. Liseo. Computational aspects of Bayesian spectral density estimation. Journal of Computational and Graphical Statistics, 22:533–557, 2013.
 Choudhuri et al. [2004] N. Choudhuri, S. Ghosal, and A. Roy. Bayesian estimation of the spectral density of a time series. Journal of the American Statistical Association, 99(468):1050–1059, 2004.
 Christensen [2010] N. Christensen. LIGO S6 detector characterization studies. Classical and Quantum Gravity, 27:194010, 2010.
 Cogburn and Davis [1974] R. Cogburn and H. R. Davis. Periodic splines and spectral estimation. The Annals of Statistics, 2:1108–1126, 1974.
 Crandell and Dunson [2011] J. L. Crandell and D. B. Dunson. Posterior simulation across nonparametric models for functional clustering. Sankhya B, 73:42–61, 2011.
 de Boor [1993] C. de Boor. B(asic)spline basics, in: L. Piegl (Ed.), Fundamental developments of computeraided geometric modeling. Academic Press, Washington, DC, 1993.
 Earl and Deem [2005] D. J. Earl and M. W. Deem. Parallel tempering: Theory, applications, and new perspectives. Physical Chemistry Chemical Physics, 7:3910–3916, 2005.
 Edwards et al. [2014] M. C. Edwards, R. Meyer, and N. Christensen. Bayesian parameter estimation of core collapse supernovae using gravitational wave signals. Inverse Problems, 30:114008, 2014.
 Edwards et al. [2015] M. C. Edwards, R. Meyer, and N. Christensen. Bayesian semiparametric power spectral density estimation with applications in gravitational wave data analysis. Physical Review D, 92:064011, 2015.
 Edwards et al. [2017] M. C. Edwards, R. Meyer, and N. Christensen. bsplinePsd: Bayesian power spectral density estimation using Bspline priors. R package, 2017.
 Einstein [1916] A. Einstein. Approximative integration of the field equations of gravitation. Sitzungsberichte Preußischen Akademie der Wissenschaften, 1916 (Part 1):688–696, 1916.
 Gangopadhyay et al. [1999] A. K. Gangopadhyay, B. K. Mallick, and D. G. T. Denison. Estimation of the spectral density of a stationary time series via an asymptotic representation of the periodogram. Jounal of Statistical Planning and Inference, 75:281–290, 1999.
 Gelman et al. [2013] A. Gelman, J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari, and D. B. Rubin. Bayesian Data Analysis. Chapman & Hall / CRC, Boca Raton, FL, 3rd edition, 2013.
 Geman and Geman [1984] S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Patt. Anal. Mach. Intell., 6:721–741, 1984.
 Green [1995] P. J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82:711–732, 1995.
 Hastings [1970] W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57:97–109, 1970.
 Jara et al. [2011] A. Jara, T. E. Hanson, F. A. Quintana, P. Müller, and G. L Rosner. DPpackage: Bayesian semi and nonparametric modeling in R. Journal of Statistical Software, 40:1–30, 2011.
 Kirch et al. [2017] C. Kirch, M. C. Edwards, A. Meier, and R. Meyer. Beyond Whittle: Nonparametric correction of a parametric likelihood with a focus on Bayesian time series analysis. Preprint, arXiv:1701.04846v1 [stat.ME], 2017.
 Kooperberg et al. [1995] C. Kooperberg, C. J. Stone, and Y. K. Truong. Rate of convergence for logspline spectral density estimation. Journal of Time Series Analysis, 16:389–401, 1995.
 Lenhoff et al. [1999] M. W. Lenhoff, T. J. Santner, J. C. Otis, M. G. Peterson, B. J. Williams, and S. I. Backus. Bootstrap prediction and confidence bands: A superior statistical method for the analysis of gait data. Gait and Posture, 9:10–17, 1999.
 Liseo et al. [2001] B. Liseo, D. Marinucci, and L. Petrella. Bayesian semiparametric inference on longrange dependence. Biometrika, 88:1089–1104, 2001.
 Littenberg and Cornish [2015] T. B. Littenberg and N. J. Cornish. Bayesian inference for spectral estimation of gravitational wave detector noise. Physical Review D, 91:084034, 2015.
 Littenberg et al. [2013] T. B. Littenberg, M. Coughlin, B. Farr, and Farr W. M. Fortifying the characterization of binary mergers in LIGO data. Physical Review D, 88:084044, 2013.
 Macaro [2010] C. Macaro. Bayesian nonparametric signal extraction for Gaussian time series. Journal of Econometrics, 157:381–395, 2010.
 Macaro and Prado [2014] C. Macaro and R. Prado. Spectral decompositions of multiple time series: A Bayesian nonparametric approach. Psychometrika, 79:105–129, 2014.
 Meier et al. [2017] A. Meier, C. Kirch, M. C. Edwards, and R. Meyer. beyondWhittle: Bayesian spectral inference for stationary time series. R package, 2017.
 Metropolis et al. [1953] N. Metropolis, A. W. Rosenbluth, MN Rosenbluth, A. H. Teller, and Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys., 21:1087–1092, 1953.
 Neumann and Kreiss [1998] M. H. Neumann and JP Kreiss. Regressiontype inference in nonparametric regression. The Annals of Statistics, 26:1570–1613, 1998.
 Neumann and Polzehl [1998] M. H. Neumann and J. Polzehl. Simultaneous bootstrap confidence bands in nonparametric regression. Journal of Nonparametric Statistics, 9:307–333, 1998.
 Perron and Mengersen [2001] F. Perron and K. Mengersen. Bayesian nonparametric modeling using mixtures of triangular distributions. Biometrics, 57:518–528, 2001.
 Petrone [1999a] S. Petrone. Random Bernstein polynomials. Scandanavian Journal of Statistics, 26:373–393, 1999a.
 Petrone [1999b] S. Petrone. Bayesian density estimation using Bernstein polynomials. Canadian Journal of Statistics, 27:105–126, 1999b.
 Powell [1981] M. J. D. Powell. Approximation theory and methods. Cambridge University Press, Cambridge, 1981.
 Rosen et al. [2012] O. Rosen, S. Wood, and A. Roy. AdaptSpec: Adaptive spectral density estimation for nonstationary time series. Journal of the American Statistical Association, 107:1575–1589, 2012.
 Rousseau et al. [2012] J. Rousseau, N. Chopin, and B. Liseo. Bayesian nonparametric estimation of the spectral density of a long or intermediate memory Gaussian time series. The Annals of Statistics, 40:964–995, 2012.
 Röver [2011] C. Röver. Studentt based filter for robust signal detection. Physical Review D, 84:122004, 2011.
 Röver et al. [2011] C. Röver, R. Meyer, and N. Christensen. Modelling coloured residual noise in gravitationalwave signal processing. Classical and Quantum Gravity, 28:015010, 2011.
 Sethuraman [1994] J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639–650, 1994.
 Sun and Loader [1994] J. Sun and C. R. Loader. Confidence bands for linear regression and smoothing. The Annals of Statistics, 22:1328–1345, 1994.
 Swendsen and Wang [1986] R. H. Swendsen and J. S. Wang. Replica Monte Carlo simulation of spin glasses. Physical Review Letters, 57:2607–2609, 1986.
 Thomson [1982] D. J. Thomson. Spectrum estimation and harmonic analysis. Proceedings of the IEEE, 70:1055–1096, 1982.
 Tonellato [2007] S. F. Tonellato. Random field priors for spectral density functions. Journal of Statistical Planning and Inference, 137:3164–3176, 2007.
 Vitale et al. [2014] S. Vitale, G. Congedo, R. Dolesi, V. Ferroni, M. Hueller, D. Vetrugno, W. J. Weber, H. Audley, K. Danzmann, I. Diepholz, M. Hewitson, N. Korsakova, L. Ferraioli, F. Gibert, N. Karnesis, M. Nofrarias, H. Inchauspe, E. Plagnol, O. Jennrich, P. W. McNamara, M. Armano, J. I. Thorpe, and P. Wass. Data series subtraction with unknown and unmodeled background noise. Physical Review D, 90:042003, 2014.
 Wahba [1980] G. Wahba. Automatic smoothing of the log periodogram. Journal of the American Statistical Association, 75:122–132, 1980.
 Welch [1967] P. D. Welch. The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms. IEEE Transactions on Audio and Electroacoustics, 15:70–73, 1967.
 Whittle [1957] P. Whittle. Curve and periodogram smoothing. Journal of the Royal Statistical Society: Series B (Methodological), 19:38–63, 1957.
 Zheng et al. [2010] Y. Zheng, J. Zhu, and A. Roy. Nonparametric Bayesian inference for the spectral density function of a random field. Biometrika, 97:238–245, 2010.