Fast and Accurate Sensitivity Estimation for Continuous-Gravitational-Wave Searches

Fast and Accurate Sensitivity Estimation for Continuous-Gravitational-Wave Searches

Abstract

This paper presents an efficient numerical sensitivity-estimation method and implementation for continuous-gravitational-wave searches, extending and generalizing an earlier analytic approach by Wette Wette (2012). This estimation framework applies to a broad class of -statistic-based search methods, namely (i) semi-coherent StackSlide -statistic (single-stage and hierarchical multi-stage), (ii) Hough number count on -statistics, as well as (iii) Bayesian upper limits on (coherent or semi-coherent) -statistic search results. We test this estimate against results from Monte-Carlo simulations assuming Gaussian noise. We find the agreement to be within a few at high (i.e. low false-alarm) detection thresholds, with increasing deviations at decreasing (i.e. higher false-alarm) detection thresholds, which can be understood in terms of the approximations used in the estimate. We also provide an extensive summary of sensitivity depths achieved in past continuous-gravitational-wave searches (derived from the published upper limits). For the -statistic-based searches where our sensitivity estimate is applicable, we find an average relative deviation to the published upper limits of less than , which in most cases includes systematic uncertainty about the noise-floor estimate used in the published upper limits.

I Introduction

The recent detections of gravitational waves from merging binary-black-hole and double neutron-star systems Abbott et al. (2016a, b, 2017a) have opened a whole new observational window for astronomy, allowing for new tests of general relativity Abbott et al. (2016c), new constraints on neutron star physics Abbott et al. (2018a) and new measurements of the Hubble constant Abbott et al. (2017b), to mention just a few highlights.

Continuous gravitational waves (CWs) from spinning non-axisymmetric neutron stars represent a different class of potentially-observable signals Prix (2009); Lasky (2015), which have yet to be detected Riles (2017). These signals are expected to be long-lasting (at least several days to years) and quasi monochromatic, with slowly changing (intrinsic) frequency. The signal amplitude depends on the rich (and largely not yet well-understood) internal physics of neutron stars Johnson-McDaniel and Owen (2013), as well as their population characteristics Camenzind (2007); Knispel and Allen (2008). A detection (and even non-detection) of CWs could therefore help us better understand these fascinating astrophysical objects, and may allow for new tests of general relativity Isi et al. (2017); Abbott et al. (2018b).

Overview of search categories

We can categorize CW searches in two different ways: either based on the search method, or on the type of explored parameter space.

The search methods fall into two broad categories: coherent and semi-coherent (sometimes also referred to as incoherent). Roughly speaking, a coherent search is based on signal templates with coherent phase evolution over the whole observation time, while semi-coherent searches typically break the data into shorter coherent segments and combine the resulting statistics from these segments incoherently (i.e. without requiring a consistent phase evolution across segments). However, there are many different approaches and variations, which are beyond the scope of this paper, see e.g. Riles (2017) for a more detailed overview. Here we will exclusively focus on coherent and semi-coherent methods based on the -statistic, which will be introduced in Sec. II.

Coherent search methods are the more sensitive in principle, but in practice they usually suffer from severe computing-cost limitations: for finite search parameter spaces the required number of signal templates typically grows as a steep power-law of the observation time, making such searches infeasible except when the search region is sufficiently small. For larger signal parameter spaces the observation time needs to be kept short enough for the search to be computationally feasible, which limits the attainable coherent sensitivity. This is where semi-coherent searches tend to yield substantially better sensitivity at fixed computing cost (e.g. see Brady and Creighton (2000); Prix and Shaltev (2012)).

Based on the explored parameter space, we distinguish the following search categories (referencing a recent example for each case):

  1. Targeted searches for known pulsars Manchester et al. (2005) assume a perfect fixed relationship between the observed neutron-star spin frequency and the CW emission frequency. Therefore one only needs to search a single point in parameter space for each pulsar, allowing for optimal fully-coherent searches Abbott et al. (2017c).

  2. Narrow-band searches for known pulsars assume a small uncertainty in the relationship between CW frequency and the measured pulsar spin rates. This finite search parameter space requires a template bank with (typically) many millions of templates, still allowing for optimal fully-coherent search methods to be used Abbott et al. (2017d).

  3. Directed (isolated) searches aim at isolated neutron stars with known sky-position and unknown spin frequency. The search parameter space covers the unknown frequency and spindowns of the neutron star signal within an astrophysically-motivated range Aasi et al. (2015a); Zhu et al. (2016).

  4. (Directed) binary searches aim at binary systems with known sky-position and parameter-space uncertainties in the frequency and binary-orbital parameters. Typically these sources would be in low-mass X-ray binaries, with the most prominent example being Scorpius X-1 (Sco X-1)) Abbott et al. (2017e, f).

  5. All-sky (isolated) searches search the whole sky over a large frequency (and spindown) band for unknown isolated neutron stars Abbott et al. (2017a, b).

  6. All-sky binary searches are the most extreme case, covering the whole sky for unknown neutron stars in binary systems Goetz and Riles (2011); Aasi et al. (2014a).

Sensitivity estimation

In this work we use the term sensitivity to refer to the upper limit on signal ampltitude (or equivalently sensitivity depth , see Sec. II.5). This can be either the frequentist upper limit for a given detection probability at a fixed false-alarm level (p-value), or the Bayesian upper limit at a given credible level for the given data.

Sensitivity therefore only captures one aspect of a search, namely how “deep” into the noise-floor it can detect signals, without accounting for how “wide” a region in parameter space is covered, how much prior weight this region contains, or how robust the search is to deviations from the signal model. Comparing sensitivity depth therefore only makes sense for searches over very similar parameter spaces. A more complete measure characterizing searches would be their respective detection probability, which folds in sensitivity depth, breadth in parameter space, as well as the prior weight contained in that space Ming et al. (2016, 2018).

However, it is often useful to be able to reliably and cheaply estimate the sensitivity of a search setup without needing expensive Monte-Carlo simulations:

  • In order to determine optimal search parameters for a semi-coherent search (i.e. the number and semi-coherent segments and template-bank mismatch parameters), it is important to be able to quickly asses the projected sensitivity for any given search-parameter combination (e.g. see Prix and Shaltev (2012); Leaci and Prix (2015); Ming et al. (2016, 2018)).

  • For setting upper limits for a given search, one typically has to repeatedly add software-generated CW signals to the data and perform a search, in order to measure how often these signals are recovered above a given threshold. By iterating this procedure one tries to find the weakest signal amplitude that can be recovered at the desired detection probability (or “confidence”). This can be very computationally expensive, and a quick and reasonably-reliable estimate for the expected upper-limit amplitude can therefore substantially cut down on the cost of this iterative process, which can also improve the accuracy of the upper limit.

  • The estimate can also serve as a sanity check for determining upper limits1.

A number of theoretical sensitivity estimates have been developed over the past decades. One of the first estimates was obtained for a coherent -statistic search Abbott et al. (2004), yielding

(1)

for a confidence upper limit at false-alarm (per template). denotes the (single-sided) noise power spectral density, and is the total amount of data. This was later generalized to the semi-coherent Hough Krishnan et al. (2004) and StackSlide method Mendell and Landry (2005); Abbott et al. (2008a), yielding an expression of the form

(2)

for the same confidence and false-alarm level as Eq. (1), and where denotes the number of semi-coherent segments.

These latter results suggested the inaccurate idea that the sensitivity of semi-coherent searches follows an exact scaling. However, this was later shown Prix and Shaltev (2012); Wette (2012) to not be generally a good approximation except asymptotically in the limit of a large number of segments (.

Furthermore, these past sensitivity estimates relied on the assumption of a “constant signal-to-noise ratio (SNR)” population of signals. While this approximation substantially simplifies the problem, it introduces a noticeable bias into the estimate, as discussed in more detail in Wette (2012). For example, the constant-SNR bias combined with the incorrect scaling in Eq. 2 would result in an overestimate by a factor of two of the sensitivity of the first Einstein@Home search on LIGO S5 data Aasi et al. (2013a).

These limitations of previous sensitivity estimates were eventually overcome by the analytic sensitivity-estimation method developed by Wette (2012) for semi-coherent StackSlide -statistic searches. In this work we simplify and extend this framework by employing a simpler direct numerical implementation. This further improves the estimation accuracy by requiring fewer approximations. It also allows us to generalize the framework to multi-stage hierarchical StackSlide- searches, Hough- searches (such as Aasi et al. (2013a)), as well as to Bayesian upper limits based on -statistic searches.

Plan of this paper

Sec. II provides a description of the CW signal model and introduces different -statistic-based search methods. In Sec. III we present the sensitivity-estimation framework and its implementation, for both frequentist and Bayesian upper limits. Section IV discusses how (frequentist) upper limits are typically measured using Monte-Carlo injection-recovery simulations. Section V provides comparisons of our sensitivity estimates to simulated upper limits in Gaussian noise, while in Sec. VI we provide a comprehensive summary of published sensitivities of past CW searches (translated into sensitivity depth), and a comparison to our sensitivity estimates where applicable. We summarize and discuss the results in Sec. VII. Further details on the referenced searches and upper limits are given in appendix A. More technical details on the signal model can be found in appendix B. Finally, appendix C contains a discussion of the distribution of the maximum -statistic over correlated templates.

Ii -statistic-based search methods

This section provides an overview of the -statistic-based search methods for which sensitivity estimates are derived in Sec. III. Further technical details about the signal model and the -statistic are given in appendix B. For a broader review of the CW signal model, assumptions and search methods, see for example Prix (2009); Lasky (2015); Riles (2017)

ii.1 Signal model

For the purpose of sensitivity estimation we assume the data timeseries from each detector to be described by Gaussian noise, i.e.  with zero mean and (single-sided) power-spectral density (PSD) . A gravitational-wave signal creates an additional strain in the detector, resulting in a timeseries

(3)

For continuous gravitational waves the two polarization components can be written as

(4)

where describes the phase evolution of the signal in the source frame. For the typical quasi-periodic signals expected from rotating neutron stars, this can be expressed as a Taylor series expansion around a chosen reference time (here for simplicity) as

(5)

in terms of derivatives of the slowly-varying intrinsic CW frequency . For a triaxial neutron star spinning about a principal axis, the two polarization amplitudes are given by

(6)

in terms of the angle between the line of sight and the neutron star rotation axis and the overall signal amplitude . This definition uses the common gauge condition of . After translating the source-frame signal into the detector frame (see appendix B for details), the strain signal at each detector can be expressed in the factored form

(7)

which was first shown in Jaranowski et al. (1998), and where the four amplitudes depend on the amplitude parameters as given in Eq. 64). The four basis functions , which are given explicitly in Eq. (65), depend on the phase-evolution parameters , namely sky position , frequency and its derivatives , and binary-orbital parameters in the case of a neutron star in a binary.

ii.2 Coherent -statistic

For pure Gaussian-noise timeseries in all detectors , the likelihood can be written as (e.g. seeFinn (1992); Cutler and Schutz (2005); Prix (2007)):

(8)

in terms of the multi-detector scalar product

(9)

where denotes the Fourier transform of , and denotes complex conjugation of . Using the additivity of noise and signals (cf. Eq. (3)), we can express the likelihood for data , assuming Gaussian noise plus a signal as

(10)

From this we obtain the log-likelihood ratio between the signal and noise hypotheses as

(11)

Analytically maximizing the log-likelihood ratio over (c.f. appendix B) yields the -statistic Jaranowski et al. (1998):

(12)

The statistic follows a -distribution with four degrees of freedom and non-centrality ,

(13)

with expectation and variance

(14)

where corresponds to the signal-to-noise ratio (SNR) of coherent matched filtering.

In the perfect-match case, where the template phase-evolution parameters coincide with the parameters of a signal in the data , the SNR can be explicitly expressed as

(15)

where is the total amount (measured as time) of data over all detectors, and denotes the multi-detector noise floor, defined as the harmonic mean over the per-detector PSDs , namely

(16)

Note that in practice the -statistic-based search implementations do not assume stationary noise over the whole observation time, but only over short durations of order , corresponding to the length of the Short Fourier Transforms (SFTs) that are typically used as input data. The present formalism can straightforwardly be extended to this case Prix (2015), where the relevant overall multi-detector noise-PSD definition generalizes as the harmonic mean over all SFTs, namely

(17)

where is an index enumerating all SFTs (over all detectors), and is the corresponding noise PSD estimated for SFT .

The response function (following the definition in Wette (2012)) depends on the subset of signal parameters

(18)

and can be explicitly expressed Prix (2010) as

(19)

with the sky-dependent antenna-pattern coefficients of Eq. (69), and

(20)
(21)
(22)

One can show that averaged over and yields

(23)

and further averaging isotropically over the sky yields

(24)

Using this with Eq. (15) we can therefore recover the sky- and polarization-averaged squared-SNR expression (e.g. see Jaranowski et al. (1998)):

(25)

ii.3 Semi-coherent -statistic methods

Semi-coherent methods Brady and Creighton (2000) typically divide the data into shorter segments of duration . The segments are analyzed coherently, and the per-segment detection statistics are combined incoherently. Generally this yields lower sensitivity for the same amount of data analyzed than a fully-coherent search. However, the computational cost for a fully-coherent search over the same amount of data is often impossibly large, while the semi-coherent cost can be tuned to be affordable and typically ends up being more sensitive at fixed computing cost Brady and Creighton (2000); Cutler et al. (2005); Prix and Shaltev (2012).

There are a number of different semi-coherent methods currently in use, such as PowerFlux, FrequencyHough, SkyHough, TwoSpect, CrossCorr, Viterbi, Sideband, loosely-coherent statistics and others (e.g. see Riles (2017) and references therein). Many of these methods work on short segments, typically of length , and use Fourier power in the frequency bins of these Short Fourier Transforms (SFTs) as the coherent base statistic.

In this work we focus exclusively on sensitivity estimation of -statistic-based methods, namely StackSlide- (e.g. see Prix and Shaltev (2012)) and Hough- introduced in Krishnan et al. (2004). Here the length of segments is only constrained by the available computing cost, and segments will typically span many hours to days, which yields better sensitivity, but also requires higher computational cost. Therefore, many of the computationally expensive semi-coherent -statistic searches are run on the distributed Einstein@Home computing platform Ein ().

Note that these methods are not to be confused with the (albeit closely related) “classical” StackSlide and Hough methods, which use SFTs as coherent segments, as described for example in Abbott et al. (2008a).

StackSlide-: summing -statistics

The StackSlide- method uses the sum of the coherent per-segment -statistic values in a given parameter-space point as the detection statistic, namely

(26)

where is the coherent -statistic of Eq. (12) in segment . This statistic follows a -distribution with degrees of freedom and non-centrality , i.e.

(27)

where the non-centrality is identical to the expression for the coherent squared SNR of Eq. (15), with referring to the whole data set used, and is the corresponding noise floor. However, the non-centrality in the semi-coherent case cannot be considered a “signal to noise ratio”, due to the larger -dependent degrees of freedom of the distribution compared to Eq. (13), which increases the false-alarm level at fixed threshold and reduces the “effective” semi-coherent to (e.g. see [Eq.(14)] in Wette (2015)).

The expectation and variance for are

(28)

We note that in practice StackSlide- searches often quote the average over segments instead of the sum , i.e.

(29)

Hough-: summing threshold crossings

The Hough- method Krishnan et al. (2004) sets a threshold on the per-segment coherent -statistics and uses the number of threshold-crossings over segments as the detection statistic, the so-called Hough number count , i.e.

(30)

where is the Heaviside step function.

ii.4 Mismatch and template banks

In wide-parameter-space searches the unknown signal parameters are assumed to fall somewhere within a given search space . In this case one needs to compute a statistic (such as those defined in the previous sections) over a whole “bank” of templates . This template bank has to be chosen in such a way that any putative signal would suffer only an acceptable level of loss of SNR. This is typically quantified in terms of the so-called mismatch , defined as the relative loss of in a template with respect to the perfect-match (of Eq. (15)), namely

(31)

We can therefore express the “effective” non-centrality parameter in a template point in the -statistic -distribution of Eqs. (13),(27) as

(32)

ii.5 Sensitivity Depth

The -statistic non-centrality parameter depends on signal amplitude and overall noise floor (cf. Eq. (17)) only through the combination , as seen in Eq. (15). The sensitivity of a search is therefore most naturally characterized in terms of the so-called sensitivity depth Behnke et al. (2015), defined as

(33)

in terms of the overall noise PSD defined as the harmonic mean over all SFTs used in the search, see Eq. (17).

A particular choice of search parameters (, template bank) will generally yield a frequency-dependent upper limit , due to the frequency-dependent noise floor . However, for fixed search parameters this will correspond to a constant sensitivity depth , which is therefore often a more practical and natural way to characterize the performance of a search, independently of the noise floor.

Iii Sensitivity Estimate

As discussed in more detail in the introduction, by sensitivity we mean the (measured or expected) upper limit on for a given search (or equivalently, the sensitivity depth ), which can either refer to the frequentist or Bayesian upper limit.

iii.1 Frequentist upper limits

The frequentist upper limit is defined as the weakest signal amplitude that can be detected at a given detection probability 2 (typically chosen as or ) above a threshold on a statistic . The threshold can be chosen as the loudest candidate obtained in the search, or it can be set corresponding to a desired false-alarm level (or p-value), defined as

(34)

which can be inverted to yield . The detection probability for signals of amplitude is

(35)

which can be inverted to yield the upper limit .

We can write , and so we can express both in terms of the general threshold-crossing probability as

(36)

iii.2 Approximating wide-parameter-space statistics

As discussed in Sec. II.4, a wide parameter-space search for unknown signals typically proceeds by computing a (single-template) statistic over a bank of templates covering the parameter space . This results in a corresponding set of (single-template) statistic values , which need to be combined to form the overall wide-parameter-space statistic . This would naturally be obtained via marginalization (i.e. integrating the likelihood over ), but in practice is mostly done by maximizing the single-template statistic over , i.e.

(37)

Noise case: estimating the p-value

For the pure noise case of Eq. (34), it is difficult to determine a reliable expression for , even if the single-template statistic follows a known distribution (such as for the -based statistics discussed in Sec. II). The reason for this difficulty lies in the correlations that generally exist between “nearby” templates in .

If all templates were strictly uncorrelated, one could use the well-known expression Eq. (71) Abadie et al. (2010); Wette (2012) for the distribution of the maximum. In this case one can also relate the single-trial p-value to the wide-parameter-space p-value (for ).

Although it is a common assumption in the literature, template correlations do not simply modify the “effective” number of independent templates to use in Eq. (71), but they generally also affect the functional form of the underlying distribution for the maximum , as illustrated in appendix C with a simple toy model.

In this work we assume that the upper limit refers to a known detection threshold in Eq. (35). This can be obtained either from (i) the loudest observed candidate (the most common situation in real searches), or from (ii) setting a single-template p-value and inverting the known single-template distribution Eq. (34), or from (iii) a numerically-obtained relation between and the threshold , e.g. via Monte-Carlo simulation.

Signal case: estimating the detection probability

In the signal case it is easier to estimate the maximum-likelihood statistic over the full template bank , provided we can assume that the highest value of will be realized near the signal location, which should be true as long as the p-value is low (typically ) and the signals have relatively high detection probability (typically or ). This will typically be a good approximation, but in Sec. V we will also encounter situations where deviations from the predictions can be traced to violations of these assumptions. We therefore approximate

(38)

where is the “closest” template to the signal location , defined in terms of the metric Eq. (31), namely the template with the smallest mismatch from the signal. This template yields the highest effective non-centrality parameter over the template bank, namely

(39)

iii.3 StackSlide- sensitivity

We first consider a semi-coherent StackSlide- search using the summed -statistic of Eq. (26), i.e. . This case also includes fully-coherent -statistic searches, which simply correspond to the special case .

We see from Eq. (36) that in order to estimate the sensitivity, we need to know . This can be obtained via marginalization (at fixed ) of the known distribution of Eq. (27), combined with the assumption Eq. (39) that the highest statistic value will occur in the “closest” template, with mismatch distribution :

(40)

where in terms of the perfect-match non-centrality defined in Eq. (15), and in the last step we used the fact that the distribution for is fully specified in terms of the non-centrality parameter of the -distribution with degrees of freedom, as given in Eq. (27).

Equation  (III.3) requires five-dimensional integration for each sensitivity estimation, which would be slow and cumbersome. One of the key insights in Wette (2012) was to notice that the perfect-match SNR of Eq. (15) depends on the four parameters only through the scalar , and we can therefore use a reparametrization

(41)

where denotes the subspace of values yielding a particular from Eq. (19).

The one-dimensional distribution can be obtained by Monte-Carlo sampling over the priors of sky-position (typically either isotropically over the whole sky, or a single sky-position in case of a directed search) and polarization angles (uniform in ) and (uniform in ). The resulting values of are histogrammed and used as an approximation for , which can be reused for repeated sensitivity estimations with the same -priors. We can therefore rewrite Eq. (III.3) as

(42)

with

(43)
(44)

The mismatch distribution for any given search will typically be obtained via injection-recovery Monte-Carlo simulation, where signals are repeatedly generated (without noise) and searched for over the template bank, obtaining the corresponding mismatch for each injection. This is often a common step in validating a search and template-bank setup. Alternatively, for some search methods pre-computed estimates for the mismatch distributions exist as a function of the template-bank parameters, e.g. for the Weave search code Wette et al. (2018).

Inserting Eq. (42) into the detection probability of Eq. (36), we obtain

(45)

where

(46)

Equation (45) can be easily and efficiently computed numerically, and simple inversion (via 1-D root-finding) yields the sensitivity (i.e. upper limit) for given detection probability and threshold .

iii.4 Multi-stage StackSlide- sensitivity

The sensitivity estimate for a single StackSlide- search can be generalized to hierarchical multi-stage searches, where threshold-crossing candidates of one search stage are followed up by deeper subsequent searches in order to increase the overall sensitivity (e.g. see Brady and Creighton (2000); Cutler et al. (2005); Miroslav Shaltev (2013); Papa et al. (2016); Abbott et al. (2017b)). We denote the stages with an index . Each stage is characterized by the number of segments, the amount of data , the noise PSD , a mismatch distribution , and a threshold (corresponding to a false-alarm level at that stage).

The initial wide-parameter-space search (stage ) yields candidates that cross the threshold in certain templates . The next stage follows up these candidates with a more sensitive search, which can be achieved by reducing the mismatch (choosing a finer template bank grid), or by increasing the coherent segment length (and reducing the number of segments ). Often the final stage in such a follow-up hierarchy would be fully coherent, i.e. .

In order for any given candidate (which can be either due to noise or a signal) to cross the final threshold , it has to cross all previous thresholds as well, in other words Eq (34),(35) now generalize to

(47)

In order to make progress at this point we need to assume that the threshold-crossing probabilities in different stages are independent of each other, so for we assume

(48)

which would be exactly true if the different stages used different data (see also Cutler et al. (2005)). In the case where the same data is used in different stages, this approximation corresponds to an uninformative approach, in the sense that we do not know how to quantify and take into account the correlations between the statistics in different stages. We proceed without using this potential information, which could in principle be used to improve the estimate. It is not clear if and how much of an overall bias this approximation would introduce. A detailed study of this question is beyond the scope of this work and will be left for future study.

Using the assumption of independent stages we write

(49)
(50)

where now the -marginalization needs to happen over all stages combined, as the signal parameters are intrinsic to the signal and therefore independent of the stage. On the other hand, the mismatch distribution depends on the stage, as each stage will typically use a different template grid, and so we have

(51)

where is given by Eq. (46) using the respective per-stage values.

Equation (49) can easily be solved numerically and inverted for the sensitivity at given and a set of thresholds .

Note that in practice one would typically Papa et al. (2016) want to choose the thresholds in such a way that a signal that passed the 1st-stage threshold should have a very low probability of being discarded by subsequent stages, in other words , and therefore . Therefore subsequent stages mostly serve to reduce the total false-alarm level , allowing one to increase the first-stage by lowering the corresponding threshold , resulting in an overall increased sensitivity.

iii.5 Hough- sensitivity

Here we apply the sensitivity-estimation framework to the Hough- statistic introduced in Sec. II.3.2. The key approximation we use here is to assume that for a given signal , the coherent per-segment -statistic has the same threshold-crossing probability in each segment , i.e.  for all , and

(52)

where the per-segment effective SNR is given by Eq. (44) with and referring to the respective per-segment quantities, and is the mismatch of -statistic in a single segment.

Provided these quantities are reasonable constant across segments, for a fixed signal we can write the probability for the Hough number count of Eq. (30) as a binomial distribution, namely

(53)

with given by Eq. (III.5). For a given threshold on the number count we therefore have the detection probability

(54)

and marginalization over yields the corresponding detection probability at fixed amplitude , namely

(55)

We can numerically solve this for at given and number-count threshold , which yields the desired sensitivity estimate.

iii.6 Bayesian Upper Limits

Bayesian upper limits are conceptually quite different Röver et al. (2011) from the frequentist ones discussed up to this point. A Bayesian upper limit of given confidence (or “credible level”) corresponds to the interval that contains the true value of with probability . We can compute this from the posterior distribution for the signal-amplitude given data , namely

(56)

The Bayesian targeted searches (here referred to as BayesPE) for known pulsars (see Table 5 and Sec. A.5) compute the posterior directly from the data , using a time-domain method introduced in Dupuis and Woan (2005) .

Here we focus instead on -statistic-based searches over a template bank. As discussed in Röver et al. (2011), to a very good approximation we can compute the posterior from the loudest candidate found in such a search, using this as a proxy for the data , i.e.

(57)
(58)

where we used Bayes’ theorem, and the proportionality constant is determined by the normalization condition .

We have already derived the expression for in Eq. (42), and for any choice of prior we can therefore easily compute the Bayesian upper limit for given loudest candidate by inverting Eq. (56).

It is common for Bayesian upper limits on the amplitude to choose a uniform (improper) prior in (e.g. see Abbott et al. (2017c)), which has the benefit of simplicity, and also puts relatively more weight on larger values of than might be physically expected (weaker signals should be more likely than stronger ones). This prior therefore results in larger, i.e. “more conservative”, upper limits than a more physical prior would.

iii.7 Numerical implementation

The expressions for the various different sensitivity estimates of the previous sections have been implemented in GNU Octave Eaton et al. (2015), and are available as part of the OctApps Wette et al. (2018) data-analysis package for continuous gravitational waves.

The function to estimate (and cache for later reuse) the distribution of Eq. (41) is implemented in SqrSNRGeometricFactorHist().

The sensitivity-depth estimate for StackSlide--searches is implemented in SensitivityDepthStackSlide(), both for the single-stage case of Eq. (45) and for the general multi-stage case of Eq. (49). For single-stage StackSlide- there is also a function DetectionProbabilityStackSlide() estimating the detection probability for a given signal depth and detection threshold.

The Hough- sensitivity estimate of Eq. (55) is implemented in SensitivityDepthHoughF(). An earlier version of this function had been used for the theoretical sensitivity comparison in Aasi et al. (2013a) (Sec. VB, and also Prix and Wette (2012)), where it was found to agree within an rms error of with the measured upper limits.

The Bayesian -based upper limit expression Eq. (56) is implemented in SensitivityDepthBayesian().

Typical input parameters are the number of segments , the total amount of data , the mismatch distribution , name of detectors used, single-template false-alarm level (or alternatively, the -statistic threshold), and the confidence level . The default prior on sky-position is isotropic (suitable for an all-sky search), but this can be restricted to any sky-region (suitable for directed or targeted searches).

The typical runtime on a 3GHz Intel Xeon E3 for a sensitivity estimate including computing (which is the most expensive part) is about per detector. When reusing the same -prior on subsequent calls, a cached is used and the runtime is reduced to about total, independently of the number of detectors used.

Iv Determining Frequentist Upper Limits

In order to determine the frequentist upper limit (UL) on the signal amplitude defined in Eq. (35), one needs to quantify the probability that a putative signal with fixed amplitude (and all other signal parameters drawn randomly from their priors) would produce a statistic value exceeding the threshold (corresponding to a certain false-alarm level, or p-value). The upper limit on is then defined as the value for which the detection probability is exactly , typically chosen as or , which is often referred to as the confidence level of the UL.

Note that here and in the following it will often be convenient to use the sensitivity depth introduced in Sec. II.5 instead of the amplitude . We denote as the sensitivity depth corresponding to the upper limit (note that this corresponds to a lower limit on depth).

The UL procedure is typically implemented via a Monte-Carlo injection-and-recovery method: a signal of fixed amplitude and randomly-drawn remaining parameters is generated in software and added to the data (either to real detector data or to simulated Gaussian noise). This step is referred to as a signal injection. A search is then performed on this data, and the loudest statistic value is recorded and compared against the detection threshold . Repeating this injection and recovery step many times and recording the fraction of times the threshold is exceeded yields an approximation for . By repeating this procedure over different values and interpolating one can find corresponding to the desired detection probability (and therefore also ).

We distinguish in the following between measured and simulated upper limits:

  • Measured ULs refer to the published UL results obtained on real detector data. These typically use an identical search procedure for the ULs as in the actual search, often using the loudest candidate (over some range of the parameter space) from the original search as the corresponding detection threshold for setting the UL. The injections are done in real detector data, and typically the various vetoes, data-cleaning and follow-up procedures of the original search will also be applied in the UL procedure.

  • Simulated ULs are used in this work to verify the accuracy of the sensitivity estimates. They are obtained using injections in simulated Gaussian noise, and searching only a small box in parameter space around the injected signal locations. The box size is empirically determined to ensure that the loudest signal candidates are always recovered within the box. Only the original search statistic is used in the search without any further vetoes or cleaning.

A key difference between (most) published (measured) ULs and our simulated ULs concerns the method of interpolation used to obtain : in practice this is often obtained via a sigmoid -interpolation approach (Sec. IV.1), while we use (and advocate for) a (piecewise) linear threshold interpolation (Sec. IV.2) instead.

iv.1 Sigmoid interpolation

In this approach one fixes the detection threshold and determines the corresponding for any given fixed- injection set. The corresponding functional form of has a qualitative “sigmoid” shape as illustrated in Fig. 1. An actual sigmoid function of the form

(59)

is then fit to the data by adjusting the free parameters and , and from this one can obtain an interpolation value for .

One problem with this method is that the actual functional form of is not analytically known, and does not actually seem to be well described by the sigmoid of Eq. (59), as seen in Fig. 1. In this particular example the true value at just so happens to lie very close to the sigmoid fit, but the deviation is quite noticeable at (see the zoomed inset in Fig. 1).

Figure 1: Detection probability versus sensitivity depth for the S6-CasA-StackSlide- search (cf. Table 2 and Sec. A.3), using a detection threshold of . The squares indicate the results from a simulation in Gaussian noise, while the solid line gives the best-fit sigmoid of Eq. (59).

Another problem with this method is that the range of depths required to sample the relation often needs to be quite wide, due to initial uncertainties about where the UL value would be found, which can compound the above-mentioned sigmoid-fitting problem. Furthermore, the injection-recovery step can be quite computationally expensive, limiting the number of trials and further increasing the statistical uncertainty on the measurements.

Both of these problems can be mitigated to some extent by using the sensitivity-estimation method described in this paper (Sec. III) to obtain a fairly accurate initial guess about the expected UL value, and then sample only in a small region around this estimate, in which case even a linear fit would probably yield good accuracy.

iv.2 Piecewise-linear threshold interpolation

An alternative approach is used in this work to obtain the simulated ULs: for each set of fixed- injections and recoveries, we determine the threshold on the statistic required in order to obtain the desired detection fraction . This is illustrated in Fig. 2, which shows a histogram of the observed loudest candidates obtained in each of injection and recovery runs at a fixed signal depth of , using the S6-CasA-StackSlide- search setup (cf. Sec. A.3). By integrating the probability density from until we reach the desired value , we find the detection threshold at this signal depth . Repeating this procedure at different depths therefore generates a sampling of the function , illustrated in Fig. 3. These points can be interpolated to the required detection threshold, which yields the desired upper-limit depth .

Figure 2: Histogram of recovered loudest values for repeated searches on signal injections at fixed sensitivity depth (with all other signal parameters randomized), using the search setup of the S6-CasA-StackSlide- directed search. The vertical line indicates the resulting threshold value corresponding to for this injection set.
Figure 3: Sensitivity depth versus detection threshold. Boxes and solid lines indicate the piecewise-linear interpolation through the obtained thresholds at different depths of an injection-recovery simulation, using the S6-CasA-StackSlide- search setup (Zhu et al. (2016) and Sec. A.3).

We see in in Fig. 3 that this function appears to be less “curvy” in the region of interest compared to shown in Fig. 1. This allows for easier fitting and interpolation, for example a linear or quadratic fit should work quite well. In fact, here we have simply used piecewise-linear interpolation, which is sufficient given our relatively fine sampling of signal depths.

As already mentioned in the previous section, using the sensitivity estimate of Sec. III one can determine the most relevant region of interest beforehand and focus the Monte-Carlo injection-recoveries on this region, which will help ensure that any simple interpolation method will work well.

Alternatively, for either the - or the -sampling approach, one could also use an iterative root-finding method to approach the desired or , respectively.

V Comparing estimates against simulated upper limits

In this section we compare the sensitivity estimates from Sec. III against simulated ULs for two example cases (an all-sky search and a directed search), in order to quantify the accuracy and reliability of the estimation method and implementation. This comparison shows generally good agreement, and also some instructive deviations.

Both examples are wide-parameter-space searches using a template bank over the unknown signal parameter dimensions (namely, {sky, frequency and spindown} in the all-sky case, and {frequency and first and second derivatives} in the directed-search case).

The simulated-UL procedure (see Sec. IV) performs a template-bank search over a box in parameter space containing the injected signal (at a randomized location) in Gaussian noise. On the other hand, the sensitivity estimate (cf. Eq. (45)) uses the mismatch distribution obtained for this template bank via injection-recovery box searches on signals without noise. We refer to this in the following as the box search.

It will be instructive to also consider the (unrealistic) case of a perfectly-matched search, using only a single template that matches the signal parameters perfectly for every injection, corresponding to zero mismatch in Eq. (45). We refer to this as the zero-mismatch search.

v.1 Example: S6-AllSky-StackSlide- search

In this example we use the setup of the all-sky search S6-AllSky-StackSlide- Abbott et al. (2016d), which was using the GCT implementation Pletsch and Allen (2009) of the StackSlide- statistic and was performed on the volunteer-computing project Einstein@Home Ein (), see Table 1 and Sec. A.2 for more details.

Figure 4 shows the comparison between simulated ULs and estimated sensitivity depths versus threshold , for the box search (squares and solid line), as well as for the zero-mismatch search (crosses and dashed line).

Figure 4: Comparison of estimated and simulated sensitivity depth as a function of threshold for the S6-AllSky-StackSlide- search Abbott et al. (2016d). The solid line shows the UL estimate for the box search, and the squares () show the corresponding simulated ULs. The dashed line indicates the estimate for the zero-mismatch case, and the crosses () are for the simulated zero-mismatch ULs. In the box search we observe an increasing divergence at decreasing thresholds due to noise effects, discussed in Sec. V.1.

We see excellent agreement between estimated and simulated ULs for the zero-mismatch search. We also find very good agreement for the box-search at higher thresholds, while we see an increasing divergence of the simulated ULs at decreasing thresholds, which is not captured by the estimate.

This discrepancy can be understood as the effect of noise fluctuations, which can enter in two different ways (that are not completely independent of each other):

  1. For decreasing thresholds the corresponding false-alarm level Eq. (34) grows, as it becomes increasingly likely that a “pure noise” candidate (i.e. unrelated to a signal) crosses the threshold. In the extreme case where approaches , the frequentist upper limit would tend to , corresponding to 3. This is illustrated in Fig. 5 showing the distribution of the loudest in a box search on pure Gaussian noise, which can be compared to the diverging depth of the simulated box search around in Fig. 4.

    Figure 5: Distribution of the loudest for a box search on pure Gaussian noise, using the S6-AllSky-StackSlide- search setup.

    We note that in practice the procedures used for measured ULs in CW searches typically make sure that the detection threshold has a very small false-alarm level, and we therefore expect this effect to have a negligible impact in cases of practical interest.

  2. The sensitivity estimate for wide-parameter-space searches makes the assumption that the loudest candidate is always found in the closest template to the signal (i.e. with the smallest mismatch ), as discussed in Sec. III.2. However, while the closest template has the highest expected statistic value (by definition), other templates can actually produce the loudest statistic value in any given noise realization. How likely that is to happen depends on the details of the parameter space, the template bank and the threshold. It will typically be more likely at lower thresholds, as more templates further away from the signal are given a chance to cross the threshold (despite their larger mismatch).

    The true distribution of a box search will therefore be shifted to higher values compared to the approximate distribution used in Eq. (42). This implies that an actual search can have a higher detection probability than predicted by the estimate (corresponding to a larger sensitivity depth).

Both of these effects contribute to different extents to the box-search discrepancy in Fig. 4 at lower thresholds:

The sampling distribution for in the presence of relatively strong signals at is shown in Fig. 6, both for a simulated box search as well for the assumed distribution in the estimate.

Figure 6: Loudest distribution for a box-search (using the S6-AllSky-StackSlide- setup) with signals at a depth of . The black histogram shows the assumed distribution for sensitivity estimation in Eq. (42), and the lighter color shows the histogram obtained in a Monte-Carlo simulation with signals injected in Gaussian noise.

We see that most of the loudest candidates obtained in the simulation are above , and are therefore extremely unlikely to be due to noise alone, as seen from Fig. 5. The difference between the two distributions in Fig. 6 is therefore soley due to effect (ii). However, we see in Fig. 4 that the resulting discrepancy in the sensitivity estimate at is still very small.

For weaker signals at , we see in Fig. 7 that the corresponding distribution now overlaps with the pure-noise distribution of Fig. 5. The sensitivity depth therefore increasingly diverges for thresholds in the range