Fast and Accurate Sensitivity Estimation for ContinuousGravitationalWave Searches
Abstract
This paper presents an efficient numerical sensitivityestimation method and implementation for continuousgravitationalwave searches, extending and generalizing an earlier analytic approach by Wette Wette (2012). This estimation framework applies to a broad class of statisticbased search methods, namely (i) semicoherent StackSlide statistic (singlestage and hierarchical multistage), (ii) Hough number count on statistics, as well as (iii) Bayesian upper limits on (coherent or semicoherent) statistic search results. We test this estimate against results from MonteCarlo simulations assuming Gaussian noise. We find the agreement to be within a few at high (i.e. low falsealarm) detection thresholds, with increasing deviations at decreasing (i.e. higher falsealarm) detection thresholds, which can be understood in terms of the approximations used in the estimate. We also provide an extensive summary of sensitivity depths achieved in past continuousgravitationalwave searches (derived from the published upper limits). For the statisticbased searches where our sensitivity estimate is applicable, we find an average relative deviation to the published upper limits of less than , which in most cases includes systematic uncertainty about the noisefloor estimate used in the published upper limits.
I Introduction
The recent detections of gravitational waves from merging binaryblackhole and double neutronstar systems Abbott et al. (2016a, b, 2017a) have opened a whole new observational window for astronomy, allowing for new tests of general relativity Abbott et al. (2016c), new constraints on neutron star physics Abbott et al. (2018a) and new measurements of the Hubble constant Abbott et al. (2017b), to mention just a few highlights.
Continuous gravitational waves (CWs) from spinning nonaxisymmetric neutron stars represent a different class of potentiallyobservable signals Prix (2009); Lasky (2015), which have yet to be detected Riles (2017). These signals are expected to be longlasting (at least several days to years) and quasi monochromatic, with slowly changing (intrinsic) frequency. The signal amplitude depends on the rich (and largely not yet wellunderstood) internal physics of neutron stars JohnsonMcDaniel and Owen (2013), as well as their population characteristics Camenzind (2007); Knispel and Allen (2008). A detection (and even nondetection) of CWs could therefore help us better understand these fascinating astrophysical objects, and may allow for new tests of general relativity Isi et al. (2017); Abbott et al. (2018b).
Overview of search categories
We can categorize CW searches in two different ways: either based on the search method, or on the type of explored parameter space.
The search methods fall into two broad categories: coherent and semicoherent (sometimes also referred to as incoherent). Roughly speaking, a coherent search is based on signal templates with coherent phase evolution over the whole observation time, while semicoherent searches typically break the data into shorter coherent segments and combine the resulting statistics from these segments incoherently (i.e. without requiring a consistent phase evolution across segments). However, there are many different approaches and variations, which are beyond the scope of this paper, see e.g. Riles (2017) for a more detailed overview. Here we will exclusively focus on coherent and semicoherent methods based on the statistic, which will be introduced in Sec. II.
Coherent search methods are the more sensitive in principle, but in practice they usually suffer from severe computingcost limitations: for finite search parameter spaces the required number of signal templates typically grows as a steep powerlaw of the observation time, making such searches infeasible except when the search region is sufficiently small. For larger signal parameter spaces the observation time needs to be kept short enough for the search to be computationally feasible, which limits the attainable coherent sensitivity. This is where semicoherent searches tend to yield substantially better sensitivity at fixed computing cost (e.g. see Brady and Creighton (2000); Prix and Shaltev (2012)).
Based on the explored parameter space, we distinguish the following search categories (referencing a recent example for each case):

Targeted searches for known pulsars Manchester et al. (2005) assume a perfect fixed relationship between the observed neutronstar spin frequency and the CW emission frequency. Therefore one only needs to search a single point in parameter space for each pulsar, allowing for optimal fullycoherent searches Abbott et al. (2017c).

Narrowband searches for known pulsars assume a small uncertainty in the relationship between CW frequency and the measured pulsar spin rates. This finite search parameter space requires a template bank with (typically) many millions of templates, still allowing for optimal fullycoherent search methods to be used Abbott et al. (2017d).

(Directed) binary searches aim at binary systems with known skyposition and parameterspace uncertainties in the frequency and binaryorbital parameters. Typically these sources would be in lowmass Xray binaries, with the most prominent example being Scorpius X1 (Sco X1)) Abbott et al. (2017e, f).
Sensitivity estimation
In this work we use the term sensitivity to refer to the upper limit on signal ampltitude (or equivalently sensitivity depth , see Sec. II.5). This can be either the frequentist upper limit for a given detection probability at a fixed falsealarm level (pvalue), or the Bayesian upper limit at a given credible level for the given data.
Sensitivity therefore only captures one aspect of a search, namely how “deep” into the noisefloor it can detect signals, without accounting for how “wide” a region in parameter space is covered, how much prior weight this region contains, or how robust the search is to deviations from the signal model. Comparing sensitivity depth therefore only makes sense for searches over very similar parameter spaces. A more complete measure characterizing searches would be their respective detection probability, which folds in sensitivity depth, breadth in parameter space, as well as the prior weight contained in that space Ming et al. (2016, 2018).
However, it is often useful to be able to reliably and cheaply estimate the sensitivity of a search setup without needing expensive MonteCarlo simulations:

In order to determine optimal search parameters for a semicoherent search (i.e. the number and semicoherent segments and templatebank mismatch parameters), it is important to be able to quickly asses the projected sensitivity for any given searchparameter combination (e.g. see Prix and Shaltev (2012); Leaci and Prix (2015); Ming et al. (2016, 2018)).

For setting upper limits for a given search, one typically has to repeatedly add softwaregenerated CW signals to the data and perform a search, in order to measure how often these signals are recovered above a given threshold. By iterating this procedure one tries to find the weakest signal amplitude that can be recovered at the desired detection probability (or “confidence”). This can be very computationally expensive, and a quick and reasonablyreliable estimate for the expected upperlimit amplitude can therefore substantially cut down on the cost of this iterative process, which can also improve the accuracy of the upper limit.

The estimate can also serve as a sanity check for determining upper limits
^{1} .
A number of theoretical sensitivity estimates have been developed over the past decades. One of the first estimates was obtained for a coherent statistic search Abbott et al. (2004), yielding
(1) 
for a confidence upper limit at falsealarm (per template). denotes the (singlesided) noise power spectral density, and is the total amount of data. This was later generalized to the semicoherent Hough Krishnan et al. (2004) and StackSlide method Mendell and Landry (2005); Abbott et al. (2008a), yielding an expression of the form
(2) 
for the same confidence and falsealarm level as Eq. (1), and where denotes the number of semicoherent segments.
These latter results suggested the inaccurate idea that the sensitivity of semicoherent searches follows an exact scaling. However, this was later shown Prix and Shaltev (2012); Wette (2012) to not be generally a good approximation except asymptotically in the limit of a large number of segments (.
Furthermore, these past sensitivity estimates relied on the assumption of a “constant signaltonoise ratio (SNR)” population of signals. While this approximation substantially simplifies the problem, it introduces a noticeable bias into the estimate, as discussed in more detail in Wette (2012). For example, the constantSNR bias combined with the incorrect scaling in Eq. 2 would result in an overestimate by a factor of two of the sensitivity of the first Einstein@Home search on LIGO S5 data Aasi et al. (2013a).
These limitations of previous sensitivity estimates were eventually overcome by the analytic sensitivityestimation method developed by Wette (2012) for semicoherent StackSlide statistic searches. In this work we simplify and extend this framework by employing a simpler direct numerical implementation. This further improves the estimation accuracy by requiring fewer approximations. It also allows us to generalize the framework to multistage hierarchical StackSlide searches, Hough searches (such as Aasi et al. (2013a)), as well as to Bayesian upper limits based on statistic searches.
Plan of this paper
Sec. II provides a description of the CW signal model and introduces different statisticbased search methods. In Sec. III we present the sensitivityestimation framework and its implementation, for both frequentist and Bayesian upper limits. Section IV discusses how (frequentist) upper limits are typically measured using MonteCarlo injectionrecovery simulations. Section V provides comparisons of our sensitivity estimates to simulated upper limits in Gaussian noise, while in Sec. VI we provide a comprehensive summary of published sensitivities of past CW searches (translated into sensitivity depth), and a comparison to our sensitivity estimates where applicable. We summarize and discuss the results in Sec. VII. Further details on the referenced searches and upper limits are given in appendix A. More technical details on the signal model can be found in appendix B. Finally, appendix C contains a discussion of the distribution of the maximum statistic over correlated templates.
Ii statisticbased search methods
This section provides an overview of the statisticbased search methods for which sensitivity estimates are derived in Sec. III. Further technical details about the signal model and the statistic are given in appendix B. For a broader review of the CW signal model, assumptions and search methods, see for example Prix (2009); Lasky (2015); Riles (2017)
ii.1 Signal model
For the purpose of sensitivity estimation we assume the data timeseries from each detector to be described by Gaussian noise, i.e. with zero mean and (singlesided) powerspectral density (PSD) . A gravitationalwave signal creates an additional strain in the detector, resulting in a timeseries
(3) 
For continuous gravitational waves the two polarization components can be written as
(4) 
where describes the phase evolution of the signal in the source frame. For the typical quasiperiodic signals expected from rotating neutron stars, this can be expressed as a Taylor series expansion around a chosen reference time (here for simplicity) as
(5) 
in terms of derivatives of the slowlyvarying intrinsic CW frequency . For a triaxial neutron star spinning about a principal axis, the two polarization amplitudes are given by
(6) 
in terms of the angle between the line of sight and the neutron star rotation axis and the overall signal amplitude . This definition uses the common gauge condition of . After translating the sourceframe signal into the detector frame (see appendix B for details), the strain signal at each detector can be expressed in the factored form
(7) 
which was first shown in Jaranowski et al. (1998), and where the four amplitudes depend on the amplitude parameters as given in Eq. 64). The four basis functions , which are given explicitly in Eq. (65), depend on the phaseevolution parameters , namely sky position , frequency and its derivatives , and binaryorbital parameters in the case of a neutron star in a binary.
ii.2 Coherent statistic
For pure Gaussiannoise timeseries in all detectors , the likelihood can be written as (e.g. seeFinn (1992); Cutler and Schutz (2005); Prix (2007)):
(8) 
in terms of the multidetector scalar product
(9) 
where denotes the Fourier transform of , and denotes complex conjugation of . Using the additivity of noise and signals (cf. Eq. (3)), we can express the likelihood for data , assuming Gaussian noise plus a signal as
(10) 
From this we obtain the loglikelihood ratio between the signal and noise hypotheses as
(11) 
Analytically maximizing the loglikelihood ratio over (c.f. appendix B) yields the statistic Jaranowski et al. (1998):
(12) 
The statistic follows a distribution with four degrees of freedom and noncentrality ,
(13) 
with expectation and variance
(14) 
where corresponds to the signaltonoise ratio (SNR) of coherent matched filtering.
In the perfectmatch case, where the template phaseevolution parameters coincide with the parameters of a signal in the data , the SNR can be explicitly expressed as
(15) 
where is the total amount (measured as time) of data over all detectors, and denotes the multidetector noise floor, defined as the harmonic mean over the perdetector PSDs , namely
(16) 
Note that in practice the statisticbased search implementations do not assume stationary noise over the whole observation time, but only over short durations of order , corresponding to the length of the Short Fourier Transforms (SFTs) that are typically used as input data. The present formalism can straightforwardly be extended to this case Prix (2015), where the relevant overall multidetector noisePSD definition generalizes as the harmonic mean over all SFTs, namely
(17) 
where is an index enumerating all SFTs (over all detectors), and is the corresponding noise PSD estimated for SFT .
The response function (following the definition in Wette (2012)) depends on the subset of signal parameters
(18) 
and can be explicitly expressed Prix (2010) as
(19) 
with the skydependent antennapattern coefficients of Eq. (69), and
(20)  
(21)  
(22) 
One can show that averaged over and yields
(23) 
and further averaging isotropically over the sky yields
(24) 
Using this with Eq. (15) we can therefore recover the sky and polarizationaveraged squaredSNR expression (e.g. see Jaranowski et al. (1998)):
(25) 
ii.3 Semicoherent statistic methods
Semicoherent methods Brady and Creighton (2000) typically divide the data into shorter segments of duration . The segments are analyzed coherently, and the persegment detection statistics are combined incoherently. Generally this yields lower sensitivity for the same amount of data analyzed than a fullycoherent search. However, the computational cost for a fullycoherent search over the same amount of data is often impossibly large, while the semicoherent cost can be tuned to be affordable and typically ends up being more sensitive at fixed computing cost Brady and Creighton (2000); Cutler et al. (2005); Prix and Shaltev (2012).
There are a number of different semicoherent methods currently in use, such as PowerFlux, FrequencyHough, SkyHough, TwoSpect, CrossCorr, Viterbi, Sideband, looselycoherent statistics and others (e.g. see Riles (2017) and references therein). Many of these methods work on short segments, typically of length , and use Fourier power in the frequency bins of these Short Fourier Transforms (SFTs) as the coherent base statistic.
In this work we focus exclusively on sensitivity estimation of statisticbased methods, namely StackSlide (e.g. see Prix and Shaltev (2012)) and Hough introduced in Krishnan et al. (2004). Here the length of segments is only constrained by the available computing cost, and segments will typically span many hours to days, which yields better sensitivity, but also requires higher computational cost. Therefore, many of the computationally expensive semicoherent statistic searches are run on the distributed Einstein@Home computing platform Ein ().
Note that these methods are not to be confused with the (albeit closely related) “classical” StackSlide and Hough methods, which use SFTs as coherent segments, as described for example in Abbott et al. (2008a).
StackSlide: summing statistics
The StackSlide method uses the sum of the coherent persegment statistic values in a given parameterspace point as the detection statistic, namely
(26) 
where is the coherent statistic of Eq. (12) in segment . This statistic follows a distribution with degrees of freedom and noncentrality , i.e.
(27) 
where the noncentrality is identical to the expression for the coherent squared SNR of Eq. (15), with referring to the whole data set used, and is the corresponding noise floor. However, the noncentrality in the semicoherent case cannot be considered a “signal to noise ratio”, due to the larger dependent degrees of freedom of the distribution compared to Eq. (13), which increases the falsealarm level at fixed threshold and reduces the “effective” semicoherent to (e.g. see [Eq.(14)] in Wette (2015)).
The expectation and variance for are
(28) 
We note that in practice StackSlide searches often quote the average over segments instead of the sum , i.e.
(29) 
Hough: summing threshold crossings
The Hough method Krishnan et al. (2004) sets a threshold on the persegment coherent statistics and uses the number of thresholdcrossings over segments as the detection statistic, the socalled Hough number count , i.e.
(30) 
where is the Heaviside step function.
ii.4 Mismatch and template banks
In wideparameterspace searches the unknown signal parameters are assumed to fall somewhere within a given search space . In this case one needs to compute a statistic (such as those defined in the previous sections) over a whole “bank” of templates . This template bank has to be chosen in such a way that any putative signal would suffer only an acceptable level of loss of SNR. This is typically quantified in terms of the socalled mismatch , defined as the relative loss of in a template with respect to the perfectmatch (of Eq. (15)), namely
(31) 
We can therefore express the “effective” noncentrality parameter in a template point in the statistic distribution of Eqs. (13),(27) as
(32) 
ii.5 Sensitivity Depth
The statistic noncentrality parameter depends on signal amplitude and overall noise floor (cf. Eq. (17)) only through the combination , as seen in Eq. (15). The sensitivity of a search is therefore most naturally characterized in terms of the socalled sensitivity depth Behnke et al. (2015), defined as
(33) 
in terms of the overall noise PSD defined as the harmonic mean over all SFTs used in the search, see Eq. (17).
A particular choice of search parameters (, template bank) will generally yield a frequencydependent upper limit , due to the frequencydependent noise floor . However, for fixed search parameters this will correspond to a constant sensitivity depth , which is therefore often a more practical and natural way to characterize the performance of a search, independently of the noise floor.
Iii Sensitivity Estimate
As discussed in more detail in the introduction, by sensitivity we mean the (measured or expected) upper limit on for a given search (or equivalently, the sensitivity depth ), which can either refer to the frequentist or Bayesian upper limit.
iii.1 Frequentist upper limits
The frequentist upper limit is defined as the weakest signal amplitude that can be detected at a given detection
probability
(34) 
which can be inverted to yield . The detection probability for signals of amplitude is
(35) 
which can be inverted to yield the upper limit .
We can write , and so we can express both in terms of the general thresholdcrossing probability as
(36) 
iii.2 Approximating wideparameterspace statistics
As discussed in Sec. II.4, a wide parameterspace search for unknown signals typically proceeds by computing a (singletemplate) statistic over a bank of templates covering the parameter space . This results in a corresponding set of (singletemplate) statistic values , which need to be combined to form the overall wideparameterspace statistic . This would naturally be obtained via marginalization (i.e. integrating the likelihood over ), but in practice is mostly done by maximizing the singletemplate statistic over , i.e.
(37) 
Noise case: estimating the pvalue
For the pure noise case of Eq. (34), it is difficult to determine a reliable expression for , even if the singletemplate statistic follows a known distribution (such as for the based statistics discussed in Sec. II). The reason for this difficulty lies in the correlations that generally exist between “nearby” templates in .
If all templates were strictly uncorrelated, one could use the wellknown expression Eq. (71) Abadie et al. (2010); Wette (2012) for the distribution of the maximum. In this case one can also relate the singletrial pvalue to the wideparameterspace pvalue (for ).
Although it is a common assumption in the literature, template correlations do not simply modify the “effective” number of independent templates to use in Eq. (71), but they generally also affect the functional form of the underlying distribution for the maximum , as illustrated in appendix C with a simple toy model.
In this work we assume that the upper limit refers to a known detection threshold in Eq. (35). This can be obtained either from (i) the loudest observed candidate (the most common situation in real searches), or from (ii) setting a singletemplate pvalue and inverting the known singletemplate distribution Eq. (34), or from (iii) a numericallyobtained relation between and the threshold , e.g. via MonteCarlo simulation.
Signal case: estimating the detection probability
In the signal case it is easier to estimate the maximumlikelihood statistic over the full template bank , provided we can assume that the highest value of will be realized near the signal location, which should be true as long as the pvalue is low (typically ) and the signals have relatively high detection probability (typically or ). This will typically be a good approximation, but in Sec. V we will also encounter situations where deviations from the predictions can be traced to violations of these assumptions. We therefore approximate
(38) 
where is the “closest” template to the signal location , defined in terms of the metric Eq. (31), namely the template with the smallest mismatch from the signal. This template yields the highest effective noncentrality parameter over the template bank, namely
(39) 
iii.3 StackSlide sensitivity
We first consider a semicoherent StackSlide search using the summed statistic of Eq. (26), i.e. . This case also includes fullycoherent statistic searches, which simply correspond to the special case .
We see from Eq. (36) that in order to estimate the sensitivity, we need to know . This can be obtained via marginalization (at fixed ) of the known distribution of Eq. (27), combined with the assumption Eq. (39) that the highest statistic value will occur in the “closest” template, with mismatch distribution :
(40) 
where in terms of the perfectmatch noncentrality defined in Eq. (15), and in the last step we used the fact that the distribution for is fully specified in terms of the noncentrality parameter of the distribution with degrees of freedom, as given in Eq. (27).
Equation (III.3) requires fivedimensional integration for each sensitivity estimation, which would be slow and cumbersome. One of the key insights in Wette (2012) was to notice that the perfectmatch SNR of Eq. (15) depends on the four parameters only through the scalar , and we can therefore use a reparametrization
(41) 
where denotes the subspace of values yielding a particular from Eq. (19).
The onedimensional distribution can be obtained by MonteCarlo sampling over the priors of skyposition (typically either isotropically over the whole sky, or a single skyposition in case of a directed search) and polarization angles (uniform in ) and (uniform in ). The resulting values of are histogrammed and used as an approximation for , which can be reused for repeated sensitivity estimations with the same priors. We can therefore rewrite Eq. (III.3) as
(42) 
with
(43)  
(44) 
The mismatch distribution for any given search will typically be obtained via injectionrecovery MonteCarlo simulation, where signals are repeatedly generated (without noise) and searched for over the template bank, obtaining the corresponding mismatch for each injection. This is often a common step in validating a search and templatebank setup. Alternatively, for some search methods precomputed estimates for the mismatch distributions exist as a function of the templatebank parameters, e.g. for the Weave search code Wette et al. (2018).
iii.4 Multistage StackSlide sensitivity
The sensitivity estimate for a single StackSlide search can be generalized to hierarchical multistage searches, where thresholdcrossing candidates of one search stage are followed up by deeper subsequent searches in order to increase the overall sensitivity (e.g. see Brady and Creighton (2000); Cutler et al. (2005); Miroslav Shaltev (2013); Papa et al. (2016); Abbott et al. (2017b)). We denote the stages with an index . Each stage is characterized by the number of segments, the amount of data , the noise PSD , a mismatch distribution , and a threshold (corresponding to a falsealarm level at that stage).
The initial wideparameterspace search (stage ) yields candidates that cross the threshold in certain templates . The next stage follows up these candidates with a more sensitive search, which can be achieved by reducing the mismatch (choosing a finer template bank grid), or by increasing the coherent segment length (and reducing the number of segments ). Often the final stage in such a followup hierarchy would be fully coherent, i.e. .
In order for any given candidate (which can be either due to noise or a signal) to cross the final threshold , it has to cross all previous thresholds as well, in other words Eq (34),(35) now generalize to
(47) 
In order to make progress at this point we need to assume that the thresholdcrossing probabilities in different stages are independent of each other, so for we assume
(48) 
which would be exactly true if the different stages used different data (see also Cutler et al. (2005)). In the case where the same data is used in different stages, this approximation corresponds to an uninformative approach, in the sense that we do not know how to quantify and take into account the correlations between the statistics in different stages. We proceed without using this potential information, which could in principle be used to improve the estimate. It is not clear if and how much of an overall bias this approximation would introduce. A detailed study of this question is beyond the scope of this work and will be left for future study.
Using the assumption of independent stages we write
(49)  
(50) 
where now the marginalization needs to happen over all stages combined, as the signal parameters are intrinsic to the signal and therefore independent of the stage. On the other hand, the mismatch distribution depends on the stage, as each stage will typically use a different template grid, and so we have
(51) 
where is given by Eq. (46) using the respective perstage values.
Equation (49) can easily be solved numerically and inverted for the sensitivity at given and a set of thresholds .
Note that in practice one would typically Papa et al. (2016) want to choose the thresholds in such a way that a signal that passed the 1ststage threshold should have a very low probability of being discarded by subsequent stages, in other words , and therefore . Therefore subsequent stages mostly serve to reduce the total falsealarm level , allowing one to increase the firststage by lowering the corresponding threshold , resulting in an overall increased sensitivity.
iii.5 Hough sensitivity
Here we apply the sensitivityestimation framework to the Hough statistic introduced in Sec. II.3.2. The key approximation we use here is to assume that for a given signal , the coherent persegment statistic has the same thresholdcrossing probability in each segment , i.e. for all , and
(52) 
where the persegment effective SNR is given by Eq. (44) with and referring to the respective persegment quantities, and is the mismatch of statistic in a single segment.
Provided these quantities are reasonable constant across segments, for a fixed signal we can write the probability for the Hough number count of Eq. (30) as a binomial distribution, namely
(53) 
with given by Eq. (III.5). For a given threshold on the number count we therefore have the detection probability
(54) 
and marginalization over yields the corresponding detection probability at fixed amplitude , namely
(55) 
We can numerically solve this for at given and numbercount threshold , which yields the desired sensitivity estimate.
iii.6 Bayesian Upper Limits
Bayesian upper limits are conceptually quite different Röver et al. (2011) from the frequentist ones discussed up to this point. A Bayesian upper limit of given confidence (or “credible level”) corresponds to the interval that contains the true value of with probability . We can compute this from the posterior distribution for the signalamplitude given data , namely
(56) 
The Bayesian targeted searches (here referred to as BayesPE) for known pulsars (see Table 5 and Sec. A.5) compute the posterior directly from the data , using a timedomain method introduced in Dupuis and Woan (2005) .
Here we focus instead on statisticbased searches over a template bank. As discussed in Röver et al. (2011), to a very good approximation we can compute the posterior from the loudest candidate found in such a search, using this as a proxy for the data , i.e.
(57)  
(58) 
where we used Bayes’ theorem, and the proportionality constant is determined by the normalization condition .
We have already derived the expression for in Eq. (42), and for any choice of prior we can therefore easily compute the Bayesian upper limit for given loudest candidate by inverting Eq. (56).
It is common for Bayesian upper limits on the amplitude to choose a uniform (improper) prior in (e.g. see Abbott et al. (2017c)), which has the benefit of simplicity, and also puts relatively more weight on larger values of than might be physically expected (weaker signals should be more likely than stronger ones). This prior therefore results in larger, i.e. “more conservative”, upper limits than a more physical prior would.
iii.7 Numerical implementation
The expressions for the various different sensitivity estimates of the previous sections have been implemented in GNU Octave Eaton et al. (2015), and are available as part of the OctApps Wette et al. (2018) dataanalysis package for continuous gravitational waves.
The function to estimate (and cache for later reuse) the distribution of Eq. (41) is implemented in SqrSNRGeometricFactorHist().
The sensitivitydepth estimate for StackSlidesearches is implemented in SensitivityDepthStackSlide(), both for the singlestage case of Eq. (45) and for the general multistage case of Eq. (49). For singlestage StackSlide there is also a function DetectionProbabilityStackSlide() estimating the detection probability for a given signal depth and detection threshold.
The Hough sensitivity estimate of Eq. (55) is implemented in SensitivityDepthHoughF(). An earlier version of this function had been used for the theoretical sensitivity comparison in Aasi et al. (2013a) (Sec. VB, and also Prix and Wette (2012)), where it was found to agree within an rms error of with the measured upper limits.
The Bayesian based upper limit expression Eq. (56) is implemented in SensitivityDepthBayesian().
Typical input parameters are the number of segments , the total amount of data , the mismatch distribution , name of detectors used, singletemplate falsealarm level (or alternatively, the statistic threshold), and the confidence level . The default prior on skyposition is isotropic (suitable for an allsky search), but this can be restricted to any skyregion (suitable for directed or targeted searches).
The typical runtime on a 3GHz Intel Xeon E3 for a sensitivity estimate including computing (which is the most expensive part) is about per detector. When reusing the same prior on subsequent calls, a cached is used and the runtime is reduced to about total, independently of the number of detectors used.
Iv Determining Frequentist Upper Limits
In order to determine the frequentist upper limit (UL) on the signal amplitude defined in Eq. (35), one needs to quantify the probability that a putative signal with fixed amplitude (and all other signal parameters drawn randomly from their priors) would produce a statistic value exceeding the threshold (corresponding to a certain falsealarm level, or pvalue). The upper limit on is then defined as the value for which the detection probability is exactly , typically chosen as or , which is often referred to as the confidence level of the UL.
Note that here and in the following it will often be convenient to use the sensitivity depth introduced in Sec. II.5 instead of the amplitude . We denote as the sensitivity depth corresponding to the upper limit (note that this corresponds to a lower limit on depth).
The UL procedure is typically implemented via a MonteCarlo injectionandrecovery method: a signal of fixed amplitude and randomlydrawn remaining parameters is generated in software and added to the data (either to real detector data or to simulated Gaussian noise). This step is referred to as a signal injection. A search is then performed on this data, and the loudest statistic value is recorded and compared against the detection threshold . Repeating this injection and recovery step many times and recording the fraction of times the threshold is exceeded yields an approximation for . By repeating this procedure over different values and interpolating one can find corresponding to the desired detection probability (and therefore also ).
We distinguish in the following between measured and simulated upper limits:

Measured ULs refer to the published UL results obtained on real detector data. These typically use an identical search procedure for the ULs as in the actual search, often using the loudest candidate (over some range of the parameter space) from the original search as the corresponding detection threshold for setting the UL. The injections are done in real detector data, and typically the various vetoes, datacleaning and followup procedures of the original search will also be applied in the UL procedure.

Simulated ULs are used in this work to verify the accuracy of the sensitivity estimates. They are obtained using injections in simulated Gaussian noise, and searching only a small box in parameter space around the injected signal locations. The box size is empirically determined to ensure that the loudest signal candidates are always recovered within the box. Only the original search statistic is used in the search without any further vetoes or cleaning.
A key difference between (most) published (measured) ULs and our simulated ULs concerns the method of interpolation used to obtain : in practice this is often obtained via a sigmoid interpolation approach (Sec. IV.1), while we use (and advocate for) a (piecewise) linear threshold interpolation (Sec. IV.2) instead.
iv.1 Sigmoid interpolation
In this approach one fixes the detection threshold and determines the corresponding for any given fixed injection set. The corresponding functional form of has a qualitative “sigmoid” shape as illustrated in Fig. 1. An actual sigmoid function of the form
(59) 
is then fit to the data by adjusting the free parameters and , and from this one can obtain an interpolation value for .
One problem with this method is that the actual functional form of is not analytically known, and does not actually seem to be well described by the sigmoid of Eq. (59), as seen in Fig. 1. In this particular example the true value at just so happens to lie very close to the sigmoid fit, but the deviation is quite noticeable at (see the zoomed inset in Fig. 1).
Another problem with this method is that the range of depths required to sample the relation often needs to be quite wide, due to initial uncertainties about where the UL value would be found, which can compound the abovementioned sigmoidfitting problem. Furthermore, the injectionrecovery step can be quite computationally expensive, limiting the number of trials and further increasing the statistical uncertainty on the measurements.
Both of these problems can be mitigated to some extent by using the sensitivityestimation method described in this paper (Sec. III) to obtain a fairly accurate initial guess about the expected UL value, and then sample only in a small region around this estimate, in which case even a linear fit would probably yield good accuracy.
iv.2 Piecewiselinear threshold interpolation
An alternative approach is used in this work to obtain the simulated ULs: for each set of fixed injections and recoveries, we determine the threshold on the statistic required in order to obtain the desired detection fraction . This is illustrated in Fig. 2, which shows a histogram of the observed loudest candidates obtained in each of injection and recovery runs at a fixed signal depth of , using the S6CasAStackSlide search setup (cf. Sec. A.3). By integrating the probability density from until we reach the desired value , we find the detection threshold at this signal depth . Repeating this procedure at different depths therefore generates a sampling of the function , illustrated in Fig. 3. These points can be interpolated to the required detection threshold, which yields the desired upperlimit depth .
We see in in Fig. 3 that this function appears to be less “curvy” in the region of interest compared to shown in Fig. 1. This allows for easier fitting and interpolation, for example a linear or quadratic fit should work quite well. In fact, here we have simply used piecewiselinear interpolation, which is sufficient given our relatively fine sampling of signal depths.
As already mentioned in the previous section, using the sensitivity estimate of Sec. III one can determine the most relevant region of interest beforehand and focus the MonteCarlo injectionrecoveries on this region, which will help ensure that any simple interpolation method will work well.
Alternatively, for either the  or the sampling approach, one could also use an iterative rootfinding method to approach the desired or , respectively.
V Comparing estimates against simulated upper limits
In this section we compare the sensitivity estimates from Sec. III against simulated ULs for two example cases (an allsky search and a directed search), in order to quantify the accuracy and reliability of the estimation method and implementation. This comparison shows generally good agreement, and also some instructive deviations.
Both examples are wideparameterspace searches using a template bank over the unknown signal parameter dimensions (namely, {sky, frequency and spindown} in the allsky case, and {frequency and first and second derivatives} in the directedsearch case).
The simulatedUL procedure (see Sec. IV) performs a templatebank search over a box in parameter space containing the injected signal (at a randomized location) in Gaussian noise. On the other hand, the sensitivity estimate (cf. Eq. (45)) uses the mismatch distribution obtained for this template bank via injectionrecovery box searches on signals without noise. We refer to this in the following as the box search.
It will be instructive to also consider the (unrealistic) case of a perfectlymatched search, using only a single template that matches the signal parameters perfectly for every injection, corresponding to zero mismatch in Eq. (45). We refer to this as the zeromismatch search.
v.1 Example: S6AllSkyStackSlide search
In this example we use the setup of the allsky search S6AllSkyStackSlide Abbott et al. (2016d), which was using the GCT implementation Pletsch and Allen (2009) of the StackSlide statistic and was performed on the volunteercomputing project Einstein@Home Ein (), see Table 1 and Sec. A.2 for more details.
Figure 4 shows the comparison between simulated ULs and estimated sensitivity depths versus threshold , for the box search (squares and solid line), as well as for the zeromismatch search (crosses and dashed line).
We see excellent agreement between estimated and simulated ULs for the zeromismatch search. We also find very good agreement for the boxsearch at higher thresholds, while we see an increasing divergence of the simulated ULs at decreasing thresholds, which is not captured by the estimate.
This discrepancy can be understood as the effect of noise fluctuations, which can enter in two different ways (that are not completely independent of each other):

For decreasing thresholds the corresponding falsealarm level Eq. (34) grows, as it becomes increasingly likely that a “pure noise” candidate (i.e. unrelated to a signal) crosses the threshold. In the extreme case where approaches , the frequentist upper limit would tend to , corresponding to
^{3} . This is illustrated in Fig. 5 showing the distribution of the loudest in a box search on pure Gaussian noise, which can be compared to the diverging depth of the simulated box search around in Fig. 4.We note that in practice the procedures used for measured ULs in CW searches typically make sure that the detection threshold has a very small falsealarm level, and we therefore expect this effect to have a negligible impact in cases of practical interest.

The sensitivity estimate for wideparameterspace searches makes the assumption that the loudest candidate is always found in the closest template to the signal (i.e. with the smallest mismatch ), as discussed in Sec. III.2. However, while the closest template has the highest expected statistic value (by definition), other templates can actually produce the loudest statistic value in any given noise realization. How likely that is to happen depends on the details of the parameter space, the template bank and the threshold. It will typically be more likely at lower thresholds, as more templates further away from the signal are given a chance to cross the threshold (despite their larger mismatch).
The true distribution of a box search will therefore be shifted to higher values compared to the approximate distribution used in Eq. (42). This implies that an actual search can have a higher detection probability than predicted by the estimate (corresponding to a larger sensitivity depth).
Both of these effects contribute to different extents to the boxsearch discrepancy in Fig. 4 at lower thresholds:
The sampling distribution for in the presence of relatively strong signals at is shown in Fig. 6, both for a simulated box search as well for the assumed distribution in the estimate.
We see that most of the loudest candidates obtained in the simulation are above , and are therefore extremely unlikely to be due to noise alone, as seen from Fig. 5. The difference between the two distributions in Fig. 6 is therefore soley due to effect (ii). However, we see in Fig. 4 that the resulting discrepancy in the sensitivity estimate at is still very small.