NANOGrav Limits on Gravitational Waves from Individual Supermassive Black Hole Binaries in Circular Orbits

# NANOGrav Limits on Gravitational Waves from Individual Supermassive Black Hole Binaries in Circular Orbits

###### Abstract

The North American Nanohertz Observatory for Gravitational Waves (NANOGrav) project currently observes 43 pulsars using the Green Bank and Arecibo radio telescopes. In this work we use a subset of 17 pulsars timed for a span of roughly five years (2005–2010). We analyze these data using standard pulsar timing models, with the addition of time-variable dispersion measure and frequency-variable pulse shape terms. Within the timing data, we perform a search for continuous gravitational waves from individual supermassive black hole binaries in circular orbits using robust frequentist and Bayesian techniques. We find that there is no evidence for the presence of a detectable continuous gravitational wave; however, we can use these data to place the most constraining upper limits to date on the strength of such gravitational waves. Using the full 17 pulsar dataset we place a 95% upper limit on the sky-averaged strain amplitude of at a frequency of 10 nHz. Furthermore, we place 95% all sky lower limits on the luminosity distance to such gravitational wave sources finding that the Mpc for sources at a frequency of 10 nHz and chirp mass . We find that for gravitational wave sources near our best timed pulsars in the sky, the sensitivity of the pulsar timing array is increased by a factor of 4 over the sky-averaged sensitivity. Finally we place limits on the coalescence rate of the most massive supermassive black hole binaries.

\collaboration

NANOGrav Collaboration

## 1. Introduction

The direct detection of Gravitational Waves (GWs) is a major goal of experimental physics and astrophysics. One of the most promising means of detecting GWs is through the precise timing of an array of millisecond pulsars (MSPs). The concept of a pulsar timing array (PTA) was first conceived of over two decades ago (Sazhin, 1978; Detweiler, 1979; Hellings & Downs, 1983; Romani, 1989; Foster & Backer, 1990). Twenty years later three main PTAs are in full operation around the world: the North American Nanohertz Observatory for Gravitational waves (NANOGrav; Jenet et al., 2009), the Parkes Pulsar Timing Array (PPTA; Manchester, 2008), and the European Pulsar Timing Array (EPTA; Janssen et al., 2008). The three PTAs collaborate to form the International Pulsar Timing Array (IPTA; Hobbs et al., 2010) which will result in increased sensitivity to GWs through more data and longer time-spans than any single PTA.

PTAs are most sensitive to GWs with frequencies in the nanohertz regime (i.e., Hz – Hz). Potential sources of GWs in this frequency range include supermassive black hole binary systems (SMBHBs) (Sesana et al., 2008), cosmic (super)strings (Olmez et al., 2010), inflation (Starobinsky, 1979), and a first order phase transition at the QCD scale (Caprini et al., 2010). The community has thus far focused mostly on stochastic backgrounds produced by these sources; however, sufficiently nearby individual SMBHBs may produce detectable continuous waves with periods on the order of years for masses in the range (Wyithe & Loeb, 2003; Sesana et al., 2009; Sesana & Vecchio, 2010). Several upper limits have been placed on the strength of the stochastic background (Kaspi et al., 1994; Jenet et al., 2006; van Haasteren et al., 2011; Demorest et al., 2013; Shannon et al., 2013) and continuous waves (Jenet et al., 2004; Yardley et al., 2010) but no successful detection has yet been made.

In this paper we will use current-generation frequentist (Ellis et al., 2012) and Bayesian (Ellis, 2013) data analysis pipelines to compute upper limits on the strain amplitude of continuous GWs from SMBHBs in circular orbits. We make use of the 5-year, 17 pulsar data set obtained as part of the NANOGrav project (Demorest et al., 2013). In Section 2 we briefly review the radio observations and timing analysis. In Section 3 we describe the signal model used to describe the continuous GWs in the PTA band. In Section 4 we describe, in detail, the time domain likelihood function, the noise model, and the frequentist and Bayesian search pipelines. In Section 5 we apply our search and upper limit pipelines to the NANOGrav dataset and report our findings. In section 6 we summarize our results. In the Appendices we derive the form of the frequency evolution of SMBHBs, and give full details on the computational implementation of our Bayesian code.

## 2. Observations and Timing Analysis

The observational data used for this analysis are the same as those presented by Demorest et al. (2013); the reader is referred to that paper for a detailed description of the observations and timing analysis. Here we present a brief review of the relevant features of the data set. The timing data used here were acquired during 2005–2010 using two radio telescopes, the 305-m Arecibo telescope, and the 100-m Robert C. Byrd Green Bank Telescope (GBT). A total of 17 pulsars (8 at Arecibo, 10 at the GBT, with J17130747 observed by both telescopes) were monitored using a typical observational cadence of 4–6 weeks between sessions. At each observing epoch, every pulsar was observed using two separate receiver systems operating at widely separated radio frequencies ranging from 327 MHz to 2.3 GHz. The typical observation length was 30 minutes per pulsar per receiver. All data were recorded using the identical ASP (at Arecibo) and GASP (at the GBT) pulsar backend systems (Demorest, 2007). These systems processed a typical radio bandwidth of 64 MHz using real-time coherent dedispersion and pulse period folding, resulting in 2048-bin full-Stokes pulse profiles averaged over 1–3 minutes in 4 MHz channels.

Pulse profile calibration, integration, and time of arrival (TOA) determination was done using standard techniques via the PSRCHIVE software package (Hotan et al., 2004). For each pulsar all profiles in a given epoch were integrated in time to form a single set of profiles across radio frequency. From these, TOAs were measured separately in each 4 MHz radio frequency channel. This resulted in a set of 20–30 multi-frequency TOAs at each epoch, or 500–2000 TOAs total for each pulsar over the full data set. Before searching for the presence of GW in these data, the rotational, orbital, astrometric and interstellar medium properties specific to each pulsar – effects collectively known as the timing model – must first be determined from the TOA data. For this we analyzed the TOAs using both the TEMPO and TEMPO2 (Hobbs et al., 2006) timing software packages and obtained identical results with both. Notable features of the timing models used here include: Spin frequency and spin-down rate, but no higher frequency derivatives, were fit for all pulsars; all five astrometric parameters (sky position, proper motion and parallax) were fit for all pulsars444Parallax was not fit for in PSR J16402224.; time-variable dispersion measure (DM); was included by fitting for an independent DM value at each epoch, using the multi-frequency TOAs;555Models for pulsars J18531308, J19101256 and B195329 did not include DM variation measurement as only single-frequency data were available for these. intrinsic profile shape evolution with frequency; was included as a constant-in-time offset for each frequency channel, and Keplarian and relativistic orbital elements, as appropriate for pulsars in binary systems. All TOA data and final timing solutions for this data set are publicly available online.

## 3. GWs From Supermassive Black Hole Binaries

Pulsar timing residuals are defined as the difference between observed times-of-arrival (TOAs) of radio pulses and a deterministic timing model. In this section we review the form of the residuals induced by SMBHBs consisting of non-spinning black holes in a circular orbit (e.g., Wahlquist, 1987) and introduce our notation. Spin effects are not likely to play any measurable role in the orbital dynamics (Sesana & Vecchio, 2010) and eccentric systems (Roedig & Sesana, 2012; Sesana, 2013) will be addressed in a future work. The GW is a metric perturbation to flat space-time defined in terms of its two polarizations as

 hab(t,^Ω)=e+ab(^Ω)h+(t,^Ω)+e×ab(^Ω)h×(t,^Ω), (1)

where is the unit vector pointing from the GW source to the Solar System Barycenter (SSB), , and () are the polarization amplitudes and polarization tensors, respectively. The polarization tensors can be converted to the SSB by the following transformation. Following Wahlquist (1987) we write

 e+ab(^Ω) =^ma^mb−^na^nb, (2) e×ab(^Ω) =^ma^nb+^na^mb, (3)

where

 ^Ω =−(sinθcosφ)^x−(sinθsinφ)^y−(cosθ)^z, (4) ^m =−(sinφ)^x+(cosφ)^y, (5) ^n =−(cosθcosφ)^x−(cosθsinφ)^y+(sinθ)^z, (6)

where , , and are the usual Cartesian coordinate unit vectors. In this coordinate system, and are the polar and azimuthal angles of the source, respectively, where and are declination and right ascension in the usual equatorial coordinates, where the north celestial pole is in the direction and the vernal equinox is in the direction.

We will write our GW induced pulsar timing residuals in the following form:

 s(t,^Ω)=F+(^Ω)Δs+(t)+F×(^Ω)Δs×(t), (7)

where

 ΔsA(t)=sA(tp)−sA(te), (8)

and and are the times at which the GW passes the Earth777Technically, this is the time that the GW passes the SSB, however, following convention we will label this as the Earth time and will later refer to the Earth-term, keeping in mind that, in practice, all variables are referenced to the SSB. and pulsar, respectively, and the index labels polarizations. The functions are known as antenna pattern functions and are defined by

 F+(^Ω) =12(^m⋅^p)2−(^n⋅^p)21+^Ω⋅^p (9) F×(^Ω) =(^m⋅^p)(^n⋅^p)1+^Ω⋅^p, (10)

where is the unit vector pointing from the Earth to the pulsar. Also, from geometry we can write888Here we use geometrized units where .

 tp=te−L(1+^Ω⋅^p) (11)

where is the distance to the pulsar. Given these definitions, we can write the GW contributions to the timing residuals at 0-PN (Post-Newtonian) order as (Wahlquist, 1987; Corbin & Cornish, 2010)

 s+(t)=M5/3dLω(t)1/3[−sin[2Φ(t)](1+cos2ι)cos2ψ−2cos[2Φ(t)]cosιsin2ψ] (12) s×(t)=M5/3dLω(t)1/3[−sin[2Φ(t)](1+cos2ι)sin2ψ+2cos[2Φ(t)]cosιcos2ψ], (13)

where is the GW polarization angle and is the inclination angle of the SMBHB. The orbital phase and frequency of the SMBHB are

 Φ(t)=Φ0+132M5/3(ω−5/30−ω(t)−5/3) (14)

and

 ω(t)=ω0(1−2565M5/3ω8/30t)−3/8. (15)

where and are the initial values at the time of our first observation, the chirp mass is defined by , where and are the masses of the two SMBHs, and is the luminosity distance to the SMBHB source. See Appendix A for a more complete derivation of the frequency evolution of the binary including important approximations that can be made. For our purposes here we will just note that orbital frequency evolution over our observing time span is very unlikely while frequency evolution over the earth-pulsar light travel time is almost certain for sources with reasonably large chirp masses (i.e., ) (see e.g., Sesana & Vecchio, 2010). We can relate the GW frequency to the orbital frequency of the binary by for circular orbits. Note that we use the observed redshifted values. For example, the chirp mass and orbital angular frequency in the rest frame are and , respectively, where is the cosmological redshift.

From the signal model presented above, we see that our parameter space is 8 dimensional and the continuous wave parameter space vector is

 →λ0={θ,φ,Φ0,ψ,ι,M,dL,ω0}. (16)

However, because typical pulsar distance uncertainties are on the order of tens of percent (Verbiest et al., 2012), in order to attain phase coherence in our search algorithm, we must allow the pulsar distance to vary as a search parameter as well. Henceforth, we will adopt the notation that , where is the distance to the th pulsar, in order to denote the fact that the pulsar distance is a search parameter. The above parameter set represents the default parameters used in our search; however, when setting upper limits we wish to parameterize the upper limit in terms of the inclination averaged strain amplitude

 h0=4√25M5/3(πfgw)2/3dL. (17)

Since the luminosity distance, is only a scale parameter we use as a free parameter in the waveform instead of luminosity distance when computing upper limits.

## 4. Search Techniques

### 4.1. Likelihood Function for PTAs

Following van Haasteren & Levin (2013) and Ellis et al. (2013) we write the measured pulsar timing residuals in the linear approximation as

 δt=Mδξ+n+s, (18)

where is a vector of length representing the timing residuals for a single pulsar, is the design matrix (an matrix composed of the functional form of the linearized timing model (column) evaluated at each TOA (row)), is a vector of length representing the parameter offset between the true pulsar timing parameters and our best fit parameters, and are length vectors representing the noise present in the TOAs (radiometer noise, red noise, etc.) and our continuous GW signal, respectively. In practice, the noise in pulsar timing residuals is non-Gaussian due to interstellar medium scintillation effects which are manifest through a time varying pulse intensity. Nonetheless, the noise in each residual is modeled very well by a Gaussian with zero mean and standard deviation equal to the uncertainty on the TOA. In other words, the noise in the weighted (by the individual TOA errors) residuals is very well approximated as a Gaussian. As will be detailed in the next section, we include these error bar weights in our noise covariance matrix. Thus, assuming is Gaussian999Note that here Gaussian noise simply means that the data obey a multivariate Gaussian probability distribution function. This does not mean that we assume the data is white., we can write the likelihood function for a single pulsar as

where are parameters that describe the noise in the pulsar residuals, the parameters that characterize the continuous GW signal101010We have dropped the subscript on the parameter vector here as we are only considering a single pulsar. and is the covariance matrix of the noise. It was shown in van Haasteren & Levin (2013) that this likelihood function can be marginalized over the timing model parameters to obtain

 p(δt|→ϕ,→λ)=exp[−12(δt−s)TG(GTCG)−1GT(δt−s)]√(2π)(NTOA−m)det(GTCG), (20)

where is an matrix with the number of TOAs and the number of fitted parameters in the timing model. The derivation of the -matrix approach can be found in van Haasteren & Levin (2013) and will not be explored here. The matrix is a projection operator that projects our data onto the null space of , that is, it projects the data into a subspace orthogonal to the linearized timing model. In the timing analysis used here, DM variation and profile frequency evolution effects are part of the timing model, and these terms are included when constructing the matrix. In this way we have fully taken into account the timing model fitting procedure.

For this work we will assume that the residuals between pulsars are uncorrelated. In other words, we are assuming that the stochastic GW background will be negligible compared to the intrinsic noise in each pulsar. In general this is not likely to be a good assumption when we would expect a detection of a single GW source. Furthermore, terrestrial clock errors (Hobbs et al., 2012) and errors in solar system ephemerides (Champion et al., 2010)111111Note that current uncertainties in the ephemerides are small enough that they will likely not pose any problems for GW analyses. can also cause correlations between residuals from different pulsars, with however different angular correlation properties on the sky than are expected from GWs. The effects of omitting the correlations in the likelihood function are unknown and will be the subject of future work. Under these assumptions, the likelihood function for the full PTA can be written as

 p(δt|→λ)=Npsr∏α=1p(δtα|→λα), (21)

where and and the residuals and model parameters for the th pulsar, respectively and is the full CW parameter vector including pulsar distances for all pulsars. In cases where we fix the noise values, we can write the log-likelihood ratio of a model with a single continuous GW to a model with just noise as

 lnΛ=Npsr∑α[(δtα|s(→λα))−12(s(→λα)|s(→λα))], (22)

where the inner product between two time-series and is

 (x|y)=xTG(GTCG)−1GTy. (23)

In the remainder of the paper we will refer to the signal-to-noise ratio in the following form

 (24)

where the brackets denote the expectation value over many noise realizations.

### 4.2. Noise Model

In the above section we have derived the likelihood function used for our analysis; however, we have not specified the form of the noise covariance matrix . Previous Bayesian analysis schemes (van Haasteren et al., 2009; van Haasteren & Levin, 2010; van Haasteren et al., 2011; Ellis et al., 2012; van Haasteren & Levin, 2013; van Haasteren, 2013; Ellis et al., 2013; Ellis, 2013) have used a power-law red noise model and an EFAC (constant multiplier on the TOA uncertainties) and EQUAD (additional Gaussian white noise added in quadrature to EFAC noise) parameters to describe the white noise, with a covariance matrix of the form

 C=E2W+Q2I+Cred(Ared,γred), (25)

where is the EFAC parameter, , with the errorbar on the th TOA, is the EQUAD parameter and is an analytic expression of the red noise amplitude and spectral index . It is worth noting that we use no EFAC or EQUAD parameters in our pulsar timing model fit but instead include them directly in our noise model. The EFAC is simply a parameter that quantifies any additional uncertainty in the TOA uncertainties and the EQUAD parameter quantifies any additional white noise that is not related to the formal TOA uncertainties. In principle, a different EFAC value should be used for each pulsar timing backend as this parameter is related to intrinsic receiver noise; however, in this 5-year NANOGrav dataset, only one backend per telescope was used121212PSR J1713+0747 is observed at both Arecibo and GBT; however, we find that there is very little difference in the measured EFAC parameters for the two telescopes.. Therefore, we are justified in only using one EFAC parameter per pulsar. This noise model is quite general and works well for many pulsars; however, the size of the matrices is quite large (on the order of ) and inversion is a large bottleneck in the analysis pipelines. Furthermore, current NANOGrav observing schemes produce large sets of multifrequency observations that are essentially simultaneous. One may be tempted to simply perform a weighted average of the TOAs and work with the new reduced datasets but in the Bayesian scheme we must marginalize over the timing model parameters analytically and it is unclear how to carry out this process for epoch-averaged TOAs. Because of this, we have developed a framework to essentially work backward from the marginal likelihood to derive a nearly exact averaging scheme. First we re-write our noise covariance matrix

 C=N+U~CUT, (26)

where is a reduced covariance matrix with the number of epochs131313Here we have defined an epoch to be one day. in our dataset, is a white noise covariance matrix of the EFAC and EQUAD terms, and is the “exploder” matrix that maps epochs (columns) to the full set of TOAs (rows). If we now make use of this new formalism, the likelihood function is then

 p(δt|→ϕ,→λ)=exp[−12((δt−s)T~N−1(δt−s)−dTΣ−1d)]√(2π)n−mdet(~C)det(GTNG)det(Σ), (27)

where we have used the Woodbury Lemma141414 and to compute the inverse and determinant of , , , and . Note, that here are essentially daily averaged residuals. For NANOGrav datasets the number of epochs per pulsar is on the order of 30–100, while the total number of TOAs per pulsar is on the order of , thus the inversions (here is diagonal and can be pre-computed, thus the only dense matrix inversion is ) required in this likelihood function scale as as opposed to , resulting in computational speedups of several orders of magnitude. Furthermore, the epoch-averaged covariance matrix can take on several forms depending on the red noise model used; however, as long as it is a slowly varying function of the TOAs (i.e., a truly red process) then this formalism is completely valid.

In order to attain further computational speedups and to gain more control over the low frequency component of our noise model we make use of the methods described in Lentati et al. (2013), but now applied to a single pulsar instead of the full PTA. This method relies on explicitly splitting up the red and white components of the residuals, so that the residuals are now written as

 δt=Mδξ+nwhite+nred+s, (28)

where and are the white and red components of the residuals, respectively. It is possible to expand the red noise piece in a Fourier series

 nred=Nmode∑j=1[ajsin(2πjtT)+bjcos(2πjtT)]=Fa, (29)

where is a vector of the concatenated sine and cosine amplitudes, is the total time span of the data, and is a matrix with alternating sine and cosine terms with the number of frequencies used. Now, we assume that the underlying ensemble average red noise process is wide-sense stationary and can be completely described by a power-spectrum. Then, by orthogonality, the Fourier coefficients will be diagonal with components

 φij=⟨aaT⟩ij=diag({φi}), (30)

where the elements of , denoted are the coefficients of the theoretical power spectrum of the red noise process in the residuals. If the red noise process is wide-sense stationary, then this relation is always true irrespective of the sampling as all information about the uneven sampling here comes from the Fourier design matrix . Thus, we can write the covariance and epoch-averaged covariance matrices, respectively, as

 C =N+FφFT (31) ~C =~Fφ~FT, (32)

where is a matrix and is constructed in the same manner as but the epoch-averaged TOAs are used as opposed to the full set of TOAs. As is done in Lentati et al. (2013), it is possible to treat each diagonal element of as a free parameter; however, for this work we choose to parameterize it by a power-law

 φi=1TA2red12π2(fifyr)3−γredf−3i, (33)

where is the th Fourier frequency assuming Nyquist sampling. In general, any Fourier based method with finite length datasets and especially with irregular sampling will suffer from spectral leakage whereby power from the lowest frequencies will leak into the higher frequencies. In effect, this makes Fourier based methods sensitive to the low-frequency cutoff. However, it was shown in van Haasteren & Levin (2013) that by including the effects of the timing model (specifically the quadratic spin-down in this case) in our analysis acts as a window function that fully removes any sensitivity to the low-frequency cutoff, thereby also removing any spectral leakage. We have done extensive simulations to test this notion and have found no evidence for spectral leakage and no bias in red noise parameter estimation and waveform reconstruction.

In the course of our single pulsar noise analysis (Ellis et al., 2014) we found that the addition of an extra white noise parameter was needed to accurately describe the data. This new white noise term incorporates a correlation among frequency channels (within a given epoch) while still remaining independent of other epochs. In other words, this white noise term accounts for epoch-to-epoch fluctuations as opposed to fluctuations within an epoch. We defer to another paper the inclusion of pulse-jitter noise from pulsar magnetospheric activity but point out that our inferred extra term may be the same as jitter noise known to be present in all well-studied pulsars (Cordes & Shannon, 2010). This parameter is quite easy to incorporate as it is simply an EQUAD like parameter in the epoch-averaged sense, that is

 J=U~JUT=J2UIqUT, (34)

where is our frequency correlated EQUAD parameter and is the identity matrix in the epoch-averaged space. With this, we have our final noise model with a total covariance matrix of

 C=N+U(~Fφ~FT+J2Iq)UT, (35)

and noise parameter vector

 →ϕ={E,Q,J,Ared,γred}. (36)

Throughout the remainder of the paper, this noise model is always used for all pulsars.

### 4.3. Fp-Statistic

The -statistic was first derived from the likelihood function of Eq. (19) in Ellis et al. (2012) (hereafter ESC12) as a “total-power” frequentist detection statistic. We will not derive the full expression here (See Appendix B for an alternate derivation to ESC12), but rather we will explain its functional form and discuss its statistics. First we define the following harmonic basis functions:

 B1α(t) =1ω1/30sin(2ω0t) (37) B2α(t) =1ω1/30cos(2ω0t), (38)

where, again, is the orbital angular frequency of the SMBHB. Following ESC12, the -statistic is written as

 2Fp=M∑α=1PiαQαijPjα, (39)

where we have assumed Einstein Summation notation over latin indices, , and the formal sum is over all pulsars in the array. An intuitive way to think of this statistic is a weighted (by the noise power spectral density) sum of the power spectrum of the residual data done in the time domain by making use of a harmonic time domain basis. It was shown in ESC12 that follows a chi-squared distribution with degrees of freedom and non-centrality parameter such that

 ⟨2Fp⟩=2Npsr+ρ2. (40)

Figure 1 shows the distribution of the -statistic (top panel) for both real and simulated data as well as as a -value test (bottom panel) for each pulsar, where we compare the single-pulsar distribution to the expected chi-squared distribution. To compute the -statistic, we have used the maximum a-posteriori noise values obtained in a previous single-pulsar noise analysis to construct the noise covariance matrix. Since we do not have independent realizations of our data, we compute the -statistic for each independent151515Note that the frequencies are not completely independent since our data are irregularly sampled. The frequency bins were chosen here assuming a cadence of two observing sessions per month. frequency bin and then construct a histogram of the results. If our noise model is a good description of the true noise in our data and there is no GW present in the data then this distribution should follow the correct chi-squared distribution. We see from Figure 1 that the -statistic values do indeed follow a chi-squared distribution with degrees of freedom. The black(blue) curve in the top panel of Figure 1 shows the aforementioned histogram along with the chi-squared distribution in the dashed gray(red) line. The -value that results from a Kolmogorov-Smirnov (KS) test comparing the and chi-squared (with 34 degrees of freedom) distributions is 0.33 showing good agreement between our data and the expected chi-squared distribution. As a cross-check, we have also simulated 100,000 datasets with the measured noise parameters and have evaluated the -statistic for each. This distribution is plotted as a gray(green) histogram in the figure and it is obvious that this distribution follows a chi-squared distribution with 34 degrees of freedom nearly perfectly. We have also performed a similar test but for each pulsar separately. In the lower panel of Figure 1 we carry out the same KS-test mentioned above but now compute the -statistic values for each pulsar individually and then compare to a chi-squared with 2 degrees of freedom. The solid line corresponds to the p-value at which we should reject the null hypothesis that the two distributions are the same with 99.7% confidence. We see that with the exception of one pulsar, J16402224, all others lie above this threshold value. This indicates that our noise model for all pulsars except J16402224 provide a good description of the true noise in the dataset. Better noise models for this pulsar are currently being explored (Ellis et al., 2014) but since our full 17-pulsar -statistic distribution is totally consistent with the expected chi-squared distribution we just use the standard noise model described in Section 4.2.

For the detection problem, we are interested in the false-alarm-probability (FAP), that is, the probability that a measured value exceeds a given threshold when no signal is present. From ESC12, the probability distribution of when the signal is absent is

 p0(Fp)=Fn/2−1p(n/2−1)!exp(−Fp), (41)

where is the number of degrees of freedom ( in this case). The corresponding FAP is then written as

 PF(Fp,0)=∫∞Fp,0p0(Fp)dFp=exp(−Fp,0)n/2−1∑k=0Fkp,0k!. (42)

In a search over GW frequencies (the only free parameter in the -statistic) we will incur a trials factor such that the resulting FAP for the search is

 PTF(Fp,0)=1−[1−PF(Fp,0)]Nf, (43)

where is the number of independent frequencies. For this work we place our detection threshold on such that the corresponding FAP is less than . The results of performing this search on the 5-year NANOGrav dataset will be presented in the next section.

### 4.4. Bayesian Method

The Bayesian search pipeline in this work is very similar to that of Ellis (2013) (hereafter E13). Here we use an MPI enabled Parallel-Tempered Markov Chain Monte-Carlo (PTMCMC) sampler (See E13 and Appendix C.1 for details on the implementation). In this work we use two “modes” of operation for the Bayesian search. The first is the most general in which we evaluate the full likelihood function of Eq. (19) and allow both the GW parameters, , and the noise model parameters, to vary simultaneously. In principle, this is the more desirable setup as it allows the uncertainty in our noise model to propagate into the measured GW parameters and also accounts for any correlations between the noise and GW parameters. This mode does require significantly more computational power as the number of search parameters in the MCMC is quite large. The total parameter space consists of 8 GW parameters, pulsar distances, and noise parameters; this comes to 110 parameters for the full 17-pulsar array.

The second mode is when we fix the noise parameters to their maximum a-posteriori values obtained from a previous single pulsar analysis. All previous GW searches for single sources have been performed in this manner (Jenet et al., 2004; Yardley et al., 2010; Babak & Sesana, 2012; Ellis et al., 2012; Petiteau et al., 2013; Ellis, 2013) which is justified if the noise model only contains white noise and the GW signal present in any single dataset is weak. If the noise model contains only white noise, there will be little to no correlation between the GW parameters and the noise parameters, and if the signal is weak then it will not affect the single pulsar noise analysis. However, there is some evidence of red noise in our pulsars and because of the highly varying noise levels among pulsars, it is likely that a detectable source would be seen in the best timed pulsars individually. Therefore, this type of Bayesian analysis is not robust and could possibly lead to biased results; nonetheless, we will carry out this mode for comparison purposes in this study. Note that we will have the same problem with the -statistic. Possible methods to ameliorate this problem in fixed-noise searches are being explored and will be the subject of a future paper.

In a Bayesian sense, the detection problem is an application of model selection and the upper limit problem is an application of parameter estimation. The upper limit problem will be discussed in the next section. The model selection technique is very similar to that discussed in E13 and we will only review it here (See Appendix C.2 for more details). In the Bayesian framework, the data (here we use as opposed to the actual residual data to remain general) are assumed to be fixed and the parameters that parameterize a hypothesis (or model) are assumed to follow a given prior distribution. In this case, the data are used to update our prior knowledge of the hypothesis via Bayes theorem.

 p(Θ|d,H)=p(d|Θ,H)p(Θ|H)p(d|H), (44)

where is the posterior probability distribution, that is, the probability that the set of parameters for hypothesis could generate the given data . In the above expression is the likelihood function, the probability that this dataset is drawn from a random distribution described by hypothesis and parameterized by . Lastly, the prior encompasses any prior knowledge we have about the given hypothesis and is the marginalized likelihood or evidence

 p(d|H)=∫dΘp(d|Θ,H)p(Θ|H). (45)

For the purposes of parameter estimation we can safely ignore the evidence in Bayes theorem since it is just a normalizing factor that does not depend on the model parameters . However, if we want to perform model selection to claim a detection or compare different waveforms then the evidence is crucial. In this case we can make use of the Bayesian odds ratio between models “” and “

 O=p(d|HA)p(d|HB)p(HA)p(HB), (46)

where the first factor is known as the Bayes Factor which is our confidence in one model over the other based on the data (henceforth we will denote the Bayes factor as ) and the second term is the prior odds ratio for models and which describes our prior belief in both models. In this paper we consider only the Bayes factor, and assume the prior odds are even. (The choice of the prior odds will determine the false-alarm rate of a detection scheme based on the odds ratio (Vallisneri, 2012).

In this work we wish to compare the signal model parameterized by to the null-hypothesis model parameterized by . When we allow the noise and GW parameters to vary simultaneously we will need to compute the evidence for both models, and separately. On the other hand, when we fix the noise parameters we can simply evaluate the likelihood-ratio in (22) and use this to compute the evidence. In this case, since the null-hypothesis model has no free parameters, the Bayes factor is simply given by the evidence computed using the likelihood ratio. In practice, to compute the evidence we make use of thermodynamic integration as detailed in E13 and Appendix C.2. The results of our Bayesian search and verification on injections will be discussed in the next section.

#### 4.4.1 Priors

In a Bayesian analysis, especially when using parallel tempering and thermodynamic integration, it is very important to choose reasonable priors so that we are not exploring areas of parameter space that have been ruled out by previous experiments. We choose isotropic priors on all angular parameters and uniform priors in the log of the chirp mass with , luminosity distance with Mpc, and frequency of the GW with Hz. We impose an additional condition on the average strain amplitude such that , where and Hz. This value is chosen so that the maximum strain is well above the level of detection. Essentially this is a cheap way to impose a correlated prior on chirp mass, luminosity distance, and GW frequency. The normalization is computed through Monte Carlo integration. For the pulsar distance prior we use the current electromagnetic (EM) measurements either from timing parallax or Very Long Baseline Interferometry (VLBI) corresponding to the best measured values taken from Verbiest et al. (2012) (10 pulsars) if available, otherwise, we use the values from the Australia National Telescope Facility (ATNF) pulsar catalog (Manchester et al., 2005) which have distances based on dispersion measure and the NE2001 Galactic electron-density model (Cordes & Lazio, 2002, 2003). For pulsars without parallax distances we assume a 20% uncertainty on the distance. Using this information, we write the distance prior as follows

 p(→L)=Npsr∏α=11√2πσ2αexp(−(Lα−LEMα)22σ2α), (47)

where is the best measured distance for the th pulsar and is the 1-sigma uncertainty on that distance measurement. In principle it would be more correct to use a Gaussian prior for the parallax, which is proportional to . If the variance on the parallax is quite large then the corresponding prior on distance will differ significantly, namely it will have a long tail towards higher distances. However, for the pulsars used in this analysis, the distance uncertainty is small enough that the two prior distributions are effectively the same and we are safe in using a Gaussian prior on the pulsar distance itself; however, for future analyses we will move to Gaussian priors in .

For our noise parameters, we use priors that are uniform in the EFAC in the range , uniform in the log of the EQUAD with EQUAD s, uniform in the log of the jitter value with the same range as the EQUAD, uniform in the log of the red noise amplitude with , where the amplitude is in GW units, and uniform in the red noise spectral index with . We impose a further prior on the red noise such that the variance is less than the unweighted standard deviation of the pulsar timing residuals, where

 (48)

with the total observation time and the power spectrum of the red noise. This prior essentially restricts the model from considering red noise dominated residuals, which is a very good approximation (Perrodin et al., 2013; Ellis et al., 2014). This prior is chosen because it leads to much more computationally efficient runs by allowing us to run fewer high temperature chains in the Thermodynamic Integration scheme (See Appendix C.2 for more details). In principle this red noise prior is illegal in the sense that it uses the data (i.e., the variance of the residuals); however, this prior restricts access to an area of parameter space that is not supported by the likelihood function. Therefore, by omitting this area of parameter space the evidence calculation for each model, and , will be biased slightly low. However, the likelihood function evaluated at this area of parameter space is essentially zero, and this slight bias will be negligible.

## 5. Results

In this section we report the results of our frequentist and Bayesian searches, provide verification of the pipeline on injected signals and report on several upper limits.

### 5.1. Verification

First, it is interesting to determine how much each pulsar in the 17-pulsar array will contribute to the overall SNR (signal-to-noise ratio) when a GW is present. In Figure 2 we plot the fraction , where is the single pulsar SNR, for each pulsar in the array.

To compute this fraction we simulate 5000 GW realizations (with parameters drawn from isotropic distributions in all angles and distributions uniform in the log of chirp mass and frequency) and calculate the single pulsar and total PTA SNR from Eq. (24). The black(blue) points in the plot show the mean and standard deviation of the aforementioned ratio for each pulsar and the gray(green) curve is a simple naive scaling of , where is the weighted RMS of the th pulsar’s TOA uncertainties. It is obvious that J1713+0747 contributes more than 55% of the SNR on average, and PSRs J19093744, J0030+0451, and J06130200 contribute 10% on average. As we see from the gray(green) curve, this is very consistent with the overall scaling with the inverse of the variance of the noise; however, PSRs J0030+0451 and J06130200 carry a higher percentage because they are located opposite to the bulk of other pulsars on the sky, and therefore will contribute more to the SNR for GWs coming from that side of the sky due to the antenna pattern response. This calculation does not mean that we advocate only timing the pulsars with the highest timing precision. Although many of the lower timing precision pulsars do not help with continuous GW detection or parameter estimation, they are essential for detection and parameter estimation of a stochastic GW background (see e.g., Siemens et al., 2013).

The fact that one pulsar dominates the total SNR means that it will be harder to make a confident GW detection as we require the same GW signal (with quadrupolar correlations) to be present in all pulsars. In other words, if the GW is only “seen” in one or two pulsars then it is hard to distinguish it from some other effect due to the pulsar timing model, ISM effects or some other systematic effect. This also implies the need to run a Bayesian analysis where both the noise and GW parameters are allowed to vary simultaneously. This does not mean that a continuous GW would not be a valid interpretation of a loud sinusoidal signal in one pulsar, only that statistically we do not have enough information to confidently claim a detection. Furthermore, if we did have a loud detectable signal, parameter estimation would be quite poor with the current NANOGrav PTA as there would be large degeneracies in the sky location (due to the small effective number of detector baselines), making sky localization and binary orientation estimates very poor. However, NANOGrav is currently timing 43 pulsars with microsecond or better precision. Also, new ultra-wideband receivers (DuPlain et al., 2008) have increased timing precision by a factor of 1.7 for many of the pulsars in this 5-year. More pulsars and better timing precision could help ameliorate some of the limitations we have with the 5-year data set.

Despite the potential limitations discussed above, we verify the efficacy of our pipeline by running both the frequentist and Bayesian pipeline on a synthetic dataset with an injected GW source. To create the synthetic dataset we first compute the residuals of our 17 NANOGrav pulsars using the tempo2 (Hobbs et al., 2006) package. Next we subtract these residuals from the site arrival times, thereby producing a new set of arrival times that match our timing model perfectly. To each set of idealized TOAs we then add a Gaussian noise process with the same characteristics as those measured in the real data, and a GW signal using the fully evolving signal model. We then use these new TOAs to produce a set of synthetic residuals. For this simulation we have chosen to inject a signal with SNR 10 and parameters .

The -statistic pipeline was run on this synthetic dataset. Since we are treating this injection as if it were a true blind search, we must first run a single-pulsar noise analysis to determine the maximum a posteriori noise parameters; however, since a strong continuous GW and red noise will be covariant we have included a single frequency sinusoid as part of our noise analysis for each pulsar. This is implemented by simply adding a free amplitude and frequency parameter to the noise model discussed above. While this may appear to be special treatment for the injected signal, we have run the same noise model on the real data and find no evidence for any sinusoidal features. After we have obtained the maximum a-posteriori noise parameters (not including the sinusoid parameters), we use these values to construct the noise covariance matrix for use in the -statistic as well as the fixed-noise Bayesian search. By performing the noise search with an included sinusoid but not including it in our noise covariance matrix in the subsequent GW analysis we are sidestepping the problem of the GW being absorbed into red noise parameters.

We have carried out this analysis and the results are shown in Figure 3 where we plot vs. GW frequency when using the measured noise values (black) and the true injected noise values (grey). The vertical dashed line indicates the injected frequency and the horizontal dashed line represents our detection threshold corresponding to a FAP of . To compute the total FAP over all frequencies we make use of Eq. (43) and choose , resulting in a total FAP of which is a decisive detection. The number of independent frequencies is difficult to calculate when we are using many datasets with very irregular sampling. In this work we have chosen as this corresponds to and Hz. The upper limit on frequency was chosen because our approximate observing cadence is 2.5 weeks and the frequency spacing was chosen by imposing the condition that the autocorrelation function of when no signal is present drops to half of its maximum value at that frequency lag. This analysis shows us that we can indeed detect a continuous GW if it is present in our data by conducting a fully blind search; however, we also see that our results will not be conclusive as there are several frequencies at which the is above our threshold value. From comparison with true-noise case, we see that the uncertainty (and residual correlations between GW and noise parameters) in the noise parameters can lead to confusing results. This again, is mostly due to the fact that our sensitivity is dominated by a small number of pulsars. Because of this, we caution against using a fixed-noise method to make final detection statements but instead advocate these methods as a first round in a suite of analyses.

Both Bayesian pipelines (with and without varying noise parameters) were run on this synthetic dataset. For both runs we have used PTMCMC and thermodynamic integration as discussed in Appendix C.2. Due to the large parameter spaces when using the full GW and noise model, we have chosen to use only the pulsars that contribute more than 1% to the injected SNR, resulting in 6 pulsars, J17130747, J19093744, B185509, J00300451, J06130200, and J10125307. Here we use the same noise parameters as mentioned above for our fixed-noise search. Even though these estimates are different from the true noise parameters, we nonetheless achieve a log-Bayes factor of 27.4 for the fixed-noise search (a log-Bayes factor greater than 5 is considered decisive evidence). However, as we mentioned earlier, we should not totally trust this level of evidence as it does not fully incorporate our uncertainty in the noise model. When we run an analysis where we allow the noise and GW parameters to vary simultaneously we only achieve a log-Bayes factor of 5.35. While still decisive, this search is much less sensitive to the GW; nonetheless, this search is the most robust and will be the real test as to whether or not one can trust a real GW detection candidate. Of course these results could change depending on the noise realization or GW parameter combinations. A more detailed study of this is warranted but beyond the scope of this paper. Nonetheless, the large spread of overall noise levels in modern PTAs will most likely make the confident detection of a continuous wave GW very difficult.

### 5.2. Search Results

First we will discuss the results of the -statistic search on the real 17-pulsar NANOGrav data. To carry out the analysis we have computed for many frequencies with Hz. These frequencies were chosen based on the fact that the approximate cadence is 2.5 weeks. The results of this search are shown in Figure 4 where the solid black line is the value at each frequency, the dotted, dash-dotted, and dashed lines are the value of corresponding to a 1.0%, 0.5%, and 0.1% FAP, respectively, where these values are calculated from Eq. (42). Furthermore if we maximize over frequencies then the total FAP, accounting for the trials factor is very nearly 1, indicating that we should accept the null hypothesis (no visible GW signal) with very high confidence.

We will now briefly discuss the results of both Bayesian searches. To carry out this analysis we have run our PTMCMC and computed the Bayes factors for each case. In the first case we allow the noise parameters and GW parameters to vary and explicitly compute the Bayesian evidence via thermodynamic integration for a model with a GW and noise and a model with just noise. In the second case, we fix the noise parameters to the maximum a-posteriori obtained from single pulsar analyses and only allow the GW parameters to vary. As mentioned above, the second case is not reliable since there is likely to be correlations between the GW and noise parameters; however, we give the results of both searches for completeness. As above, in the case of a true continuous GW signal we can get very different results from a fixed-noise versus a varying noise search. However, in our case the log-Bayes factor for searches with and without varying noise parameters is and , respectively, both indicating that there is no evidence for a continuous GW and a model consisting of noise is preferred. We further note that this is completely consistent with our frequentist analysis.

### 5.3. Upper Limits

In this section we will outline the procedures used to compute both the frequentist and Bayesian upper limits on the strain amplitude, . First we wish to make state that an upper limit on the strain amplitude does not mean that we would have detected a signal with that amplitude with 95% confidence, it simply means that the true value of the amplitude is less than the upper limit with 95% probability (this probability measures the frequency of measuring that value of the amplitude in the frequentist case and the degree of belief in the true value of the amplitude in the Bayesian case). In the following sections we will discuss the mathematics of upper limit computation in the frequentist and Bayesian frameworks, and then we will lay out our computational procedure.

#### 5.3.1 Frequentist Approach

From a frequentist viewpoint, the data are random while the signal parameters are fixed but unknown (i.e., we construct probability distributions for the data, or rather some function of the data, given a set of signal parameters), whereas in the Bayesian framework the data are fixed and the signal parameters are uncertain (i.e., we construct probability distributions of the signal parameters given a dataset). From the above statement it then follows that frequentist upper limits are derived from integrating the probability distribution of some statistic of the data (the -statistic in this case) at a fixed value of the parameter of interest, and Bayesian upper limits are derived from integrating the probability distribution of the parameter of interest for the given data set.

More formally, the probability distribution of the -statistic given a value of the strain amplitude is

 p(Fp|h0)=∫p(Fp|h0,~λ,n)p(~λ)p(n)d~λdn, (49)

where is a reduced parameter space vector, is the sampling distribution of (these sampling distributions are identical to the prior probability distributions in the bayesian case), is a noise timeseries drawn from the distribution

 p(n)=1√det2πCexp(−12nTC−1n), (50)

with the covariance matrix of the noise in the pulsar timing residual timeseries, and is the probability distribution function for the statistic for given values of and and a given noise realization . An upper limit on at confidence level is then computed as

 α=∫∞Fp,0p(Fp|h0)dFp=⟨1NN∑i=1{1ifFp,i≥Fp,00otherwise}⟩, (51)

where the observables are drawn from the “signal distribution”, , and the average, , is over that distribution. In other words, we integrate the probability distribution of the -statistic over the so called “signal space” (i.e., from the measured value to infinity), that is, we count the number of signal realizations that gives an -statistic value larger than the one measured in the actual dataset. This integral can take on any value for a given ; therefore, the integral must be repeated with different values of until for a upper limit.

In practice, we carry out the following computational procedure:

1. Measure the value from the real 17-pulsar NANOGrav dataset as described in Section 4.3.

2. Simulate a synthetic noise vector for each pulsar, where is the Cholesky decomposition of the noise covariance matrix , and is a unit variance, zero-mean, vector.

3. Choose strain amplitude, and construct a GW waveform for each pulsar where the parameters, are drawn from the distribution .

4. Construct a new set of residuals for each pulsar , where is the so called fitting projection matrix introduced in Demorest et al. (2013) and Ellis et al. (2013).171717We choose to create residuals with the matrix rather than re-fitting the timing model with tempo2 in order to simulate many datasets quickly. We have done many tests to make sure that we get the same results using both the matrix and using a full tempo2 run.

5. Now measure the value for the simulated dataset.

6. Repeat steps 2–5 10,000 times and measure the number of realizations that result in .

7. Repeat steps 2–6 with different values of until 95% of simulations result in .

In the remainder of the paper we will choose to compute upper limits on the strain amplitude as a function of GW frequency or GW sky location at a fixed GW frequency. To facilitate such upper limits we simply fix the parameters (either GW frequency or sky location) when simulating waveforms in step 3.

#### 5.3.2 Bayesian Approach

As mentioned above, in the Bayesian framework we do not rely on simulations as we treat the data as fixed and integrate the posterior pdf of the parameter of interest to compute the upper limit. In principle, a Bayesian upper limit is much more simple and intuitive than a frequentist upper limit. To compute a Bayesian upper limit we compute an integral that is analogous to Eq. (51)

 α=∫hup0dh0∫d~λd→ϕp(δt|h0,~λ,→ϕ)p(h0)p(~λ)p(→ϕ)=∫hup0dh0p(δt|h0)p(h0), (52)

where is the likelihood function, , , are the prior probability distributions on , , and , respectively, where denotes the noise model parameters. In words, we simply integrate the marginalized posterior distribution of until the desired credible region corresponding to a probability of is reached at . As in the frequentist case, we want upper limits on the strain amplitude as a function of GW frequency or sky location. In this case we simply fix the parameters and then marginalize over the others. In practice, to compute the Bayesian upper limits we carry out a separate MCMC run for fixed values of frequency and/or fixed sky locations and then compute the 95% upper limit for each. The choice of prior on is very important and can lead to very different upper limits. Such a detailed analysis of priors is beyond the scope of this work but will be addressed in a future paper. In principle, our prior distribution should come from population synthesis models (Sesana, 2013); however, since we wish our upper limits to be informed by our data and not dominated by our prior distribution we use a very conservative181818On a logarithmic scale this prior prefers higher strain values a priori; however, it is conservative in the sense that the corresponding upper limit will not overestimate our sensitivity and the limit will not depend on the lower bound of the prior as is the case for logarithmic priors. prior that is uniform in with .

#### 5.3.3 Sky Averaged Strain Upper Limits

In Figure 5 we report the 95% upper limits on the strain amplitude, , as a function of GW frequency computed using the methods described above for the frequentist and Bayesian pipelines. The gray(red), thick black(blue) and thin black(purple) curves are the 95% upper limits on strain amplitude computed using the -statistic, Bayesian method with fixed-noise values, and Bayesian method with varying noise values, respectively. There are several features in the plot that require explanation. First, the decrease in sensitivity at and is due to the sky position and parallax fitting in the timing model, respectively. The upward trend at lower frequencies is due to the quadratic spin-down model fit. The noisiness of the frequentist upper limit is due to the fact that -statistic distribution at higher frequencies is indeed quite noisy when computed using the real data, and since our upper limits compare the value measured in real data to values measured in simulated data, this noisiness is to be expected.

If we compare our results to those of Yardley et al. (2010), we see that the upper limits using the 5-year NANOGrav datasets are a factor of 2 to 3 times more constraining. The main reason for this improvement is the higher timing precision of the NANOGrav dataset as compared to the older PPTA data sets (Verbiest et al., 2009). Although the procedures for setting frequentist and Bayesian upper limits is quite different, our results are very similar. In part, this is due to the fact that we have used a uniform prior on the strain amplitude, , making our Bayesian analysis very similar to a pure likelihood analysis. Since the -statistic is just the likelihood (ratio) maximized over amplitudes, we would expect a likelihood analysis to give similar results. Note that the Bayesian upper limits when varying the noise parameters are somewhat less constraining than the fixed-noise case. This is to be expected since at lower frequencies the GW amplitude and red noise amplitude are somewhat correlated and at higher frequencies the GW amplitude and jitter parameter are somewhat correlated. Both correlations will result in slightly worse upper limits on the GW amplitude when allowing the noise parameters to vary.

In Figure 5, the dashed curves indicate lines of constant chirp mass for a source with a distance to the Virgo cluster (16.5 Mpc) and chirp mass of and , respectively and the gray(green) squares are the strain amplitude of the loudest GW events in 1000 Monte Carlo realizations using an optimistic phenomenological model of Sesana (2013). The model used here produces a stochastic GW background with dimensionless strain amplitude of , just below the current upper limits presented in Shannon et al. (2013). astrophysically, these upper limits tell us that we can essentially rule out any source with at the distance to the Virgo cluster (16.5 Mpc); however, our horizon distance falls just short of the Virgo cluster for sources with . Furthermore, we see that all sources from monte-carlo realizations (gray(green) squares) have strain amplitudes below our upper limits indicating that it is very unlikely that we will see a resolvable source at the current sensitivity (consistent with our search results). It is important to note; however, that these strain amplitude upper limits are averaged over sky location and inclination angle (either through marginalization in the Bayesian case, or from Monte Carlo sampling in the frequentist case), both of which play a large part in the overall amplitude of the signal. Therefore, these results have the caveat that they make statements about the average sensitivity to such GW sources; however, it is still unlikely (i.e., probability of detection ) that we could detect even the loudest optimally oriented source shown in Figure 5. For face on systems (i.e., ) and sky location near the best timed pulsars, the overall amplitude of the GW can be 5 times larger than the averaged strain amplitudes reported here.

#### 5.3.4 Angular Upper Limits

In Figures 6 and 7 we report the 95% lower limit on the luminosity distance as a function of sky location computed using the frequentist and Bayesian techniques, respectively. We have chosen to present our results in terms of the luminosity distance instead of the strain amplitude as it is a true physical parameter and it gives a more intuitive feel as to what the data can constrain. To compute this lower limit we carry out the same procedure as above but we fix the frequency to Hz and compute an upper limit on the strain amplitude as a function of sky location; we can then use Eq. (17) to convert an upper limit on strain amplitude into a lower limit on luminosity distance.

The values in the color bar are calculated assuming a chirp mass of and a frequency of Hz but this can be scaled to determine the minimum luminosity distance for any chirp mass value and GW frequency. In Figures 6 and 7 the white diamonds represent the locations of the 17 NANOGrav pulsars used in the analysis and the black(white) stars are the sky locations of potential GW hotspots (Simon et al., 2013) and possible GW source candidates (Valtonen et al., 2008; Iguchi et al., 2010; Ju et al., 2013).

We will now discuss the features of this sky-dependent upper limit computed using the frequentist -statistic. Firstly, we notice that the overall distribution is quite similar to the antenna pattern response (i.e., ) as is to be expected in the case of no detection. Due to this, we are most sensitive (larger lower limit on luminosity distance) at sky locations near the best timed pulsars (i.e., J1713+0747, B1855+09, J1909-3744) and least sensitive in the opposite direction. More quantitatively, we note that in the most sensitive areas of the sky we can constrain the luminosity distance Mpc for . Furthermore, it is possible to constrain the luminosity distance Gpc in the most sensitive sky locations if we consider chirp mass sources. It should be noted that the Bayesian fixed-noise search gives nearly identical results to the fixed-noise frequentist search.

We now move to the sky-dependent upper limit computed using the full Bayesian technique where the GW and noise parameters are varied simultaneously. The first observation that we make is that the overall scale is about a factor of 2 lower than the fixed-noise frequentist or Bayesian upper limit. At first this may be surprising given the general agreement of the sky-averaged upper limits of Figure 5; however, full Bayesian sky-dependent upper limits exacerbate the problem of relatively few pulsars contributing to the overall PTA sensitivity as shown in Figure 2. Another difference in this upper limit, as opposed to the frequentist upper limit, is that it does not quite match the expected antenna pattern response function. These differences are due to the fact that we are simultaneously varying the GW and noise parameters, and when only one or a few pulsars contribute to the PTA sensitivity, there is a degeneracy between intrinsic red-noise processes in the pulsar and a common GW among all pulsars. In other words, it is very difficult to distinguish between a low-frequency continuous GW and a red noise process if only a small number of pulsars have sufficiently low noise levels to resolve the GW.

Because Bayesian upper limits marginalize or integrate over all parameters except the amplitude, the correlations between the GW and the red noise amplitude will broaden the 1-d pdf of the amplitude and thus will result in larger upper limits as opposed to the fixed-noise case. As is clear from Figure 7, the aforementioned effect is very strong for GW sky locations near our best timed pulsars. For example, we are not most sensitive to GWs around the sky location of PSR J17130747 because this pulsar contributes a very large percentage of the overall SNR of the GW in this case and thus results in a very large correlation between the GW and red noise amplitudes.

Since, at the moment, we have no way of measuring the noise properties of the pulsars independently of any GWs that may be present in the data, to perform a completely robust upper limit or search we must allow both to vary simultaneously. Given this reality, we must view any fixed-noise results with the caveat that they assume that the noise parameters are measured perfectly and are independent of any GWs in the data.

Unfortunately, many of the GW hotspots and potential SMBHB sources are located at insensitive sky locations, for both frequentist and Bayesian analyses, where our lower limit on distance only allows us to constrain sources. This fact is a great argument for aggressive pulsar search campaigns and the addition of new pulsars to the PTA at sky locations that are currently insensitive (Burt et al., 2011).

#### 5.3.5 Constraints on the SMBHB Coalescence Rate

A non-detection of continuous GW, as we have presented here, allows us to compute an upper limit on the rate of SMBH coalescences using methods presented in Wen et al. (2011). Since we have made no detections, we assume Poisson statistics for the probability of an event (i.e., a detectable signal) occurring, that is, the probability of no events is , where is the expected number of events. We use this probability distribution function to place a 95% upper limit on the expected number of events such that , telling us that . Therefore, if the expected number of events were greater than 3, at least one source would have been detected with 95% probability. Now, following Wen et al. (2011), the expected number of events is

 ⟨N⟩=∫d2Rdlog10(1+z)log10(Mr)(dfdt)−1×Pd(Mr,z,f)dlog10(1+z)dlog10(Mr)df, (53)

where is the probability of detecting an SMBHB with chirp mass , redshift , and observed GW frequency . Following the derivation in Wen et al. (2011) and making the assumption that the differential coalescence rate does not vary significantly over the range and , it is possible to show that

 d2Rdlog10(1+z)log10(Mr)<15∫(dfdt)−1Pd(Mr,z,f)df. (54)

In order to compute the detection probability , we make use of the statistic. We use the same method that we have used for the upper limits, except now we compare the the value of computed using simulated data with injections to a specified threshold based on a FAP of . We use 10,000 realizations at each value of and . After the probability of detection is computed, we numerically integrate the above expression to obtain a limit on the differential coalescence rate. It should be noted that we will be able to place more meaningful constraints on the coalescence rate using upper limits on the amplitude of a stochastic background of SMBHBs; however, this is beyond the scope of this work and will be addressed in a future paper.

In Figure 8 we plot our constraints on the differential coalescence rate as a function of redshift. Since we have made the assumption that this differential coalescence rate does not vary significantly over an order of magnitude in chirp mass, the results presented here are for the case. We are unable to place meaningful constraints on less massive systems. The light gray(red) shaded area is constructed using the model presented in Jaffe & Backer (2003) along with measurements from the Sloan Digital Sky Survey (Wen et al., 2009). The medium gray(blue) shaded area is constructed by considering the different galaxy merger rates based on observations (Sesana, 2013) along with the most recent MBH-sigma relation from McConnell & Ma (2013). The dashed line comes from an a posteriori implementation of the McConnell & Ma (2013) MBH-sigma relation into the semi-analytic model of Guo et al. (2011) assuming accretion onto both SMBHs before merger. The black(green) shaded region is constructed by using the observed evolution of the galaxy mass function combined with the MBH-M-stars relation from McConnell & Ma (2013) to calibrate an analytical model for evolving the mass function via mergers (McWilliams et al., 2012). The figure shows that the coalescence rate for MBHs of is poorly constrained. This is mostly because of the steepness of the galaxy mass function at such high masses: a small change in the slope results in a large variation in the sparse population of black holes. The intrinsic scattering (e.g. Gültekin et al., 2009) and poor knowledge of the behavior of the MBH-galaxy relations at the high mass end (e.g. Lauer et al., 2007) add further uncertainties, making the coalescence rate estimate problematic. As is clear from the figure, we are unable to place any constraints on the physical models mentioned above; however, as our GW sensitivity improves with time, we will begin to place meaningful constraints on physical models pertaining to the coalescence rate of SMBHBs.

## 6. Conclusions

In this paper we have performed various searches for continuous GWs from non-spinning SMBHBs in circular orbits using both frequentist and Bayesian techniques. Specifically, we have run a fixed-noise frequentist and Bayesian pipeline, as well as a varying noise Bayesian pipeline. In the absence of any detections we have placed upper limits on the strain amplitude of continuous GWs as a function of GW frequency. We have also computed a lower limit on the distance to such SMBHBs as a function of sky location, as well as placing constraints on the differential coalescence rate of such SMBHBs. Our sky-averaged upper limits on strain amplitude as a function of frequency are a factor of times more constraining than the previously published upper limits (Yardley et al., 2010) and we see good agreement between all three data analysis methods. Although improving, our limits still lie well above the amplitudes of individual sources produced from several realizations of an optimistic SMBHB population. We have shown that with good estimates of the intrinsic noise we can rule out any sources with luminosity distance Gpc and a chirp mass of . Unfortunately we are not yet able to place any constraints on predictions for the coalescence rate of SMBHBs obtained from both theory and observations.

Throughout the paper we have made several statements about what is needed for completely robust data analysis techniques and what will be required from future PTAs in order to secure a confident detection of a continuous GW. These statements can be summarized as follows:

1. Currently we have no way to confidently separate intrinsic noise in the residuals from any GW that may be present. Therefore, it is necessary to include both noise and GW parameters in any data analysis pipeline that aims to be truly robust. This is not to say that fixed-noise methods should not be used; instead we advocate a hierarchical approach where the faster fixed-noise methods are used as a first-pass and then followed up with a full GW plus noise search. Lastly, a signal with more information, such as that from an eccentric system, could help break this degeneracy between signal and noise models and will be the subject of a future paper.

2. Even with simultaneous noise and GW characterization, unless we have several well timed pulsars (with very similar timing precision on all) with decent sky coverage, a confident detection of a continuous GW is unlikely even if the signal is loud.

While not as likely as a detection of a stochastic GW background, with continually improving timing precision, the addition of new pulsars to PTAs and improved data analysis techniques, prospects are good for obtaining astrophysically constraining GW limits, or possibly even a detection of a continuous GW, over the next decade.

The authors would like to thank Neil Cornish and Stephen Taylor for many useful discussions regarding the data analysis methods presented in this work. The NANOGrav project receives support from the National Science Foundation (NSF) PIRE program award number 0968296. All computational work was performed on the Nemo cluster at UWM supported by NSF grant number 0923409. NANOGrav research at UBC is supported by an NSERC Discovery Grant and Discovery Accelerator Supplement. XS and JE were partially funded through an NSF CAREER award number 0955929. JE was partially funded through the Wisconsin Space Grant Consortium. MV was supported by the Jet Propulsion Laboratory RTD program. Parts of this work were performed at JPL under contract with the National Aeronautics and Space Administration. Data for this project were collected using the facilities of the National Radio Astronomy Observatory and the Arecibo Observatory. The National Radio Astronomy Observatory is a facility of the NSF operated under cooperative agreement by Associated Universities, Inc. The Arecibo Observatory is operated by SRI International under a cooperative agreement with the NSF (AST-1100968), and in alliance with Ana G. Méndez-Universidad Metropolitana, and the Universities Space Research Association. Work at Cornell University was supported by NSF PIRE award 0968126. R.vH. is supported by NASA Einstein Fellowship grant PF3-140116.

## Appendix A Gravitational Wave Frequency Evolution

Since GWs will radiate power away from a SMBHB source, to compensate for this loss of energy the orbital separation must decrease with time. Equivalently, though Kepler’s third law; GW radiation will cause the orbital frequency to increase with time. By setting the power radiated in GWs equal to the change of orbital energy due to increasing orbital frequency, , we obtain

 ˙ω=965M5/3ω11/3, (A1)

where is the chirp mass and is the orbital frequency of the binary system (note ). We can now use this expression to analytically solve for the orbital frequency as a function of time

 ∫tt0dt=596M−5/3∫ω(t)ω(t=t0)dωω−11/3t−t0=5256M−5/3(ω−8/30−ω(t)−8/3)∴ω(t)=ω0(1−2565M5/3ω8/30(t−t0))−3/8, (A2)

where is the time at which the first measurement was made on earth and is the initial orbital frequency. For a circular orbit, we define the phase to be

 dΦdt=ω. (A3)

We can solve this equation similarly

 ∫Φ(t)Φ(t=t0)dΦ=∫tt=t0dt′ω(t′)Φ(t)−Φ0=∫ω(t)ω(t=t0)dωω˙ω=596M−5/3∫ω(t)ω(t=t0)dωω−8/3∴Φ(t)=Φ0+132M5/3(ω−5/30−ω(t)−5/3), (A4)

where, again, .

Eqs. A2 and A4 are true in general and can be applied when the frequency evolves appreciably over the total observing time. However, it is very useful to work under the assumption of slowly evolving binaries where , with the observing time and

 Tchirp=ω0˙ω=3.2×105yr(M108M⊙)−5/3(f01×10−8Hz)−8/3. (A5)

Since typical PTA observations are on the order of 10 – 20 years and , this is a safe assumption for a broad range of masses and initial orbital frequencies of interest. With this approximation we can write the orbital frequency and phase for the earth term simply as

 Φe(t) =Φ0+ω0(t−t0