A Transdimensional Bayesian Approach to Pulsar Timing Noise Analysis
Abstract
The modeling of intrinsic noise in pulsar timing residual data is of crucial importance for Gravitational Wave (GW) detection and pulsar timing (astro)physics in general. The noise budget in pulsars is a collection of several well studied effects including radiometer noise, pulsephase jitter noise, dispersion measure (DM) variations, and low frequency spin noise. However, as pulsar timing data continues to improve, nonstationary and nonpowerlaw noise terms are beginning to manifest which are not well modeled by current noise analysis techniques. In this work we use a transdimensional approach to model these nonstationary and nonpowerlaw effects through the use of a wavelet basis and an interpolation based adaptive spectral modeling. In both cases, the number of wavelets and the number of control points in the interpolated spectrum are free parameters that are constrained by the data and then marginalized over in the final inferences, thus fully incorporating our ignorance of the noise model. We show that these new methods outperform standard techniques when nonstationary and nonpowerlaw noise is present. We also show that these methods return results consistent with the standard analyses when no such signals are present.
pacs:
I Introduction
Pulsar timing has been used as a tool to probe important aspects of astrophysics and fundamental physics including the indirect detection of gravitational waves (GWs) Hulse and Taylor (1975), the precise determination of neutron star masses (Demorest et al., 2010), precise tests of General Relativity and alternative theories of gravity Stairs (2003), constraints on equationofstate physics, and several other areas Lorimer (2008). Furthermore, the goal of modern PTAs is the direct detection of lowfrequency gravitational waves (GWs) which could usher in a new era of astronomy and astrophysics Lommen (2015).
As pulsar timing becomes more precise, increasingly robust and sophisticated noise modeling will be needed in order to make (astro)physical inferences from the data. In the last several years noise modeling has become an active area of research within the pulsar community (Cordes and Shannon, 2010; Shannon and Cordes, 2010; van Haasteren and Levin, 2013; Keith et al., 2013; Lentati et al., 2014; Cordes et al., 2015; Lam et al., 2015a, b) and has proven crucial to current timing and GW detection efforts (Arzoumanian et al., 2014; The NANOGrav Collaboration et al., 2015; Zhu et al., 2015; Lentati et al., 2015a; Arzoumanian et al., 2015; Babak et al., 2016; Reardon et al., 2016). However, even with current sophisticated methods there are still several areas in which our current noise models are inadequate:

We currently have no model for nonwhite nonstationary noise features which could result from a transient instrumental or astrophysical source.

We have no robust way of determining the validity or the goodness of fit of our lowfrequency rednoise modeling.
In this work we seek to address these two problems by using a transdimensional approach in which a variable number of wavelets model the nonwhite, nonstationary features and a variable number of interpolation control points determine the spectrum of the stationary red noise. Inferences on any parameters of interest (i.e. timing model, GW parameters, etc.) are then marginalized over all possible models (i.e. marginalized over the number of wavelets and/or control points) thereby incorporating our uncertainty in the noise model itself as opposed to the standard practice of choosing one model and incorporating the uncertainty from the parameters within that model. There is precedent for this type of analysis in other GW experiments, namely the BayesWave Cornish and Littenberg (2015) and BayesLine Littenberg and Cornish (2015) algorithms used in LIGO/Virgo analyses, both of which serve as a starting point for this work.
The paper is organized as follows: In Section II we review current Bayesian data analysis techniques use in pulsar timing. In Sections III and IV we introduce the nonstationary and adaptive spectral modeling techniques. In Section V we review the Bayesian algorithms used to implement our transdimensional models. In Section VI we test our analysis on simulated data and compare against traditional techniques. Finally we conclude in Section VII.
Ii Standard Pulsar Timing Data Analysis
The basic data analysis techniques used in this work are very similar to that presented in The NANOGrav Collaboration et al. (2015); Arzoumanian et al. (2015). Here we will briefly review the formalism and set up the likelihood function for this work.
ii.1 Likelihood
Pulsar timing data consists of a set of timesofarrival (TOAs) or the radio pulses and a timing model. Our timing residuals (difference of measured and modeled TOAs) for a single pulsar can be broken into individual components as follows:
(1) 
The term describes inaccuracies in the subtraction of the timing model, where is called the timing model design matrix, and is a vector of small timing model parameter offsets. The term describes all lowfrequency signals, including lowfrequency (“red”) noise, with a limited number of Fourier coefficients . Our harmonics are chosen as integer multiples of the harmonic base frequency , with the length of our dataset . The matrix then has alternating sine and cosine functions. The term describes noise that is fully correlated for simultaneous observations at different observing frequencies, but fully uncorrelated in time. The matrix is an matrix that maps the observation sessions to the TOAs. The vector describes the white noise per observation session that is fully correlated across all observing frequencies. The term is simply any other deterministic feature that we choose to include in our model. The last term, , describes Gaussian white noise that is assumed to remain in the data, after correcting for all known systematics. The white noise is assumed to be an uncorrelated and Gaussian as is described in The NANOGrav Collaboration et al. (2015); Arzoumanian et al. (2015).
We define the noisemitigated timing residuals as
(2) 
where is our best approximation of , given our knowledge of all the noise and signal parameters. The likelihood can now be written
(3) 
where is the covariance matrix of the white noise. We have collectively denoted all parameters not directly represented by , , and as . We group the linear signals as follows:
(4) 
which allows us to elegantly place a Gaussian prior on the coefficients of these random processes. The prior covariance is:
(5) 
resulting in a prior:
(6) 
where is a diagonal matrix of infinities, which effectively means we have a uniform unconstrained prior on the timing model parameters . As described in The NANOGrav Collaboration et al. (2015), this representation allows us to analytically marginalize Eq. (3) times Eq. (6) over the waveform coefficients resulting in the marginalized likelihood
(7) 
with . The Woodbury matrix identity (Woodbury, 1950) can be used to evaluate Eq. (7) efficiently. Notice that this is nearly identical to the likelihood of The NANOGrav Collaboration et al. (2015); Arzoumanian et al. (2015) except that now we are including the additional nonlinear deterministic feature .
The parameters that describe are the hyperparameters . The hyperparameters of the diagonal matrix are the perbackend TEMPO2 ECORR parameters. The matrix represents the spectrum of the lowfrequency noise Denoting frequency bin with , we can write:
(8) 
where is the noise power spectral density (PSD) at frequency .
ii.2 Bayesian analysis
Bayesian inference is a method of statistical inference in which Bayes rule of conditional probabilities is used to update one’s knowledge as observations are acquired. Given a model , model parameters , and observations , we write Bayes rule as:
(9) 
where is the posterior probability (probability of obtaining the model parameters given the data and model ), is the likelihood function (probability of obtaining the data given the model parameters and model ), is the prior probability of the model parameters given a model , and is the marginal likelihood or the Bayesian evidence, usually denoted by .
The lefthand side of Eq. (9) can be regarded as the “output” of the Bayesian analysis, and the righthand side is the “input” modulo the evidence term. Indeed, provided we have a generative model of our observations (meaning we can simulate data, given the model parameters), we know the likelihood and prior. However, for parameter estimation we would like to know the posterior, and for model selection we need the evidence.
For parameter estimation, the evidence is usually ignored, and one can use directly to map out the posterior distribution (up to a normalizing constant) and to estimate confidence intervals. When is higherdimensional, MonteCarlo sampling methods are typically used to perform this multidimensional integral. We use Markov Chain Monte Carlo methods in this work to sample the posterior distribution.
Model selection between two models and can be carried by calculating the “Bayes factor”: the ratio between the evidence for the two models. Assuming we have a prior degree of belief of how likely the two model are ( and ), we can write:
(10) 
where is the Bayes factor, and is the odds ratio. The odds ratio can be obtained by calculating the evidence for each model separately (e.g. with Nested Sampling or thermodynamic integration), or by calculating the Bayes factor between two models directly (e.g. with transdimensional markov chain Monte Carlo methods).
Iii Nonstationary Noise Model
To model nonstationary noise events we use a nonorthogonal frame of MorletGabor wavelets with a functional form
(11) 
where is the width of the gaussian envelope, is the amplitude, and are the central time and frequency of the wavelet and is a phase offset. This frame is simply a matter of choice and could be replaced by any other basis, such as shapelets (Refregier, 2003). However, this basis is well suited to modeling nonstationary transient events in our residual data because they are localized in both time and frequency and their shape in timefrequency space can adapt to fit the signal. Thus, to model nonstationary noise events we use a collection of independent wavelets where the number of wavelets, , is a free parameter that must be constrained by the data. In principle, this analysis is similar to that used in Lentati et al. (2015b); Lentati and Shannon (2015) where shapelets are used to model the pulse profile; however, in this case the number of wavelets is free to vary, where in the shapelet analysis separate runs must be done in order to find the number of shapelets with the highest Bayes factor.
As can be seen from Eq. (11), the MorletGabor wavelet is a function of five parameters, thus this analysis includes an additional over the standard gaussian noise model. Our RJMCMC implementation will therefore balance the goodnessoffit for the likelihood with the model complexity (i.e., prior volume) of the additional wavelets. Getting the correct balance of the two aforementioned effects is a conceptually and computationally difficult problem that depends heavily on the choice of priors on the wavelet parameters and on the number of wavelets.
iii.1 Wavelet Priors
As we will see, the priors on the wavelet amplitude are quite important in balancing goodnessoffit with the model complexity. While the priors on the other parameters will likely have some effect on our transdimensional model, for this work we use uniform priors on , , , and with ranges , with weeks, , , where , and are the start and end times of the residuals timeseries, and . Here we will only focus on choosing a good amplitude prior that in effect will limit the number of wavelets used in the RJMCMC. When doing standard fixed dimension noise analysis it is standard to use a loguniform prior on the amplitude of deterministic sources unless we have some apriori information; however, in this transdimensional model a loguniform prior will lead to many wavelets being used that essentially just explore the prior and contribute little to the signal being modeled. As is used in the BayesWave algorithm (Cornish and Littenberg, 2015), a more reasonable prior is one that is based on the SNR of the wavelet
(12) 
where is the wavelet waveform and is the full covariance matrix from Eq. (7). In practice, computing the SNR as defined above carries nearly the same computational cost as that of computing the likelihood. Furthermore, to use the SNR in a prior we will need to compute it for every wavelet used in the model. Therefore, it is computationally undesirable to use the full SNR but to instead use an SNR proxy that is computationally efficient. For this work we choose a simple white noise inner product
(13) 
where is the diagonal white noise matrix and is the amplitudefree waveform. In the simulations presented later in this paper we have found that this overestimates the true SNR by a factor of two in many cases; however this proxy is very fast to compute and scales similarly with the other wavelet parameters as the true SNR. Using this SNR proxy, we parameterized the signal in terms of as
(14) 
where is now the free parameter. As was done in Cornish and Littenberg (2015) we would like to use a parameterized prior on that peaks at some characteristically detectable value and slowly decreases as gets large; however, since we are using an SNRproxy the choice of the peak in the distribution is problem specific and is very difficult to determine without some apriori knowledge. For this reason we have chosen a simple uniform prior of for this work. Finally we choose a uniform prior on the number of active wavelets, . We have experimented with various exponential priors on but we have found that these have little effect and the choice of wavelet amplitude prior is the dominant factor in constructing a parsimonious model. Further consideration of these prior choices will be addressed in a future work.
Iv Adaptive Spectral Modeling
As mentioned in Section I, current noise analysis methods rely on specifying a model apriori and then performing Bayesian model selection to determine the best model. However, in many cases several different spectral models seem to describe the data equally well. This uncertainty in the model and the computational complexity of performing several different runs to determine the Bayesian evidence has led us to develop a modified version of the BayesLine algorithm (Littenberg and Cornish, 2015) to model the red noise in our residual data. Here we model the power spectral density (PSD)
(15) 
for . We have chosen to carry this interpolation out in the log frequencyPSD space in order to facilitate numerical stability and in order to fit the fewest parameters. In the course of this analysis we have tried other interpolation methods, including cubic splines as are used in BayesLine (Littenberg and Cornish, 2015) in both log and linear frequencyPSD space. It was found that our pulsar timing residual data does not have enough frequency evolution to warrant a more sophisticated interpolation technique. We note that even though we may not have a free control point at every frequency we do in fact include all frequencies in the analysis differentiating this method from those of Lentati et al. (2014) in which various combinations of frequencies were used in order to determine the socalled “optimal” model.
In principle, one could allow the frequencies at which we place the control points to be free parameters but this could lead to many ordering and double counting problems. Instead we place the frequencies on a regularly spaced grid with in the range , where is the number of frequencies. In our case we use unless stated otherwise. We assume uniform priors on both the logarithm of the PSD amplitude, , and on the number of active control points, The control points at the lowest and highest frequencies are always active, thus the simplest spectrum is parameterized by two parameters in this setup. Therefore, this analysis encapsulates both the standard powerlaw and free spectrum analyses which are parameterized by 2 and parameters, respectively. In practice, during the RJMCMC, control points can be subtracted and added with varying amplitudes. This means that the data decide the shape and complexity of the spectrum instead of prescribing a model apriori.
V Bayesian Techniques
v.1 Reverse Jump MCMC
The main driver of this transdimensional noise modeling technique is the RJMCMC algorithm, a variant of traditional MCMC where the model itself is included the explored parameter space. Traditionally RJMCMC has been used as a model selection tool by computing the Bayes factor of competing models as the ratio of the iterations spent in each model. In our analysis it is used more as a tool to perform a type of Bayesian model averaging. In other words, we are more interested in the ability of the RJMCMC to map out the transdimensional distribution and provide marginalized distributions over the full modelspace than we are interested in choosing a specific model via the resulting Bayes factors of the RJMCMC.
In the RJMCMC framework we must include a separate Metropolis Hasting step that proposes the transdimensional move. The parameters of model are drawn from the proposal distribution . Since our model is nested (i.e. there are shared parameters between models), we hold all other parameters fixed while the new parameters are drawn from the proposal distribution. The criteria for accepting this transdimensional jump is the Hastings ratio
(16) 
where the Jacobian accounts for any change in dimension across models. However, if our jump proposals yield parameter values directly instead of a set of random numbers that are then used to determine the new model parameters, then the Jacobian can be neglected. Choosing a good proposal distribution is notoriously difficult and is the main drawback in using RJMCMCs. A simple choice for a RJ proposal distribution is a random draw from prior. If the models are nested and the difference in dimension between models is relatively low then this prior draw usually provided adequate mixing.
As such, we only use prior draws for our RJ proposals in this work, for both the wavelet parameters and interpolation control points. Furthermore, we always propose to either add or subtract one component (i.e., one wavelet or one control point) and never more. In principle we could choose to add several wavelets or control points at one time but it is unlikely to be accepted. To add a new wavelet, we simply draw all 5 wavelet parameters from the prior (using either the SNR or the loguniform prior for the amplitude). To subtract a wavelet, we randomly choose between the active wavelets at that iteration. The technique is very similar for control points. To add a new control point, we first randomly choose a frequency at which to place the control point and then draw from the loguniform prior to choose the amplitude. To subtract a control point we randomly choose one from the active control points. When modeling both wavelets and control points simultaneously, we always hold one fixed and jump in the other, that is, we never propose to add or subtract wavelets and control points simultaneously. Of course, more complicated and more efficient proposals could be made for both the wavelet and spectral models and this will be addressed in the future; however, for this work we find that these random uniform draws provide quite adequate mixing.
v.2 Parallel Tempering
Although uniform transdimensional proposals provide adequate mixing, the transdimensional acceptance rates are quite low (typically 6% for the wavelet model). This means that we have to run for many iterations in order to adequately explore the full transdimensional model space. In order to ameliorate this problem we use a parallel tempering method introduced in Cornish and Littenberg (2015) that allows different chains to be in different models, thus jumps between different temperature chains are also transdimensional jumps. These parallel tempering moves greatly enhance the inter and intramodel mixing. Currently we use a geometric temperature spacing that is tuned to give 40% acceptance for temperature swaps to the coldest chain. In future versions of the code we will incorporate a more sophisticated adaptive temperature spacing as was introduced in Vousden et al. (2015).
Vi Results
Here we test our nonstationary and adaptive noise modeling techniques. In all cases we have used the time stamps and timing model for PSR B185509 from the NANOGrav 9year data release (The NANOGrav Collaboration
et al., 2015) and inject white gaussian noise consistent with the TOA uncertainties using libstempo
vi.1 BayesWavePTA
Here we test our nonstationary noise modeling techniques described in Section III. As mentioned above we test this technique on a simulated dataset of pulsar B185509 that contains an injected white noise burst with amplitude of ns and a width of 250 days. A white noise burst is simply a realization of noise from a white PSD with a gaussian window that is localized in time. We have analyzed these data with the nonstationary model using both a uniform SNR and a uniform logamplitude prior. In both cases we allow up to 30 wavelets with a uniform prior on the number of wavelets. The results of this analysis are shown in Figures 1 and 2.
In the top panels of Figures 1 and 2 we have the dailyaveraged residuals plotted in blue, the injected waveform in green, the MAP waveform (chosen from the model with the highest Bayes factor) in red, and the 90% uncertainty on the waveform (marginalized over all models) in gray. The bottom left plot we show the utility as a function of number of wavelets. Here utility is simply the ratio of the number of iterations spent in a model with a given number of wavelets to the total number of iterations in the RJMCMC run. In the bottom right, we plot the histogram of the normalized residuals (normalized by the TOA uncertainties) when including (blue) and not including (red) the MAP wavelet model. The green curve is a zeromean unit variance Gaussian distribution.
First, we point out that although this “event” seems quite large, it is quite comparable to a real event in the published data of PSR B185509 (see e.g. Figure 27 of The NANOGrav Collaboration et al., 2015). Next, we see that for both choices of amplitude prior we recover the injected waveform very well and recover the Gaussian nature of the underlying white noise shown in the bottom right panels. Most important, however, is the number of wavelets that the data prefer under the two different amplitude priors and how this effects the uncertainty in the recovered waveform. In the case where we have used a uniform prior on the SNR of the wavelet we see from the bottom left panel of Figure 1 that the data prefer 6 wavelets and can support up to 11 wavelets, albeit with a much lower utility (utility is proportional to the Bayesian evidence). Furthermore we see from the wavelet model uncertainty (marginalized over all wavelet models) in gray in the top panel of the figure that the wavelet model only contributes to modeling the white noise burst and not any other features in the data.
Alternatively, for the loguniform amplitude prior we see that the data prefer wavelets but can support resulting in very broad spread in the number of wavelets that are allowed by the data. Furthermore, we see from the waveform uncertainty at early times in the reconstructed waveform (gray shaded area of the top panel of Figure 2) that many wavelets are being placed far away from the white noise burst. The fact that this prior allows for many more wavelets that are effectively sitting below the white noise level can be understood by the fact that there is quite a large prior volume at low amplitudes since a uniform prior in is proportional to a prior on the amplitude of . In contrast, the uniform SNR prior is much more similar to a uniform amplitude prior. Thus, apriori the loguniform prior prefers low amplitude wavelets whereas the uniform SNR prior prefers high amplitude wavelets.
As mentioned above we want to choose a prior on the wavelet amplitude so that we use the minimum number of wavelets to model the signal. As we have seen from this example, a uniform SNR prior performs admirably in this respect whereas a loguniform amplitude prior fails this test.
In Figure 3, we perform the analysis without the wavelet model using standard noise models with a powerlaw (top panel) and freespectrum (bottom panel) red noise parameterization.
In comparison with Figure 1 we see that while the waveform is accurately recovered in both cases, the uncertainty in this inferred waveform is much larger than in our transdimensional wavelet model. Most importantly we note that the uncertainty is very large in the regions where the injected waveform is zero. This is mostly due to the fact that both the powerlaw and freespectral models are constrained to be a realization of a stationary gaussian process. Furthermore we point out that the large uncertainty and periodic structure seen in the uncertainty region for the freespectrum model is due to covariances between the power spectral amplitudes and the sky location and parallax terms in the timing model. This feature is unavoidable in such an unconstrained model. Finally, in these simulations we have only included white noise and the white noise burst, in such cases where there is steep red noise and transient behavior, it is likely that standard methods would perform much worse.
Lastly, we show that this kind of noise analysis could be crucial for placing tight constraints or making a detection of a stochastic GW background. In Figure 4 we plot the marginalized posterior distribution of the dimensionless GW strain amplitude using the wavelet model above and the standard powerlaw red noise model, plotted in blue and green, respectively. For this simulation we use the same data that was analyzed above.
In both cases we model the red noise via a powerlaw and include an additional noise term with a fixed spectral index (13/3) corresponding to a stochastic GW background of SMBHBs. For the wavelet model we allow up to 30 wavelets. As we see from Figure 4, we can constrain the amplitude of a potential stochastic GW background significantly better when we include the additional wavelets in the noise model. Specifically, the upper 95% upper limits on the GWB amplitude for the wavelet and nonwavelet model are and , – a factor of 3.5 improvement. As our PTA data becomes more precise over time, this additional noise term could prove critical in either detecting or constraining the stochastic GW background.
vi.2 BayesSpecPTA
Here we will test the adaptive spectral modeling techniques of Section IV on three cases that we call the null, intermediate, and extreme cases. For each simulation we will recover the spectrum using the adaptive technique, a standard powerlaw, and free spectral components.
Null Case
For the null case, we inject a realization of a gaussian process that follows a pure powerlaw distribution with a power spectral index of as one would expect from a gravitationally wave driven isotropic stochastic background from a population of SMBHBs in circular orbits. We have dubbed this the null case since the spectrum is fully described by two parameters and this is the simplest model that the adaptive spectral modeling technique can achieve. In Figure 5 we show the results of our analysis on this null case. The middle panel shows the posterior probability density of the recovered spectrum from our adaptive method. The solid black lines are the 90% credible intervals and the median value. The dashed line is the PSD of the injected noise process. The top panel shows the utility of the various control points as a function of frequency. Here we define the utility as the ratio of the number of iterations of the RJMCMC that a given control point was active to the total number of iterations. In this case we see that none of the additional control points was active for a significant number of iterations. The bottom panel again shows the posterior probability density of of the recovered spectrum using the powerlaw model.
The gray points and error bars are the median and the 90% confidence interval on each spectral component using the free spectral technique. From Figure 6 we can see that in this case the adaptive technique does indeed favor no additional control points (i.e. the spectrum is parameterized by two parameters). Furthermore we see from Figure 5 that the recovered spectrum is very similar for the adaptive and powerlaw models. The posterior is slightly broader in the adaptive case due to the fact that we allow for more than two spectral components and although the data favors only two, it is clear from Figure 6 that the data does not heavily disfavor more than two components. Furthermore, the free spectral model is less constraining in the lowest frequency component and does not constrain other frequencies significantly.
Intermediate Case
For the intermediate case we have injected a realization of a gaussian process that follows a broken powerlaw distribution. In this case we see in Figure 7 that the spectrum is well recovered with the adaptive technique and that the data strongly prefer an additional control point at the turnover frequency in the spectrum. Furthermore, we see from Figure 8 that, overall, the data prefers to describe the spectrum by three parameters. This is exactly the behavior that we wish to see in that the most parsimonious model to describe a broken powerlaw contains three parameters. Meanwhile, the bottom panels of Figure 7 shows that the powerlaw noise model significantly overestimates the low frequency noise and the free spectral model loosely models the power around the turnover in the spectrum. In fact, the powerlaw model does not even recover the spectrum within its 90% credible interval, thus failing basic Bayesian consistency. This case really demonstrates the power of the adaptive technique in which it correctly identifies the most parsimonious model containing three parameters while still incorporating the uncertainty associated with more or less components, all while producing an accurate reconstruction of the power spectral density.
Extreme Case
For the socalled extreme case we have injected a realization of a gaussian process that follows a distribution with two distinct peaks. In this case we see in Figure 9 that the spectrum is well recovered with the adaptive technique and that the high frequency peak is clearly distinguishable while the low frequency peak is not as constrained. In this case, as can be seen in both Figures 9 and 10, the model complexity is in strong competition with the goodnessoffit in that data nearly equally prefer either a shallow powerlaw (2 parameters) that models the power at the peaks of the PSD but also the frequencies in between, or the more complex spectrum that models both the peaks and valleys of the PSD. These features show the utility of this kind of analysis in that one does not have to apriori choose a model for the spectrum and try to find the best model from the data but instead we allow the data to constrain the model while marginalizing over our uncertainty in that model. For comparison, from the bottom panel of Figure 9 we see that, because of the rigidness of the powerlaw model, it overestimates the power at low frequencies and underestimates the power at the high frequency peak. Furthermore, we see that even the free spectral method does not significantly constrain the PSD at either peak.
vi.3 Combination of Both Methods
In this section we combine both the wavelet and adaptive signal modeling in one large RJMCMC where the number of wavelets and the number of control points are both free to vary. In principle, this method should be near optimal in that we allow the data to decide the complexity of the nonstationary and stationary noise processes simultaneously; however, in practice it can be difficult to distinguish between PSDs with significant high frequency power and transient signals. Therefore, in many cases, although we are able to recover the actual waveform in the data, it is difficult to separate out the stationary and nonstationary components. Nonetheless, because neither the wavelet modeling nor adaptive spectral modeling can optimally account for all noise sources alone, we recommend this combined approach.
To test this combined method we once again simulate a data set with low frequency red noise with a broken power spectrum (black dashed line in Figure 11) and a white noise burst. We have used a uniform SNR prior for the wavelet amplitudes and loguniform prior for the amplitude of the interpolation control points. The results of this analysis are shown in Figure 11. In the top panel we plot the daily averaged residuals (blue) along with the injected (green) and MAP (red) red noise and transient noise signal realization and the corresponding 90% credible region in gray. The middle panel shows the posterior probability of the recovered PSD along the injected PSD (dashed black line). The bottom panel shows the utility (same definition as above) of the number of wavelets and number of spectral components used in the model.
First, we see that the model recovers the injected waveform very well through a combination of 5 wavelets and a broken power spectrum. Although this is a different realization of the same white noise burst used above the complexity of the wavelet model is directly comparable to that of Figure 1. Furthermore, we see the spectrum is recovered fairly well but there is still a large spread in which many spectral components could be used. This is due to the covariances between the wavelet model and the spectral model and is likely unavoidable unless we have some aprior knowledge of the transient and/or the stationary noise spectrum.
Vii Discussion and Conclusions
As the precision of PTAs increases, sophisticated and robust noise modeling techniques are quickly becoming one of the most important aspects in the pulsar timing process. In this paper we have presented a transdimensional noise modeling approach that lets the data decide the complexity of the model as opposed to traditional methods of choosing a specific model prior to analysis. Furthermore, this analysis does not seek to determine the “best” model but instead captures our inherent uncertainty in the noise model by marginalizing over a large range of models. In this first analysis we have modified the transient signal modeling and adaptive spectral techniques first developed in the LIGO context (Cornish and Littenberg, 2015; Littenberg and Cornish, 2015) for use in the PTA regime. To model nonstationary transient noise events we use a sum of MorletGabor wavelets and to model our stationary red noise PSD we use a set of control points for linear interpolation on a fixed frequency grid. In both cases, the number of wavelets and control points are free to vary and final inferences are made by marginalizing over the full transdimensional model. Through simulations, we have shown that these methods perform better than standard methods both when applied separately and together. Furthermore we have shown that in the presence of strong nonstationary noise, the implementation of these transdimensional techniques could result in significantly more constraining upper limits on a stochastic GW background.
While the application in this work has been on achromatic (with to respect to radio frequency) single pulsar analysis, these methods could also be applied to dispersion measure (DM) modeling techniques and other radiofrequency dependent noise sources. Furthermore, this type of analysis could also be applied to multipulsar GW analysis both for noise modeling and for GW detection of unmodeled signals or as a cross check to other modeled GW searches.
Acknowledgements.
JAE acknowledges support by NASA through Einstein Fellowship grant PF4150120. NJC was supported by NSF Physics Frontiers Center Award PFC1430284. Portions of this research were carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. This work was supported in part by National Science Foundation Grant No. PHYS1066293 and by the hospitality of the Aspen Center for Physics where this work was initiated.Footnotes
 When we refer to the PSD here we are referring only to the rednoise component and not the white component of the spectrum.
 https://github.com/vallis/libstempo
References
 R. A. Hulse and J. H. Taylor, 195, L51 (1975).
 P. B. Demorest, T. Pennucci, S. M. Ransom, M. S. E. Roberts, and J. W. T. Hessels, Nature (London) 467, 1081 (2010), eprint 1010.5788.
 I. H. Stairs, Living Reviews in Relativity 6, 5 (2003), eprint astroph/0307536.
 D. R. Lorimer, Living Reviews in Relativity 11, 8 (2008), eprint 0811.0762.
 A. N. Lommen, Reports on Progress in Physics 78, 124901 (2015), URL http://stacks.iop.org/00344885/78/i=12/a=124901.
 J. M. Cordes and R. M. Shannon (2010), eprint 1010.3785.
 R. M. Shannon and J. M. Cordes, Astrophys. J. 725, 1607 (2010), eprint 1010.4794.
 R. van Haasteren and Y. Levin, 428, 1147 (2013), eprint 1202.5932.
 M. J. Keith, W. Coles, R. M. Shannon, G. B. Hobbs, R. N. Manchester, M. Bailes, N. D. R. Bhat, S. BurkeSpolaor, D. J. Champion, A. Chaudhary, et al., 429, 2161 (2013), eprint 1211.5887.
 L. Lentati, P. Alexander, M. P. Hobson, F. Feroz, R. van Haasteren, K. J. Lee, and R. M. Shannon, MNRAS 437, 3004 (2014), eprint 1310.2120.
 J. M. Cordes, R. M. Shannon, and D. R. Stinebring, ArXiv eprints (2015), eprint 1503.08491.
 M. T. Lam, J. M. Cordes, S. Chatterjee, and T. Dolch, Astrophys. J. 801, 130 (2015a), eprint 1411.1764.
 M. T. Lam, J. M. Cordes, S. Chatterjee, M. L. Jones, M. A. McLaughlin, and J. W. Armstrong, ArXiv eprints (2015b), eprint 1512.02203.
 Z. Arzoumanian, A. Brazier, S. BurkeSpolaor, S. J. Chamberlin, S. Chatterjee, J. M. Cordes, P. B. Demorest, X. Deng, T. Dolch, J. A. Ellis, et al., Astrophys. J. 794, 141 (2014), eprint 1404.1267.
 The NANOGrav Collaboration, Z. Arzoumanian, A. Brazier, S. BurkeSpolaor, S. Chamberlin, S. Chatterjee, B. Christy, J. M. Cordes, N. Cornish, K. Crowter, et al., Astrophys. J. 813, 65 (2015), eprint 1505.07540.
 W. W. Zhu, I. H. Stairs, P. B. Demorest, D. J. Nice, J. A. Ellis, S. M. Ransom, Z. Arzoumanian, K. Crowter, T. Dolch, R. D. Ferdman, et al., Astrophys. J. 809, 41 (2015), eprint 1504.00662.
 L. Lentati, S. R. Taylor, C. M. F. Mingarelli, A. Sesana, S. A. Sanidas, A. Vecchio, R. N. Caballero, K. J. Lee, R. van Haasteren, S. Babak, et al., ArXiv eprints (2015a), eprint 1504.03692.
 Z. Arzoumanian, A. Brazier, S. BurkeSpolaor, S. Chamberlin, S. Chatterjee, B. Christy, J. Cordes, N. Cornish, P. Demorest, X. Deng, et al., ArXiv eprints (2015), eprint 1508.03024.
 S. Babak, A. Petiteau, A. Sesana, P. Brem, P. A. Rosado, S. R. Taylor, A. Lassus, J. W. T. Hessels, C. G. Bassa, M. Burgay, et al., MNRAS 455, 1665 (2016), eprint 1509.02165.
 D. J. Reardon, G. Hobbs, W. Coles, Y. Levin, M. J. Keith, M. Bailes, N. D. R. Bhat, S. BurkeSpolaor, S. Dai, M. Kerr, et al., MNRAS 455, 1751 (2016), eprint 1510.04434.
 N. J. Cornish and T. B. Littenberg, Classical and Quantum Gravity 32, 135012 (2015), eprint 1410.3835.
 T. B. Littenberg and N. J. Cornish, Phys. Rev. D 91, 084034 (2015), eprint 1410.3852.
 M. A. Woodbury, Memorandum report 42, 106 (1950).
 A. Refregier, MNRAS 338, 35 (2003), eprint astroph/0105178.
 L. Lentati, P. Alexander, and M. P. Hobson, MNRAS 447, 2159 (2015b), eprint 1412.1427.
 L. Lentati and R. M. Shannon, MNRAS 454, 1058 (2015), eprint 1509.07505.
 W. Vousden, W. M. Farr, and I. Mandel, ArXiv eprints (2015), eprint 1501.05823.