# Cyclic spectroscopy of The Millisecond Pulsar, B1937+21

###### Abstract

Cyclic spectroscopy is a signal processing technique that was originally developed for engineering applications and has recently been introduced into the field of pulsar astronomy. It is a powerful technique with many attractive features, not least of which is the explicit rendering of information about the relative phases in any filtering imposed on the signal, thus making holography a more straightforward proposition. Here we present methods for determining optimum estimates of both the filter itself and the statistics of the unfiltered signal, starting from a measured cyclic spectrum. In the context of radio pulsars these quantities tell us the impulse response of the interstellar medium and the intrinsic pulse profile. We demonstrate our techniques by application to 428MHz Arecibo data on the millisecond pulsar B1937+21, obtaining the pulse profile free from the effects of interstellar scattering. As expected, the intrinsic profile exhibits main- and inter-pulse components that are narrower than they appear in the scattered profile; it also manifests some weak, but sharp features that are revealed for the first time at low frequency. We determine the structure of the received electric-field envelope as a function of delay and Doppler-shift. Our delay-Doppler image has a high dynamic-range and displays some pronounced, low-level power concentrations at large delays. These concentrations imply strong clumpiness in the ionized interstellar medium, on AU size-scales, which must adversely affect the timing of B1937+21.

###### Subject headings:

methods: data analysis Ñ pulsars: general Ñ pulsars: individual (PSR B1937+21) Ñ ISM: general Ñ scattering^{†}

^{†}slugcomment: Submitted to ApJ 16th January 2013

## 1. Introduction

Although radio pulsars emit intrinsically broad-band radiation, spectroscopy of these sources often reveals a great deal of narrow-band structure (e.g. Rickett 1990). This structure arises during propagation of the signal through the interstellar medium (ISM), where it is scattered by inhomogeneities in the ionized gas — it is interference between the various scattered waves which creates the observed fringes. Consequently high resolution spectroscopy of pulsars has proved to be a powerful tool for investigating the ISM (Roberts and Ables 1982; Cordes and Wolszczan 1985; Stinebring et al 2001).

Traditionally pulsar spectroscopy is undertaken by forming the power-spectrum of the signal in a pulse-phase window where the flux is high (i.e. “on-pulse”), and subtracting the power-spectrum from a window where the flux is low (“off-pulse”), so as to remove the steady, background power level. Recently Demorest (2011) has drawn attention to an alternative approach, known as cyclic spectroscopy, in which one measures the modulation of the spectrum across the entire pulse profile. Cyclic spectroscopy was developed in engineering disciplines for studying signals whose statistics are periodically modulated (Gardner 1992; Antoni 2007). Signals of this type are common and are referred to as cyclo-stationary. The electric field received from a radio pulsar can be thought of as periodically-amplitude-modulated noise (Rickett 1975), so radio pulsars provide an example of a signal which is cyclo-stationary.

As described by Demorest (2011), cyclic spectroscopy has several advantages over the simpler method of differencing on-pulse and off-pulse power spectra. Periodic amplitude modulation of the pulsar’s radio-frequency noise, introduced by rotation of the pulsar beam, splits the received signal into upper- and lower-sidebands. By construction, the cyclic spectrum is the product of the lower sideband with the complex conjugate of the upper sideband. It is thus a complex quantity and as such it explicitly manifests information about the phase of any filtering which has occured prior to reception. For radio pulsars observed at low frequencies the dominant filtering is due to the ISM — specifically, to dispersion and scattering of the waves. Thus a time-domain representation of the filter is, to a good approximation, just the impulse-response of the ISM.

In the present paper we show how to determine the filter given a measured cyclic spectrum. We also show how to determine the intrinsic cyclic spectrum of the signal — in other words the (Fourier Transform of the) pulse profile which would have been observed in the absence of any scattering or dispersion. These determinations are both made in the narrow-band approximation, appropriate to our data, where there is assumed to be no variation of the intrinsic cyclic spectrum across the observed radio-frequency band. Our main dataset is a 4 MHz bandwidth voltage recording, centered on 428 MHz, of the original millisecond pulsar, B1937+21 (Backer et al 1982), made with the Arecibo radio telescope.^{1}^{1}1The Arecibo Observatory is operated by SRI International under a cooperative agreement with the National Science Foundation (AST-1100968), and in alliance with Ana G. Méndez-Universidad Metropolitana, and the Universities Space Research Association.

As far as we are aware the methods presented in this paper are the first attempts to determine both the filter and the intrinsic cyclic spectrum for any astronomical signal. It is possible that our techniques may be useful in fields other than pulsar astronomy, but we do not attempt to identify appropriate fields. Rather we encourage readers to consider applications in other contexts. To aid that process we note here the requirements for validity of our approach: first, the signal must be cyclo-stationary – i.e. stationary at each phase in its cycle – in order for the cyclic spectrum to be well-defined. Secondly, our least-squares fitting assumes that the intrinsic cyclic spectrum is just white-noise that is periodically amplitude-modulated, so non-pulsar applications of our techniques are limited to signals which can be described in this or similar fashion. And, finally, the filter must not change significantly within the averaging time over which each cyclic spectrum is constructed. In addition to these requirements, the stopping criterion we employ for our optimizations is based on the assumption of Gaussian noise; but it would be straightforward to modify that criterion. We note that source code is freely available for all the software used herein (see §5), so readers are free to adapt our code to their purpose.

This paper is organised as follows. In the next section we give some background to the particular problems tackled in this paper. Then in §3 we show how to determine the filter function and the intrinsic cyclic spectrum by direct construction. In §4 we consider the issue of optimization — i.e. obtaining representations of these quantities which best fit the measured cyclic spectrum. In doing so we see that our direct estimate of the intrinsic profile, given in §3, is in fact the optimum estimate in a least-squares sense. But §4 does highlight deficiencies in our direct approach to the filter function; so for this quantity we utilize a large-scale optimization of the filter coefficients. Our implementation of this optimization is coded in the language “C” and is freely available; it is described in §5. In §6 we present results obtained by applying our methods to low-frequency data on PSR B1937+21; both filter functions and intrinsic pulse profiles are presented. Discussion (§7) and Conclusions (§8) round out the paper. Two Appendices detail (i) the results of various tests we used to evaluate the code, and (ii) an analysis of the uncertainties in best-fit parameters.

## 2. Background and general considerations

Procedures for constructing the cyclic spectrum itself, from a set of recorded voltages, are given by Demorest (2011). We begin our development by quoting the relationship between a signal, , a function of time, with Fourier Transform , and the cyclic spectrum of that signal, . At modulation frequency we have (Gardner 1992; Antoni 2007; Demorest 2011)

(1) |

where the time-average is taken over integer multiples of the period of the system. Thus if we apply a filter, , such that the filtered signal is then the cyclic spectrum of the filtered signal is

(2) |

In the case of a radio pulsar the signals are just electric fields, and the frequency is the radio frequency. Filtering of the signal occurs as a result of propagation, notably dispersion and scattering in the ionized ISM, and in the process of reception, e.g. the bandpass filter. The filter resulting from interstellar propagation evolves on some time-scale, and the average in equation (1) must be restricted to times which are short compared to that evolution time-scale.

Throughout this paper we confine attention to the case of small fractional radio bandwidths, for which we expect the intrinsic cyclic spectrum to be approximately independent of :

(3) |

The quantity is already familiar to astronomers from conventional analysis of radio pulsar signals: it is just the Fourier Transform of the pulse profile. But we emphasise that it is the transform of the intrinsic pulse profile, rather than the transform of the measured pulse profile — the difference being that the latter includes the influence of scattering and other contributions to the filter .

In general both and are complex quantities, but in the particular case we obtain the zero-modulation-frequency components of the filtered and unfiltered signals, respectively. As these are just the time-averaged power-spectra of the signals they are non-negative real numbers.

### 2.1. Degeneracies

Before extracting estimates from our data it is necessary to identify and eliminate any degeneracies in the model. Equation (2) shows that there are degeneracies which are multiplicative in form. Writing

(4) |

we see that if and only if

(5) |

Thus if and are completely unconstrained then there may be a great deal of degeneracy between these quantities in our model of : features seen in the data might be attributed to the intrinsic spectrum or to the effects of an imposed filter.

In circumstances where the intrinsic cyclic spectrum is independent of radio-frequency (equation 3), the scope of the degeneracy is limited to functions such that is independent of . This condition should hold for all . In the case of small , by expanding to first order in , we see that the form of is restricted to those functions satisfying

(6) |

and

(7) |

Hence if we do not know the actual form of , then the filter function can only be determined up to an arbitrary multiplicative factor of

(8) |

where , and are real constants. In other words the overall normalization of , its phase and its phase gradient are all arbitrary, because the simultaneous transformation

(9) |

leaves unchanged.

If, however, is already known, from previous observations, then the only remaining degeneracy is the overall phase of . This phase is always arbitrary, as can be seen by noting that does not appear in equation (9).

### 2.2. Sampling

For a periodic modulation with period , as is the case with signals from a radio pulsar, the cyclic spectrum is expected to be zero everywhere except at , where is an integer, so those are the only modulation frequencies which we sample. In practice the data are also sampled discretely in the radio-frequency dimension, so we have measurements on a grid, with spacing , and which we are at liberty to choose. In choosing the primary consideration relates to structure in the filter function: if we wish to capture signal components which are delayed by times up to then we need to have a resolution . One could choose the resolution to be but that would entail a greater computational load in constructing the cyclic spectrum.

There is a natural limit to the fineness of the spectral resolution set by , corresponding to delays , where is the pulse period. If there are signal components at delays greater than half the pulse period then the cyclic spectrum is intrinsically undersampled in , because the modulation imposed by the filter function changes significantly on scales .

On the other hand there is no difficulty in setting , providing that there are no significant signal components at delays greater than .

Although the cyclic spectrum is normally computed on a rectangular grid, values at large and may not contain any information. If the voltage signal has a bandwidth , sampled at the Nyquist rate, then the resulting cyclic spectrum is only valid within a diamond-shaped region around the origin, with (Demorest 2011). We also note that there cannot be more information in the cyclic spectrum than was present in the sampled voltage signal from which it was derived. Thus if the cyclic spectrum includes pulse harmonic numbers (the number of pulses averaged-over), then the pixels in the cyclic spectrum may not be statistically completely independent. Because of these limitations, the actual number of constraints provided by the data may be smaller than the number of grid points in the cyclic spectrum.

### 2.3. Noise and bias

The computed cyclic spectrum includes measurement noise which we can characterize in the following way. Suppose that the recorded voltage is , then we expect the measured cyclic spectrum to be

(10) |

where the delta-function appears because the measurement noise is stationary. Thus our measured cyclic spectrum is free of noise bias except at .

Because modulation is the fundamental characteriztic of pulsar radiation which allows it to be distinguished from measurement noise, estimating the unmodulated part of the cyclic spectrum, , from is ambiguous. In this paper we therefore make no attempt to quantify , nor do we make direct use of in our estimates of the signal properties and . In turn this means that we are giving up any possibility of determining , the zero-frequency term in the Fourier representation of the intrinsic pulse profile. We therefore adopt the convention in our models throughout the rest of this paper.

The actual data which we record, , will differ from because of measurement noise and because the signal itself is stochastic in nature. If there is no averaging (see discussion following equation 14) the variance of the measured cyclic spectrum is given by (Antoni 2007)

(11) |

At zero modulation frequency, we recover from equation 11 the familiar result for stationary signals that the variance of the unaveraged power is just the square of the mean power.

For observations of radio pulsars with current instrumentation, noise power is usually the dominant contribution to and in this case we have

(12) |

If the measurement noise is white, as is often the case in practice, then equation 12 yields a uniform variance,

(13) |

over the entire cyclic spectrum. It is straightforward to estimate , because at zero modulation frequency the cyclic spectrum is just a power spectrum. Thus the noise level is just

(14) |

where is the system-equivalent-flux-density, is the integration time and the channel width. (Here we consider only a single polarization state, but clearly the results can be generalised to different combinations of polarization states.)

Equation 14 clarifies what is meant by the “no averaging” requirement immediately preceding equation 11. For cyclic spectroscopy of pulsars the natural choice of spectral resolution is , and we always have , where is the pulse period. Thus for we have a time-bandwidth product of unity – a single sample of the signal – and . Equation 11 is then appropriate to a single pulse, and if the cyclic spectrum is averaged over pulses the variance is smaller by a factor .

## 3. Direct construction of filter and profile

We now turn to the task of estimating the filter function (ISM impulse response) and the intrinsic (unscattered) pulse profile starting from a measured cyclic spectrum. We can approach both of these tasks by iteration, as we now describe.

### 3.1. Determining the filter function

Suppose we have a model for , but we have incomplete knowledge of . If we know the value of at a single frequency, , we can determine its value at nearby frequencies using the measured cyclic spectrum in the vicinity of , thus:

(15) |

We can make a better estimate of at a given frequency if we know several nearby values of . Making the replacement in eq. (2), multiplying by and summing yields

(16) |

where we have used the data, , as our estimate of . This equation allows us to construct , in regions where it is unknown, from nearby regions where it has already been determined, providing only that we have formed an estimate of . We note that equation (16) includes equation (15) as a special case where is known only at a single frequency.

Although the development in this section has focused on the idea of obtaining an estimate of at frequencies where it is not known, it is clear that one could employ equation (16) even if we already have an estimate of for all frequencies, so it can also be viewed as a procedure for updating an existing model of . We will return to this idea in §3.3 and §4.

### 3.2. Determining the intrinsic spectrum

Now suppose that we have a model for , what then do the data tell us about ? Multiplying equation (2) by and summing over gives

(17) |

where, again, we have used the data, , as our estimate for . Thus, given data and a model for the filter function, we can obtain an estimate of the intrinsic pulse profile implied by the observed cyclic spectrum. Notice that this formula implies a unique estimate of associated with any given pair . We shall see in §4 that equation (17) is actually the optimum estimate of , in a least-squares sense, given the data and the filter .

### 3.3. Bootstrap

From the foregoing we can see that it is straightforward to form an estimate of given , and vice versa. But initially we might not know either. In this situation it is natural to proceeed iteratively, starting with crude estimates and then using equations (16) and (17) repeatedly to improve those estimates. One way of starting the process is to initialize the intrinsic cyclic spectrum to , i.e. the observed (scattered) pulse profile. This corresponds to the model and we could commence iteration of equations (16) and (17) using this approximation.

Alternatively, having specified our initial estimate of we can build up our estimate of gradually, using equation (16), starting from an estimate of its value at a single frequency, . Because the overall phase of is arbitrary (§2.1) we are free to choose the phase of , e.g. phase zero, so only need be specified in order to start the iteration. One possible initialization is thus , and from there we can gradually build over the full range of radio frequencies, with information flowing outward from towards the edges of the band. In this approach one simply initializes to zero for frequencies where no estimate has previously been made, so that those frequencies make no contribution to the estimator in equation (16).

Once this is done we can improve our estimate of the intrinsic cyclic spectrum, , by application of equation (17), then we can get a better estimate of by applying equation (16), and these iterations can be repeated. Thus, if we know neither nor , we can build bootstrap estimates for both of these quantities, given a measured cyclic spectrum.

The procedure just described is the method which we initially used to solve for and , from the first measured cyclic spectra of a radio pulsar (i.e. the data used in §5). Broadly speaking the method works: we found that it provided a good representation of much of the structure in the cyclic spectra, and the intrinsic profile was significantly narrower than the scattered profile (see figure 3 in Demorest, 2011). But it did also exhibit some deficiencies, as we describe below.

### 3.4. Deficiencies of the direct method

One problem which we anticipated is the difficulty of constructing in regions where is small. In these regions the solution for is sensitive to noise in the data. In particular it is susceptible to phase jumps at points where : the solutions on either side of the zero can be mutually inconsistent. There are two reasons why this problem arises. One is fundamental: a zero in is an absence of phase information at that particular point, and this cannot be overcome by using different methods of solution. The other reason is specific to the solution method we have presented: the summation in equation (16) includes information coming from both sides of the zero, so each side tries to rotate the phases of the other in order to bring about consistency, but neither side succeeds. In other words, phase discontinuities at zeros of constitute traps for this method of solution. It is not necessary for to be precisely zero in order for a trap to form; it suffices for the signal-to-noise ratio to be low ( on a per-channel basis). Trapping was indeed observed in the results we obtained using the approach described above, with significant residuals commonly occuring in the vicinity of points where is small.

It is clearly possible to modify the solution method so as to be less susceptible to these traps. Most obviously, one can restrict the summations in equation (16) to values of with a single sign — so that we are only using the information from frequencies (or ) in our estimate for . In this scheme information flows in only one direction across the zeros, so one side dictates phase to the other. In practice we observed that this modification did decrease the prevalence of trapping. However, in using only one sign of we are ignoring half of the information available to constrain at any given value of , so the resulting solution for cannot be optimum. In the next section we present methods for obtaining the best fit solutions for and .

## 4. Optimum estimates of filter and profile

In estimating and what we really want are the models which best fit the data, so we have an optimization problem. We introduce the residual between model and data:

(18) |

and we seek to minimize the magnitude of these residuals.

Suppose our data, , have radio-frequency channels, and modulation-frequency bins. In this case we are modelling a filter with complex unknowns, and an intrinsic cyclic spectrum with complex unknowns. (The pulse profile is a real quantity, so the spectrum at negative modulation frequencies is simply the complex conjugate of that at positive frequencies.) Thus there are complex unknowns and complex constraints provided by the data, so for the model is over-determined. In this situation we cannot make the residuals zero everywhere and we simply aim to make them small.

Here we follow the usual practice of minimizing the sum-of-squares of the residuals

(19) |

with respect to all of the model parameters. We then have

(20) |

where represents any of the model parameters which define and , and minimization of implies

(21) |

for every parameter .

We compute the derivative for each parameter in turn. Each value of and is complex and thus involves two distinct real parameters. We take these to be the real and imaginary parts of the coefficients. For , , we have

(22) |

where

(23) |

And for , , we have

(24) |

where

(25) |

Having determined a demerit function, , and the gradient of with respect to each of the parameters of interest, we are in a position to employ one of various standard methods (e.g. Nocedal and Wright 1999) to the problem of optimizing our solutions. Before turning to the choice of method, and the details of its application, it is helpful to establish the relationship between our “direct” solutions of §3 and the optimum estimates which we are seeking.

### 4.1. Relationship of direct solution to least-squares

We have already noted (§3) that our “direct” procedure for constructing – i.e. equation (16) – could also be regarded as an algorithm for updating , given an existing estimate. Explicitly, the update is , where

(26) |

We can also rewrite equation (17) as an update for the intrinsic spectrum, , with

(27) |

In both cases we recognise the numerator to be (up to a constant factor) just the gradient of with respect to the corresponding parameters. This is comforting because it suggests that our “direct” method is moving the estimates in a direction which will improve the model. To be confident that this is the case we need to gauge the step-size, not just its direction, and to achieve that it is helpful to evaluate the second derivatives of .

The curvature of with respect to our various parameters is given by differentiating equations 23 and 25. The results are

(28) | |||

(29) |

and

(30) | |||

(31) |

We can now see that for each of our real parameters, , the “direct” estimate in §3 is an iteration with updates (equations 26, 27) :

(32) |

This form is just Newton’s method applied to each parameter separately. Equivalently: it is a simultaneous multi-parameter quasi-Newton method in which the off diagonal elements of the Hessian are neglected.

We can check whether or not this is a good approximation to the actual Hessian by explicitly computing the off-diagonal terms. In the case where both and relate to these off-diagonal elements are all zero. Furthermore, because the diagonal terms (equation 28) are independent of , all of the higher derivatives of with respect to are zero — the hypersurface of is quadratic in when is fixed. This is no surprise because the residual (equation 18) is linear in , and is quadratic in the residual. It follows that Newton’s method yields an exact solution for in a single step. Thus we see that our direct estimate of , given in equation (17), is also the least-squares solution appropriate to the filter and the data ; no additional optimization steps are necessary.

Unfortunately this is not true of the filter function, : neither the off-diagonal elements of the Hessian nor the higher order derivatives are zero in this case. The fact that the off-diagonal terms of the Hessian are non-zero means that we should not expect the filter update (equations 16, 26) to yield a good model. We now turn to the problem of optimizing our model filter function.

### 4.2. Optimisation of the filter

To optimize our model filter we can employ one of the established quasi-Newton methods, in which an approximate (inverse-) Hessian is constructed at each iteration, based on the local properties of the hypersurface revealed in previous iterations (see, e.g., Nocedal and Wright 1999). A popular choice is the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update, and that is the method we employ in §5 and subsequently.

By utilizing a BFGS update to our estimate of we expect to do significantly better than the update in equation 16. This is a general expectation, but it also applies to the particular problems noted in §3.4: by including the non-zero off-diagonal curvatures of we provide some of the information needed for the algorithm to escape the traps introduced by zeros in . However, we ought to be able to do better still if we do not actually seek to construct in frequency-space, where the traps are localised, but in the Fourier-space conjugate to frequency, i.e. lag-space.

We introduce the lag-space description of the filter, , which is related to the frequency-space description of the filter, , by the usual Fourier relationships for discretely sampled functions:

(33) |

and

(34) |

To optimize our model filter in lag-space we need to know the gradient of with respect to the lag-space filter coefficients, and . Noting that and are different representations of the same information we can write

(35) |

and similarly for the derivative with respect to . In this way we find

(36) |

where

(37) |

Similarly one can show that

(38) |

So as an alternative to computing the frequency-space derivatives and determining the lag-space derivatives from them, we can compute the lag-space derivatives first and then determine the frequency-space derivatives. Formally the two different paths to either frequency-space or lag-space derivatives are equivalent. In practice we computed the lag-space derivatives as our primary quantities, using

(39) |

and determined the frequency-space derivatives, if required, using equation 36. There appears to be no significant difference in computation time between the two approaches.

### 4.3. Uncertainties in best-fit parameters

Suppose that we have obtained our best-fit model, the question then arises “how accurate is that model?” To address this issue we need a description of the behaviour of the demerit, , in the vicinity of the best fit.

At the best-fit point in parameter space, which we denote by , , and either or . If at the best-fit point, then in the immediate neighbourhood of this point the variation of can be approximated by

(40) |

For Gaussian noise the normalized demerit, , is distributed like with degrees of freedom, and we expect . The fit becomes significantly worse if we move away from the optimum point to any other point such that (Avni 1976), and this contour of delineates the range of uncertainties in our fit.

Uncertainties in the individual fit parameters can be readily determined if the Hessian, , is diagonal so that the parameters are all independent of each other. In this case the standard deviation, , is given by

(41) |

If the Hessian is not diagonal then the parameters are covariant and it is a much more difficult task to describe the uncertainties in the fit. Because we know how depends on each of the various parameters, we can evaluate the elements of the Hessian explicitly. Doing so we find that the Hessian is indeed diagonal with respect to the set of parameters describing , so equation 39 correctly describes the constraints which our model places on those parameters. However, the Hessian is not diagonal with respect to either or . The standard errors as given by equation 39 are evaluated in an Appendix, while in the next section we discuss parameter covariance.

#### 4.3.1 Covariances of

Unfortunately the curvatures given in equation A8 are not the whole story when it comes to describing the uncertainties in the impulse-response function, because there are non-zero off-diagonal elements of the Hessian in respect of these parameters. It is beyond the scope of this paper to give a detailed description of the effect of these mixed curvature terms; here we only draw attention to their significance, deferring a thorough treatment to a later paper.

To illustrate the importance of the off-diagonal elements of the Hessian we employ the simplest filter model, . In this case we find by direct calculation that in addition to the leading diagonal (described by equation A8), there is a single reverse-diagonal on which the curvatures are non-zero. This reverse-diagonal cuts the leading diagonal at , and for , the pulse-width, the mixed curvatures are comparable in size to the diagonal elements. The upshot of this is that the combination of complex coefficients is tightly constrained, whereas the combination is poorly constrained. The former combination can be thought of as a pure-amplitude modification of the filter , whereas the latter is a pure-phase modification. And the fact that these particular combinations of parameters are well-constrained/poorly-constrained for is directly attributable to the (in)sensitivity of to these types of modification.

## 5. Implementation of filter optimization

Having already established that the simple quasi-Newton method of §3 works tolerably well for our optimization problem, even though all the off-diagonal elements of the Hessian are neglected, our next step is to implement a more sophisticated quasi-Newton method, the BFGS algorithm, to optimize our filter coefficients. More precisely, because of the large number of parameters () needed to describe the filter coefficients, we utilize a “limited memory” algorithm, which we call L-BFGS, in which the full inverse-Hessian is not constructed (Nocedal 1980; Liu and Nocedal 1989).

We employed the L-BFGS algorithm coded in the NLopt library^{2}^{2}2http://ab-initio.mit.edu/nlopt (Steven G. Johnson, “The NLopt nonlinear-optimization package”). The NLopt package was chosen because it is free, portable and offers a wide variety of optimization algorithms (see §5.3.1). In addition we utilized the FFTW Fourier Transform package^{3}^{3}3http://www.fftw.org from the same group (Frigo and Johnson 2005). Our code is written in C and is freely available.^{4}^{4}4https://github.com/demorest/Cyclic-Modelling It makes use of the PSRCHIVE library^{5}^{5}5http://psrchive.sourceforge.net (Hotan et al 2004; van Straten et al 2012) for file input and therefore can accept data in a variety of formats, including the standard PSRFITS pulsar data format (Hotan et al 2004).

Perhaps the first point to make, here, is that we have chosen to optimize the filter coefficients separately from the parameters which describe our model of the intrinsic cyclic spectrum, . There are several reasons for this choice. The strongest motivation is that it allows us to enforce a common timing reference on all our filter solutions, by using the same intrinsic cyclic spectrum throughout. A common timing reference is of paramount importance for all astrophysical studies which rely on pulse arrival-time measurements. Furthermore, by using a common timing (pulse-phase) reference, we can obtain a high signal-to-noise ratio measurement of the intrinsic spectrum by averaging over all our data.

The degeneracies discussed in §2.1 provide further, minor motivations for separate optimization of filter and intrinsic cyclic spectrum models, as these degeneracies must be eliminated in order for any algorithm to identify the best fit solution. For the overall normalization and phase of the filter that is fairly straightforward, but controlling the degeneracy in phase-gradient is not so easy if both and are simultaneously adjusted. By contrast, there is no degeneracy in phase-gradient if is fixed.

We noted in §2.1 that the overall phase of is always arbitrary, and this degeneracy must be eliminated before we can determine the model filter which best fits the data. We remove this freedom by forcing the imaginary part of (or ) to be zero at the point where (or ) attains its largest value. Because this choice is arbitrary, once an optimized filter is obtained we are free to rotate its overall phase to any preferred value. If we have a temporal sequence of filters (see §7.5), the appropriate choice of phase for a given filter is the one which yields the closest match between the current and the previous (or subsequent) filter, leaving only a single, arbitrary phase for the whole temporal sequence.

### 5.1. Initialisation

We make use of two different initializations, which we refer to as “Unit” and “Proximate”. In the case of Unit initialization we begin with , for all radio frequencies, and a constant phase gradient in , chosen to match the mean phases seen in the data at . For lag-space optimization this initialization corresponds to a delta-function model for . Naturally, Unit initialization is only sensible if the overall normalization of our model is consistent with that of the data, , and we therefore also normalize appropriately.

Unit initialization is appropriate if we have no prior information on the actual structure which is present in the filter function at the time the cyclic spectrum was recorded. Usually there are many cyclic spectra recorded during a single epoch of observation — e.g. in §6 we present data from three separate epochs of observation, totalling several hundred cyclic spectra. In such cases the averaging time for each spectrum is chosen to be small enough that the changes in the filter function between adjacent cyclic spectra are small. Consequently, if we have already optimized the filter appropriate to one cyclic spectrum then that model provides us with a good starting point for modelling the next filter function: that scheme is what we refer to as Proximate initialization.

### 5.2. Stopping criterion

At what point should we stop the optimization? The NLopt algorithms include various criteria which may be used to recognise that the optimization is complete. Our aim is to find the minimum of , but we do not know ahead of time the precise value of that minimum, so a natural choice of stopping criterion is that should change by less than a certain, small fractional value during a single iteration of the algorithm. We can determine what that fractional tolerance should be as follows.

In §2.3 we gave expressions for the variance of . In particular we noted that , a constant, is usually a good approximation in practice. Furthermore, at large modulation frequencies, , the noise is usually much larger than the signal we’re interested in, so it is straightforward to get an estimate of directly from the data.

For Gaussian noise, which is appropriate to the thermal noise component, we expect the best-fit value of to conform to a distribution, with degrees of freedom. In this case the minimum demerit is expected to be , and is a significant change in (Avni 1976), so it is appropriate to stop the optimization once the changes in are small compared to . This translates directly into the requirement that fractional changes in should be small compared to . Therefore in this paper the usual stopping criterion is that the fractional change in should be less than .

If the noise is not uniform – e.g. at the edges of the band, where the instrumental response rolls off, or because of strong Radio Frequency Interference – one can determine the variance at each point in the cyclic spectrum using equation 11. In this case the residuals (equation 19) should be normalized by the variance at each point prior to summation. The resulting figure of merit will then be distributed like . It is straightforward to measure the noise variation across the band, as per §6.1 (see the top panel in figure 1).

### 5.3. Choice of optimization approach

The various tests described in the Appendix demonstrate that, of the various optimization approaches we tried, the best method for this problem is L-BFGS in lag-space from Proximate initial conditions; we therefore utilize that method.

## 6. Observations of PSR B1937+21

All of the data utilized in this paper are Arecibo observations of PSR B1937+21 (Backer et al 1982), at radio frequencies close to 428 MHz. The ATNF Pulsar Catalogue^{6}^{6}6www.atnf.csiro.au/people/pulsar/psrcat (Manchester et al 2005) reports the following characteriztics for this pulsar: a period of 1.558 ms, a dispersion measure of (Cognard et al 1995), and a mean 400 MHz flux of 240 mJy (Foster, Fairhead and Backer 1991). Most of the data we use come from a single 4 MHz band centered on 428 MHz, with the exception being an additional 4 MHz chunk, centered on 432 MHz, that we use exclusively in §6.3 (intrinsic pulse profile determination). We observed at three different epochs: MJD53791, MJD53847 and MJD53873. Dual-polarization voltages were recorded for intervals of order an hour at each epoch, using the Arecibo Signal Processor baseband recorder (ASP; Demorest 2007), with digitisation at 8-bits per sample. This high dynamic range sampling proved valuable in mitigating the effects of Radio Frequency Interference (§6.4). We did not attempt any polarization calibration for our data; all the results reported here are based on summing the two polarizations (i.e. the orthogonal feeds of the telescope), as an approximation to Stokes-I.

Individual cyclic spectra were generated from the recorded voltages, using the method described by Demorest (2011). In our first processing of the data we constructed cyclic spectra, averaged over 15 seconds, with 6230 radio-frequency channels and 511 pulse-phase bins. These values were chosen so as to make as nearly equal to as possible, because our first attempts at modelling and (using the method described in §3), avoided interpolations. However, the improved fitting method described in §§4,5 employs precise interpolation, so it is no longer necessary to match the resolutions in this way. Nor is it preferred, as array sizes which are integer powers of two are better matched to the Fast Fourier Transform algorithm, which we utilize. All the tests of our optimization software, reported in an Appendix, were conducted on the cyclic spectra obtained in our first processing of the data.

Analysis of the cyclic spectra from our first processing revealed some leakage at the edges of the bandpass filter. This is undesirable, particularly because any out-of-band signal is aliased by MHz, and will thus appear delayed by approximately ms due to incorrect dedispersion. In turn this leaked signal may introduce low-level contamination into our profile estimates or our filter models, or both. We therefore decided to completely reprocess our data, to deal with the leakage and to correct some other minor defects which we were aware of.

In the second processing we produced cyclic spectra averaged over 15 seconds, with 4608 channels and 1024 pulse-phase bins. This reprocessing utilized the cyclic spectrum implementation now freely available as part of the DSPSR software package^{7}^{7}7http://dspsr.sourceforge.net (van Straten and Bailes 2011). To eliminate the aliased (leakage) signals we then trimmed the spectral array down to 4096 channels, so the final bandwidth was approximately MHz. With the exception of §6.1 and §6.2, all of the results presented in this section were obtained using the trimmed cyclic spectra from the second processing of our data.

### 6.1. Bandpass Filter

If we want to know the profile of the bandpass filter of our instrument there are two methods available to us: we can measure the average total power as a function of radio-frequency, or we can make use of the filter functions, , obtained from our fitting. (One can also inject artificial pulsed power, of known spectral shape, into the signal chain, but we did not record such data.)

Our estimates of incorporate all of the filtering imposed on the signal. We expect there to be contributions from the ISM, the Solar wind, the Earth’s ionosphere, and from our instrument (telescope, front-end and back-end). Of these various contributions, only the receiver system is expected to be stable over long time-scales. As is a complex quantity, averaging it will yield zero, but we can instead form , which we take as an estimate of the bandpass filter, . Averaging over all filter solutions for all three epochs we obtain the result shown in figure 1 for . Also shown in figure 1 is the result of estimating the bandpass in a more traditional way, using the square-root of the average total power: . (The square-root appears here because the power is a quadratic function of the filter response.)

Although fluctuates quite rapidly, the amplitude of those fluctuations is large, so a long total observation time is required in order to form an accurate estimate of . With our three epochs combined we have approximately 4.5 hours of data, and the scintillation time-scale is of order a minute so we expect our estimate of the filter response to be accurate to %. That is approximately the level of fluctuation seen in our estimate of across most of the band. Thus the only clearly significant structure we find in is the roll-off of the filter at the band edges. A cause for concern is the abrupt rise in the estimated filter response at both extremes of the frequency range. These upturns indicate that that there is some leakage of signal from outside the nominal band of the filter.

By contrast with , the estimate shows evidence of an upturn only at one end of the band. The reason for this difference is unclear. The other points of distinction between the two results are (i) that the noise on the traditional estimate is much smaller, even though only a third as much data was used, and (ii) Radio Frequency Interference (RFI) is manifest in the traditional estimate. To some extent the effect of the RFI could be mitigated by averaging using the median estimator, rather than the mean, but this would not help for steady interference. The reason for the lower noise-level on the traditional bandpass estimate can be seen from equation (10). Our solutions for – whence the estimate – are based on the pulsed power, i.e. , whereas the zero-modulation-frequency data, , are dominated by the system noise, , which is both large and unpulsed.

As mentioned at the start of §6, the leakage at the band edges, most evident in the lower panel of figure 1, can introduce low-level artifacts into our filter or pulse-profile estimates. Consequently we decided to fully reprocess our data, trimming off the edges of the band as we did so. The results described in §6.3 and later sections of this paper were obtained from the second processing in which the spectral band was trimmed.

### 6.2. Bootstrap approach to the intrinsic profile

Lacking prior knowledge of the intrinsic pulse profile we are obliged, as in §3.3, to commence our modelling using the observed, scattered pulse profile as an approximation to the intrinsic profile. We then obtain our first model of the filter function, for each sample cyclic spectrum, by fitting to the data in the way described in §§4,5. The filters obtained in this way are then used to obtain a better estimate of the intrinsic pulse profile, and the whole process is iterated, obtaining better approximations to , and the various , on each pass through the data.

Once an accurate model of the intrinsic profile is obtained, other data-sets for the same pulsar taken with the same instrumental configuration can use that profile to obtain model filters in a single pass through the data. But new instrumental configurations – e.g. different observing frequencies – may force a return to the bootstrap approach.

Because it requires multiple passes through the data, a bootstrap can be slow. We can, however, speed things up to some degree because at the second and subsequent profile-iterations we already have available a set of model filters appropriate to each recorded cyclic spectrum. These filters can be used to initialize subsequent models prior to optimization. As our model of the intrinsic pulse profile approaches the true intrinsic profile we expect the model filters to change very little between successive iterations, so this procedure should accelerate the optimization substantially. This expectation was borne out in practice, as we now describe.

To enable a rapid approach to the intrinsic profile we initially used a subset of the data (roughly 20 minutes of observation) from one epoch (MJD53873), iterating several times on this subset, and then adding in the rest of the data from this epoch in order to improve the signal-to-noise ratio of our profile estimate. For the first set of filter solutions, using Proximate initialization, we found that on average 289 NLopt steps were required to fit each cyclic spectrum in the data subset. Subsequently, using the filter models obtained at the previous iteration as our starting point, the number of NLopt steps declined to 222, 17 and 14 for the second, third and fourth iterations, respectively.^{8}^{8}8These step-counts refer to the first processing of the data. The small decrease in the required number of steps between the third and fourth iterations, contrasting with the large decrease between the second and third iterations, suggested that we had reached the noise floor of the data subset, so for subsequent iterations we utilized all of the data from MJD53873 — a total of approximately 2 hours.

For iteration five we needed to obtain the first filter solutions for the bulk of the data from this epoch, using Proximate initialization, which required on average NLopt steps per cyclic spectrum. But for all subsequent iterations we were able to initialize our models using the previous set of filter solutions. We found that iterations six and seven required only 12 and 3, respectively, NLopt steps for each cyclic spectrum, indicating very rapid convergence of our estimate of the intrinsic pulse profile.

Separately we have observed, when using an existing set of optimized filter models as our starting point, that our code requires a minimum of 3 NLopt steps to return an optimized solution, even when the same reference profile is used for both solutions. We therefore conclude that our intrinsic profile estimates for B1937+21 do not differ significantly between iterations six and seven, and further iterations are unwarranted.

Use of the previous set of filter solutions to initialize our models clearly leads to a substantial saving in computation time. Using Proximate initialization we expect that the bootstrap would have required a total of 10 days of CPU time, whereas the sequence just described used only a third of that time. In fact our procedure needed only one quarter more time than a single pass through the same data using a given reference pulse profile.

### 6.3. Intrinsic versus scattered profile

In figure 2 we show our estimate of the intrinsic profile, together with the scattered profile, using all the data from MJD53873. This epoch was chosen because we obtained significantly more data on that date than on either of the other epochs. As expected, the intrinsic modulation profile of the signal is much sharper than the apparent modulation, because of the contribution of the scattered (delayed) waves to the apparent profile. The “scattered tail” of the pulse is absent from our estimate of the intrinsic profile.

Figure 2 (lower panel) also reveals the presence of several low-level (a fraction of 1% of the peak height), but sharp features in the “baseline” of the intrinsic pulse. These features are difficult to recognise in the scattered profile for two reasons. First, interstellar scattering broadens them, while decreasing the peak amplitude of each. Secondly, the features that are present immediately after the main-pulse or the inter-pulse are swamped by the delayed signal from those two, very strong components of the pulse profile.

An equivalent description of the pulse modulation is available by Fourier-transforming the scattered and intrinsic profiles. The resulting harmonic powers are shown in figure 3, demonstrating that the high harmonics of the intrinsic profile contain a great deal more power than the scattered profile. This is just as expected. The scattered profile is a convolution of the intrinsic profile with the impulse response function, so in the Fourier domain the relationship is multiplicative, and the multiplier declines from near unity at low harmonic numbers to very small values at high harmonic numbers.

Because the low-level features evident in figure 2 are seen here for the first time at these radio frequencies, and the signal-processing we have used to reveal these structures is itself novel, we would like to have some confirmation of their reality. We have therefore undertaken a completely independent bootstrap estimate of the intrinsic profile for another epoch, MJD53791. In this comparison we are not interested in any timing (pulse-phase) offset between the two epochs, so in comparing the two intrinsic profiles we have applied a pulse-phase shift and a scaling, chosen so as to minimize the difference between the profiles.

The result of our two independent bootstrap solutions can be seen in figure 4, where we show the mean of the intrinsic profiles and their difference. The latter curve appears noise-like, without any clearly significant differences between the two, independently derived intrinsic profiles. In particular we note that the largest differences occur underneath the main-pulse and inter-pulse components, where the signal is very strong and the noise is therefore greater than at other pulse-phases. There is no apparent systematic difference between the two profiles at those pulse-phases where the weak, low-level features are seen.

As a final check on the reality of the features revealed in figures 2, 4, we have also compared the intrinsic pulse profiles obtained from independent bootstrap estimates at two different frequencies, 428 MHz and 432 MHz, for the epoch MJD53791 — this comparison is shown in figure 5. Although the 432 MHz data exhibit more system noise than the 428 MHz profile, because the integration time for the latter is larger by a factor of 1.5, the two profiles appear otherwise very similar in respect of the low-level features which are revealed by construction of the intrinsic profile.

An important aspect of the inter-band comparison in figure 5 is that it excludes signal leakage (§6.1) as a possible origin for the low-level structures which we see in the intrinsic profile. Even though we have trimmed the band edges, which should eliminate the bulk of that problem, it is possible that some traces of leakage remain. This concern is heightened by the fact that the sharp feature at a pulse-phase of lies close to the expected location of the aliased main pulse component, for signals leaking across the low-frequency edge of the 428 MHz band (upper red curve in figure 5). The inter-band comparison makes it plain that this is not a viable explanation for that feature, because at 432 MHz the corresponding alias should lie at 1,200, where no profile feature is seen – yet the observed peak appears very similar in the two bands. We also note that interpreting the feature as an alias of the main-pulse implies that there should be a counterpart feature from the inter-pulse, roughly half a turn later, whereas no such feature is observed in either band. Overall, the aliased signals from the band edges do not correspond with the low-level features we see in the pulse profiles in either band, and we conclude that they are not due to out-of-band signals.

In fact residual out-of-band signals are expected to appear as broad structures in the time-domain, because the dispersive delay is a strong function of frequency. The sharpness of the features shown in the red and blue curves in figure 5 is due to the fact that only frequencies immediately adjacent to the band edges have been considered. The red and blue curves are simply calculated as delayed (and scaled) versions of the mean pulse profile, with the delay/advance equal to the difference in dispersive delay between the upper and lower edges of the band. At MJD53791 the dispersion measure of PSR B1937+21 was 71.023, and the period was 1.5577 ms, so the aliased signals appear at ms (428 MHz band) and ms (432 MHz band). Modulo the pulse period these become, respectively, and milliseconds.

Some of the “new” structure that we see in the intrinsic pulse profile corresponds well with features of B1937+21 which have been found by others, as follows. The distinct peaks seen immediately after the main- and inter-pulse have previously been observed by a number of authors at higher radio-frequencies, where the delayed, scattered signal is much weaker — see, particularly, figure 1 of Thorsett and Stinebring (1990). Here we are presumably seeing the emission regions which are responsible for the giant pulses of B1937+21 (Cognard et al 1996), and the consequently high modulation index at these pulse-phases (Jenet, Anderson and Prince 2001). The sharp feature we see at a pulse-phase of ( turns) has a counterpart which was noted in L-band observations by Yan et al (2011). Residual dispersion smearing in the Yan et al (2011) data is significant, so it is not surprising that their feature appears broader than the one we observe. Finally, the gradual rise we see in the turns immediately preceding the main-pulse is also manifest in the Yan et al (2011) data.

The consistency of our intrinsic profile estimates across different epochs and spectral sub-bands, and the connections we can make between individual features and previous observations of B1937+21 at other frequencies, give confidence that the statistically-significant features we see in our intrinsic profile are indeed real.

### 6.4. Dynamic Spectra

A measured cyclic spectrum quantifies the power spectrum of the signal as the zero-modulation-frequency array (see §2). We compute our cyclic spectra with a cadence of 15 seconds, and thus we can trivially obtain a dynamic spectrum from the temporal sequence of . This dynamic spectrum is a simple time-average, not a difference of on-pulse and off-pulse power, so it includes all power contributions: noise from the receiver and the sky, the pulsar signal, and any terrestrial signals reaching the receiver, i.e. RFI. Because RFI can cause severe problems for some types of radio astronomical investigations, it is useful to examine the dynamic spectrum in order to gauge its impact.

Figure 6 shows the dynamic spectrum for a 512-channel spectral segment recorded on MJD53791; RFI is manifest in this segment as narrow spectral lines. None of these lines is so strong that the voltage signal exceeds the dynamic range of the sampler, nor is any impulsive RFI evident in figure 6. These aspects of the data reassure us that the observations were taken under relatively benign RFI conditions, and in this circumstance we can reasonably expect a high level of immunity from RFI in our models of and . In particular, if the RFI is both accurately captured and not modulated at the frequency , or its harmonics, then cyclic spectra will be free of RFI contamination.

To demonstrate that the observed RFI does not propagate into our model filters we also show in figure 6 the squared-modulus of the dynamic filter, i.e. . This quantity is our estimate of the contribution of the pulsar to the dynamic spectrum; the spectral structure can also be seen in the total power signal. It is evident that the RFI present in the total power signal is absent from the dynamic filter. We emphasise that the specific, small fraction of the spectrum shown in figure 6 was chosen at random: it was not selected because it displays good immunity from RFI.

### 6.5. Dynamic fields

Whereas the dynamic spectrum is a quantity which pulsar astronomers routinely measure, it has been much more difficult to get at the dynamic electric field because the latter requires information on the phases, and that information is usually not explicit in the measured intensities. The requisite phases can sometimes be retrieved – e.g. if the field is sparse in some representation – but to date this has been successfully demonstrated for only one dynamic spectrum (Walker et al 2008). By contrast, cyclic spectroscopy provides us with access to the electric field envelope, including the phase information; as such it is an intrinsically holographic method.

There are various possible representations of the dynamic fields because they may be described in terms of frequency-space (filter) or lag-space (impulse-response) coefficients, and the dynamic nature of the field can be represented either as a temporal sequence or in terms of the conjugate Fourier variable, i.e. a frequency.

#### 6.5.1 Impulse-response functions

Figure 7 (top panel) shows one possible representation of the field: the real part of the impulse-response function, , determined from the first cyclic spectrum we observed on MJD53873. This function spans a lag range of , and we see that the amplitude of the response falls off on lag-scale . There is, however, a low-level tail to the response, extending to lags that are a substantial fraction of the pulse period. To bring out these low-level features we took the modulus of the impulse response, and then averaged it over all the data at this epoch of observation. The lower panel of figure 7 presents the resulting , which demonstrates that the low-level tail of continues out to delays of at least relative to the peak of the response.

At extreme negative lags there is an obvious rise in . The origin of this feature is not completely clear; however, a preliminary analysis suggests that parameter covariances in (see Appendix) may give rise to increased noise near the lag limits of the cyclic spectra, and we therefore consider this to be an artifact.

On the other hand the features seen in the vicinity of appear to be bona fide structure in . The delay-Doppler image, which we present in the next section, gives more information on these features.

#### 6.5.2 Delay-Doppler field images

Finally we present our results in the Fourier domain conjugate to . The conjugate variables have immediate physical meaning as the delay and Doppler-shift, respectively, that accumulate during propagation of the wave (Harmon and Coles 1983; Cordes et al 2006). The Fourier Transform, , of the dynamic electric field, , is therefore a quantity of particular interest we call this the “delay-Doppler image”. Figure 8 shows the squared-amplitude of the delay-Doppler image for our data taken on MJD53873.

The lower-half of figure 8 is largely free of signal, as expected for negative lags (which are acausal). The only signals that can be recognised at negative lags are the band of scattered power running horizontally across the figure (discussed later), and a handul of thin, faint, vertical streaks in the region mHz, . We are uncertain as to the cause of these streaks, but we suspect that they may be sidelobes caused by the sharp truncation of the spectrum which we introduced by trimming the band (§6.1). These streaks were not seen in the delay-Doppler image that we obtained in our first processing of the data. The enhanced noise at extreme negative lags, plainly seen in the average signal in figure 7, is also present in figure 8 but is difficult to discern without averaging.

By contrast, in the upper half-plane of figure 8 there is an abundance of structure. Most of the power is concentrated in a broad distribution centered on zero-Doppler-shift. And the overall distribution appears to have an approximately parabolic envelope, as is now familiar for many pulsars (Stinebring 2001; Cordes et al 2006). But there are also some discrete concentrations of power. Most evident of these are the concentrations in the range on the right-hand-side of the figure. These concentrations indicate that there are particular regions, within a few milli-arseconds of the direct line-of-sight to B1937+21, which are strongly diffracting, or refracting signals from this pulsar into our telescope. Apparently similar features were discovered by Hill et al (2005), in a multi-epoch study of PSR B0834+06, who found that their features appear to move through the delay-Doppler plane at constant velocity, consistent with the observed proper-motion of the pulsar. At present we don’t know whether that property also holds for the features seen in figure 8.

In addition to the real structure just discussed, a strong artifact is plain in figure 8: around zero delay there is a broad, horizontal stripe in the image. The nature of this feature is clear: it is “scattered power” caused by discontinuities between successive values of (or ) in our temporal sequence. These discontinuities might arise in several ways, for example: inadequate sampling of the evolving ; amplitude fluctuations in the pulsar; arbitrary phase rotations between successive filter solutions (per the degeneracy in overall phase, §2.1); or gaps in the data record. We have considered each of the above possibilities, but none provides a satisfactory explanation, as we now detail.

First, the evolution of the filter is well sampled by our 15-second cadence, as can be seen from the upper panel of figure 6. Secondly, there are pulses within each of our cyclic spectra, so the variations in average intensity between samples will be small, %. In fact even this variation is irrelevant to figure 8 as we have normalized each filter solution such that it has a root-mean-square value of unity. Thirdly, the arbitrary phase of each filter (see §2.1) has been chosen so that each solution matches the previous solution as closely as possible, in a least-squares sense. Finally, although there is indeed a gap of 30 seconds in our temporal coverage (caused by a change of hard-disk during observing), we have interpolated across this gap before constructing figure 8. For these reasons we do not expect any of these effects to be responsible for the high levels of scattered power seen in figure 8.

A clue to the origin of the scattered power can be found in figure 9, which shows the “secondary spectrum” – i.e. the power-spectrum of the dynamic spectrum, – for our data. By contrast with figure 8, this quantity shows quite low levels of power scattered to large Doppler-shifts. In forming the spectrum, , we are erasing all information on the phase of the filter , so the difference in scattered power levels between figures 8 and 9 indicates that the source of the scattered power in figure 8 is phase-discontinuities between adjacent filters, . As noted above, we have matched the phases of adjacent filters, to the extent that this can be done with a uniform phase rotation of . Therefore our filter solutions contain non-uniform phase structure that is discontinuous between adjacent samples. In §4.3.1 we noted that our filter solutions may exhibit covariance between the lag coefficients and , for lag separations small compared to the pulse-width (), and that the poorly constrained combination () modifies only the phase of . We therefore attribute the scattered power evident in figure 8 to these parameter covariances. We defer a detailed treatment of these issues to a later paper.

As figures 8 and 9 both display the response of the interstellar medium in the delay-Doppler coordinate frame, it is worth clarifying the relationship between them. Recall that is just the Fourier Transform of the sequence . Thus the Fourier transform of the dynamic spectrum, which is the Fourier transform of the product , is just the convolution of with . Consequently the arc that appears around the origin in figure 8 is echoed in a series of inverted arclets in figure 9; each of these arclets is centered on one of the power concentrations visible in figure 8 — cf. figure 5 of Walker et al (2004). Because the “secondary spectrum” (figure 9) is equivalent to a self-convolution of the delay-Doppler image (figure 8), the latter is more fundamental and will typically be the more useful quantity for two reasons. First because the delay-Doppler image exhibits the scattered field with greater clarity: in the secondary spectrum the scattered field is tangled up with itself. Secondly, convolution is a smoothing operation, so faint power concentrations are more easily seen in the delay-Doppler image. These points are well demonstrated by comparing figures 8 and 9.

Despite the fact that figure 9 is derived from the dynamic spectrum, it could not have been obtained by conventional spectroscopic methods, in which the on-pulse power-spectrum is determined within a window of width comparable to the width of the main-pulse (or inter-pulse) component. The reason is simply that windowing restricts the lag-range of the resulting secondary spectrum to the width of that window. Refering to figure 2 we see that the main-pulse would be completely contained within a window of width , so the resulting lag range would be — a tiny fraction of the actual lag range of figure 9.

## 7. Discussion and future directions

Because cyclic spectroscopy has not previously been applied to radio pulsar signals, there are many related issues that deserve consideration. Here we confine ourselves to a brief discussion of three aspects that the present study calls attention to.

### 7.1. Precision timing of PSR B1937+21

It is well known that the small-scale structure of the ISM can have a significant effect on the measured arrival times of radio pulses, in consequence of the delays (geometric and wave-speed) associated with signal propagation (e.g. Foster and Cordes 1990). These effects are of particular importance if they are epoch dependent, which is the case if the scattering properties of the medium are not statistically uniform transverse to the line-of-sight. It is plain from figure 8 that some of the scattering material towards B1937+21 is indeed very clumpy, with several flux concentrations appearing far from the origin, albeit at low power levels. Previous studies of the dispersion and scattering on this line-of-sight (Cordes et al 1990; Ramachandran et al 2006) preferred a near-Kolmogorov model of the structure, but the clumpiness we see is quite different from the expectations of a uniform Kolmogorov model (Cordes et al 2006; Walker et al 2004). As B1937+21 is routinely used for precision timing experiments (e.g Verbiest et al 2009), a better understanding of this scattering material is desirable.

It has previously been reported (Cognard et al 1993; Lestrade, Rickett and Cognard 1998) that B1937+21 occasionally exhibits timing fluctuations, correlated with flux variations, whose properties are suggestive of “Extreme Scattering Events” — that is, plasma-lensing events (Fiedler et al 1987, 1994; Romani, Blandford and Cordes 1987). Such events require close alignment between the observer, plasma-lens and pulsar, and these events are consequently rare. If the alignment is not so close then the lens will cause smaller flux changes, but may still have a significant effect on the pulse arrival time because the extra path-length traversed by the faint images may be large. Furthermore these poorly aligned lens configurations should be relatively common. It is possible that plasma-lensing is responsible for the discrete flux concentrations that we see in the vicinity of (figure 7 and 8), with each concentration being due to one or more additional faint images. We note that at this epoch (MJD53873) the features appear at such large delays that the scattered pulse has little overlap with the unscattered signal, so the pulse arrival time estimate should not be greatly affected. But at later epochs, when the scattering structures are closer to the line-of-sight to the pulsar, the scattered signals may appear at delays where they can exert a substantial influence on the measured arrival time. We defer a quantitative examination of pulse arrival time variations to a later paper.

Depending on the electron column-density structure, and the pulsar-lens-observer configuration, several additional images may arise from one plasma lens, so it is possible that all of the flux concentrations we see near in figures 7 and 8 are due to a single lens. Under that hypothesis, the observed range of delays () tells us something about the size of the lens. Assuming that the pulsar is at a distance kpc, and that the lens is near the midpoint, one finds that the lens diameter is AU. This is comparable to the dimensions that have previously been inferred for the lenses responsible for Extreme Scattering Events (e.g. Romani, Blandford and Cordes 1987).

Unfortunately, with the techniques currently available to us, it is not possible to distinguish between lens-like, refractive behaviour and diffractive scattering as the cause of the observed power concentrations around . The clearest way to distinguish between these possibilities would be to undertake rigorous, quantitative physical modelling of the particular wave-propagation paths for this line-of-sight at the epoch(s) of observation. Such modelling would also tell us the relationship between the pulse arrival times actually observed, and those that would have been observed in the absence of the scattering medium. Physical modelling is, however, beyond the scope of this paper.

### 7.2. Cyclic spectropolarimetry

We have seen how cyclic spectroscopy gives access to the intrinsic modulation (pulse) profile of the signal, and that this can reveal new structure (figure 2) which is otherwise masked by the effects of scattering. It is the sharp features of the profile – those which include a large fraction of high-modulation-frequency Fourier components – which are most affected by the scattering. All the results shown in this paper are based on a signal combination which approximates Stokes-I (recall that our data have not been polarization calibrated). But many pulsars exhibit highly polarized radio emission, and the polarized pulse profiles may be quite complex (van Straten 2006; Johnston et al 2008). For example, there are pulsars where the profile shows rapid transitions between orthogonal, elliptically-polarized states — usually referred to as “orthogonal mode jumps”. Such transitions will be strongly affected by any filtering (temporal smearing) of the signal (Karastergiou 2009). More generally, it is clear that interstellar scattering can have a profound effect on the apparent polarization properties of pulsars at low frequencies (Li and Han 2003; Kramer and Johnston 2008), and we therefore expect the fidelity of polarization profiles to improve substantially when intrinsic profiles, rather than scattered profiles are used.

Furthermore, it has been emphasised by van Straten (2006) that the most accurate pulse-timing requires accurate polarimetry. These are strong motivations to further develop the methods of this paper to encompass cyclic spectropolarimetry.

### 7.3. Covariance of filter coefficients

In §4.3.1 we drew attention to the issue of covariance amongst the parameters describing the filter coefficients (or, equivalently, the impulse-response coefficients). The effects of these covariances are not easy to quantify because (i) the total number of parameters needed to describe the filter is very large ( in the present case), and (ii) the covariances depend on the properties of both the filter and the pulse profile – neither of which is known a priori. What is clear, though, is the qualitative point that the actual uncertainty in the filter coefficients can be much larger than the standard deviation for a single parameter taken in isolation.

We have argued that there are two aspects of the impulse-response functions, seen in figures 7 and 8, that are probably due to parameter covariances. And one of these – the power near zero delay, scattered to large Doppler-shifts – is a very strong feature indeed, being evidently well above the noise floor and potentially masking real features of . In other respects cyclic spectroscopy seems to be a near-ideal tool for studying the propagation of radio-pulsar signals, and the issue of parameter covariance consequently deserves further study.

We can identify two aspects that merit particular attention. The first is a thorough understanding of the origin of parameter covariance, and thus how it manifests itself in different representations of the data. Our preliminary analysis (§4.3.1) suggests that strong covariance can be traced to pure-phase modifications of the filter. That analysis was only carried through for the simplest possible filter model (), and needs to be revisited using more general models. In cases where the data cannot constrain pure-phase modifications of the filter to be small compared to 1 radian, the problem is akin to one of phase-retrieval. Such problems are notoriously difficult, and the difficulty is associated with non-convexity of the target set (Bauschke, Combettes and Luke 2002).

With an understanding of the origin of the covariances one would be in a good position to tackle the key question of how to mitigate their effects on the filter models. For example, in §4.3.1 we noted that the well-determined/poorly-determined parameter combinations are sum/difference terms of and , so one might think of enforcing causality in the solutions, such that for all .

## 8. Summary and conclusions

Cyclic spectroscopy of PSR B1937+21 was undertaken with a 15 second cadence over a 4 MHz band at 428 MHz, starting from voltages recorded with the Arecibo radio telescope. By least-squares fitting we determined the impulse response function of the ISM for each cyclic spectrum separately, and the intrinsic pulse-profile averaged over the whole observation. In this way we obtained the 428 MHz pulse-profile of B1937+21 free of the influence of interstellar scattering, revealing some weak, but sharp features that had not previously been seen at low radio-frequencies.

From our temporal sequence of impulse response functions we derive the delay-Doppler field image. This image exhibits a noise floor at dB relative to the peak power, and we are thus able to see faint features in the angular structure of the received field. Several power concentrations are visible in the delay range . These concentrations can plausibly be attributed to a single plasma-lens, a few AU in diameter, but alternative interpretations are possible. Regardless of their physical origin, the scattered power concentrations are expected to have a deleterious effect on the pulse-timing experiments that are utilizing this pulsar. To accurately describe and remove these effects it is necessary to have a physical model of the various propagation paths by which the signal reaches the telescope. We did not attempt any physical modelling, but we have shown that cyclic spectroscopy provides us with a large quantity of information on these paths, and thus faciltates that process.

We caution that our fitting procedure is adversely affected by covariance amongst some combinations of the fit parameters. These covariances were identified as the origin of the scattered power artifact in our delay-Doppler image. Parameter covariance appears to be the main challenge currently facing widespread application of cyclic spectroscopy.

## References

- Antoni (2007) Antoni J., 2007, Mechanical Systems and Signal Processing, 21, 597
- Avni (1976) Avni Y., 1976, ApJ, 210, 642
- Backer, Kulkarni, Heiles, Davis & Goss (1982) Backer D.C., Kulkarni S.R., Heiles C., Davis M.M., Goss W.M., 1982, Nature, 300, 615
- Bauschke, Combettes & Luke (2002) Bauschke H.H., Combettes P.L., Luke D.R., 2002, J. Opt. Soc. Am. A, 19 (7), 1334
- Cognard et al. (1993) Cognard I., Bourgois G., Lestrade J.-F., Biraud F., Aubry D., Drouhin J.-P., 1993, Nature, 366, 320
- Cognard et al. (1995) Cognard I., Bourgois G., Lestrade J.-F., Biraud F., Aubry D., Darchy B., Drouhin J.-P., 1995, A&A, 296, 169
- Cognard et al. (1996) Cognard I., Shrauner J.A., Taylor J.H., Thorsett, S.E., 1996, ApJ457, L81
- Cordes & Wolszczan (1986) Cordes J.M., Wolszczan A., 1986, ApJ, 307, L27
- Cordes et al. (1990) Cordes J.M., Wolszczan A., Dewey R.J., Blaskiewicz M., Stinebring D.R., 1990, ApJ, 349, 245
- Dembo & Steihaug (1982) Dembo R.S., Steihaug T., 1982, Math. Programming 26, 190
- Demorest (2007) Demorest P. B., 2007, PhD thesis, University of California, Berkeley
- Demorest (2011) Demorest P.B., 2011, MNRAS, 416, 2821
- Fiedler et al. (1987) Fiedler R.L., Dennison B., Johnston K.J., Hewish A., 1987, Nature, 326, 675
- Fiedler et al. (1994) Fiedler R.L., Dennison B., Johnston K.J., Waltman E.B., Simon R.S., 1994, ApJ, 430, 581
- Frigo & Johnson (2005) Frigo M., Johnson S.G., 2005, Proc. IEEE 93 (2), 216
- Foster & Cordes (1990) Foster R.S., Cordes J.M., 1990, ApJ, 364, 123
- Foster, Fairhead & Backer (1991) Foster R.S., Fairhead L., Backer D.C., 1991, ApJ, 378, 687
- Gardner (1992) Gardner W.A., 1992, Statistical Signal Analysis: A Non-probabilistic Theory
- Harmon & Coles (1983) Harmon J.K., Coles W.A., 1983, ApJ, 270, 748
- Hill et al. (2005) Hill A.S., Stinebring D.R., Asplund C.T., Berwick D.E., Everett W.B., Hinkel N.R., 2005, ApJ, 619, L171
- Hotan, van Straten & Manchester (2004) Hotan A.W., van Straten W., Manchester R.N. 2004, PASA, 21, 302
- Jenet et al. (2001) Jenet F.A., Anderson S.B., Prince T.A., 2001, ApJ, 546, 394
- Johnston et al. (2008) Johnston S., Karastergiou A., Mitra D., Gupta Y., 2008, MNRAS, 388, 261
- Karastergiou (2009) Karastergiou A., 2009, MNRAS, 392, L60
- Kraft (1980) Kraft D., 1994, ACM Trans. Math. Software, 20 (3), 262
- Kramer & Johnston (2008) Kramer M., Johnston S., 2008, MNRAS, 390, 87
- Lestrade, Rickett & Cognard (1998) Lestrade J.-F., Rickett B.J., Cognard I., 1998, A&A, 334, 1068
- Li & Han (2003) Li X.H.., Han J.L., 2003, A&A, 410, 253
- Liu & Nocedal (1989) Liu D.C., Nocedal J., 1989, Math. Programming, 45, 503
- Manchester et al. (2005) Manchester R.N., Hobbs G.B., Teoh A., Hobbs M., 2005, AJ, 129, 1993
- Nocedal (1980) Nocedal J., 1980, Math. Comput., 35, 773
- Nocedal & Wright (1999) Nocedal J., Wright S.J. 1999 “Numerical Optimization” (Springer: New York)
- Ramachandran et al. (2006) Ramachandran R., Demorest P.B., Backer D.C., Cognard I., Lommen A., 2006, ApJ, 645, 303
- Rickett (1975) Rickett B.J., 1975, ApJ, 197, 185
- Rickett (1990) Rickett B.J., 1990, ARA&A, 28, 561
- Roberts & Ables (1982) Roberts J.A., Ables J.G., 1982, MNRAS, 201, 1119
- Romani, Blandford & Cordes (1987) Romani R.W., Blandford R.D., Cordes J.M., 1987, Nature, 328, 324
- Stinebring et al (2001) Stinebring, D.R., McLaughlin, M.A., Cordes, J.M., Becker, K.M., Espinoza-Goodman, J., Kramer, M.A., Sheckard, J.L., Smith, C.T., 2001, ApJ, 549, L97
- Svanberg (2002) Svanberg K., 2002, SIAM J. Optim. 12 (2), 555
- Thorsett & Stinebring (1990) Thorsett S.E., Stinebring D.R., 1990, ApJ361, 644
- van Straten (2006) van Straten W., 2006, ApJ, 642, 1004
- van Straten & Bailes (2011) van Straten W., Bailes M., 2011, PASA, 28, 1
- van Straten, Demorest & Oslowski (2012) van Straten W., Demorest P.B., Oslowski S., 2012, Astron. Res. Tech., 9, 237
- Verbiest et al (2009) Verbiest J.P.W., et al 2009, MNRAS, 400, 951
- Vlček & Lukšan (2006) Vlček J., Lukšan L., 2006, J. Comp. Appl. Math. 186, 365
- Walker et al. (2004) Walker M.A., Melrose D.B., Stinebring D.R., Zhang C.M., 2004, MNRAS, 354, 43
- Walker et al. (2008) Walker M.A., Koopmans L.V.E., Stinebring D.R., van Straten W., 2008, MNRAS, 388, 1214
- Yan et al. (2011) Yan W.M., et al, 2011, MNRAS, 414, 2087

## Appendix A Tests of the filter optimization code

Here we describe tests which we have undertaken to evaluate the performance of our software. Three different aspects of the optimization were compared: L-BFGS versus other algorithms; lag-space versus frequency-space optimization; and Unit versus Proximate initializations. All of these comparisons were made using cyclic spectrum samples #2-11 of PSR B1937+21 recorded at Arecibo on MJD53873 (first processing of the data: see §6).

We do not expect that our conclusions regarding the relative merits of the different optimization paths are machine dependent. But for reference: the machine used for these tests was a MacBook Pro with a dual core 2.7 GHz Intel processor and 8 GB RAM installed. With this machine almost all algorithms required approximately 2 seconds to complete a single iteration, so run-times for the various approaches can be compared directly from the number of steps required to complete the optimization.

Table 1 sets out the results of our tests. The first three columns show the NLopt algorithm used, the space in which the filter was optimized, and the initialization conditions. Column four shows the average number of steps (rounded to the nearest integer) required to find the best-fit model for the ten sample cyclic spectra. Column five shows the number of sample cyclic spectra in which a particular configuration yielded the best result (i.e. lowest value of ) out of all of the configurations tested. And the final column shows the average value of , relative to the best-performing configuration, in units of (rounded to the nearest integer). The ordering of the outcomes in the table was dictated by the results given in the last column, because a high-quality fit is our main objective. In the following sections we consider the outcomes presented in table 1, and their implications for the choice of optimization approach.

NLopt | Lag or | Unit or | Avg. | Best | |
---|---|---|---|---|---|

Algorithm | Freq. | Prox. | Step | # | |

L-BFGS | Lag | Prox | 227 | 3 | 0 |

L-BFGS | Freq. | Prox | 262 | 3 | 3 |

L-BFGS | Lag | Unit | 514 | 3 | 8 |

VarMetric2 | Lag | Prox | 237 | 9 | |

L-BFGS | Freq | Unit | 680 | 10 | |

VarMetric2 | Freq | Prox | 223 | 16 | |

VarMetric1 | Lag | Prox | 191 | 17 | |

VarMetric1 | Freq | Prox | 221 | 1 | 17 |

VarMetric2 | Lag | Unit | 499 | 29 | |

VarMetric2 | Freq | Unit | 500 | 30 | |

VarMetric1 | Freq | Unit | 497 | 33 | |

VarMetric1 | Lag | Unit | 471 | 39 | |

MMA | Lag | Prox | 1082 | 309 | |

TNewtonPR | Lag | Unit | 2505 | 405 | |

MMA | Lag | Unit | 3394 | 600 | |

MMA | Freq | Prox | 318 | 616 | |

MMA | Freq | Unit | 1009 | 1119 |

Note. – This table summarizes the results of the tests described in this Appendix. Each line represents the outcomes from least-squares modelling of , or , for 10 sample cyclic spectra of B1937+21. In each case there are approximately degrees of freedom and the total number of parameters in the model is roughly 13,000.

#### a.0.1 L-BFGS vs other algorithms

By design the NLopt package makes it possible to switch easily between a variety of different optimization algorithms, and thus to select the best one for the task at hand: to change algorithms is simply a matter of altering one line of code. The algorithms available within NLopt include both global and local methods. Global methods are not practical for our problem because of the large-scale nature of the optimization: it would be necessary to thoroughly search a space of dimensions in order to find the global minimum.

Of the local methods, there are algorithms which require derivatives of to be supplied, and those which do not. As we are able to supply derivatives, and this is a major advantage in exploring the hypersurface of , we restrict ourselves to those algorithms which make use of the gradient of ; there are five such algorithms available in NLopt. One of these, SLSQP (“Sequential Least Squares Quadratic Programming”; Kraft 1994), had not completed a single step after more than an hour of run-time, at which point we terminated the optimization by force. The failure of SLSQP on our optimization problem is not surprising: it uses dense-matrix methods which, for our problem, requires times more storage space and run-time than a limited-memory algorithm.

Results for the remaining four algorithms are given in table 1. We can see a clear division between these four: the Method of Moving Asymptotes (MMA; Svanberg 2002) and the Truncated Newton method (TNewtonPR; Dembo and Steihaug 1982) both performed poorly on our optimization task, in terms of the quality of fit and run-time, when compared to the Variable Metric (in either rank 1 or rank 2 forms: VarMetric1,2; Vlček and Lukšan 2006) and L-BFGS algorithms. We note the failure of TNewtonPR to complete the optimization task from Proximate initialization, or from Unit initialization in frequency-space, hence the omission of those results. It is clear that MMA and TNewtonPR are uncompetitive for our optimization task and we do not consider them further.

It is not surprising that the VarMetric and L-BFGS algorithms yield similar results as they are similar algorithms. Nevertheless, our tests do show a clear preference for L-BFGS over either of the variable metric methods, with L-BFGS providing the three top-performing configurations, as gauged by , and 9/10 of the best individual fits (column 5 of table 1).

#### a.0.2 Lag-space vs. frequency-space

We have already noted (§4.2) that lag-space optimization is expected to be superior to a frequency-space approach, because of the traps present in the latter space. This expectation is borne out in practice, with lag-space optimization yielding better fits than the corresponding frequency-space optimization in almost every case in table 1. However, the difference is not very great. We interpret this as meaning that L-BFGS and the VarMetric algorithms obtain enough information on the hyper-surface of to allow them to avoid most of the traps.

One potential problem which we noticed during our tests is that L-BFGS, when used in frequency-space, would sometimes oscillate as it progressed towards the minimum. This phenomenon was most noticeable with Unit initialization; it appears to be responsible for the 30% extra steps required for L-BFGS-Freq-Unit relative to L-BFGS-Lag-Unit.

We note that the cyclic spectra used for these tests (see §6) have typical signal-to-noise ratio greater than unity, for low harmonic numbers, on individual channels. It remains to be seen whether frequency-space optimization remains competitive for cyclic spectra which exhibit low signal-to-noise ratio at all harmonic numbers.

#### a.0.3 Variation of initialization

The algorithms tested here are local methods. That is, they locate a minimum of in the vicinity of the starting point, but this minimum is not guaranteed to be the global minimum of . The local nature of our solutions is something that readers should be aware of. However, reliably finding the true, global minimum of in a space with dimensions is a difficult problem which does not seem tractable with the computational technologies currently available. Given the difficulty of finding the true minimum of , it behoves us to examine the sensitivity of our results to the starting point from which the optimization of proceeds.

Unsurprisingly, table 1 shows that optimization from a Proximate initialization is roughly a factor of two quicker than from Unit initialization. And Proximate initialization always yields a significantly better fit, for a given choice of algorithm and optimization-space. Bearing in mind the large-scale nature of the optimization, with parameters, some sensitivity to the initialization conditions is not surprising.

The fact that there are significant differences between Unit and Proximate initializations suggests the specific question “how far are our best results from the corresponding global minima?” As a partial answer to that question we can compare the results of different Proximate initializations, because each of the 10 sample cyclic spectra used in our tests has cyclic spectra taken immediately before and immediately after, and we can step through this sequence in either direction. Referring to the L-BFGS-Lag-Proximate results in table 1 as “Forward” initialization, we find that the corresponding “Backward” initialization typically gives worse results, with the average being larger by and needing 31 more steps per cyclic spectrum, on average, to complete. Forward initialization produced a better fit than Backward for eight of the ten spectra,^{9}^{9}9This level of asymmetry between Forward and Backward initialization is slightly surprising, being expected only once in 18 trials, but we have no explanation other than as a random occurence. and the root-mean-square difference between the corresponding values is approximately . Clearly the variations of the L-BFGS-Lag-Proximate outcomes, relative to the true minimum for each spectrum, must therefore be at least as large as , indicating that there is room for some significant improvement.

This point was confirmed by the following: we ran the whole suite of optimization tests again, but with a tighter fractional tolerance on of for the stopping criterion. For each of the ten sample cyclic spectra, we took the lowest value of (regardless of the configuration which achieved that result) as a reference point. Compared to that reference point, we find that the best-performing configuration of the standard-precision tests (i.e. L-BFGS-Lag-Prox; table 1) is worse by , on average, for each cyclic spectrum.

In the high-precision suite of tests we observed that none of the consistent outcomes of table 1 – i.e. L-BFGS better than other algorithms, Prox better than Unit, Lag better than Freq – were reproduced. Not surprisingly, the differences in amongst the 12 tested configurations were considerably smaller than shown in table 1, with the worst-performing configuration being only above the best (cf. in table 1). These facts suggest that in the high-precision tests all configurations have penetrated well into the noise-limited region of the optimization. The penalty for doing so, of course, is that many more steps are required to achieve that outcome — 784 steps, on average, for L-BFGS-Lag-Prox, which is more than 3 times the number of steps required to satisfy our usual stopping criterion (see table 1).

## Appendix B Estimation of model uncertainties

We have already determined the curvature of with respect to the coefficients describing and (equations 28 and 29). For the parameters describing the lag-space representation of the filter, the curvatures can be obtained by taking the real and imaginary parts of the relations

(B1) |

and

(B2) |

where the matrices and are given by

(B3) |

and

(B4) |

Here we have used notation such that means , for example; and we have neglected the contribution from a sum over the residuals, whose expectation is zero.

### b.1. Noise levels for

It is clear that the uncertainties in our parameter estimates depend on the filter coefficients and intrinsic pulse profile. But for our purposes here it suffices to determine rough estimates of the parameter uncertainties. To proceed we therefore consider the particular case . For this circumstance we obtain

(B5) |

and

(B6) |

where is a measure of the total pulsed flux, with

(B7) |

For the lag representation of the filter we find

(B8) |

and for the curvature with respect to the real part of the coefficient is twice this value, whereas there is no curvature with respect to the imaginary part. This last point, which implies a formally infinite uncertainty, should not cause concern because the overall phase of the filter is completely arbitrary.

Using equation 39 we can immediately translate these curvatures into standard deviations. The results are

(B9) |

(B10) |

and

(B11) |

In all these cases the coefficients are complex; the quoted uncertainty is the uncertainty in the real part of the coefficient, which is equal to the uncertainty in the imaginary part. With the exception of one coefficient of , the standard deviation is uniform across each set of coefficients.

In practice the system noise, , is dependent on the total number of radio-frequency channels, , because we have a fixed total bandwidth, , for the instrument. Thus , and equation 14 can be written

(B12) |

A further simplification is appropriate. For cyclic spectroscopy of a pulsar with period , the pulsar’s rotation frequency is necessarily equal to the spacing in modulation frequency, , and in turn this is the natural choice for channelisation, . Thus the natural configuration is , and for this circumstance we obtain

(B13) |

(B14) |

and

(B15) |

### b.2. Noise levels for more general filters

The curvature of the demerit function with-respect-to the various model parameters depends on the structure in the filter functions, as manifest in equations 28, 29, A3, A4, but we have so far considered only the simplest filter, . We now consider how structure in the filter affects the noise level on various parameters.

It is, of course, possible to concoct bizarre examples of filters which imply correspondingly unusual noise properties. But we shall ignore such possibilities as our purpose here is to describe what one might normally expect to encounter in practice. To that end we will restrict our discussion to cases where , and we will characterize the impulse response function by a typical scattering time, , corresponding to a filter decorrelation bandwidth .

Consider first the noise level for the pulse harmonic coefficients. For low harmonics the summation in equation 28 is approximately . But at higher harmonics, where , there is some decorrelation between and and the sum declines. In the limit of complete decorrelation, , the summation yields . Providing that both second- and fourth-order expectation values are of order unity, this is not a big effect. For example, in the random-phasor picture for the electric field the intensity statistics are exponential, so and , yielding a noise level for high harmonics which is larger than for low harmonics. In this picture, the noise level for high harmonics coincides with the value quoted in equation A13, for the case .

Quite a different situation arises for the filter coefficients . It is evident that the curvatures given in equation 29 may be much less than in regions where the filter function is small, with correspondingly large errors on those coefficients. As with the noise on the pulse harmonics, there are two different limiting cases relating to the value of the typical scattering time. Most of the pulsed flux, , is contributed by harmonics up to , where is the temporal width of the pulse. If then the filter function is almost constant over the range of which contributes most to , so the curvature in equation 29 becomes . Clearly this curvature could be very large (small) in comparison with the estimate given in equation A6, leading to correspondingly small (large) errors in the estimates. In the opposite limit, where , the filter coefficient changes rapidly with harmonic number and we obtain a curvature estimate , comparable to that given in equation A6.

Finally we consider the effect of a structured filter on the errors associated with the lag-space filter coefficients, . The curvatures of the merit function with respect to real and imaginary parts are (equations A3 and A4) made up of two terms. The first term is the same in both cases and we expect it to be . The second term differs in sign between the real and imaginary parts of the coefficients; it is the real part of a sum of complex numbers. In normal circumstances those complex numbers bear no particular phase relationship to each other, so the second term is typically small in comparison with the first. We therefore neglect it, and we conclude that in normal circumstances the curvatures given in equation A8 are appropriate to all lag-space filter coefficients.