Microcalorimeter Spectroscopy at High Pulse Rates:
a MultiPulse
Fitting Technique
Abstract
Transition edge sensor microcalorimeters can measure xray and gammaray energies with very high energy resolution and high photoncollection efficiency. For this technology to reach its full potential in future xray observatories, each sensor must be able to measure hundreds or even thousands of photon energies per second. Current “optimal filtering” approaches to achieve the best possible energy resolution work only for photons well isolated in time, a requirement in direct conflict with the need for highrate measurements. We describe a new analysis procedure to allow fitting for the pulse height of all photons even in the presence of heavy pulse pileup. In the limit of isolated pulses, the technique reduces to the standard optimal filtering with long records. We employ reasonable approximations to the noise covariance function in order to render multipulse fitting computationally viable even for very long data records. The technique is employed to analyze xray emission spectra at 600 eV and 6 keV at rates up to 250 counts per second in microcalorimeters having exponential signal decay times of approximately 1.2 ms.
1 Motivation and goals
The Transition Edge Sensor (TES) is an excellent calorimetric spectrometer with applications in physical chemistry, materials analysis, gammaray spectrometry for nuclear forensics, and xray astronomy, among many other uses. A variety of satellites and sounding rockets are under development or have been proposed as platforms for TESs or other cryogenic microcalorimeters (to which our results also apply). Currently, an array of nearly 4000 TESs is planned as the focal plane for the Xray Integral Field Unit (XIFU) on the recently selected European xray satellite Athena (Ravera et al., 2014).
Absorption of a single xray photon produces a current pulse lasting for a fraction of a millisecond to a few milliseconds under typical TES operating conditions. The current stateoftheart “optimal filtering” techniques for pulse height estimation work well only when pulses are isolated from all earlier and later pulses over an isolation period several times longer than this. Such techniques can deal with pulse pile up only by discarding nonisolated pulses. Hence, optimal filtering is a good approach in the limit of low photon rates. When the product of the exponential decay time of the sensors and the photon count rate approaches and exceeds 0.05 to 0.10, however, optimal filtering forces increasingly difficult compromises between photon throughput and energy resolution. Relieving this tension through faster time constants is difficult; faster pulses generally require alterations to the sensor designs and—more critically—additional readout bandwidth. To preserve resolution for sensors with a decay time of 150 microseconds (as the Athena plan specifies), pileup rejection becomes increasingly wasteful of photons at a pulse rate per sensor of 100 counts per second or more.
Instruments such as the Athena XIFU have a wide range of science targets (Barcons et al., 2012). Some extended targets such as galaxy clusters and emission from filaments of the Warm Hot Interstellar Medium require highresolution spectroscopy across the full field of view of the XIFU. These targets are particularly well matched to TES sensors. However, the same instrument must also perform spectroscopy on compact objects whose spatial extent at the focal plane is set by the pointspreadfunction of the optics, objects such as neutron star binaries and stellar mass black holes. In these cases, the full xray flux of the source will be incident on one or just a few TESs and the count rate capabilities of the individual sensors will strongly affect measurement quality. The XIFU specifications require at least 80 % of photons to be measured with the highest resolution for 1 milliCrab source intensities. The count rate from a milliCrab source will depend on parameters that are still under discussion, such as mirror area, but this requirement corresponds approximately to achieving output count rate of 40 counts per second (cps) per TES at an input count rate of 50 cps. For sources that exceed 1 milliCrab, the fraction of lost photons will be even higher than 20 %. One viewing strategy for bright compact objects is to insert a diffuser that spreads the xrays across a larger area of the focal plane. Alternatively, improved signalprocessing strategies may increase photon throughput. We describe here a pulseanalysis technique that can dramatically advance the goal of preserving spectral resolution at high count rates relative to existing methods.
To perform any given observation as quickly as possible and make efficient use of the highest photon fluxes, we must be able to estimate pulse heights in the presence of some modest amount of pileup with resolution comparable to that achieved on isolated pulses. Previous xray telescopes such as the ASTROE (Boyce et al., 1999) and ASTROH missions (Takahashi et al., 2012) address the problem by “event grading.” Pulses not isolated enough to use the best optimal filter use a shorter “midgrade” filter; pulses not suited even for the shorter filter are simply averaged over several successive samples near the pulse peak, for “lowgrade” filtering. Even this method, though, is hampered by having minimum pulseisolation requirements.
We present here a different approach based on linear superposition of pulses. Its primary goal is to break the tight link between photon throughput and energy resolution that the isolatedphoton requirement imposes. This requirement arises in the traditional analysis because optimal filtering is designed to consider only one pulse at a time; our method relaxes this rule. Our secondary goal is to create a method simple enough and fast enough to implement in firmware or hardware, so that it can be performed on a spacecraft. It should handle pulse pileup in a way that works at low pulse rates, too, and not rely on different analysis modes for high and low rates. It is also important that the method not require substantially more data to be sent back from space than a traditional method does, given the limited bandwidth for communication from space; the method should yield a pulse height and arrival time, and not more than a few additional quantities for each xray pulse.
We first describe the standard technique of optimal filtering and show how it generalizes naturally to the estimation of multiple pulse heights simultaneously in a long record of sensor data, by a form of weighted leastsquares fitting (Section 2). A straightforward application of this idea to long data records would be computationally quite expensive, the cost scaling quadratically with the samples per data segment. This high computational cost was noted by an earlier group that applied this idea to astronomical microcalorimeter data from a brief suborbital flight (Crowder et al., 2012). One possible compromise—to make an implicit assumption of white noise—has been used to create FPGAbased hardware performing simultaneous pulseheight fits for very fast silicon drift diodes (Scoullar et al., 2011). We present a method to reduce the computational complexity of the fitting procedure even in the presence of nonwhite noise by approximating the noise autocorrelation function as the result of a loworder autoregressive moving average (ARMA) process (Section 3). This powerful approximation has not previously been used with xray microcalorimeter data.
The technique is then applied to two terrestrial TES measurements. The data were recorded by microcalorimeters designed for xrays up to 10 keV. The first data set (Section 4) employs an array of forty sensors at energies below 1 keV. Because the xray energy is far below the saturation energy, the energyresolving power is modest but the detectors are highly linear. The xray source is the beamline U7a at the Brookhaven National Laboratory’s National Synchrotron Light Source (NSLS). This example application is chosen to have both high photon rates and minimal nonlinearity.
The other data set (Section 5) consists of the 5.9 keV and 6.4 keV fluorescence emission of a manganese target illuminated by an xray tube source. The emission is measured by a single detector over a range of photon rates from 9 to 100 photons per second. Figure 1 shows 100 ms of observation from each of the five data rates, giving a sense of the problem being addressed. The TES calorimeters are less linear at these energies than they are below 1 keV, while detector linearity is an assumption intrinsic to our method. Still, the multipulse fitting method shows impressive performance even at high rates with only the simplest of corrections for nonlinearity. Highrate astronomical spectroscopy will present problems due to sensor nonlinearity that are beyond the scope of this work, though in Section 7 we offer some suggestions on ways to move beyond the technique featured here.
2 Pulse height estimation from optimal filtering to multipulse fitting
The data used for estimating microcalorimeter pulse amplitudes consist of the regularly sampled and digitized values of the timevarying bias current through the sensor. The sample rate is typically to samples per second per TES. In the Athena XIFU—an array of 4000 sensors—1 GB of raw data will be generated in each second of operation, or nearly 100 TB per day. While this high data rate demands fast analysis procedures, the extremely high resolving power of the microcalorimeters also means that one must take great care not to lose information or to introduce new systematic errors. Balancing the competing requirements of computational and statistical efficiency is among the central challenges in analyzing TES data.
We distinguish here between estimating the amplitude of the TES current pulse and the energy of the absorbed photon, not only because the scale factor between the two is not known a priori, but also because the relationship is generally not a linear one. Understanding this nonlinear mapping is an important concern in analyzing TES data, but it is beyond the scope of the current paper. We will assume for now that precise and accurate estimates of pulse amplitudes are a necessary and sufficient condition for making similarly good estimates of pulse energy and focus on the problem of amplitude estimation.
2.1 Optimal filtering of isolated pulses
The standard technique for estimating microcalorimeter pulse amplitudes begins with the identification of pulses and their arrival times from a long stream of regular TES current samples (“triggering”), and the subsequent selection of a predetermined number of samples both before and after the pulse onset (generating a “pulse record” surrounding the trigger time). The amplitude estimate for each pulse is then a certain weighted sum of the values in the pulse record. The best set of weights is the socalled optimal filter (Szymkowiak et al., 1993; Boyce et al., 1999; Lindeman, 2000). This filter can be derived in various ways—as a minimumvariance unbiased estimator, or as a maximumlikelihood estimator—starting from these assumptions:

Pulses always arrive long after any energy deposited by earlier xrays has fully dissipated. The sensor has returned to its steadystate electrical and thermal conditions.

Pulses are all identical in shape, regardless of energy. Only a single scale factor—the pulse amplitude—varies from one pulse to another.

The pulse shape is known ahead of time, perhaps by averaging together many pulses from an initial training portion of the data.

Noise is additive and follows a multivariate Gaussian distribution.

The noise is stationary (in particular, it does not vary as the current in the sensor evolves), and its autocorrelation function (or equivalently, its power spectral density) is known.
Not one of these assumptions is strictly correct, yet the procedure yields splendid results in a wide variety of conditions, at least for timeisolated pulses. In this paper, we consider violations of the first condition: cases where some measurable amount of residual energy remains in a sensor when a new pulse arrives. This problem is known as pulse pileup. Our approach to the problem is to assume strict sensor linearity, so that the current in the sensor is simply the sum of the decay to equilibrium that would have been seen in the absence of the new pulse, plus the new pulse as it would have been measured in isolation. This assumption, of course, is also incorrect; we will explore how far we can go with it, nevertheless.
Returning to the standard analysis, let be the covariance matrix of the noise. That is, its elements are expectation values (here, ) of twopoint products for a pulsefree data sequence ,
Because the noise is assumed to be stationary, the symmetric matrix is Toeplitz, i.e., for some sequence , the noise autocovariance function. A symmetric Toeplitz matrix is fully specified by its first row or column.
Let the assumed pulse shape, called the pulse model, be the vector^{1}^{1}1We treat all vectors as column vectors. and the measured data be the vector . We choose the length of both vectors in advance (typically by fixing the record length at dataacquisition and triggering time). In the simplest case, there is nothing expected in the data but the pulse of unknown amplitude plus zeromean noise, i.e.
and it can be shown that the best (minimumvariance, unbiased) estimate of the pulse height is
Here we have introduced , the optimal filter for estimation of a single amplitude. It is a weighting vector, which lets us estimate by taking an inner product with the data vector.
Most cases are not so simple, and one must account for additional terms in the data other than noise and the pulse itself. One specific term that must be included in nearly every analysis is an additive constant, called the baseline. This constant arises because systems for measuring the tiny currents through microcalorimeters generally have unknown (and slowly varying) DC offsets. Removing the DC term specifically is straightforward (Doriese et al., 2009), but we prefer a more general treatment. We have previously introduced a framework for constrained optimal filtering (Alpert et al., 2013), in which the filter is the minimumvariance filter chosen not from the set of all possible weights, but only from those that are strictly insensitive to one or more additive “nuisance terms”. In this picture, we replace the vector with a model matrix whose first column is (the pulse shape); other columns contain all anticipated nuisance terms. One such column might be a column of ones, if DCinsensitivity is required; another might be a decaying exponential chosen to render the filter insensitive to the tail of an earlier pulse (provided its decay time is known). In this approach, the best estimate of the pulse height becomes
(1) 
Here we use the unit vector to select only the term that corresponds to the amplitude of the pulse shape component and to discard all other terms.
It can be shown that each new component introduced into the model increases the variance of the estimator , but in practice the signaltonoise price paid is often quite small, particularly when the records are long or the nuisance terms are very different from the pulse shape .
These results use the noise covariance matrix rather than the noise power spectrum. Use of the Fourier space representation of noise is both possible and convenient, but it introduces an additional wrong assumption of periodicity; in fact, neither the finitelength pulse nor the noise in a record are periodic. The signaltonoise cost for filtering in the Fourier domain may be small in some cases (Alpert et al., 2012), but we prefer to avoid paying unnecessary costs, however small.
2.2 Filtering as fitting for pulse amplitude
A different and productive view of the optimal filter (Equation 1) is also possible. Suppose, as before, that we are given the noise model (the matrix ), the pulse shape (), and a complete set of components in a model of the pulse records (the columns of matrix ). If we also knew the components’ true amplitudes , we could compute the likelihood, the probability under the linear model that the data would be . Ignoring normalization factors, the likelihood is
For a fixed measurement and model, the logarithm of is a quadratic function of the unknown parameters . If we maximize the likelihood with respect to the vector by setting
then we find a closedform expression for the maximumlikelihood estimates of the parameters:
(2) 
of which Equation 1 is one component. Note that the estimate is unbiased:
because . This establishes that the minimumvariance estimator for the pulse height quoted in the previous section is in fact also the maximumlikelihood estimate of the pulse height, provided that one fits simultaneously for the amplitudes of all components of the model (including the baseline level, unwanted pulse tails, and so forth).
The covariance of our parameter estimates can also be computed:
The notation is simplified if we designate the system design matrix as
(3) 
Then
(4)  
(5) 
This estimate of the parameter covariance will be useful in what follows.^{2}^{2}2Equation 4 can also be useful in the event that we wish to perform filtering with an approximate noise covariance matrix , but we believe the true covariance to be . See Section 3.4.
2.3 Firstorder treatment of the variation in xray pulse arrival times
One source of systematic error in pulseheight analysis is the undesirable dependence of the estimated pulse heights on the exact arrival time of an xray photon. Photons arriving at different times with respect to the TES currentsampling clock will appear to have slightly different pulse shapes, and this effect is often large enough to degrade the energy resolution if left uncorrected.
A full cancellation of this arrivaltime effect presents a major unsolved challenge to developers of pulseprocessing techniques. Nevertheless, we find that the effect can be suppressed substantially by a firstorder treatment. We assume that the pulse model is the result of regularly sampling an unknown, smooth “pulse shape” function with a sampling period of :
where is the pulse amplitude, and is the unknown exact arrival time of the photon. We can treat as a parameter of the pulse model; unfortunately, the model is nonlinear in this parameter, invalidating the analysis of the previous section. If the function is expanded in a Taylor series about and only the linearorder term is kept, however, then the result is necessarily linear:
In this approximation, we can treat and as the two coefficients of a linear model and solve for them simultaneously by using the model pulse and its time derivative as two of the columns of model matrix and solving Equation 2. For this purpose, the time derivative of a pulse is not known exactly, but it can be approximated by the first finite difference of the pulse model .
2.4 Fitting for multiple pulse heights simultaneously
Within the assumptions of the linear model, is the maximumlikelihood estimator for the amplitudes of all components in the model simultaneously. To this point, we have supposed that the columns of consist of one pulse shape and multiple nuisance terms such as a constant baseline and possibly the pulse’s time derivative. We need not restrict the model to contain only one pulse, though. We can instead analyze a segment of data known to contain two or more pulses. Equation 2 will handle this case equally well; in such situations, more than one parameter of will estimate a pulse height of interest.
This approach, which we call multipulse fitting, allows much greater flexibility in handling pulses that are not cleanly separated in time. Rather than simply rejecting both members of a closely spaced pair, we can determine the linear combination of two pulses that best fits the measurements. How close in time two pulses can be without invalidating the estimates is now a decision that can be made at analysis time instead of at dataacquisition time. In any event this approach is much more tolerant of modest pileup.
Figure 2 illustrates the procedure. The example data (labeled “Raw” in the figure) depict a 73 ms segment (5700 samples) from an observation of the fluorescence of NHNO illuminated by a monochromatic 542 eV xray beam. Xray photons struck this particular detector at an average rate of 215 counts per second (the arraywide rate was over 8000 cps); 15.6 are expected and 14 are seen during this example. Assuming that the noise covariance and the expected pulse shape have already been determined during a “training pass” through some or all of the data, the first step in analyzing this record is to locate the pulse arrival times. For this purpose, the data are convolved with a filter having weights . This has the effect of finding the difference between each data sample and the extrapolation of the leastsquares fit of a line to the five previous samples. This convolution (“Trig”) is checked for large positive excursions, which define the trigger times. In this record, 14 photons are found. The matrix is constructed with 5700 rows (the record length) and 29 columns. One column is the constant value 1 (representing the unknown baseline level); 14 are the standard pulse model, each offset in time by the appropriate trigger time; and 14 are the finitedifference approximation to the pulse model’s first time derivative (used as described in the previous section to allow for a firstorder correction to the exact pulse arrival time). For convenience, the standard pulse model is scaled to have exactly unit height, which ensures that the estimate of the amplitude has the same arbitrary units as the raw data. Given this and an estimate of the noise , Equation 2 can be solved for the estimated parameters, yielding 14 pulseheight estimates, 14 arrivaltime corrections, and one constant baseline level. We find that all photons are 525 eV oxygen fluorescence except for the sixth, which is a 397 eV nitrogen fluorescence xray. The best fit to the data, given by , is shown (“Model”), followed by the dataminusmodel difference, scaled by a factor of 10.
The parameteruncertainty estimates (Equation 5) predict for most pulses a pulseheight uncertainty of 0.9 eV (or 2.0 eV fullwidth at halfmaximum, FWHM). When pulses are piled up, these uncertainties increase, but only slightly. Pulses 8 and 9 have 2.1 eV predicted FWHM, while pulses 10 and 11, although separated by only 0.4 ms, have 2.4 eV. The matrix also predicts correlations among the estimates of . The Pearson correlation coefficient between well separated pulses is typically , though closely spaced pulses can have strong anticorrelations. Correlations of and are found between pulses 8 and 9 and between 10 and 11, respectively. All pulse heights are anticorrelated with the baseline estimate, with in all cases.
It should be apparent that multipulse fitting requires a different strategy for selecting segments of data to store for later processing. Instead of implementing a software trigger and recording fixedlength records appropriate for the duration of a single pulse, the dataacquisition software needs to allow for longer records whenever pulses pile up on one another. For the test cases explored here, we have taken this approach to its extreme limit and simply recorded all data samples without interruption. This choice consumes the maximum possible storage space but has the advantage of simplicity during data acquisition.
When the online system does not separate the data naturally into distinct records, the offline analysis system must do so. The approach used here is to choose a default record length, typically 10 to 50 ms. When a pulse occurs too close to the end of such a record, the record is shortened (and the next record lengthened) as needed to ensure that the pulses in any one record have minimal coupling with the next record. At extremely high pulse rates, no gaps of satisfactory size would occur, and a different solution would be required.
3 Through the noisewhitening bottleneck
Important practical difficulties can arise when Equation 2 is used for multipulse fitting. Data sets can contain very many records per second, each of length or . Noise covariance matrices of size are costly to invert; for , it can be impossible even to store or . Because of the Toeplitz symmetry, the Levinson algorithm (Press et al., 2007) can be used to solve in time with only storage. This is a great improvement over general matrix solutions, which require times as much of both time and storage, yet Levinson is too slow for general use in applications involving large TES arrays. Socalled “super fast Toeplitz solvers” requiring only operations have been proposed in recent years (Chandrasekaran et al., 2008). Unfortunately, they have very high breakeven points, and midsize problems like the current one are unlikely to reap the benefits larger problems will from the reduced asymptotic complexity.
The model matrix , whose columns are the components of the data model, is different for every segment of data. While two segments might have the same number of pulses, their pulses will be at different times; for all practical purposes, these are entirely different data models. Since each data segment will have its own , evaluating the maximumlikelihood estimates of the parameters requires computing , or one solution of for each model component for each data segment.
Of course, a standard (Levinson) solution of the full noise covariance matrix will be practical for smaller problems. Readers concerned only with such cases may skip over the remainder of Section 3, which discusses approximations to for more demanding situations.
3.1 Noise whitening the data and the model
The expression for the best estimate of the model parameters, Equation 2, twice uses the inverse of the noise covariance matrix. The true noisecovariance matrix of any random process is positive definite.^{3}^{3}3An empirical estimate of is not guaranteed to be positive definite if the covariances are computed from data in the usual, unbiased way. A different estimator of can be employed that is guaranteed to be positive definite, though it is also biased (Brockwell & Davis, 2009). In practice, we have never found it necessary to use the biased estimator for the noise covariance. It follows that is invertible, that its inverse is also positive definite, that both admit a Cholesky decomposition, and that both have strictly positive eigenvalues.
It is therefore possible to construct matrix , which is a linear noisewhitening operation in the following sense: the covariance of a whitened data vector () is
If satisfies
(6) 
or equivalently
(7) 
then the whitened data vector is linear in the data and has uncorrelated noise of unit variance.
We write the maximumlikelihood fit for the parameters and the design matrix with tildes indicating whitened vectors or matrices:
(8)  
(9) 
The effect of whitening the data and the model components is to reduce the problem from one of nonwhite noise (requiring correlated weights) to the simpler problem of an unweighted leastsquares fit of a linear model. We stress, however, that these two equations are completely equivalent to Equations 2 and 3.
At least three straightforward constructions of the whitener satisfy Equations 6 and 7:

Let be the eigendecomposition of , where and is the diagonal matrix of eigenvalues. Then is a whitener.

Let be the Cholesky decomposition of , so that is an upper triangular matrix. Then is an uppertriangular whitener.

Let be the Cholesky decomposition of , so that is a lower triangular matrix. Then is a lowertriangular whitener.
If we order the components of the data vector as running from earlier to later times, then this last has the appealing property of being strictly causal, in that depends on only for .
3.2 ARMA representations of the noisecovariance function
Instead of pursuing superfast Toeplitz solvers for this work, we have chosen to make approximations to the noisecovariance function that defines . The extreme approximation is simply to pretend that the noise is strictly white, and therefore is diagonal. Though convenient, this approximation is inappropriate for typical microcalorimeter data. We have found in practice that making even an apparently very rough approximation to can yield performance that is much closer to the ideal than the naïve assumption of white noise does.
Specifically, we approximate as the sum of a small number of decaying exponentials. These exponentials can be complex, allowing oscillatory as well as decaying behavior in . A simple and very widely studied class of stochastic models yields exactly this form for the noisecovariance function. In the signalprocessing literature, these models are known as autoregressive movingaverage (ARMA) models. The theory of ARMA models is described at booklength in Box et al. (1994) and Brockwell & Davis (2009). Briefly, the outlook is that some “input” process of independent zeromean, unitvariance Gaussian deviates is passed through an infinite impulse response filter to produce the correlated noise of the “output” . When the filter consists of an order moving average filter and an order autoregressive filter, it is called an ARMA() process:
A random sequence generated in this way has a noiseautocovariance function that obeys
(10) 
where the exponential bases are the inverse of the roots of the polynomial . Stability of the ARMA process requires that . For to be real, it is also required that each and be either real, or a member of a complexconjugate pair such that and . We have assumed for simplicity that the roots are distinct. In Equation 10, we see that if , the sumofexponentials form applies only after one or more exceptional values for small . In the current work, we have used throughout, so that is the only exception to the general form. This choice works well when a large part of the noise variance is white, that is, concentrated in the component.
The appeal of approximating the noise as the result of an ARMA() process is that its noisecovariance matrix can be factored as in time, the result can be stored in memory, and the whitener can be applied in time. This approximation thus gives us a way to evaluate the maximumlikelihood pulse amplitudes (Equation 2) and all other nuisance components in a time strictly linear in the length of the data record.
It is beyond the scope of this paper to explain, in general, how to work from an observed noise covariance and choose the order of an ARMA model or its coefficients. The work of De Groen & De Moor (1987) explains how one can estimate the exponential bases of Equation 10, though little guidance is given on the choice of the model order . Once the bases are chosen, estimating the amplitudes is straightforward. Moving from these numbers to a fast solver of the approximate is the topic of a future manuscript, now in preparation.
Figure 3 shows ten example ARMA(3,3) noise models along with the residual difference between measured and modeled noise. Although visible structure is apparent in the residuals, calculations in Section 3.4 show that the discrepancies between measurement and model are small enough to have only negligible effect on the precision of parameter estimates.
3.3 Noise as an ARMA(1,1) process
In some situations, it might suffice to approximate the noise as an ARMA(1,1) process. For such cases, and to sketch the form of the solution when , we show explicitly how to whiten data with ARMA(1,1) noise. The result summarized here is completed in Appendix A.
We seek the whitener of type 3, the inverse of the lower Cholesky factor of , which is also lower triangular. First, write the noisecovariance function as
(11) 
We require , and all parameters to be real. The amplitude of the decaying term is , and the additional white noise term is .
Let the matrix be the banded lowertriangular Toeplitz matrix with 1 on the diagonal and on the first subdiagonal, and zeros elsewhere. The product is a symmetric tridiagonal matrix, which can be factored in time as . We can rewrite this as . The desired type3 whitener is therefore the lowertriangular matrix
(12) 
Expressions for elements of , recursions to compute elements of , and the solution to apply for ARMA(1,1) noise are all given in Appendix A.
We expect that the ARMA(1,1) approximation (Equation 11) improves on the pure whitenoise approximation enough to be a valuable first step in many applications. If not, and if computational costs are not too high, multipulse fitting (Equation 2) with a direct Toeplitz solver for an arbitrary Toeplitz matrix is an alternative to approximating the noise as an ARMA process.
3.4 Estimating the effect of approximate noise whitening
In this section, we have proposed a fast noise whitening where the noise covariance matrix is only approximated as . That is, , but the equality is inexact. What is the effect of this approximation on the estimates of pulse heights and other parameters when we compute them using the approximate optimal filter given in Equation 2? Returning to Equation 4 for the expected parameter covariance, we see that arises in the middle of that equation as the covariance of the data vector: . If we believe the true covariance of the data to be some matrix , then the estimate of parameter covariances becomes
(13) 
This equation can be used to assess whether any particular approximation to the true noise covariance will appreciably degrade the statistical power of a multipulse fit or of a traditional or constrained optimal filter. For example, we can use it to evaluate what value of is sufficient when approximating the noise as an ARMA() process. In the example analyses below, we have used Equation 13 to determine that an ARMA(3,3) model suffices for the noise examples shown in Figure 3.
4 SubkeV xray fluorescence: an example application
We have performed measurements at a synchrotron using beam energies less than 1 keV, establishing the viability of multipulse fitting at very high rates in the closetolinear regime of the TES sensors^{4}^{4}4The specific TES design employed here works up to 10 keV, so the sensors are quite linear below 1 keV. The twopeaked nitrogen Kline emission from NHNO samples allows us to estimate the energy resolution at 390 eV and to compare the performance of multipulse fitting against standard optimal filtering at that energy and a mean photon rate of 46 cps. Measurements of the beam scattered from a Nylon target demonstrate that good energy resolution is achieved even at 260 cps while retaining over 80 % of the photons.
4.1 The NIST spectrometer at the NSLS
We developed a TES array of 60 sensors and installed it on the soft NIST U7A beamline at the NSLS, where it was in use from 2011 until the NSLS permanently ceased operation in late 2014. The U7A beamline source is a bending magnet with an energy range of 180 to 1200 eV, which is optimized for operation to study the K lines of boron, carbon, nitrogen, oxygen, and fluorine; and the L lines of elements potassium to gallium. The NIST spectrometer (Ullom et al., 2014) used three timedivision multiplexer channels (Reintsema et al., 2003) each reading out twenty TES sensors apiece. The multiplexing electronics switched between sensors every 640 ns, so the sampling rate for any given sensor was one sample per 12.8 s. In the measurements described here, taken on June 14, 2012, some 40 to 45 sensors were operating.
The TES detectors used in the measurements described in this and the next section have pulses with a characteristic exponential decay time of 1.0 to 1.2 ms. The Athena mission’s XIFU focal plane will employ TESs approximately six to eight times faster. Therefore, pulseanalysis techniques which we show to work with the current sensors at 100 to 200 cps per detector might scale to 1000 cps with each Athena TES.
4.2 Demonstration spectra
Figure 4 shows the xray fluorescence spectra from seven measurements over an energy range including the K lines of carbon, nitrogen, and oxygen. A multipulse fitting analysis was used to obtain each spectrum. Two spectra show the emission from NHNO when the probe beam was tuned to 423 eV and to 542 eV. In the former case, the nitrogen emission “line” is split into a twopeaked complex. Chemical effects cause the splitting, which results from the very different bonding environments of the nitrogen atoms in the ammonium and in the nitrate ions (Vila et al., 2011). The split nitrogen line offers an excellent opportunity to assess the energy resolution resulting from different analysis methods.
To achieve the best possible energy resolution, a few small corrections are required after the basic multipulse fit described in Section 2.4. One is a small gaindrift correction. The gain of each TES sensor is found to be anticorrelated with its steadystate baseline level. Although each gain varies by only a few parts per thousand, the derived energy resolution improves somewhat when fitted pulse heights are scaled by a number linear in the baseline level. The correction has a single variable parameter per sensor, which is chosen by minimization of the Shannon entropy of the corrected pulseheight spectrum. Lower entropy corresponds to our general sense of “sharper” spectral features, so this heuristic is appropriate in a spectrum dominated by unresolved or marginally resolved lines.
A similar oneparameter correction is applied to remove a small, quadratic dependence of pulse height on the pulse arrival time. (The arrival time is found via a firstorder linear model for the effect of time offset, as described in Section 2.3.) Again the parameter is chosen to minimize the spectral entropy. Both the gaindrift and the arrivaltime corrections are generally employed in our standard optimalfilter analysis, and are not unique to the multipulse fitting technique.
The third correction is unique to multipulse fitting, but only because this technique permits analysis of pulses with far more pileup than is possible in other analysis approaches. We make a small correction quadratic in the “residual current” flowing through the sensor just before the present pulse arrives. As with the others, this correction requires only one parameter per sensor, is not more than a few eV in the most extreme cases, and is selected by entropy minimization of the corrected spectrum. At low pulse rates, this correction has no measurable effect on the data quality, but at the highest rates, it can improve the resolution from 5 or 6 eV to 4 eV.
Multipulse fitting is intended as a method to improve on the either the photon efficiency or the energy resolution (or both) offered by standard optimal filtering. Using the NHNO data with 423 eV xray beam (Figure 4, top panel), we have made a direct comparison of MPF and standard filtering. The perdetector average photon rate in these data is 46 cps. We compare the MPF results against the results from a suite of six different optimal filters. Four are of the standard type, in which the filter is explicitly insensitive only to addition of a constant level. These filters are 20, 10, 5, and 2.5 ms long (the shortest is 195 samples long, given the 12.8 s sample time). The other two filters are constrained optimal filters (Section 2.1) of length 5 and 2.5 ms, with the constraint being that they are insensitive not only to the addition of a constant, but also to decaying exponentials with time constants 0.64 and 1.024 ms (50 and 80 samples). These time constants correspond to the typical sensor’s signal decay at a few ms and at a few tens of ms after the peak of a pulse. Multiple decay time constants are not unusual in microcalorimeter pulses.
Each of the six standard or constrained optimal filters has a corresponding “veto window”—effectively a dead time imposed offline—inside of which no other pulse is permitted. Each filter is therefore applied to a unique subset of the full 1,038,000 pulses observed in this measurement. For the four standard filters, it was required that the previous pulse arrive at least 4 ms before the start of the filtered record, and that the next arrive at least 0.1 ms after the end of the record. The veto windows are thus 24.1, 14.1, 9.1, and 6.6 ms long. As the constrained filters are designed specifically for insensitivity to the tails of any prior pulses, their requirement was relaxed so that the prior pulse could arrive as little as 0.5 ms before the start of the filter. This yields windows of length 5.6 and 3.1 ms. At the mean data rate of 46 cps, the expected efficiency of the timing cuts ranges from 33 % for the longest standard filter to 87 % for the shorter of the constrained filters.
Figure 5 shows spectra generated with the six optimal filters and with MPF. As described above, the optimal filters can increase photon efficiency by employing shorter data records, by building in insensitivity to the exponential tails of previous pulses, or both. The increased efficiency always comes at the price of poorer energy resolution, as shown by the family of spectra; the more photons in a spectrum, the less distinct the two nitrogen fluorescence peaks become. The MPF results, on the other hand, demonstrate energy resolution as good as the longest standard filter’s while achieving 96 % photon throughput, well above that permitted by any optimal filter’s corresponding timing cuts.
4.3 Energy resolution at various count rates with MPF
To understand how the energy resolution changes with the photon rate, we use the measurements made of fluorescence and scattered xrays from the Nylon target with a 573 eV beam (Figure 4, bottom panel). For this purpose, we study the width of the peak corresponding to elastically scattered beam photons. They make up only some 1 % of the total photons, but their value is that the natural line width (set by the beamline’s monochromator) is approximately 0.6 eV FWHM, which is too narrow to be resolved with the TES spectrometer. Therefore, the measured peak width serves as a direct measurement of the instrumental resolution.
We varied the intensity of the scattered and fluorescence xrays by moving the spectrometer. The detector array was placed approximately 2 cm from the Nylon target to achieve the highest intensity and 11 cm for the lowest intensity. At the higher rates, the sensors in one of the three multiplexer columns were much closer to the fluorescenceemission spot on the target, and the xray pulse rate was much higher for these sensors. Consequently, we treat the five spectrometer positions as measurements at ten distinct data rates, handling the higherrate column (18 of the TESs) separately from the other sensors.
We estimate the spectrometer resolution by fitting a Gaussian peak (plus a line to represent background photons), always using data from 568 to 578 eV. The parameters are chosen to maximize the full Poisson likelihood, in order to reduce the expected parameter bias (Fowler, 2014). Figure 6 shows the results. The resolution is found to degrade only a small amount even as the xray count rate is increased from 10 to 262 cps. Pulses are cut only if the prior pulse falls within 0.55 ms or the next pulse falls within 0.32 ms. At the highest rate, 80 % of the pulses survive these cuts, yielding a 4.2 eV resolution at an input rate of 260 cps per sensor.
The bright oxygen K line does not have a known line shape that we can use as a primary tool for estimating energy resolution. Nevertheless, we can use the line to corroborate the resolution found using the scattered beam. A KolmogorovSmirnov test (Press et al., 2007) shows complete consistency between the O line shapes for the two lowestrate measurements (roughly 10 and 25 cps), so we combine these and assume them to be the results of smearing the true shape by some unknown amount. The higherrate data are then fit to find the additional Gaussian smearing required to make the best agreement. If we assume that the resolution of the two lowestrate data sets is 3.0 eV FWHM, as established from the scattered beam, then the total resolution implied by the oxygen line shape is also consistent with the scattered beam results at higher photon rates: 3.6 eV at 150 cps and 4.5 eV at 260 cps.
Figure 7 shows the rate of photons passing all timing and other cuts as a function of the incident photon rate. Each sensor appears seven times, once for the MPF analysis and for each of six standard optimal filter analyses. Although cuts other than timing are applied, the throughput is largely governed by timing cuts, and so the measured data all lie close to the theoretical lines.
In summary, the Nylon fluorescence results show that of approximately 4 eV is possible at rates exceeding 250 cps, a resolution consistent with that provided by a traditional filter of length 10 ms. Yet the MPF can achieve this using a vastly higher fraction of the photons than the traditional filter can. Furthermore, we have established that the technique is workable with dozens of separate detectors and over a very wide range of photon rates.
5 Multipulse fitting in the nonlinear regime: a second example
To complement the lowenergy limit, discussed in the previous section, we also made measurements at 5900 eV. These data consist of the Kline fluorescence emission of manganese metal. For this purpose, electrons are accelerated in a standard commercial xray tube source onto a rhodium target, which emits brehmsstrahlung photons that illuminate the manganese secondary target. By varying the electron current, we are able to produce xray count rates on a TES microcalorimeter up to hundreds of counts per second. We performed the measurement using a single TES in a nonstandard nonmultiplexed readout system. In this system, the single detector’s sample rate was one sample per 640 ns. Examples of the raw TES current appear in Figure 1.
Multipulse fitting was performed on the manganese fluorescence data in the same manner described in the previous section. The same small corrections detailed in Section 4.2 were applied here: gaindrift, arrivaltime, and a leadingorder nonlinearity correction. The sensor resolution was assessed via the twopeaked shape of the Mn K line complex, assuming the natural line shape given by Hölzer et al. (1997) as a sum of seven Lorentzian profiles.
The measured energy spectrum at 6 keV is not simply a convolution of the ideal energy spectrum with a Gaussian, as we could safely assume in the previous section. Observations indicate that an exponentially decaying tail towards low energies is an important component of the energy response. Such a tail is discussed quantitatively in Bortels & Collaers (1987), for example. In most Mn K data, we find that approximately 10 % of all pulses must be attributed to the exponential tail, which has a typical scale length of 20 to 30 eV. Both the fraction and the scale length are allowed to vary when fitting for the instrument response. The remaining 90 % of pulses are unaffected by the exponential tail. The distribution of all pulses—tail or otherwise—is then convolved with a Gaussian, and it is the full width at halfmaximum of this Gaussian which we label “the energy resolution.” When the fraction of pulses in the tail exceeds about 20 %, however, it is not at all clear that the Gaussian width still represents the quantity of interest. Whether the lowenergy tail represents physical effects in the detector or systematic effects of the data analysis is unknown, but we expect that both causes play important roles. We fit for the lowenergy component in analyzing all Mn K spectra and indicate cases where it appears to comprise at least 20 % of all pulses.
Input rate (cps)  9  14  27  50  100 cps 
MPF pulse rates:  
Loose  8.4  14.0  25.2  44.9  78.7 cps 
Medium  7.8  12.4  20.4  30.5  36.5 cps 
Strict  7.2  10.9  16.3  20.0  14.7 cps 
Energy Resolution:  
Loose  2.35  2.83  3.61  4.53  4.88 eV 
Medium  2.25  2.43  2.94  3.75  4.16 eV 
Strict  2.12  2.28  2.61  3.55  3.55 eV 
OptFilt pulse rates:  
2.5 ms, exp  8.0  13.2  23.3  40.1  68.3 cps 
5 ms, exp  7.9  12.9  22.2  36.6  56.4 cps 
2.5 ms  7.7  12.2  20.4  31.6  41.2 cps 
5 ms  7.5  11.9  19.3  28.0  32.4 cps 
10 ms  7.3  11.3  17.3  23.0  21.9 cps 
20 ms  6.8  9.9  13.7  14.9  9.1 cps 
Energy Resolution:  
2.5 ms, exp  3.93  4.07  4.82  6.13  6.41 eV 
5 ms, exp  3.21  3.36  4.02  5.43  5.56 eV 
2.5 ms  2.95  3.18  3.40  4.56  4.52 eV 
5 ms  2.82  3.10  3.32  4.50  4.27 eV 
10 ms  2.58  2.75  3.20  3.89  4.54 eV 
20 ms  2.45  2.59  3.02  3.77  5.00 eV 
Figure 8 shows the line shape and the bestfit energy resolution (Gaussian FWHM) at five different photon rates up to 100 cps, after we performed multipulse fitting and imposed a strict cut primarily against pulses that are too closely piled up on the tail of the prior pulse. This cut is chosen to achieve the best possible energy resolution, though it passes only 40 % of pulses in the 50 cps data set. To show that much higher photon efficiency is possible at only modest cost in energy resolution, we also apply two other cuts that are less selective against pulse pileup. The upper half of Table 1 states the resolutions and usable pulse rate for each input rate and for the three cuts. The data at the highest rate of 100 cps fit the model with a large fraction of events (40 % to 50 %) attributed to the lowenergy exponential tail with either of the two less restrictive cuts. For this reason, we are reluctant to call the width of the Gaussian component in the 100 cps data its “energy resolution”.
The usable subset of Mn K pulses can be selected either with aggressive cuts to minimize systematics (and optimize energy resolution) or with more relaxed cuts to favor the highest possible pulse efficiency. In the points connected by solid lines in Figure 9, we show the sort of balance this allows; the points from left to right show the energy resolution as cuts vary from stricter to more permissive. The main variable among the three choices of cut criteria is how much “leftover energy” is allowed in the TES when a given pulse arrives—that is, how much prior signal a pulse can be piled upon. Pulses suffering the most from nonlinear detector effects can thus be eliminated. A similar resolutionefficiency tradeoff is also possible in standard optimal filtering (points on the same plot connected by dotted lines). Here, six different filters are chosen, having the same record lengths, pretrigger fractions, and constraints as described in Section 4.2.
The multipulse fitting technique offers both a higher maximum possible pulse efficiency than optimal filtering and superior energy resolution at any given level of throughput. It accomplishes this because the assumption of pulse linearity allows us to relax the pulseisolation requirements. Analysis of any given pulse uses more data than would be possible with standard optimal filtering. This matters because pulse heights are meaningful only as a height relative to a varying baseline level. One way to view the excellent performance of multipulse fitting is that one can use the extra data to make superior estimates of this baseline.
The data at the lowest rates establish that this sensor is capable of 2.1 eV resolution at 5900 eV. The resolution degrades as the photon rate is increased. Importantly, the degradation appears equally in the multipulse fitting and in the optimally filtered data. It appears to be a nonlinearity effect, inadequately addressed by the simple leadingorder correction. This degradation with increased rate might seem to be a weakness of multipulse fitting, but we do not believe that it is intrinsic to the technique. More sophisticated methods for handling the nonlinear evolution of pulse shapes and sizes with increasing “residual sensor energy” are needed, but they are not incompatible with the basic concept of fitting the data records as a sum of modeled components. Even without them, MPF achieves energy resolutions of 4 eV at 50 to 100 cps with a slow ( ms) TES detector.
6 Prospects and possible extensions
The prospects for the use of multipulse fitting in future xray satellite missions depend on controlling both the computational costs of the method and the systematic errors that arise because the technique assumes linearity from an inherently nonlinear sensor technology.
We have argued that when one approximates the noise as an ARMA process, the cost of multipulse fitting to a data set of length scales linearly in , just as with optimal filtering. Though true, this claim ignores two problems: first, that the overall scale factor grows rapidly with the number of pulses being fit at once. If pulses are fit in the typical data section, and parameters are fit per pulse,then multipulse fitting requires, among other computations, the inversion of a square matrix of size . This cost can grow quickly, so multipulse fitting should be done on the shortest reasonable segments of data. Future work will investigate alternatives for selecting data segment lengths, in order to control the number of floatingpoint operations without compromising energy resolution.
Another costly feature is that multipulse fitting requires computations to be done on all samples in a long data sequence, instead of creating fixedlength records from only a fraction of the data. In the limit of high pulse rates, though, most data samples are incorporated into the fixedlength records anyway, and the difference becomes smaller.^{5}^{5}5Indeed, if records were allowed to overlap, then the optimal filter technique would need to use substantially more than 100 % of the samples at high rates. Furthermore, even with optimal filtering of short records, one must convolve the complete raw data stream with some kernel in order to identify trigger points.
Aside from practical concerns of computational costs, sensor nonlinearity is the most likely source of trouble with the fully linear multipulse fitting technique as presented here. How to address nonlinearity in this context—or in the more familiar context of optimal filtering—remains a major open issue for the entire field of microcalorimetery. We are not prepared to settle the question now, but one approach is to begin by expressing pulse records as linear combinations of basis vectors. Such a basis might be found empirically, through Principal Component Analysis of data vectors, or by imposing an a priori model. Either way, multipulse fitting means precisely the approximation of pulse records as a linear combination of specified vectors; it is therefore fully compatible with a such a decomposition. Although a complete nonlinearityaware analysis may estimate pulse energies as some nonlinear function of the amplitudes in the decomposition, one can still benefit from the fitting algorithm we have described in this work to perform the linear decomposition in the first place.
One possible approach to accelerate further computation is to express basis vectors (after noisewhitening) as the sum of a small number of decaying (potentially complex) exponentials. If this can be done for all basis vectors, then inner products of the form can be computed analytically, in a time independent of the data length (the height of the matrix). How amenable realistic basis sets are to such an expansion is unknown and is likely to vary considerably over the range of TES designs, noise environments, and spectrometer applications.
Another extension to the basic MPF approach that might be useful in some instances would be handling of nonstationary noise. In full generality, nonstationary noise would defeat the simplifications that the pure ARMA noise model introduces. Still, it is possible to combine the ARMA model with certain forms of nonstationarity and retain the former’s speed advantages. One such form is the case of a noisecovariance matrix , where is a diagonal matrix determined by the TES current, and is representable as a stationary ARMA noise process.
7 Conclusions
Multipulse fitting is a technique to extend the virtues of optimal filtering into the regime of high photon rates. It fits simultaneously for the pulse heights of multiple piledup pulses in an extended period longer than the usual optimalfilter length. Because pulses are permitted to overlap in time, the technique softens the usual requirement for long periods of isolation before and after any single valid pulse.
Just as optimal filtering does, multipulse fitting takes advantage of inversenoise weighting to achieve the maximum possible signaltonoise ratio consistent with the data and the model assumptions. While this weighting is computationally expensive in the general case, we find success in replacing the full estimated noisecovariance function with a simplified approximation to it, a loworder ARMA model.
The multipulse fitting technique has been applied to measurements from a single TES microcalorimeter operating at 6 keV and to an array operating at 400 to 600 eV. In the former case, the energy resolution remains better than 5 eV at the Mn K line for rates up to 100 cps. At the lower energy, the resolution of an array of 2.5 eV devices remains as low as 4.2 eV even up to 262 cps per sensor, and using a full 80 % of the pulses. The nitrogen fluorescence line complex of an ammonium nitrate target exhibits 2.5 eV resolution at 46 cps and 96 % pulse efficiency.
We have shown results using a multipulse fitting technique on microcalorimeter measurements in the regime of very high photoncount rates and sensor linearity. Although nonlinearity and computational costs are not fully solved problems, we expect that this concept can be usefully adapted to a wide range of microcalorimeter data sets and believe it to be a promising step toward an analysis chain optimized for the demanding requirements of future xray space missions.
Appendix A Whitening ARMA(1,1) noise
Section 3.3 gives partial results for whitening data when Equation 11 gives the noise covariance function. That is, it covers the case when the noise is described by an ARMA(1,1) process and its covariance is a decaying exponential plus a delta function at time 0. The product is a symmetric tridiagonal matrix with values
The Cholesky factorization is straightforward. Label the values on the diagonal of and on the subdiagonal . They can be computed from the recursion:
and stored for later use.
Equation 12 shows that whitening the vector to yield requires solving . A recursion for and is
This gives the whitened data in terms of the raw data and the elements of the Cholesky factor of .
The procedure for whitening (“inverting”) an ARMA() process is similar, though necessarily more complicated than the case given in this Appendix.
References
 Alpert et al. (2012) Alpert, B. K., Doriese, W. B., Fowler, J. W., & Ullom, J. N. 2012, Journal of Low Temperature Physics, 167, 582
 Alpert et al. (2013) Alpert, B. K., Horansky, R. D., Bennett, D. A., Doriese, W. B., Fowler, J. W., Hoover, A. S., Rabin, M. W., & Ullom, J. N. 2013, Review of Scientific Instruments, 84, 056107
 Barcons et al. (2012) Barcons, X., et al. 2012, ArXiv eprints http://arxiv.org/1207.2745
 Bortels & Collaers (1987) Bortels, G. & Collaers, P. 1987, International Journal of Radiation Applications and Instrumentation. Part A. Applied Radiation and Isotopes, 38, 831
 Box et al. (1994) Box, G. E. P., Jenkins, G. M., & Reinsel, G. C. 1994, Time Series Analysis: Forecasting and Control, Third Edition (PrenticeHall)
 Boyce et al. (1999) Boyce, K. R., et al. 1999, SPIE’s International Symposium on Optical Science, Engineering, and Instrumentation, 3765, 741
 Brockwell & Davis (2009) Brockwell, P. J. & Davis, R. A. 2009, Time Series: Theory and Methods (Springer)
 Chandrasekaran et al. (2008) Chandrasekaran, S., Gu, M., Sun, X., Xia, J., & Zhu, J. 2008, SIAM Journal on Matrix Analysis and Applications, 29, 1247
 Crowder et al. (2012) Crowder, S. G., et al. 2012, The Astrophysical Journal, 758, 143
 De Groen & De Moor (1987) De Groen, P. & De Moor, B. 1987, Journal of Computational and Applied Mathematics, 20, 175
 Doriese et al. (2009) Doriese, W. B., Adams, J. S., Hilton, G. C., Irwin, K. D., Kilbourne, C. A., Schima, F. J., & Ullom, J. N. 2009, in American Institute of Physics Conference Series, Vol. 1185, American Institute of Physics Conference Series, ed. B. Young, B. Cabrera, & A. Miller, 450–453
 Fowler (2014) Fowler, J. W. 2014, Journal of Low Temperature Physics, 176, 414
 Hölzer et al. (1997) Hölzer, G., Fritsch, M., Deutsch, M., & Härtwig, J. 1997, Physical Review A, 56, 4554
 Lindeman (2000) Lindeman, M. 2000, Ph.D. thesis, UC Davis
 Press et al. (2007) Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. 2007, Numerical recipes : the art of scientific computing, Third Edition (Cambridge)
 Ravera et al. (2014) Ravera, L., et al. 2014, SPIE Astronomical Telescopes + Instrumentation, 9144, 91442L
 Reintsema et al. (2003) Reintsema, C. D., et al. 2003, Review of Scientific Instruments, 74, 4500
 Scoullar et al. (2011) Scoullar, P. A. B., McLean, C. C., & Evans, R. J. 2011, Applications of Nuclear Technology, 1412, 270
 Szymkowiak et al. (1993) Szymkowiak, A. E., Kelley, R. L., Moseley, S. H., & Stahle, C. K. 1993, Journal of Low Temperature Physics, 93, 281
 Takahashi et al. (2012) Takahashi, T., et al. 2012, in SPIE Astronomical Telescopes + Instrumentation, ed. T. Takahashi, S. S. Murray, & J.W. A. den Herder (SPIE), 84431Z–84431Z–22
 Ullom et al. (2014) Ullom, J. N., et al. 2014, Synchrotron Radiation News, 27, 24
 Vila et al. (2011) Vila, F., Jach, T., Elam, W., & Rehr, J. 2011, The Journal of Physical Chemistry A, 115, 3243