[
Abstract
We apply a messenger field method to solve the linear minimumvariance mapmaking equation in the context of Cosmic Microwave Background (CMB) observations. In simulations, the method produces sky maps that converge significantly faster than those from a conjugate gradient descent algorithm with a diagonal preconditioner, even though the computational cost per iteration is similar. The messenger method recovers large scales in the map better than conjugate gradient descent, and yields a lower overall . In the single, pencil beam approximation, each iteration of the messenger mapmaking procedure produces an unbiased map, and the iterations become more optimal as they proceed. A variant of the method can handle differential data or perform deconvolution mapmaking. The messenger method requires no preconditioner, but a highquality solution needs a cooling parameter to control the convergence. We study the convergence properties of this new method, and discuss how the algorithm is feasible for the large data sets of current and future CMB experiments.
Subject headings:
methods: data analysis – methods: statistical – cosmic background radiationMapmaking with a messenger field]Cosmic Microwave Background Mapmaking with a Messenger Field
1. Introduction
Cosmic Microwave Background (CMB) experiments measure the temperature and polarization of radiation from the surface of last scattering. To reduce systematics, a CMB telescope scans the sky in constant motion. As an important step in the analysis, we must assemble the scanning data into sky maps, but doing so is computationally expensive. Janssen & Gulkis (1992) suggested a linear, minimumvariance solution to the mapmaking problem for COBE, and Tegmark (1997) showed that this solution preserves the information content of the data. While there are other methods that also preserve information, this solution makes few assumptions and produces an unbiased map with minimized errors, which are appealing features.
As CMB experiments achieved higher resolution, the computational scaling and increased number of map pixels made brute force approaches to the minimum variance solution impractical, and researchers turned to iterative linear system solvers that could find the solution without explicitly performing an intractable matrix inversion (Wright et al., 1996; Doré et al., 2001; Natoli et al., 2001; Dupac & Giard, 2002; Stompor et al., 2002; de Gasperis et al., 2005). Preconditioned conjugategradient descent is the currently preferred method and these solvers have subsequently been optimized for massivelyparallel highperformance computing environments (e.g. Cantalupo et al., 2010).
Preconditioners speed up the convergence to the solution. These are approximate matrix inverses whose purpose is to lower the condition number of the matrix in the linear system. With simple preconditioners, these methods typically converge in a few hundred iterations for CMB data sets. With a more carefully design preconditioner, the number of iterations can be greatly reduced (Næss & Louis, 2014; Szydlarski et al., 2014), but the preconditioner must be tuned carefully to fit the characteristics of the CMB survey (using e.g. a typical scanning pattern). Such computations can be much more costly per iteration.
In this paper we introduce a new algorithm for map making, based on a messenger field method. Messenger fields were introduced by Elsner & Wandelt (2013) to circumvent the inversion of a dense covariance matrix in the computation of the linear Wiener filter. The messenger method is iterative, and requires no preconditioner. Along with the map, it solves for an additional field (the “messenger”), but only requires inversions of sparse matrices. Beyond solving the linear system for the Wiener filter, the messenger field can be used to generate samples of the signal and signal covariance via Gibbs sampling. Several subsequent works have applied the messenger method to problems in cosmology, including CMB, gravitational lensing, and large scale structure (Anderes et al., 2015; Jasche & Lavaux, 2015; Lavaux & Jasche, 2016; Alsing et al., 2016). As a relatively new approach, it is still being refined to improve the convergence (Kodi Ramanah et al., 2017) and to treat wider varieties of covariance matrix structures (Huffenberger, 2017).
The approach to the mapmaking problem that we present here uses messenger fields, but differs from the above works in two important respects. First, we solve a different linear system in the mapmaking problem than in the Wiener filter. Mapmaking makes no assumption about the signal’s covariance. Second, in all the above works, the data and the signal vectors have the same dimensionality, that is, the same number of pixels. This is not true in mapmaking, since the timeordered data for all detectors contains vastly more samples than the number of pixels in the temperature and polarization components of the resulting maps. The time samples are tied to positions on the sky map by the pointing of the telescope’s detectors. Many time samples refer to the same map pixel, and the crosslinking of the scans is important to optimize the measurement of the sky.
The tests we report here indicate that our new messenger mapmaking algorithm can converge much faster than the preconditioned conjugate gradient descent method, at a similar computational cost per iteration. Furthermore, the method is feasible for large data sets, an important feature for modern CMB experiments that utilize many thousands of detectors and multiyear campaigns.
This paper is organized so that in Section 2 we introduce the messenger mapmaking method. In Section 3, we present the results of the method applied to simple simulations that allow a careful examination of the convergence properties. In Section 4, we discuss scaling up the method to real data sets. In Section 5, we conclude with a brief description of a preliminary application to ACTPol data. Three appendices provide derivations of the method for single pencil beam and for composite (e.g. differential) beam experiments, and also present an extension that can treat complications in the noise properties, but at increased computational cost.
2. Methods
The standard model for a vector of timeordered data is
(1) 
where is the pointing matrix that scans the sky (vector ), and is zeromean noise in the timestream. The pointing matrix contains the weight that a particular time sample provides to a map pixel (in temperature and polarization). The minimumvariance, unbiased estimate for the sky is given by the mapmaking equation,
(2) 
where is the covariance matrix of the noise. This solution minimizes
(3) 
and therefore maximizes the Gaussian probability . It is also unbiased, . The noise covariance for data timestreams is approximately diagonal in the frequency domain, which makes the multiplication of the data by the inverse noise matrix tractable. However, the matrix inverse is not directly computable for such correlated noise and large numbers of pixels, and the problem requires a linear system solver. Conjugate gradient descent methods, for example, move to the left hand side of equation (2) and prescribe iterative updates to to find the solution.
2.1. Messenger field mapmaking
In an appendix, we show that iterating the following equations provides an alternative solution to the mapmaking equation:
(4)  
for and . This approach is inspired by the messenger field method of Elsner & Wandelt (2013). The messenger field is a weighted average of the data and the map (projected into the timestream by the pointing matrix).
We are free to choose any that leaves positive definite. In the simplest implementations, we let . In this specific case, drops out of the second equation. Since the identity matrix is diagonal in any basis, this choice makes all the matrix inversions trivial so long as time samples are each associated in with one sky position only (the single, pencil beam approximation). For , the messenger field is our statistical best estimate for a portion of the timestream that contains the signal and uniform, uncorrelated noise only, while the full timestream contains correlated noise as well. For that case, the estimated map is a simple coaddition of . The value is the variance of the uniform noise in , and so cannot be larger than the global minimum of noise in the data. We will discuss , the “cooling parameter,” shortly.
Although the messenger field has this nice interpretation, it is large for real data sets, the same size as the timeordered data. It is preferable to precompute smaller data objects and discard the timeordered data and similar objects from memory. For the second equation we do not actually need , just , and so we can combine the equations, rewriting the iteration using mapsized objects:
(5) 
Here is a binned map of filtered data,
(6)  
and is a matrix filter applied to the current iteration,
(7)  
The Woodbury formula leads to the last equation. Together these yield another compact form for the mapmaking algorithm, which is
(8) 
although this form requires the full timeordered data.
For practical map making, the main computational costs per iteration are the projection of the map into a timestream, the Fourier transform to the frequency domain for filtering by sparse matrices, the inverse Fourier transform back to the time domain, and the projection back to the map domain. These costs per iteration are comparable to conjugate gradient descent with a diagonal preconditioner, but in our examples we find the messenger field method can converge to a higherquality solution in many fewer iterations.
For general values of the cooling parameter , the iterations instead minimize
(9) 
where . This minimization leads to the dependent, unbiased map solution
(10) 
which is optimal only for .
For very large , and the next iteration gets no contribution from the current one. Then the map is an unbiased weighted map of the data: . For , this is a coaddition of . Regardless of , if an iteration is unbiased (), the messenger field has mean , and the next iteration will be unbiased as well.
The residual in a map compared to the solution at a particular is
(11) 
Thus the convergence of the remaining residual to zero depends on the eigenvalues of matrix . If an eigenvalue is much less than unity, the corresponding eigenmode will converge fast. If it is close to unity, the mode will converge slowly. Looking at the structure of , we may expect that this transition point will occur (for frequency domain noise) around , so that modes less noisy than that will rapidly converge to their proper values for .
For modes that correspond to frequencies that are much more noisy than , the solution and our desired are the same. Following Elsner & Wandelt (2013), we thus establish a “cooling schedule” for that takes it from large values to as we iterate. We must lower the value of so that lessandless noisy modes converge to their final values, but not too quickly, or noisy modes will get stuck with eigenvalues close to unity before converging. A proper cooling schedule will permit noisy modes to converge quickly at large , and then let less noisy modes converge to their proper values as , speeding the convergence to the ultimate solution.
2.2. Composite pointing matrix
The above discussion of messenger mapmaking supposes that the matrix is simple to invert for suitable . This is true in the single pencil beam approximation, which has only one pixel entry (or polarized pixel block) in the pointing matrix per time sample. However, it is not true for differential experiments like COBEDMR or WMAP, which sample and difference widelyseparated directions on the sky. In the differential case the pointing matrix has a and a pixel entry per time sample in the pencil beam approximation, corresponding to the pointing of the two sides of the differencing assembly (Wright et al., 1996; Hinshaw et al., 2003). The pointing matrix has even more entries when attempting deconvolution mapmaking, where a portion of the beam is placed in the pointing matrix and deconvolved in a regularized way (Armitage & Wandelt, 2004; ArmitageCaplan & Wandelt, 2009; Cantalupo et al., 2010).
The case with two components has the form
(12) 
but we could generalize this to an arbitrary number of components. In the differential case the component could have entries and the component could have entries. We also split the messenger covariance as . This form allows us to make progress because matrices like , and sums of such matrices, can be made simple to invert.
In an appendix, we show that the messenger mapmaking solution in this case still has the form , with slightly modified terms (compare equations 6 and 7):
The matrix is a weighted sum of weight maps, and as before can be made (block) diagonal for simple matrices. Another compact form for the mapmaking solution is:
which recalls equation (8).
There are some important differences to the single beam case. Here for large , so even in that case, the initial condition matters. Unlike before, large does not automatically ensure an unbiased map. In some cases, for example when and all are diagonal, then is proportional to the diagonal part of . Then, in the limit of large , equation (2.2) expresses a weighted Jacobi method that will converge toward the unbiased binned map solution . At other the map will converge toward an unbiased but nonoptimal (see equation 10). If the method achieves an unbiased map at any , or is initialized with an unbiased map, subsequent iterations will stay unbiased.
2.3. Simulations
To validate and study the messenger mapmaking method, we prepared mock timeordered data based on crosslinked raster scans of model instruments over a simulated sky. The scanned area is square and two degrees across. The statistics of the sky signal are not important for the mapmaking solution, which depends only on the noise covariance and the pointing. For the signal in the test case, we used a collection of compact and extended axisymmetric polarized and unpolarized sources. This lets us observe how different scales converge. The source amplitudes range from a few hundred microkelvin to a few millikelvin and their signals also vary within pixels. All the mapmaking algorithms we tested here were subject to the same subpixel signal bias, so it does not figure into the relative comparisons.
The single beam instrument model uses an orthogonal pair of raster scanning patterns, and has nine polarization sensitive detectors, at various orientations, that are offset from the telescope boresight by up to 1.8 arcmin. The noise for each detector is uncorrelated and has a power spectrum with powerlaw and white noise components,
(15) 
where K) and Hz. Each raster scan lasts for seconds and covers the bulk of the scanning area. The scans are unphysical: the boresight moves in one direction only without turnarounds, and jumps back to the start with a frequency 0.46 Hz (every 2.17 seconds), while stepping ahead monotonically in the crossscan direction at the end of each scan.
We made a second set of simulations to test the differential mapmaking case. In this test we simplified the differential model instrument to have a single temperatureonly detector (with two beams). We offset the side and the side beams from the boresight in opposite directions, each by 9 arcminutes. We scanned over the same model as before, fixing the angle of the boresight so that the separation between the beams is perpendicular to the scanning direction. As a simplifying assumption, we discarded subpixel structure. At the mapmaking resolution we used for the singlebeam model, we find that the differential scenario is not wellcrosslinked for the same pair of orthogonal scans. Therefore we rotated the raster pattern to nine different orientations, shortening the angular length of each scan so that they still fit within the two degree square of the model. We adjusted the noise level of the detectors so that we spend the same total effort for both the differential and singlebeam cases, although the singlebeam scan pattern distributes it more evenly.
3. Results
3.1. Single beam
In Figure 1, we compare the results of the messenger mapmaking procedure to a conjugate gradient mapmaker on the same timeordered data. The two algorithms converge to nearly the same endpoint, but in strikingly different ways. The messenger method starts after one iteration with an unbiased binned map full of scanning stripes. As the iterations proceed, the maps remain unbiased and become more optimal. The stripes fade away, in this case after a few tens of iterations, and the map in the crosslinked region resembles the input map (Figure 2). This map used a cooling schedule that we discuss below.
The conjugate gradient maps converge much more slowly on large scales. Initialized to zero, these maps start highly filtered, showing only the minima and maxima positions of the final map. As the iterations proceed, the maps gradually become less biased. A pulse of signal propagates from each pixel (most obviously from the bright extrema), filling out the rest of the map. Even at 500 iterations, however, the map has large scale distortions compared to the messenger map and the input map. These tend to emanate from the uncrosslinked regions. There are also other narrow glitches at the map edges that are not present in the messenger map at 40 iterations.
The mapmaking procedure is supposed to minimize (equation 3). In Figure 3, we show that decreases monotonically for iterations of both the messenger and conjugate gradient map makers. We evaluate in the Fourier domain as a sum of contributions from every Fourier mode for all detector timestreams for all scans. The conjugate gradient maps have a that drops rapidly, but reaches a plateau.
The convergence for the messenger method depends strongly on the cooling schedule. With no cooling schedule, at constant , the falls quickly, but the noisy modes are slow to converge, and it would take hundreds or thousands of iterations to achieve a quality solution.
We implement cooling schedules where is set very high for one iteration, which starts us with an unbiased binned map. Therefore subsequent iterations are also unbiased. Then is set so that the levels of are logarithmically distributed between the maximum and minimum noise power, and we repeat each level for a fixed number of iterations. This naturally concludes with for several iterations to finish the solution. Thus the schedule with eight levels of cooling, for five iterations each, we denote as . At a particular value of , the messenger maps will reach a plateau in after a few iterations as the low noise modes rapidly converge. If there are noisy modes left to converge, they will proceed slower, and falls more and more slowly. When we lower , the rapidly falls toward the new equilibrium. Cooling schemes can lead to better overall values than the conjugate gradient solution, in fewer iterations, and lead to better maps, particularly on the large scales that converge at high .
For this simulation, we argue that the cooling scheme is wellconverged, based on comparisons among several cooling schemes and the fact that the plateaus in are very flat, indicating that noisy modes were well converged at previous levels. In the cooling scheme, the plateaus are not flat, so more convergence could be achieved at each level. Even so, the cooling scheme after 40 or so iterations achieves a lower final than the conjugate gradient algorithm after 500 iterations.
Although is a good diagnostic, it does not tell us what parts of the solution are converging. It is instructive to examine where the contributions to are coming from. In Figures 4 and 5, we examine per Fourier mode for a single detector during a single scan. In the ideal limit, where the scan is completely in the crosslinked area and there is so much data that , the per mode for a segment of timeordered data should be distributed with two degrees of freedom (for the real and imaginary portions of the Fourier coefficient) but rescaled to have unit mean. We are not in that limit here, and the distribution of values per mode is a complicated interplay of the noise and the crosslinking of the scanning pattern. It varies as a function of frequency, and has a substantial contribution from subpixel bias.
There is also no simple relationship between the frequencies in the time ordered data and the eigenmodes of that converge in the map, except for a rough correspondence that high frequencies in the timestream tend to produce small scale features in the map, and low frequencies tend to produce large scale features (but not exclusively).
In Figure 4, for messenger iterations with constant (no cooling), we can see that the high frequencies find their final values first. Lower and lower frequencies follow, but progress slows down as we are digging into noisier and noisier modes. These plots are binned in frequency to average over nearby values, giving a rough idea of the mean of the distribution of as a function of frequency. They are compared to the wellconverged cooling scheme.
In another panel we show iterations at a different fixed value for . Here Hz, and it converges toward the solution for that . For that solution, the per mode approaches its final value for very low frequencies. At Hz the per mode is lower than in the fully converged solution, while it is higher at high frequencies. Essentially, this solution overfits the Hz component of the noise and allows the high frequency part of the map to be too noisy compared to the ultimate () solution.
We also compare the per mode for a cooling schedule and a conjugate gradient mapmaker. In the cooling schedule solution, the early, higher iterations also have low at low frequency, but as approaches unity the iterations force the of the low frequency modes up to their final values. This balances the penalty in between low and high frequencies. Compared to the others, the conjugate gradient solver converges very quickly at high frequencies, but even after 500 iterations, it does not converge at frequencies around 1 Hz.
In Figure 5 we compare the final per mode for the messenger and conjugate gradient algorithms, extending to even lower frequencies. We can see more clearly that the conjugate gradient solution’s average per mode is above those for the pair of messenger solutions, and that even the relatively quick cooling schedule provides a nearly converged map by comparison. We note that the worst frequencies for the conjugate gradient method are near to the telescope scanning frequency (0.46 Hz), and also at the very lowest frequencies in the data.
We repeated this exercise with another simulation where all nine detector pointings coincided with the telescope boresight, and sampled pixel centers exactly, eliminating the effect of subpixel bias. The messenger maps here are indeed unbiased, with no apparent residuals from the sky signal. Without subpixel bias, the overall values are substantially lower. The comparisons to the conjugate gradient maps lead to the same conclusions as before: conjugate gradient maps show large scale features that are not present in the messenger maps with cooling. Like before, the cooling scheme has the lowest . Unlike before, the cooling scheme has a slightly higher than the conjugate gradient maps (at 500 iterations), but if the messenger algorithm is allowed to continue at , it achieves a lower after 46 total iterations.
3.2. Differential beams
To verify our solution for a differential experiment, we implemented equation (2.2) using , and applied it to our differential beam simulation. We solve for the map and examine the residuals compared to input model. (Recall that we discarded subpixel structure for this case.) For the solution in Fig. 6, we take one step at very high , then take with a cooling schedule.
In these examples, the effect of the differential beams is visible as a bias in the first iteration of the map. Bright objects are surrounded by a ring of shadow images (of opposite sign). These show the positions of the other beam when one beam has scanned over the bright object. This example has nine scans of two beams, so there are eighteen shadows from each bright object. All structures in the map are surrounded by similar features, which for extended objects blend together into doughnutlike shapes.
There are two distinct phases in the iterative solution. First, in the early iterations, the shadow features fade away, and this debiasing dominates the reduction of . In this example, the shadows have mostly disappeared on small scales after a dozen iterations, and the map becomes nearly unbiased. Second, the subsequent iterations mostly serve to make the reconstructed map more optimal. After 160 iterations, our solution in Fig. 6 has eliminated much of the noise. The reconstruction errors that remain on medium to large scales appear to be noise, and change when varying the noise realization.
A test that narrows the separation between the differential beam pair from 18 arcminutes to 4.8 arcminutes provides a solution that is unbiased on small scales, but has a large scale bias that persists after a cooling schedule, and does not appear to depend on the particular noise realization. Running for more iterations at high before starting the cooling schedule helps to reduce that bias, and so we conclude that the bias at least in part comes from a schedule that cools prematurely, before the map became unbiased on large scales, and erroneously freezes in the beam shadows of the large scale features.
Another possibility we need to consider is missing modes. Depending on the crosslinking, differential experiments may struggle to represent certain modes on the sky. The mean of the map is always unconstrained, for example, and other modes could be unconstrained also, particularly on scales larger than the beam separation. Equivalently, this means the matrix has less than full rank and so is not invertible. To probe particular modes of the differential solution more carefully, we wanted to examine a brute force solution, but this is only practical at very low resolution. We downsampled the model from the original map resolution of pixels to pixels. We then generated timeordered data from that map (with no subpixel structure) using the same scan pattern and noise realization. Fig. 6 shows the messenger solution at low resolution with the same cooling scheme. The messenger solution displays reconstruction errors that are similar to the high resolution solution.
At this lower resolution, we computed the brute force solution by tabulating the entries of explicitly. From that matrix we computed eigenvectors, eigenvalues, and the Penrose pseudoinverse via singular value decomposition. We find that 1794 eigenmodes have eigenvalues much larger than the others. Large eigenvalues indicate that modes are well constrained. Since 1795 pixels have one or more observations, it appears that only one signal mode, the mean, is unconstrained in the map. For the case with the narrower beam separation, at low resolution again only the mean appears to be unconstrained, which supports the hypothesis that the bias we saw there was due to insufficient iterations at high to debias the map before cooling.
We keep only the singular values of the wellconstrained modes in our computation of the pseudoinverse. The brute force solution in Fig. 6 shows smaller reconstruction errors than the messenger solution. In Fig. 6 we also show the two eigenmodes of with the largest eigenvalues and the two modes with the smallest eigenvalues (of the wellconstrained modes). The most constrained modes characterize the central region with the most crosslinking. The least constrained modes are large scale gradients. When we compute the solution with the pseudoinverse, we see that the remaining reconstruction error does indeed have a large scale gradient.
In the low resolution case, we also see that the messenger method with a cooling schedule in Fig. 6 has not achieved reconstruction errors as small as the brute force solution. On the other hand, at 160 iterations it has a that is only a fraction higher than the brute force solution (comparable to the conjugate gradient mapmaker’s quality in the single beam case), so the large scale feature is not very significant. Letting it continue to iterate at until 320 iterations lowers the steadily to a fraction above the brute force value. We tested several fixed and adaptive cooling schedules. The quickest we achieved a that is a fraction above the brute force value is about 110 iterations, although that solution has larger large scale noise residuals than what we see in Fig. 6. A fraction above the brute force value we can achieve after 670 iterations or so. The slowest and most stringent case we tried reached a fraction above the brute force value after about 2700 iterations.
For many applications, a small number of messenger iterations may produce a map of sufficient quality. It is also possible that a more clever approach to the cooling schedule could improve these results.
4. Discussion of large surveys
We discuss two points that are important in applying this method to the large detector arrays in current and future CMB experiments. First, a practical algorithm should avoid storing the timeordered data in memory, and second, it must treat the diversity of detector noise properties inherent in a multiseason groundbased CMB observing campaign.
4.1. Projecting the data to mapsized objects
To minimize its memory footprint, a mapmaking code should ideally avoid storing the full timeordered data from iteration to iteration, and the computer should not read the data in more than once to avoid costly input/output operations.
In equation (5) we showed that the messenger algorithm depends on the data only through the mapsized object , which is far smaller than the full data. If we establish a cooling schedule with levels in advance, we can load the timeordered data once from disk (in chunks), precompute the set of maps, and then flush the data from memory. (In parallelization schemes where segments of the timeorder data are distributed among computational nodes, each node needs only the parts of the maps that the data touch.) This limitation to fixed levels constrains the onthefly adaptability of our cooling schemes, but not the number of iterations per level. We have shown already that simple cooling schemes can work well.
For real data sets, we do not want to compute per iteration because it too requires the full data. However, in our single beam test case we have shown that the behavior of per mode for individual detectors can be indicative of the convergence overall. The per mode for representative detector timestreams thus can be a useful online diagnostic.
4.2. Heterogeneous data
Over the course of a multiseason campaign, differing observing conditions and loading will alter the minimum white noise level for detectors. If , then must represent the global minimum noise for all timestreams going into the map. This same argument applies to mixing timestreams from observatories with heterogeneous experimental designs, as in the current planning for the Simons Observatory and CMBS4. This single value will be too low for most detectors, and cause the convergence for them to go slower than it could have. A better choice for is piecewise proportional to the identity matrix, and uses appropriate values for each detector and each uncorrelated timesegment of the data:
In general, good choices for can be best understood in terms of a hierarchical forward model for the timeordered data:

Map is the sky signal.

Messenger field is plus noise drawn from .

Data is plus noise drawn from , so that the total amount of noise in is .
The more noise that we can put into , the better the method will work (while keeping positive definite). In the limit that all the noise is in , then and the method “converges” in one single step. In practice we cannot do that because we cannot invert when for realistic cases, as it reverts to the standard mapmaking problem.
So our goal for is that it should contain as much of the noise detail as possible, while still allowing for simple matrix inversions of , , , and . Uniform noise per detector timestream in fulfills this goal.
5. Conclusions
We have presented a mapmaking algorithm based on messenger fields. The procedure is faster, and by several indications, produces higher quality maps than preconditioned conjugate gradient descent mapmakers, at least in our example. It requires no preconditioner, but it does require a cooling schedule for rapid convergence. We showed in our test cases that simple cooling schedules suffice. A straightforward modification to the method allows treatment of differential data and deconvolution mapmaking.
We discussed ways in which the algorithm can scale up to handle real data sets. We have successfully applied the messenger mapmaking method to polarization data from the Atacama Cosmology Telescope’s ACTPol receiver (Niemack et al., 2010), adapting some of the existing mapmaking infrastructure to map one of the sky regions from Næss et al. (2014). We will discuss it more fully in future work, but our early efforts indicate that the map quality can be similar to conjugate gradient descent methods in a fraction of the iterations. We have not worked hard yet to optimize the cooling schedule for the ACTPol data. Like our simulation example here, the preliminary map of the real data appears more fully converged on large scales than its conjugate gradient counterpart. If these preliminary indications bear out, messenger field mapmaking may be a promising and powerful new addition to our CMB analysis toolbox.
Acknowledgments
We thank Thibaut Louis, Arthur Kosowsky, Lyman Page, and Suzanne Staggs for useful comments and encouragement. KMH acknowledges support from NASA grant NNX17AF87G. The Flatiron Institute is supported by the Simons Foundation.
References
 Alsing et al. (2016) Alsing, J., Heavens, A., Jaffe, A. H., Kiessling, A., Wandelt, B., & Hoffmann, T. 2016, MNRAS, 455, 4452, 1505.07840
 Anderes et al. (2015) Anderes, E., Wandelt, B. D., & Lavaux, G. 2015, ApJ, 808, 152, 1412.4079
 Armitage & Wandelt (2004) Armitage, C., & Wandelt, B. D. 2004, Phys. Rev. D, 70, 123007, astroph/0410092
 ArmitageCaplan & Wandelt (2009) ArmitageCaplan, C., & Wandelt, B. D. 2009, ApJS, 181, 533, 0807.4179
 Cantalupo et al. (2010) Cantalupo, C. M., Borrill, J. D., Jaffe, A. H., Kisner, T. S., & Stompor, R. 2010, ApJS, 187, 212, 0906.1775
 de Gasperis et al. (2005) de Gasperis, G., Balbi, A., Cabella, P., Natoli, P., & Vittorio, N. 2005, A&A, 436, 1159, astroph/0502142
 Doré et al. (2001) Doré, O., Teyssier, R., Bouchet, F. R., Vibert, D., & Prunet, S. 2001, A&A, 374, 358, astroph/0101112
 Dupac & Giard (2002) Dupac, X., & Giard, M. 2002, MNRAS, 330, 497, astroph/0110407
 Elsner & Wandelt (2013) Elsner, F., & Wandelt, B. D. 2013, A&A, 549, A111, 1210.4931
 Hinshaw et al. (2003) Hinshaw, G. et al. 2003, ApJS, 148, 63, astroph/0302222
 Huffenberger (2017) Huffenberger, K. M. 2017, ArXiv eprints, 1704.00865
 Janssen & Gulkis (1992) Janssen, M. A., & Gulkis, S. 1992, in NATO Advanced Science Institutes (ASI) Series C, ed. M. Signore & C. Dupraz, Vol. 359, 391–408
 Jasche & Lavaux (2015) Jasche, J., & Lavaux, G. 2015, MNRAS, 447, 1204, 1402.1763
 Kodi Ramanah et al. (2017) Kodi Ramanah, D., Lavaux, G., & Wandelt, B. D. 2017, ArXiv eprints, 1702.08852
 Lavaux & Jasche (2016) Lavaux, G., & Jasche, J. 2016, MNRAS, 455, 3169, 1509.05040
 Næss et al. (2014) Næss, S. et al. 2014, J. Cosmology Astropart. Phys, 10, 007, 1405.5524
 Næss & Louis (2014) Næss, S. K., & Louis, T. 2014, J. Cosmology Astropart. Phys, 8, 045, 1309.7473
 Natoli et al. (2001) Natoli, P., de Gasperis, G., Gheller, C., & Vittorio, N. 2001, A&A, 372, 346, astroph/0101252
 Niemack et al. (2010) Niemack, M. D. et al. 2010, in Proc. SPIE, Vol. 7741, Millimeter, Submillimeter, and FarInfrared Detectors and Instrumentation for Astronomy V, 77411S, 1006.5049
 Stompor et al. (2002) Stompor, R. et al. 2002, Phys. Rev. D, 65, 022003, astroph/0106451
 Szydlarski et al. (2014) Szydlarski, M., Grigori, L., & Stompor, R. 2014, A&A, 572, A39, 1408.3048
 Tegmark (1997) Tegmark, M. 1997, ApJ, 480, L87, astroph/9611130
 Wright et al. (1996) Wright, E. L., Hinshaw, G., & Bennett, C. L. 1996, ApJ, 458, L53, astroph/9510102
Appendix A Derivation of messenger mapmaking
The mapmaking equation minimizes
(A1) 
and maximizes the Gaussian probability . We introduce the augmented with a messenger field
(A2) 
(with ) that corresponds to the Gaussian probability and has the property that
(A3) 
when is marginalized. The field is called the messenger field because it appears both with the map and the data in , but the map and the data never appear together. Thus communicates between the two. The value of at the maximum is the same in both distributions and . By iteratively maximizing the marginal distributions and , we can work our way up to the peak of , and find the that solves the mapmaking equation.
We can write down the marginal distributions by completing the square in for and respectively,
and finding the maximum probability points (the means) of the marginal distributions at
(A5)  
Setting and leads to the iterative solution. We are free to choose so long as is positive definite. Smart choices make the matrix inverses in the means of the marginal distributions easy to calculate.
If , then the matrix inverses are easy in the upper equation and the cancels from the lower equation. Alternatively, can be made diagonal so that pieces of the timeorder data each get their own optimized values, in which case in the next iteration is an weighted average of those pieces of the messenger field.
Appendix B Mapmaking with a composite pointing matrix
We can treat cases with a nontrivial pointing matrix by using a modified formalism where the full pointing matrix is a sum of simpler pointing matrices, each of which includes only a single pixel entry per time sample. The case with two components has the form
(B1) 
but this could be generalized to an arbitrary number of components. In the differential case the component could have entries and the component could have entries. The messenger field covariance will also be split, . Matrices like can be made simple to invert.
The timeordered data has the form
(B2) 
As before, we write down an augmented , but this time with two messenger fields:
(B3) 
Each messenger field has an associated covariance matrix . With few restrictions, we are able to choose these matrices to make subsequent operations simpler. Again we have .
As before we complete the square to find the means of the marginal distributions,
(B4)  
and find
(B5)  
The iterative solution follows from , , and . These equations require inversions only of trivial matrices and are sufficient to find the mapmaking solution. The term is a weighted sum of weight maps (summed over pointing component) when we choose that are piecewise proportional to the identity matrix. The generalization to more pointing components is straightforward.
Despite the appearance of and , we can write the solution in terms of a single messenger field . Then and vice versa. In that case
(B6) 
and similar for . Putting all the terms on the left hand side, and multiplying both sides by , yields
(B7) 
and similar for . If we add the two messenger fields together we find that
(B8) 
which can be rearranged to give
(B9) 
the same as the case with a simple pointing matrix. Thus we conclude that there is only one real messenger field, namely , and and are simply projections of it (using equation B7).
We can plug those projections into the the final term in the expression for the map in equation (B5), and work to express the righthand side in terms of the data and map only:
We used the Woodbury formula in the first term of the last step.
Multiplying by , we arrive at a compact expression for updating the map iterations:
This expresses the mapmaking solution in a way that requires only mapsized objects be stored in memory. The expressions
(B12)  
recall similar ones in equations (6) and (7) for the single beam case. Including the parameters, we have
(B13)  
Appendix C Mapmaking with composite timestream noise
The timestream noise covariance could be complicated and composed of several pieces, like , so that it by itself is difficult to invert in any easily accessible basis. If the subcomponents are invertible in separate and convenient bases, Huffenberger (2017) showed that the messenger field method can be extended to handle such cases. That work examined Wiener filters, but the same idea applies for mapmaking. For the case we need additional messenger fields and two matrices . (More complicated noise will require even more additional fields.) Iterating the following equations will yield the mapmaking solution:
(C1)  
where and similarly for .
Gap filling, for example, can be naturally incorporated into this scheme. We can let be the timestream noise (diagonal in the Fourier domain) and let have infinite variance during gaps in the timestream (diagonal in the time domain) but minimal other noise. This is similar to the treatment of spatial masks in Elsner & Wandelt (2013), Huffenberger (2017), and many other works. Under this assumption, will be an optimized estimate of the gapfilled timestream and there are no ambiguities about how to fill gaps. However, this method may not be practical because of the need to carry around additional messenger fields. Each of these is large for real data sets, the same size as the timeordered data. The additional fields will also make the system slower to converge.