Wiener filter reloaded: fast signal reconstruction without preconditioning
Abstract
We present a high performance solution to the Wiener filtering problem via a formulation that is dual to the recently developed messenger technique. This new dual messenger algorithm, like its predecessor, efficiently calculates the Wiener filter solution of large and complex data sets without preconditioning and can account for inhomogeneous noise distributions and arbitrary mask geometries. We demonstrate the capabilities of this scheme in signal reconstruction by applying it on a simulated cosmic microwave background (CMB) temperature data set. The performance of this new method is compared to that of the standard messenger algorithm and the preconditioned conjugate gradient (PCG) approach, using a series of wellknown convergence diagnostics and their processing times, for the particular problem under consideration. This variant of the messenger algorithm matches the performance of the PCG method in terms of the effectiveness of reconstruction of the input angular power spectrum and converges smoothly to the final solution. The dual messenger algorithm outperforms the standard messenger and PCG methods in terms of execution time, as it runs to completion around 2 and 34 times faster than the respective methods, for the specific problem considered.
keywords:
methods: data analysis – methods: statistical – cosmology: observations – cosmic background radiation1 Introduction
In an era of precision cosmology, the data analysis of stateoftheart galaxy redshift surveys and CMB experiments with unprecedented levels of sensitivity and resolution poses complex numerical challenges. Fast and robust methods are therefore required to render the data analysis of these large and complex data sets computationally tractable.
One of the most frequently encountered and ubiquitous problems in astrophysics and cosmology (and many other fields in science) is signal reconstruction from noisy data. Solutions to this problem have therefore been researched extensively for the last two centuries (e.g., Gauss, 1809; Jaynes, 1957; Kalman, 1960; Elsner & Wandelt, 2013). The Wiener filter (Wiener, 1949) has emerged as a standard tool for the analysis of large data sets for the inference of high dimensional signals, such as the largescale structures and CMB problems. It has therefore been employed for largescale structure analysis problems such as inferring the three dimensional density field from observations (e.g., Zaroubi et al., 1999; Zaroubi, 2002; Erdoǧdu et al., 2004, 2006; Kitaura & Enßlin, 2008; Kitaura et al., 2009; Jasche et al., 2010; Jasche & Lavaux, 2015). In the analysis of CMB data sets, the Wiener filter has been applied to a range of problems, such as the joint inference of temperature fluctuations and powerspectra (e.g., Wandelt et al., 2004; Eriksen et al., 2004; Jewell et al., 2004; O’Dwyer et al., 2004; Smith et al., 2007; Larson et al., 2007; Elsner & Wandelt, 2013; Bunn & Wandelt, 2016).
If we assume a linear model where the data is a combination of the signal and noise , i.e.,
(1) 
the Wiener filter is defined as the solution to the following equation:
(2) 
where and are the signal and noise covariance matrices, respectively, and is the Wiener filter solution. As is evident from Equation (2), the direct numerical implementation of the Wiener filter requires inversion of dense matrices. This task is rendered intractable by the size of modern data sets from stateoftheart experiments, and therefore represents a computational bottleneck. While it would be extremely convenient if there existed a common basis set, easily accessible by fast transforms, where both and are sparse, this is often not the case as for instance, the signal and noise covariances may be sparse in Fourier or pixel space, respectively. Some previous approaches relied on the assumption of a homogeneous and isotropic noise distribution to find approximate solutions to the Wiener filter equation (e.g., Hirata et al., 2004; Komatsu et al., 2005; Mangilli & Verde, 2009). Alternative approaches for the exact solution to the Wiener filter equation involve complex numerical algorithms like Krylov space methods, such as preconditioned conjugate gradient (PCG) techniques (eg., Wandelt et al., 2004; Eriksen et al., 2004; Kitaura & Enßlin, 2008, and references therein) for matrix inversion and solving high dimensional systems of linear equations. However, this procedure requires intricate and costly numerical schemes such as preconditioning the linear system with suitable matrices and subsequently demands significant investment in software development. Moreover, this approach involves a further inherent stumbling block as finding efficient preconditioners is a complicated task in itself (Oh et al., 1999) since matrices are often extremely illconditioned in typical CMB problems (Eriksen et al., 2004). The inclusion of polarisation data in the analysis further exacerbates this predicament (Larson et al., 2007).
Some previous schemes based on the PCG method involved the use of a combination of block and diagonal preconditioners on large and small angular scales, respectively (Eriksen et al., 2004), or were based on a recursive algorithm, where the conjugate gradient solution on a coarse grid is adopted as the preconditioner on a finer grid (Smith et al., 2007). Although both of these approaches yielded satisfactory performance with the analysis of WMAP data (e.g., Eriksen et al., 2008), they were found to be too computationally intensive for the high resolution and sensitivity analysis of Planck data. Seljebotn et al. (2014) recently proposed a multilevel solver for Gaussian constrained realisations of the CMB, which is fast once a suitable preconditioner is found but requires careful tuning and costly precomputations in terms of both computing power and memory requirements. This approach is therefore less attractive when we have to solve many different systems, each requiring a specific preconditioner.
Elsner & Wandelt (2013) presented an alternative approach by devising the messenger algorithm which bypasses the need for a preconditioner, as described in the following section. The evaluation of the Wiener filter from Equation (2) via both the standard and a new formulation that is dual to the messenger algorithm constitutes the crux of this work.
The paper is organised as follows. In Section 2, the underlying principles of the standard messenger method are outlined, followed by a description of the new dual messenger algorithm. We then test our new technique on an artificially generated CMB data set in Section 3, and follow up by investigating its performance in terms of convergence, computation time and stability, and draw comparisons to the standard messenger scheme and the popular PCG method in Section 4. Finally, in Section 5, we summarise the main aspects of our findings and discuss the areas of applications where the potential of our new algorithm can be fully exploited. In Appendices A and B, we illustrate the rationale behind the schemes implemented in this work for fast convergence, followed by a brief review of the PCG method in Appendix C.
2 The Messenger algorithms
2.1 The standard messenger algorithm
Elsner & Wandelt (2013) proposed a high precision, iterative algorithm for the solution to the full Wiener filter equation, while being numerically efficient and straightforward to implement. Conceptually, the key idea is to introduce a stochastic auxiliary field , the socalled “messenger" field, with covariance , where is proportional to the identity matrix. Taking advantage of the useful property of the identity matrix being invariant under any orthogonal basis transformation, the messenger field acts as an intermediate in transformations between different preferred orthogonal bases, in which signal and noise covariance matrices are expressed conveniently, i.e., are sparse. As a result, although directly applying combinations like may not be possible, we can always apply expressions like and , irrespective of the basis chosen to render and sparse. Under such a scheme, the information from the data is transmitted to the signal via the messenger field that can be transformed efficiently from one basis representation to another, thereby obviating the requirement to apply the inverse Wiener covariance matrix to data.
With the introduction of the messenger field , the modified , where the posterior probability distribution of is proportional to , is as follows:
(3) 
where we defined , and we choose the covariance matrix of the auxiliary field according to , where . Minimising with respect to and leads to the following two equations:
(4)  
(5) 
where we also introduced a scalar parameter whose purpose is to accelerate convergence. In the limit of , the above system of Equations (4) and (5) reduces to the usual Wiener filter Equation (2), as shown in Appendix A. The messenger algorithm basically involves the following steps, as outlined in Algorithm 1. We initialise the vectors and with zeros, and choose an initial high value of . We first solve Equation (4) for the messenger field in the basis defined by , the noise covariance matrix. Then, we change to a basis where the signal covariance matrix has a sparse representation, such as Fourier space. Next, we solve for using Equation (5) and the resulting from the previous step. Finally, we transform the result back to the original basis. The signal reconstruction converges to the Wiener filter solution, i.e., , as .
If is the residual at the ^{th} step, then at the following ^{th} step, the corresponding residual is
(6)  
(7) 
For the case of homogeneous noise, i.e., , the system is solved exactly in a single step. A key observation is that for all since the terms in brackets are less than unity, resulting in unconditional convergence of the signal reconstruction to the Wiener filter solution . A careful inspection of Equation (7) yields the following insight: Convergence is fast for low noise pixels and modes with low signal prior variance while the converse is also true, i.e., the system converges slowly for high prior variance and high noise pixels. In the latter regimes, a high value of would speed up convergence. Elsner & Wandelt (2013) have shown that it is possible to find a cooling scheme for , where initially, to smoothly bring the algorithm to the final solution (). Here, we reduce by a constant factor , where . The rationale behind the cooling scheme adopted in this work is laid out in Appendix A.
The standard messenger method has enjoyed considerable success, thereby establishing its credentials as a reliable method for fast Wiener filtering, without having recourse to a preconditioner. Essentially, it speeds up the computation of the Wiener filter when the noise is strongly inhomogeneous and/or the data is masked. Elsner & Wandelt (2012); Elsner & Wandelt (2013) applied the messenger algorithm on CMB data from WMAP satellite and found the final map to be accurate to about 1 part in , compared to standard conjugate gradient solvers. Even with the inclusion of polarisation data in the analysis, the messenger algorithm maintained its efficiency. Mangilli et al. (2013) implemented the messenger algorithm to produce Wienerfiltered simulations of nonGaussian CMB maps with the lensingintegrated Sachs Wolf bispectrum signal. The messenger method was also employed in Planck data analysis, specifically for inverse covariance filtering of CMB maps at high angular resolutions (Planck Collaboration et al., 2014a, b). Jasche & Lavaux (2015) implemented a Gibbs sampling adaptation of the messenger algorithm in a simple, easy to implement but efficient algorithm for Bayesian largescale structure inference, specifically aiming at the joint inference of cosmological density fields and power spectra for linear data models. Anderes et al. (2015) adopted a similar approach in their Bayesian hierarchical modelling of the CMB gravitational lensing. Alsing et al. (2016a) also implemented the Gibbsmessenger sampling adaptation developed by Jasche & Lavaux (2015) for Bayesian hierarchical modelling of cosmic shear power spectrum inference and eventually for cosmological parameter inference (Alsing et al., 2016b). The works of Jasche & Lavaux (2015), Anderes et al. (2015) and Alsing et al. (2016a) show that the messenger algorithm can be successfully adapted for high resolution conditional Gaussian sampling using Markov Chain Monte Carlo techniques, demonstrating the flexibility of the method.
2.2 The dual messenger algorithm
In the above messenger algorithm, we made use of a relation to trivially split the noise into two components: one with a trivial covariance matrix and the other as a fluctuating component over the sky. An alternative approach is to introduce the auxiliary field at the level of the signal in a complementary formalism to the standard messenger framework. Due to the two schemes being complementary to each other, the new algorithm is referred to as the dual messenger algorithm. The corresponding logposterior to be optimised then becomes
(8) 
where, analogous to the standard approach, with , and the covariance of the auxiliary field, . We derive the corresponding equations that must be satisfied by and at the minimum of :
(9)  
(10) 
The dual messenger algorithm has interesting convergence properties. The amount of reduction of the residual at each iteration is given by
(11)  
(12) 
This provides the basis for the following mechanism: We artificially truncate the spectrum to some lower initial value of that corresponds to a covariance and bring slowly to corresponding to our final covariance . So, essentially, we vary the covariance via a cooling scheme to bring , where, in the limit , the above system of Equations (9) and (10) reduces to the usual Wiener filter Equation (2), as shown in Appendix B. This results in a redefinition of using the Heaviside function as , as described quantitatively in Appendix B (cf. Equations (32) and (33)). For , the ratio given by Equation (12) is always convergent. The algorithm consists of the following steps. We initialise the vectors and with zeros. We begin iterations with an initial value of and correspondingly , and iterate until at , i.e., , in accordance with a chosen convergence criterion for each value of . In analogy with the messenger technique, we need to adopt a cooling scheme for . The idea is to reduce by a constant factor , where , which matches the convergence speed of one iteration with the modification in . The rationale behind the cooling scheme tailored for the dual messenger algorithm is illustrated in Appendix B.
However, numerically, the above algorithm does not result in the correct final solution due to the continuous mode of the signal, i.e., the zero eigenvalue in the signal covariance . We therefore require to obtain the proper solution at the end, which cannot be accommodated by the dual messenger scheme above. To remedy this numerical predicament, we introduce an extra degree of freedom, , in the system, where , thereby incorporating aspects of the standard messenger method into the dual messenger scheme. This leads to the following :
(13) 
where , with , , and inheriting their previous definitions. The corresponding set of equations to be solved iteratively is then:
(14)  
(15) 
If , we recover the usual dual messenger scheme, while setting yields the standard messenger algorithm. The definition of implies that the cooling scheme described above still applies to this hybrid method. As outlined in Algorithm 2, we proceed in similar steps as described above for the previous scheme, except that here we reduce the norm of by the factor () and iterate until , at which point and we obtain the proper solution as desired.
3 Application to Cosmic Microwave background
We generated artificial CMB data, in a realistic scenario, by drawing Gaussian random fields on a 2d flat sky with grid resolution, , and angular extent, degrees. This true simulated map is contaminated by Gaussian white noise with covariance matrix, , via a linear model as given in Equation (1). We made use of CAMB^{1}^{1}1http://camb.info (Lewis et al., 2000) to generate the input angular power spectrum from which the CMB signals are drawn. We assume a standard CDM cosmology with the set of cosmological parameters (, , , , , ) from Planck (Planck Collaboration et al., 2016). A portion of the flat sky is masked by imposing a central square patch of angular extent degrees with a noise covariance of . The three different methods, messenger, dual messenger and PCG, are applied to this setup to investigate their effectiveness and efficiency of signal reconstruction. We implement the same “weak” Cauchy convergence criterion, , where , in all the algorithms, to ensure unbiased results. As a reference, we make use of the PCG method with the more stringent to provide results against which the other methods can be compared. Currently regarded as the standard Wiener filtering technique by the scientific community, the PCG method is the natural choice for providing a reference solution in the absence of an exact analytic solution. By imposing a more stringent convergence criterion, we ensure that the PCG solution is close to the ground truth.
The true CMB simulated signal and the reconstructed map obtained using the dual messenger algorithm are displayed in Figure 1. The characteristic feature of Wiener filtering, i.e., the extrapolation of signal into the mask, can be distinctly observed from the reconstructed map. At an initial cursory glance, the reconstructed maps from all three methods appear rather similar, and hence, only the dual messenger reconstruction is displayed in Figure 1. This observation holds for the low noise regions but under scrutiny, the reconstruction in masked regions shows some slight differences. The residual maps depicted in Figure 3, generated by computing the difference, over the full sky, between our reference map and the corresponding reconstructed map obtained using each method, help to discern these differences. The messenger technique provides the most accurate reconstruction in the masked region as illustrated by its residual map, with less than residuals. The dual messenger and PCG reconstructions yield around and residuals, respectively, with the residuals obtained via the former algorithm being concentrated on the edges of the mask.
The effectiveness of reconstruction of the dual messenger technique on both small and large scales is also manifest from Figure 2 which shows the reconstructed power spectra from the different algorithms. The corresponding power spectra recovered via the messenger, dual messenger and PCG methods are all in good agreement, on all scales, with the reference one, computed using the PCG scheme with the more stringent convergence criterion of . The characteristic feature of Wiener filtering where the recovered power on the small scales in the low signal to noise regime is suppressed is also observed from Figure 2. Standard Wiener filtering algorithms usually encounter difficulties in dealing with masked regions having infinite noise. By masking of the simulated map with noise of the order higher than the unmasked region, we are investigating the worst case performance aspect of the dual messenger algorithm. Hence, we do not expect the algorithm to encounter any difficulties in Wiener filtering maps with high levels of inhomogeneous noise.
The dual messenger algorithm is therefore relevant for current and future high resolution CMB experiments such as South Pole Telescope, Advanced ACTPol, Simons Observatory and CMBS4. The application of the algorithm can be extended to problems involving the polarisation of the CMB. The formalism remains unchanged for spin field reconstruction, although the numerical implementation is less trivial due to the correlation between the temperature and polarisation components of the signal covariance. The Wiener filtering of polarised CMB data with more complex noise models will be subjected to future investigation to further showcase the efficiency of the dual messenger algorithm in treating complex CMB problems.
4 Numerical Convergence and Stability
We now investigate the convergence properties of the dual messenger algorithm and also provide a more in depth study of the performance of the standard messenger scheme. We quote the usual statistics for convergence and the change in of the posterior probability density between successive iterations for each method, so that unbiased comparisons of their efficiency and effectiveness can be drawn.
Figure 4 shows the residual error at each iteration for the different methods, as a function of the number of iterations. For the messenger algorithm, we relax the convergence criterion for high values of and reduce to as in a threestep procedure, as depicted in Figure 4 in dashed red lines. Here, we adopt a cooling scheme where is reduced by a constant factor of , i.e, , until . This consequently ensures that the decreases rapidly, thereby bringing us closer to the final solution with a smaller number of iterations, resulting in faster convergence.
For the dual messenger algorithm, we implemented a cooling scheme for with , which results in fast convergence while providing accurate results. So, we pick an initial value of corresponding to an initial value of , and therefore , and iterate until convergence, as dictated by a given Cauchy criterion. We repeat this iterative procedure, in accordance with the aforementioned cooling scheme, until , at which point we have the Wiener filter solution. Again, we impose less stringent convergence criterion at higher values of , as shown in Figure 4 in dashed blue lines.
Figure 4 essentially illustrates the convergence of the different methods. The dual messenger algorithm requires slightly fewer iterations to converge to the final solution than the PCG approach, while the messenger technique requires nearly twice as many iterations as its dual counterpart for convergence. In terms of wallclock times, the dual messenger has the definite upper hand, as it runs to completion in around 258 seconds whereas the messenger and PCG methods have corresponding wallclock times of roughly 435 seconds and 663 seconds, i.e., reconstruction via the dual messenger algorithm is nearly three and two times faster than the PCG and messenger methods, respectively. All computations were run on a single core of an Intel Core i54690 CPU (3.50 GHz). Both messenger algorithms possess the same algorithmic complexity and memory requirements. They require two Fourier transforms, , and two scalar multiplications corresponding to algebraic operations of , per iteration. In terms of memory requirements, two vectors of size must be temporarily stored in memory. In comparison, the PCG method requires three Fourier transforms and ten scalar multiplications per iteration, and temporary storage of eight vectors of dimension in memory.
We also performed a consistency check by verifying the variation of the residual error given by . This is usually adopted as a convergence criterion for PCG computations. The variation of this residual error, analogous to Figure 4, is illustrated in Figure 5. The corresponding residual errors obtained via the different algorithms all drop below their respective Cauchy convergence thresholds implemented. The oscillatory behaviour of the PCG solution, displayed in Figures 4 and 5, is mainly due to the reinitialisation step after every two hundred iterations in the algorithm, as described in Appendix C. However, there are also some oscillations in the residual errors due to the PCG method being susceptible to instabilities sourced by numerical noise. The oscillations in the residual errors in Figure 4 of the two messenger solutions are however due to their respective cooling schemes, resulting in transitions in the systems of equations with the varying covariances of the auxiliary fields (cf. Equations (20) and (37)), with the peaks produced coinciding with these transitions. It is important to note that the residual errors always drop sharply after the peaks, thereby demonstrating the unconditional stability of the messenger algorithms. From numerical experiments, the messenger techniques have proven to be far more stable than the PCG method for nearly degenerate systems.
Due to the Wiener filter being the maximum a posteriori solution, the of the intermediate solution can be regarded as a useful convergence diagnostic. The variation as a function of number of iterations is displayed in Figure 6, with the left and right panels correspondingly showing the convergence of the messenger and dual messenger algorithms. The cooling scheme for implemented in the standard messenger technique causes the to drop rapidly with each change in . Intuitively, this decrease of via a series of such steps seems reasonable since the decreases sharply with each change in , and then reaches a plateau for a given until the latter decreases further, as evidenced in Figure 2 of Elsner & Wandelt (2012). The of the dual messenger algorithm has a similar behaviour. The in both cases finally attains the of the reference PCG method with , with and , where .
Another important convergence diagnostic is the variation of the relative error, , computed over the full sky, as a function of scale, , depicted in Figures 7, 8 and 9 for the messenger, dual messenger and PCG algorithms, respectively. The relative errors on the small scales are of the order , and , and conversely, on the large scales, , and , correspondingly, for the messenger, dual messenger and PCG methods. The messenger technique displays smooth and nearly uniform convergence on small and intermediate scales, with the relative error dropping below , while remaining below for the largest scales. For the dual messenger scheme, the corresponding convergence rate highlights the hierarchical fashion in which the solution is computed, while yielding similar final relative error across all scales as the messenger algorithm. The PCG method has the lowest relative error on the smallest scales, although this may be biased by the fact that the reference method is also a PCG, but remains inferior to both messenger methods on the largest scales. This is consistent with the significant residuals resulting from the PCG reconstruction, as observed in the previous section (cf. Figure 3). We stress that all relative errors above are computed with respect to the reference PCG method with .
We also carry out a series of additional runs with various values of the convergence criterion to investigate the wallclock times required for convergence in each case, thereby providing a more indepth picture of the performance of the different algorithms. As illustrated in Figure 10, the dual messenger technique converges faster than the other methods, except for the extreme values of . It is also interesting to note that we can further reduce its execution time by lowering the factor for the cooling scheme without degrading the accuracy of results significantly. For instance, choosing reduces the number of iterations required, and therefore computation time, by around for the case .
5 Conclusions
We presented a new formulation that is dual to the recently developed messenger algorithm, where, unlike in the standard approach, the auxiliary field is introduced at the level of the signal and is consequently associated with the signal reconstruction instead of the noise. This new iterative solver provides another pathway to solve the illposed problem of Wiener filtering, frequently encountered in several applications in cosmology and astrophysics.
We tested our new method on a simulated CMB data set and its performance was evaluated in terms of effectiveness of reconstruction, convergence properties, processing time and stability. The dual messenger scheme is shown to match the accuracy of reconstruction of the standard messenger and PCG methods on all scales. Regarding the convergence of the algorithm, it is shown to perform smoothly over all scales, with the relative error being sufficiently low even on the largest scales. For the specific problem under consideration, the dual messenger algorithm has a definite edge over the standard messenger scheme and the PCG approach in terms of computation time. The efficiency and effectiveness of this new technique in calculating the Wiener filter solution of general data sets has therefore been demonstrated. We also provided further insight into the mechanism of the standard messenger method so that it can be optimised for data analysis by the scientific community.
The dual messenger algorithm, like its predecessor, does not require an ingenious choice of preconditioner and is straightforward to implement, robust and flexible. It is also capable of taking into account inhomogeneous noise distributions and arbitrary mask geometries. A key aspect of this technique is that it computes the Wiener filter solution in a hierarchical manner due to the thresholding of the signal covariance matrix inherent in the algorithm. Given the success of the standard messenger method, this new dual messenger technique is highly promising and may therefore be adapted for high precision largescale data analysis methods in cosmology and astrophysics.
Other performance improvements to this algorithm can be obtained by adapting the working resolution such that the Nyquist frequency is always slightly higher than the current considered in the dual messenger method. This would consequently reduce the number of operations required for Fourier transform. We are also currently working on a generalisation of both the messenger and dual messenger algorithms in a combined approach that may further optimise the performance and efficiency compared to traditional techniques of solving the Wiener filter problem.
Acknowledgements
The authors thank the anonymous reviewer for his constructive comments which helped to improve the manuscript. The authors also thank Franz Elsner for his comments on a draft version of the paper. The authors acknowledge financial support from the ILP LABEX (under reference ANR10LABX63) which is financed by French state funds managed by the ANR within the Investissements d’Avenir programme under reference ANR11IDEX000402. GL also acknowledges financial support from “Programme National de Cosmologie et Galaxies” (PNCG) of CNRS/INSU, France, and INPHYNITI programme of Interdisciplinary division of CNRS.
References
 Alsing et al. (2016a) Alsing J., Heavens A., Jaffe A. H., Kiessling A., Wandelt B., Hoffmann T., 2016a, MNRAS, 455, 4452, arXiv:1505.07840
 Alsing et al. (2016b) Alsing J., Heavens A. F., Jaffe A. H., 2016b, ArXiv eprints, arXiv:1607.00008
 Anderes et al. (2015) Anderes E., Wandelt B. D., Lavaux G., 2015, ApJ, 808, 152, arXiv:1412.4079
 Bunn & Wandelt (2016) Bunn E. F., Wandelt B., 2016, ArXiv eprints, arXiv:1610.03345
 Elsner & Wandelt (2012) Elsner F., Wandelt B. D., 2012, Proc. Big Bang, Big Data, Big Computers, arXiv:1211.0585
 Elsner & Wandelt (2013) Elsner F., Wandelt B. D., 2013, A&A, 549, A111, arXiv:1210.4931
 Erdoǧdu et al. (2006) Erdoǧdu P., Lahav O., Huchra J. P., Colless M., et al. 2006, MNRAS, 373, 45, arXiv:astroph/0610005
 Erdoǧdu et al. (2004) Erdoǧdu P., Lahav O., Zaroubi S., Efstathiou G., et al. 2004, MNRAS, 352, 939, arXiv:astroph/0312546
 Eriksen et al. (2008) Eriksen H. K., Dickinson C., Jewell J. B., Banday A. J., Górski K. M., Lawrence C. R., 2008, ApJ, 672, L87, arXiv:0709.1037
 Eriksen et al. (2004) Eriksen H. K., O’Dwyer I. J., Jewell J. B., Wandelt B. D., et al. 2004, ApJS, 155, 227, arXiv:astroph/0407028
 Gauss (1809) Gauss C. F., 1809, Theoria motus corporum coelestium in sectionibus conicis solem ambientium auctore Carolo Friderico Gauss. sumtibus Frid. Perthes et IH Besser
 Golub & Van Loan (1996) Golub G. H., Van Loan C. F., 1996, Johns Hopkins University, Press, Baltimore, MD, USA, pp 374–426
 Hirata et al. (2004) Hirata C. M., Padmanabhan N., Seljak U., Schlegel D., Brinkmann J., 2004, Phys. Rev. D, 70, 103501, arXiv:astroph/0406004
 Jasche et al. (2010) Jasche J., Kitaura F. S., Wandelt B. D., Enßlin T. A., 2010, MNRAS, 406, 60, arXiv:0911.2493
 Jasche & Lavaux (2015) Jasche J., Lavaux G., 2015, MNRAS, 447, 1204, arXiv:1402.1763
 Jaynes (1957) Jaynes E. T., 1957, Physical review, 106, 620
 Jewell et al. (2004) Jewell J., Levin S., Anderson C. H., 2004, ApJ, 609, 1, arXiv:astroph/0209560
 Kalman (1960) Kalman R. E., 1960, Journal of basic Engineering, 82, 35
 Kitaura & Enßlin (2008) Kitaura F. S., Enßlin T. A., 2008, MNRAS, 389, 497, arXiv:0705.0429
 Kitaura et al. (2009) Kitaura F. S., Jasche J., Li C., Enßlin T. A., Metcalf R. B., Wandelt B. D., Lemson G., White S. D. M., 2009, MNRAS, 400, 183, arXiv:0906.3978
 Komatsu et al. (2005) Komatsu E., Spergel D. N., Wandelt B. D., 2005, ApJ, 634, 14, arXiv:astroph/0305189
 Larson et al. (2007) Larson D. L., Eriksen H. K., Wandelt B. D., Górski K. M., Huey G., Jewell J. B., O’Dwyer I. J., 2007, ApJ, 656, 653, arXiv:astroph/0608007
 Lewis et al. (2000) Lewis A., Challinor A., Lasenby A., 2000, Astrophys. J., 538, 473, arXiv:astroph/9911177
 Mangilli & Verde (2009) Mangilli A., Verde L., 2009, Phys. Rev. D, 80, 123007, arXiv:0906.2317
 Mangilli et al. (2013) Mangilli A., Wandelt B., Elsner F., Liguori M., 2013, A&A, 555, A82, arXiv:1303.1722
 O’Dwyer et al. (2004) O’Dwyer I. J., Eriksen H. K., Wandelt B. D., Jewell J. B., et al. 2004, ApJ, 617, L99, arXiv:astroph/0407027
 Oh et al. (1999) Oh S. P., Spergel D. N., Hinshaw G., 1999, ApJ, 510, 551, arXiv:astroph/9805339
 Planck Collaboration et al. (2014a) Planck Collaboration Ade P. A. R., Aghanim N., ArmitageCaplan C., et al. 2014a, A&A, 571, A23, arXiv:1303.5083
 Planck Collaboration et al. (2014b) Planck Collaboration Ade P. A. R., Aghanim N., ArmitageCaplan C., et al. 2014b, A&A, 571, A24, arXiv:1303.5084
 Planck Collaboration et al. (2016) Planck Collaboration Ade P. A. R., Aghanim N., Arnaud M., et al. 2016, A&A, 594, A13, arXiv:1502.01589
 Seljebotn et al. (2014) Seljebotn D. S., Mardal K.A., Jewell J. B., Eriksen H. K., Bull P., 2014, ApJS, 210, 24, arXiv:1308.5299
 Shewchuk (1994) Shewchuk J. R., 1994, An introduction to the conjugate gradient method without the agonizing pain. CarnegieMellon University. Department of Computer Science
 Smith et al. (2007) Smith K. M., Zahn O., Doré O., 2007, Phys. Rev. D, 76, 043510, arXiv:0705.3980
 Wandelt et al. (2004) Wandelt B. D., Larson D. L., Lakshminarayanan A., 2004, Phys. Rev. D, 70, 083511, arXiv:astroph/0310080
 Wiener (1949) Wiener N., 1949, Extrapolation, interpolation, and smoothing of stationary time series. Vol. 2, MIT press Cambridge
 Zaroubi (2002) Zaroubi S., 2002, MNRAS, 331, 901, arXiv:astroph/0010561
 Zaroubi et al. (1999) Zaroubi S., Hoffman Y., Dekel A., 1999, ApJ, 520, 413, arXiv:astroph/9810279
Appendix A Cooling Scheme for Messenger Algorithm
In this section, we illustrate the rationale behind the cooling scheme for the scalar parameter adopted for the standard messenger technique. In the computations below, we assume invertibility of matrices everywhere. Using , where , the for the messenger scheme from Equation (3) can be written as
(16) 
Hence, in the messenger approach, we have and plugging this into the Wiener filter given by Equation (2) yields
(17) 
where we assume that all matrices are invertible. As a consistency check, for , Equation (17) reduces to Equation (2), the usual Wiener filter equation. The difference between the solutions at two consecutive values of is given by
(18) 
Now, we consider limiting cases of Equation (18). We assume a homogeneous noise distribution, i.e., . From Equation (17), we have
(19) 
Using Equations (18) and (19), the relative error can then be computed as follows:
(20) 
where, for an arbitrary matrix , is the subset of over subspace . From Equation (6), we have the fractional error reduction, after one iteration, given by
(21) 
where the term in parentheses will always favour rapid convergence. The other term is strongly convergent on small scales, but to give bounds to the maximum convergence speed of “slow modes”, we focus on the latter, leading to
(22) 
To favour convergence, from Equations (20) and (22), we require the two conditions:
(23) 
so that the error reduction in one iteration matches the change in solution arising from two consecutive values of . This results in the following two constraints: and . The former is automatically satisfied, while the latter leads to the following bounds: . Therefore, for the cooling scheme, we can write , where , to improve convergence on all scales. This is the motivation behind our choice of in this work.
Appendix B Cooling Scheme for Dual messenger Algorithm
As in the previous section, we wish to analytically determine the cooling scheme that would improve convergence for the dual messenger algorithm. We first derive the Wiener filter solution for the dual messenger, i.e., the analogue of Equation (17) for this scheme. We begin by writing down Equations (9) and (10), showing the dependence on the power spectrum truncation :
(24)  
(25) 
Solving for via the following steps:
(26) 
so that finally, we have
(27) 
In the limit , where , , reducing Equation (27) to the usual Wiener filter Equation (2), implying consistency. When we change the truncation of the spectrum, , for , we obtain a new solution . The counterpart of Equation (20) is then
(28) 
Here, we consider Fourier space, so that and . We can write the truncated signal covariance matrix as
(29) 
where is the portion of the power spectrum bounded by and , while the corresponding truncated signal covariances can be represented by Heaviside functions as follows:
(30)  
(31) 
where, for a matrix , after applying a basis transformation, with being diagonal,
(32) 
and
(33) 
Using Equation (29) in Equation (28) results in
(34) 
Using Equation (29) and substituting the following form of Equation (27),
(35) 
in Equation (34) leads to
(36) 
such that finally we obtain the following equation for the relative error:
(37) 
Now, for , there are three distinct regimes of relevance and we consider each case below. To investigate the convergence behaviour in the different regimes, we make use of the following equation, obtained by plugging Equations (30) and (31) in Equation (29),
(38) 
The three regimes are:

,
where , and , so that , and this causes the second term in Equation (37) to vanish, implying that the relative error is not affected by our choice of . This shows that the Wiener filter solution is naturally computed in a hierarchical fashion using the dual messenger algorithm. 
,
where , and , so that , leading to the following relative error:(39) The first term behaves as a constant, i.e., , so that the relative error can be approximated as
(40) since . Also, , and to favour convergence, we want this change to be as large as possible but sufficiently small such that iterating the solution results in rapid decay of the error, i.e., the change in the solution due to changing should be matched to the change when iterating the solution. If , choosing would therefore improve convergence by avoiding the early freeze of modes in the iteration scheme. This served as the basis for our choice of in this work.

,
where , so that , and again using the approximation yields(41)
So, the overall convergence behaviour can be quantitatively described by:
(42) 
and this leads to the following interpretation: For higher values of on large scales, the third term in Equation (42) dominates since is large. But for the final truncations, , so this regime is saturated, while on small scales, the second term dominates as , so the relative error continues to drop till the final truncation. The cooling scheme described above applies naturally to the hybrid version of the dual messenger method.
Appendix C Brief Review of Preconditioned Conjugate Gradient Method
In general, the PCG approach consists of solving the following set of linear equations:
(43) 
where is usually a very large matrix. We wish to avoid computing the inverse of this dense matrix and to this end, in the traditional PCG scheme, we make use of a sparse matrix known as the preconditioner , which is the approximate inverse of , i.e., , as follows:
(44) 
The Wiener filter Equation (2) can be rewritten as
(45) 
such that , and , in accordance with Equation (43). For the case considered in this work, we make use of a suitable preconditioner, which is diagonal in Fourier space, as follows:
(46) 
where are the eigenvalues of the inverse noise covariance matrix, . We then implement this preconditioner in a PCG algorithm (e.g., Golub & Van Loan, 1996). We make an initial guess and interate until , resulting in an that corresponds to the Wiener filter solution . If convergence is not yet achieved, we reinitialise all parameters every 200 iterations and resume iterations, in order to facilitate convergence. An in depth review of the PCG algorithm is provided in Shewchuk (1994). Again, we stress the fact that finding a suitable preconditioner is the key factor when making use of the PCG method (e.g., Oh et al., 1999) and this consequently is the major stumbling block for stateoftheart CMB data analysis.