Highprecision simulations of the weak lensing effect on cosmic microwave background polarization
Key Words.:
cosmic background radiation  largescale structure of Universe  gravitational lensing: weak  methods: numericalWe studied the accuracy, robustness and selfconsistency of pixeldomain simulations of the gravitational lensing effect on the primordial cosmic microwave background (CMB) anisotropies due to the largescale structure of the Universe. In particular, we investigated the dependence of the precision of the results precision on some crucial parameters of these techniques and propose a semianalytic framework to determine their values so that the required precision is a priori assured and the numerical workload simultaneously optimized. Our focus was on the mode signal but we also discuss other CMB observables, such as the total intensity, , and mode polarization, emphasizing differences and similarities between all these cases. Our semianalytic considerations are backed up by extensive numerical results. Those are obtained using a code, nicknamed lenSHAT – for lensing using scalable spherical harmonic transforms (SHAT) – which we have developed in the course of this work. The code implements a version of the previously described pixeldomain approach and permits performing the simulations at very high resolutions and data volumes, thanks to its efficient parallelization provided by the SHAT library – a parallel library for calculating of the spherical harmonic transforms. The code is made publicly available.
1 Introduction
The cosmic microwave background (CMB) anisotropies in both temperature and polarization are one of the most studied signals in cosmology and one of the major available sources of constraints of the earlyUniverse physics.
After having decoupled from matter and set free at the time of recombination, CMB photons propagated nearly unperturbed throughout the Universe. The largescale structures (LSS) emerging in the Universe in the postrecombination period have however left their imprint on them which are referred to as secondary anisotropies. In particular, the gravitational pull of the growing matter inhomogeneities has deviated the paths of primordial CMB photons, modifying somewhat the pattern of the CMB anisotropies observed today. This weak lensing effect on the CMB (see Lewis & Challinor (2006) for an extensive review) therefore offers a unique probe of the matter distribution at intermediate redshift where the forming LSS were still in the nearlylinear regime. Because this depends on the cumulative matter distribution in the Universe, it is expected to be particularly efficient in constraining the properties of all the parameters affecting the growth of LSS, such as neutrino masses and dark energy physics (de Putter et al. 2009; Das & Linder 2012; Hall & Challinor 2012).
The first observational evidence of the CMB lensing signal had been indirect and obtained through crosscorrelation of the CMB maps with highredshift mass tracers (Smith et al. 2007; Hirata et al. 2008). More recently, more direct measurements have become available, thanks to the latest generation of highprecisionandresolution groundbased CMB temperature experiments, which have collected highquality data and made possible a direct reconstruction of the power spectra of this deviation using CMB alone (Das et al. 2011; van Engelen et al. 2012). Even more recently, this has been further elaborated on by the Planck results based on the first 15 months of the total intensity data collected by the mission (Planck Collaboration 2013).
The forthcoming next generation of lownoise CMB polarization experiments such as EBEX (Oxley et al. 2004), POLARBEAR (Kermish et al. 2012), SPTpol (McMahon et al. 2009), and ACTpol (Niemack et al. 2010) and their future upgrades (e.g, POLARBEARII, Tomaru et al. 2012) will be able to target a CMB observable most affected by weak lensing – the Bmode polarization. Indeed, primordial CMB gradientlike polarization (modes) is converted into curllike polarization (modes) by gravitational lensing
(Zaldarriaga & Seljak 1998) and is expected to completely dominate the primordial signal at least at small angular scales. The lensinggenerated modes are interesting because of their sensitivity to the largescale structure distribution, but also because they are the main contaminant of any primordial modes signal, which is expected in many models of the very early Universe, and which is one of the major goals of the current and future CMB observations. Since sensitivities of the CMB polarization arrays are rapidly improving, the experiments aiming at setting constraints on values of the tensortoscalar ratio parameter are expected to be ultimately limited by the lensing signal (e.g., Errard & Stompor 2012). This acts as an extra noise source with a white spectrum shape on large scales and an amplitude of approximately arcmin, which could in principle be separated from the primordial signal with the help of an accurate delensing procedure (Kesden et al. 2002; Seljak & Hirata 2004; Smith et al. 2012).
The high quality of forthcoming datasets requires the development, testing and validation through simulations of dataanalysis tools capable of fully exploiting the amount of information present there. An important part of this effort involves simulating very accurate, highresolution maps of the CMB total intensity and polarization, covering a large fraction of the sky and with lensing effects included. The relevant approaches have been studied in the past (e.g. Lewis 2005; Basak et al. 2009; Lavaux & Wandelt 2010) and resulted in devising and demonstrating an overall framework for such simulations, as well as in two publicly available numerical codes (Lewis 2005; Basak et al. 2009). Because the computations involved in such a procedure are inherently very timeconsuming, the proposed implementations of those ideas unavoidably involve tradeoffs between calculation precision and their feasibility, giving rise to a number of problems, practical and more fundamental, which need to be carefully resolved to ensure that these techniques produce highquality, reliable results. The main objective of this paper is to provide comprehensive answers to some of these problems, with special emphasis on those arising in the context of highprecision andreliability simulations of the Bmode component of the CMB polarization signal.
2 Simulating weak lensing of the CMB
2.1 Algebraic background
The CMB radiation is completely described by its brightness temperature and polarization fields on the sky, and . Since both fields are (nearly) Gaussian, they are characterized by their power spectra after their harmonic expansion in a proper basis. Temperature is a scalar field and can be conveniently expanded in terms of scalar spherical harmonics,
(1) 
while polarization is described by the Stokes parameters Q and U, which are coordinatedependent objects, that behave like a spin2 field on the sphere under rotations (Zaldarriaga & Seljak 1997; Kamionkowski et al. 1997). The polarization field must therefore be expanded in terms of spin2 spherical harmonics, ,
where and are the gradient and curl harmonic components of a spin2 field, whose general definitions for and arbitrary spins field are
(3)  
Weak gravitational lensing shifts the light rays coming from an original direction on the last scattering surface to the observed direction , inducing a mapping between the two directions through the socalled displacement field , i.e., for a CMB observable
(4) 
Hereafter, we use a tilde to denote a lensed quantity, we also use a tilde over a multipole number of a lensed quantity, i.e., , to distinguish it from a multipole number of its unlensed counterpart.
The displacement field is a vector field on the sphere and can be decomposed into a gradientfree and a curlfree component. In most cases we can neglect the gradientfree component and consider the displacement field as the gradient of the socalled lensing potential , the projection of the 3D gravitational potential on the 2D unit sphere. This quantity can be computed with Boltzmann codes (e.g. CAMB^{1}^{1}1http://camb.info or CLASS^{2}^{2}2http://lesgourg.web.cern.ch/lesgourg/class.php), from galaxy surveys or Nbody simulations (Carbone et al. 2008; Das & Bode 2008),
(5) 
Here is the comoving distance to the last scattering surface, is the comoving distance, is the comoving angular diameter distance. The lensing potential is expected to be correlated on a large scale with temperature anisotropies and modes of polarization through the integrated SachsWolfe effect; this correlation mainly affects the large angular scales and is of the order of at and will thus be neglected in the following analysis.
Since the lensing potential is a scalar function and can be expanded into canonical spherical harmonics, its gradient (a spin1 curlfree field) can be easily computed in the harmonic domain with a spin1 spherical harmonic transform (SHT):
(6) 
2.2 Pixeldomain simulations
2.2.1 Basics
Because typical deviations of CMB photons are on the order of few arcminutes (although coherent over the degree scale), we can work in the Born approximation, i.e., considering this deviation as constant between and , and evaluate the displaced field along the unperturbed direction.
In practice this means that to compute the lensed CMB at a given point it is sufficient to compute the unlensed CMB at another position on the sky. This observation provides
the basis for the pixelbased approaches to simulating lensing effects of the CMB maps. For every direction on the sky corresponding to a pixel center these methods
first identify the displaced direction and then compute the corresponding sky signal value, which is used to replace the original value at the pixel center.
The implementations of this approach typically involve the following main steps (Lewis 2005; Basak et al. 2009; Lavaux & Wandelt 2010):

Generating a random realization of the harmonic coefficients of the unlensed CMB map and its synthesis.

Generating a random realization of the harmonic coefficients of the lensing potential and then of the spin1 displacement field in the harmonic domain. Synthesizing the displacement field.

Sampling the displacement field at pixel centers and, for each of them, computing the coordinates of a displaced direction on the sky using the spherical triangle identities on the sphere.
Defining as the angle between the displacement vector and the versor, such that , the value of a lensed field, i.e., , and , in a direction is given by the unlensed field at where,
(7) (8) 
Computing temperature and polarization fields at displaced positions.

Reassigning the temperature and polarization from the displaced to new positions to create the simulated lensed map sampled on the original grid. For the polarization, we need also to multiply the lensed field by an extra factor taking into account the different orientation of the basis vector at the two points. Calling the difference between the angles between and the geodesic connecting the two points, and defining
(9) (10) the lensed polarization field becomes
(11) 
Smoothing and, potentially, repixelizing the lensed map to match a particular experimental resolution, if needed.
2.2.2 Challenges and goals
There are two main, closely intertwined challenges involved in implementing the approach detailed in the previous section. The first one is related to the bandwidths of fields used in, or produced as a result of, the calculation, and in particular to the need of imposing those on the fields, which are either naturally not bandlimited or are bandlimited but have too high bandwidths to make them acceptable from the computational efficiency point of view. The other challenge arises from step 4 of the algorithm: the displaced directions do not correspond in general to pixel centers of any isolatitudinal grid on the sphere, and thus the lensed values of the CMB signal cannot be computed with the aid of a fast SHT algorithm and a more elaborated, and computationally costly approach is needed.
We emphasize that both these problems should be looked at from the perspective of the efficiency of the numerical calculations as well as accuracy of the produced results. We discuss them in some detail below.
Signal bandwidths.
Because the lensing procedure needs to be applied prior to any instrumental response function convolution, the relevant sky signals on all but the last steps above require using a resolution sufficient to support the signal all the way to its intrinsic bandwidth, where is either for the total intensity, or for the polarization, or – for the gravitational potential. However, because mathematically the lensing effects can be seen as a convolution in the harmonic domain (Hu 2000; Okamoto & Hu 2003; Hu & Okamoto 2002) of the CMB signal – either the total intensity, , or the polarization, , – and of the potential, , the bandwidth of the resulting lensed field will be broader than that of any unlensed fields and is given roughly by . Consequently, the lensed map produced in step 5 should have its resolution appropriately increased to eliminate potential power aliasing effects. The resolution of the unlensed maps produced in steps 15 should then coincide with that of the lensed signal but with the number of harmonic modes set by and respectively .
One of the problems arising in this context is related to the fact that the unlensed sky signals, , and , considered here are not truly bandlimited even if their power at the small scales decays quite abruptly as a result of Silk damping. Picking an appropriate value for the bandwidth is therefore a matter of a compromise between the precision of the final products and the calculation cost, with both these quantities being quite sensitive to the chosen value, and which will depend in general on a specific application. We emphasize that the presence of the high power decay plays a dual role in our considerations here. On the one hand, it ensures that the lensing effect at sufficiently large scale can be computed with an arbitrary precision by simply choosing the bandwidth values sufficiently high. On the other hand it does introduce an extra complexity in defining a set of sufficient conditions, which ensure required precision, because these will be typically different in the regime of the high signal power and that of the damping tail. In either case, though, it is clear that whatever the selected bandwidths, the amplitudes of the harmonic modes of the lensed signal close to the highest value of supported by the employed pixelization, i.e., , will generally be unavoidably misestimated, and satisfactory precision can only be achieved for harmonic modes lower than some . From the practitioner’s perspective the main problem is therefore, given some precision criterion, , which we wish to be fulfilled by the harmonic modes of the lensed signal up to some value of , how to determine the required bandwidths of the unlensed signals, where and can be the same, e.g., in the case of the or signal lensing, or different, e.g., for the potential field or modes.
One effect of these considerations is that if these are maps of the lensed signals, which are of interest as the final product of the calculation, then the biased high modes should either be filtered out or suppressed before the map is synthesized from its harmonic coefficients. To ensure that this does not adversely affect the resolution of the final map, the bias should affect only angular scales much smaller than the expected final resolution of the map as produced in step 6 of the algorithm. If the latter is defined by the experimental beam resolution, one therefore needs to ensure that no bias is present for , where is an experimental beam width.
Interpolations.
Interpolation is the most popular workaround of the need to directly calculate values of the unlensed fields for every displaced directions, which typically will not
correspond to grid points of any isolatitudinal pixelization.
Three interpolation schemes have been considered to date in the context of the polarized signals. Lewis (2005) proposed
a generic modified bicubic interpolation and demonstrated that it seems to work satisfactorily in a number of cases. This approach together
with the direct summation are both implemented in the publicly available code LensPix^{3}^{3}3http://cosmologist.info/lenspix.
Two other methods have been proposed more recently. Basak et al. (2009) implemented the general interpolation scheme, which
recasts a bandlimited function on the sphere as a bandlimited function on the 2D torus where a nonequispaced fast Fourier (NFFT) transform algorithm is used to compute the field at the displaced positions. This method would be arbitrarily precise if the sky signals were strictly bandlimited. However, the choice of NFFT can become a bottleneck for this algorithm since its numerical workload scales with the number of pixels squared, and its memory requirements are huge. As it is, the NFFT software can be run only on sharedmemory architectures, making it more difficult to resolve both these problems.
Consequently, the issue of the bandwidth values is becoming of crucial importance for the performance and applicability of the method, and its relevance
in particular in the context of simulations of upcoming and future highresolution experiments needs to be investigated in more detail.
Lavaux & Wandelt (2010) proposed a fast pixelbased method using the spectral characteristics of the field to be lensed to compute the weighting coefficient for the interpolation of this field, without using any spherical harmonic algorithm. Its accuracy is set by the number of neighboring pixels used to interpolate the field at a given point.
In addition, Hirata et al. (2004) used in their work a polynomial interpolation scheme of arbitrary order and precision, which has been shown to successfully produce temperature maps (Hirata et al. 2004; Das & Bode 2008)
but has not been tested for the polarized case.
Any interpolation in this context is not without its dangers because interpolations tend to smooth the underlying signals. For a genuinely bandlimited function this
could in principle be avoided as in, e.g., Basak et al. (2009). However, for the
actual CMB signals the bandwidth is only approximate and is a function of the required precision and specific application; the sampling density and interpolation scheme
therefore need to be chosen very carefully to render reliable results. Again, the choice of appropriate bandwidth values is therefore central for a successful resolution of this problem.
Numerical workload
Numerical cost of the direct calculation per direction is given by and corresponds to the cost of calculating an entire set of all and modes of associated, scalar, or spinweighted, Legendre functions. For directions the overall cost about and is therefore prohibitive for any values of and of interest. Here, we assumed a relation, , typically fulfilled for the fullsky pixelization with a proportionality coefficient on the order of a few, e.g., for the HEALPix^{4}^{4}4http://healpix.sf.net/ pixelization (Górski et al. 2005) we have , while for ECP, . The interpolations can cut on this load, trimming it to the one needed to compute a representation of the signals on an isolatitudinal grid, with complexity plus the interpolation with the complexity , or in the case of NFFT, in both cases with a potentially large prefactor. Nevertheless, this is clearly a more favorable scaling than the one of the direct method and, as has been shown in the past, makes such calculations feasible in practice. We note, however, that for the sake of the precision of the interpolation one may need to overpixelize the sky, meaning using a higher value of than what would normally be needed to support the harmonic modes all the way to . Hereafter, we denote the overpixelization factor in each of the two directions, and , as . Consequently, the number of pixels used is given by , where is the standard fullsky number of pixels as determined by the selected value of .
Goals and methodology.
This paper has two main goals. One is to study internal consistency and convergence of the pixeldomain simulations in the context of the currently viable cosmologies. The other is to study the dependence of the precision of these simulations on some of its most important parameters.
In previous works, analyses of this sort have usually been restricted to comparisons of power spectra of the lensed maps derived by a lensing simulation code and the theoretical predictions computed via an integration of the Boltzmann equation, as implemented in the publicly available codes, CAMB and CLASS. In these works, the effort has been made to find a set of the code parameters for which the resulting spectrum is consistent with the theoretical expectations. Such comparisons are without doubt an important part of a code and method validation. However, they are limited to the cases of the gravitational potentials, , derived in a linear theory, and are not applicable in some other cases where the potential is obtained by some other means such as, Nbody simulations. In addition, they may on occasion be misleading because the numerical effects can easily conspire to deliver a spectrum tantalizingly close to the desired one, without any reassurance that the map of the lensed sky characterized by it has correct other statistical properties, such as higherorder statistics. That this is particularly likely and consequential for the modes spectrum given its low amplitude and the lack of characteristic, finescale features. An example of such a conspiracy is shown in Fig. 1, where the power deficit at the high end caused by the oversmoothing due to the interpolation nearly perfectly compensates the extra power aliased into the range of interest as a consequence of too crude a resolution of the final map.
We therefore propose to study the robustness of the simulated results by demonstrating their convergence and internal stability with respect to sky sampling and bandlimit changes, as expressed by two parameters introduced earlier: the upper value of the signal band, , and the overpixelization factor, . Only once the convergence is reached we compare the results to those computed by other means, if any are available. We note that the convergence tests do not have to, and should not in general, be restricted to the power spectra comparison only and could instead involve other metrics more directly relevant to the simulated maps themselves. In all such tests it is typically required to consider maps with extreme resolutions, which has been traditionally prohibitive for numerical reasons. We overcome this problem with the help of a highperformance lensing code, lenSHAT, which we have developed for this purpose.
Our second goal, i.e., to study the dependence of the calculation precision on the two crucial parameters, and , is complementary and is aimed at providing meaningful and practically useful guidelines of how to select the values of these parameters prior to performing any numerical tests given some predefined precision targets. In this context, we present an indepth semianalytical analysis of the impact of these parameters on the lensed signal recovery. Though ultimately they may need to be confirmed numerically casebycase, e.g., using the convergence tests as discussed earlier, they could be of significant help in providing a reasonable starting point for such tests.
At last we also present a simple, highperformance parallel implementation of the pixeldomain approach, lenSHAT, which is capable of reaching extremely high sample density on the sphere thanks to its efficient parallelization and numerical implementation, and which has been instrumental in accomplishing all the other goals of this work.
3 Exploring the bandlimits
3.1 CMB lensing in the harmonic domain
This section addresses the second of goals, as stated above, and describes a semianalytic study of the impact of the assumed bandwidth values on the precision of the lensed signal. Our discussion is based on the model of Hu (2000) and focuses on the lensed mode signal that is obtained obtained as a result of the lensing acting upon the primordial mode signal, and is the main target of this paper. Similar considerations can be made, however, for other CMB observable spectra and we present some relevant results calculated for these cases (see Sect. 3.2 for some more details). Using the results of Hu (2000), we represent the lensed mode signal as
(12) 
where is a spin coupling kernel (see Hu (2000) for a full expression), and and denote the unlensed power spectra of the mode polarization and of the gravitational potential, respectively. This formula can be obtained by a secondorder series expansion around undisplaced direction, which is expected to be accurate to within 1% for multipoles and then for , where the CMB amplitude is small and can be modeled by its gradient only, while in the intermediate scales its precision degrades to nearly 5%. The reliability of this analytical model is discussed later in Sec. 4.3.1. We can now introduce 1D kernels, , defined as
(13) 
Summed over for a fixed , these give the lensed mode power contained in the mode , Eq. 12, while for a fixed they define the power spectrum of the lensed modes signal, generated via lensing from the polarization signal that contains nonzero power in a single mode , and with its amplitude as given by . The kernels are displayed in Fig. 2 together with their analogs for the total intensity and polarization signals. We find that the kernels computed for different values of are similar, just shifted with respect to each other accordingly. The change in the amplitude simply reflects the change in the assumed power of the signal, which in turn follows that of the actual power spectrum. The kernels are flat for values and decay as a power law for , displaying a sharp dip at . Similar observations can be made for the and kernels, with the exception that unlike their and counterparts, the kernels are not peaked around the dip. This behavior is related to the fact that the lensed modes signal we discuss here, described by Eq. 12, is generated by the polarization, while the main effect of the lensing on and is imprinted on these signals themselves. A direct consequence of this is that for any lensed modes spectrum mode a contribution from local unlensed multipoles will be less dominant, as is the case for the and signals, and nonlocal contributions will be relatively more important and therefore required to be accounted for in highprecision calculations.
Indeed, owing to the flat plateau of the kernels at the low end, in principle all high unlensed modes contribute to the lensed power at the low end. The magnitude of their contribution is modulated by the shape of the unlensed spectrum and therefore eventually becomes negligible only because of the Silk damping, i.e., lack of the power at small angular scales in the unlensed fields. Nevertheless, we can expect that nearly all the modes of the unlensed spectrum up to the damping scale have to be included in the calculation of the lensed spectrum to ensure highprecision recovery of the lensed modes spectrum with . Given some specific target precision, we could and should finetune the required spectrum bandwidth, and whatever is the value selected here, the bandwidth for the potential field will have to be at least the same.
For high modes of the lensed modes spectrum, , the nonlocality of the power transfer due to lensing is even more striking, as due to the low amplitudes of the spectrum the local contributions are additionally suppressed, and the long powerlaw tails of the contributions from large and intermediate angular scales, are evidently dominant. Less evident is the fact that also the power from even smaller angular scales, , may be relevant. The contributions from each of these modes may appear small, Fig. 2, but are potentially nonnegligible due to the large number of those modes. A highprecision recovery of the high tail of the lensed modes spectrum will therefore need a careful assessment of the importance of all these contributions, nevertheless, a generic expectation would be that the bandwidth of the unlensed modes spectrum will have to be higher than the highest value of the lensed modes signal multipole for which high precision is required, and potentially higher than the scale of Silk damping. Because these very high multipoles of the lensed spectrum are expected to have a significant contribution from relatively low multipoles of the unlensed signal, i.e., for which given the triangular relations, Eq. 16, and the definition of the kernels, Eq. 13, we can conclude that the bandwidth of the potential field used in the simulations will have to be at least as large as .
There are two main conclusions to be drawn here. First, it is clear that a highfidelity simulation of the polarization power spectrum even in a restricted range of angular scales will require broad bandwidths, potentially all the way up to the scale of Silk damping, for both the unlensed mode polarization signal and the gravitational potential. However, these bandwidth values are not expected to depend very strongly on the maximal mode multipole that we want to recover, at least as long as it is in the range . Second, because the expected bandwidths are broad, it is important to optimize them to ensure efficiency of the numerical codes without affecting precision of the results.
Thanks to the peaked character of the respective kernels, the lensed modes for the lensed and spectra are typically dominated by a local contribution coming from the immediate vicinity of the mode. This in general permits setting the bandwidth for the potential shorter than the mode of the lensed spectrum to be computed. By contrast, the unlensed and spectrum have to be known at least up to the multiple of interest of the lensed spectrum, or , augmented by the assumed bandwidth of the potential. These observations reflect the usual rule of thumb, (e.g., Lewis 2005), indicating that lower bandwidth values can be used in these two cases for the same required accuracy.
3.2 Accuracy
In this section we aim at turning the consideration presented above into more quantitative prescriptions concerning the bandwidths of the input fields used in the simulations. For this reason we introduce 2D kernels, , defined as,
(14) 
These define for a given value of a contribution of the power at and the power at to the amplitude of the lensed modes spectrum at that , which can then be computed by summing over and , i.e.,
(15) 
The sum in this equation involves in principle an infinite number of terms and therefore would have to be truncated in any numerical work, either explicitly, e.g., by setting finite limits in the formula above, or implicitly, e.g., by selecting the bandwidths, pixel sizes, etc, in the pixeldomain codes. We therefore used these kernels to study the precision problems involved in this type of calculations. As the expressions for the kernels are approximate, so will be our conclusions. However, as our goal is to provide guidelines on how to select the correct values for the simulations codes, this should not pose any problems. We will return to this point later in this section.
We show a sample of the kernels, in Fig. 3. These are computed for selected values of for which the approximations involved in their computation are expected to be valid. We note that all elements of the kernel, , vanish if the quantity , defined in the previous section, is even, as do those for which the triangular relation
(16) 
is not satisfied. This last fact is a consequence of the Wigner 3j symbols in the expressions for , (Hu 2000). Within these restrictions it is apparent from Fig. 3 that each multipole of the lensed modes spectra receives contributions from a wide range of harmonic modes of both E and spectra, extending to values of and significantly higher than and roughly independent of the latter value at least for . For its higher values a nonnegligible fraction of the contribution starts to come from progressively higher multipoles of both and . Clearly, these trends are consistent with what we have inferred earlier with help of the 1dim kernels.
As also observed earlier, we find the modes kernels qualitatively different from those computed for the lensed total intensity and modes polarization signals, Fig. 3, and they are more localized in the harmonic space with the bulk of power coming mainly from scales for which both are relatively close to the considered lensed multipole, .
We note that all the 2D kernels are positive^{5}^{5}5This is not true for the kernels, which we comment about later. and therefore including more terms in the sum, Eq. 15, will always improve the precision of the result. From the efficiency point of view one may want to include in the sum preferably the terms corresponding to the largest 2D kernel amplitudes because they provide the largest contribution to the final lensed result before adding those with progressively smaller kernel amplitudes until the required precision is reached. This approach would in principle ensure that the best accuracy is achieved with the smallest number of included terms. This may therefore look as a potentially attractive option from the perspective of optimizing the calculations. However, in practice, as the recurrence formulae are usually employed in the calculations, e.g., either those needed to compute spherical harmonics in the case of the pixeldomain codes or those needed to calculate the 3j symbols as in a direct application of Eq. 15, and therefore all the terms up to a given bandwidth are at our disposal at any time, and it therefore seems efficient and useful to capitalize on those by including all of them in the calculation. Consequently, we estimated what degree of precision can be achieved by such calculations by including all the contributions up to some specific bandwidth values for the and multipoles.
For the modes spectrum we therefore hereafter express the precision of the calculations as
(17) 
where the sums in the denominator should in principle extend over the infinite range of values of , but for practical reasons are truncated to , which for the range of lensed multipoles of interest in this work, , should be sufficient.
This expression can be generalized for all lensed CMB spectra, but in this case our model has to take into account that the main effect due to lensing is to reshuffle the power of the signal and not to convert it into some other component. Therefore the total variance of the signal has to be conserved (e.g. Blanchard & Schneider 1987). In this case the lensed power spectra of or can be written as
(18)  
(19) 
where is an integer that is different for each CMB spectra

for X=E

for X=T

for X=TE.
We note that the factor is a smooth function of the cutoff value of the sum over , which quickly becomes nearly constant for , Fig. 5. Hereafter, we therefore precompute it once assuming and use it in all subsequent calculations. The generalized expression for the accuracy function in Eq. 17 would then be
(20) 
where for shortness we have introduced
We note that for cosmological models of the current interest, the factor is typically found to be on the order of and thus the term is expected to be
negative for most of the values of in the range of interest here, see Fig. 5.
In Fig. 3 black solid lines represent the expected error estimates, as expressed by the accuracy function, , for a number of selected values ranging from % to %. We note that for the shown range of only the subpercent values of the accuracy are likely to be somewhat biased due to the assumed cutoff in the denominator of Eqs. 17 or 20,
an effect, which is therefore largely irrelevant for our considerations here.
The fact that our accuracy
definition is based on an approximate formula is also not a problem because any potential (and small, Challinor & Lewis (2005)) discrepancy would affect both the numerator and denominator of Eqs. 17 an 20 in
the same way. It can therefore be shown that to the first order in the discrepancies amplitude, precision of our accuracy criterion improves progressively when the estimated level of the accuracy, , tends to
and is degraded to the percent level when , i.e., when it is well outside of the region of any interest for the highprecision simulations considered here (see Appendix A).
The differences in the shape of the lensing kernels result in differences in the accuracy contours for different lensed signals and their multipoles as shown in Figs. 3 and 4. In particular, for lensed modes, the contribution of largescale power of the CMB to the lensed signal is more significant. In spite of these differences, we, however, find that the overall contours seem to share a similar shape made of two lines nearly aligned with the plot axes which meets at a right angle. Consequently, if one of the two bandwidths is fixed, then the accuracy, which can be reached by such a computation, will be limited and, moreover, starting from some value of the other bandwidth, nearly independent on its value. This has two consequences. First, if the attainable precision is not satisfactory given our goals, it can be improved only by increasing the value of the first bandwidth appropriately. Second, the value of the second bandwidth can be tuned to ensure nearly the best possible accuracy, given the fixed value of the first bandwidth, while keeping it much lower than what the triangular relation, Eq. 16, would imply. This could lead to a tangible gain in terms of the numerical workload needed to reach some specific accuracy. Turning this reasoning around, we could think of optimizing both bandwidths to minimize the cost of the computation for a desired precision. From this perspective, taking the turnaround point of the contour for a given accuracy may look as the optimal choice. However, this choice would merely minimize the sum of both bandwidths, (or some monotonic function of each of them) for the given accuracy, which may or may not be relevant for a specific case at hand. Instead we may rather select the bandwidths to minimize explicitly actual computational cost of whatever code we plan on using. We present specialized considerations of this sort in the next section.
On a more general level, we find that the standard rule of thumb, interpreting the effects of lensing as a convolution of the unlensed CMB signal with a relatively narrow, , convolution kernel due to the lensing potential, applies only for and signals and even in these cases only to low and intermediate values of and only as long as a computation precision on the order of % is sufficient. For higher values of the lensed spectrum multipoles or higher levels of the desired accuracy in the case of and and for all multipoles of the polarization signal, the required bandwidths of both the respective, unlensed CMB signal and the gravitational potential are more similar and indeed the latter bandwidth is often found to be broader.
We note that an analysis of this sort is somewhat more prone to problems in the case of the power spectrum since the lensing kernels are not always positive because they contain the products of two different Wigner 3j coefficients and power spectra, which may be nonpositive, rendering the corresponding accuracy function not strictly monotonic. Hereafter, we excluded this spectrum from our analysis, noting that any band limits prescriptions derived for and will also apply directly to .
4 Numerical analysis
In this section, we present results of simulations of lensed polarized maps of the CMB anisotropies and their spectra. We address two aspects here. First, we numerically study selfconsistency of the pixeldomain approach to simulating the lensing effect. Second, we demonstrate how the consideration from the previous section can be used to optimize numerical calculations involved in these simulations.
We start this section by introducing a new implementation of the pixeldomain algorithm, which we refer to as lenSHAT.
4.1 lenSHat
lenSHAT is a simple implementation of the pixeldomain algorithm for simulating effects of lensing on the CMB anisotropies. The hallmark of the code is its algorithmic simplicity and robustness, with its performance rooted in efficient, memorydistributed parallelization. The code is therefore particularly welladapted to massively parallel supercomputers. Its implementation follows the blueprint described in Lewis (2005) that summarized in Sect. 2.2.1. The main features of the code are listed below.
Grids.
The code can produce lensed maps in a number of pixelizations used in cosmological applications, but internally it uses grids based on the equidistant cylindrical projection (ECP) pixelization where grid points, or pixel centers, are arranged in a number of equidistant isolatitudinal rings, with points along each ring assumed to be equidistant. This pixelization supports a perfect quadrature for bandlimited functions, which in the context of this work permits minimizing undesirable leakages that typically plague codes of this type. It can be shown, Driscoll & Healy (1994), that an ECP grid made of isolatitudinal rings, each with points and a weight, as given by
(21) 
is required and sufficient to ensure a perfect quadrature for any function with a band not larger than .
Interpolation.
For the interpolation, the code employs the nearest grid point (NGP) assignment, e.g., we assign to every deflected direction a value of the sky signal computed at the nearest center of a pixel of the assumed pixelization scheme, therefore the respective sky signal values are calculable at the fast spherical harmonic speed. The NGP assignment is extremely quick and simple, but it requires the computations to be performed at a very high resolution to ensure that the results are reliable. The sufficient resolution required for this will in general depend on the intrinsic sky signal prior to the lensing procedure, as well as the resolution of the final maps to be produced, as is discussed in Sect. 4.2. As discussed above, in a typical case these are expected to be very high and the computations involved in the problem may quickly become very expensive. Nevertheless, as we show in Sect. 4.8, the overall computational time in this case is only somewhat longer than that involved in some other interpolation schemes, while the memory requirement can be significantly lower. However, the major advantage of this scheme for the purpose of this work is its simplicity and in particular the fact that its precision is driven by a single parameter defining the grid resolution.
Spherical harmonic transforms.
To sidestep the problem of computing spherical harmonic transforms with a huge number of grid points and a very high band limit, lenSHAT resorts to parallel computers and massively parallel numerical applications. With these becoming quickly more ubiquitous and affordable this solution is becoming progressively more attractive.
Parallelization of the fast spherical harmonic transforms is difficult due to the character of
the input and output objects and the involved computations, where a calculation of each output datum requires knowledge of, and access to, all input data.
This is clearly not straightforward to achieve without extensive data redundancy, as done e.g., in LensPix or parallel routines of HEALPix, or complex data exchanges between
the CPUs involved in the computation.
To avoid such problems in our implementation we used the publicly available
scalable spherical harmonic transform (SHAT) library^{6}^{6}6http://www.apc.univparis7.fr/APC_CS/Recherche/Adamis/MIDAS09/software/s2hat/s2hat.html.
This library provides a set of routines designed to perform harmonic analysis of arbitrary spin fields on the sphere on distributed memory architectures (though it has an efficient performance even when working in the serial case). It has a nearly perfect memory scalability obtained via a memory distribution of all main pixel and harmonic domain objects (i.e., maps and harmonic coefficients), and ensures very good load balance from the memory
and calculation points of view.
It is a very flexible tool that allows a simultaneous, multimap analysis of any isolatitude pixelization, symmetric with respect to the equator, with pixels equally distributed in the azimuthal angle, and provides support for
a number of pixelization schemes, including the above mentioned ECP; see Szydlarski et al. (2011) for more details. The core of the library is written in F90 with a C interface and it uses the message passing interface (MPI) to institute distributed memory communication, which ensures its portability. The latest release of the library also includes routines suitable for general purpose graphic processing units (GPGPUs) coded in CUDA (Hupca et al. 2012; Szydlarski et al. 2011; Fabbian et al. 2012).
We emphasize that if a sufficient resolution can be indeed attained, the approach implemented here can produce results with essentially arbitrary precision. In the following
we demonstrate that thi is indeed the case for the described code.
4.2 Code parameters
4.2.1 Overview
In this section we describe how we fixed the essential parameters of the code. We first emphasize important relations between them. A detailed description of the procedures used to assign specific values to them, is given in the following sections.

We start by defining a target value in terms of the highest value of the harmonic mode, , that we aim to recover and its desired precision, . We then use the reasoning from Sect. 3.2 to translate this requirement into corresponding bandwidths, and , of the relevant unlensed signals, and . These ensure that the precision of all modes of the lensed signal up to will be not lower than , barring any unaccountedfor, numerical inaccuracies. The values of and are then used to estimate the bandwidth of the output, lensed map, .

We then simulate two unlensed maps, and , of the signal and potential field, , with their band limits set to and , as estimated earlier. The number of pixels of the displacement map, , is equal to that in the output map of the lensed signal, and for the ECP grid, equal therefore to . The number of pixels in the signal map, is then given by , where is an overpixelization factor introduced in Sect. 2.2.1 and discussed in detail below, Sect. 4.2.4. For simplicity, we assume that the grid for which the unlensed field is computed is a subgrid of the grid used for .

The reassignment procedure (step 5 of the algorithm, Sect. 2.2.1) is then straightforwardly performed, leading to the map containing power potentially up to , which maybe needed to be filtered down to the band limit of , as initially required.
4.2.2 Intrinsic bandwidths
We employ the procedure described earlier in this work in Sect. 3.2 to set the intrinsic band limits. Instead of using generic predictions, we aim at optimizing their values to ensure the lowest possible computational overhead. To do so we need to provide a model of the cost of numerical calculations involved in lenSHAT. This is dominated by large spherical harmonic transforms, one needed to calculate the map of and the other to calculate that of signal . Given the parameters introduced above and because the total cost of a spherical harmonic transform is proportional to we therefore obtain
(22)  
Here stands for the number of signal maps, that we aim to produce and is equal – only, – and , or – , , and , while for the field the prefactor is fixed and equal to
, reflecting the number of components of a vector field on the sphere. In deriving the last equation above we have assumed that . This is justified below, as are the values that should be adopted for and . The expression above includes neither the cost of the interpolation nor reshuffling, but because in both these cases the number of involved operations is proportional to , their cost is negligible with respect to that of the transforms.
Solving for the optimized values of the bandwidths, which simultaneously ensure the desired precision, , at a selected multipole, , involves minimizing the cost function in Eq. 22, with a constraint, , Eqs. 17 and 20. This is implemented as follows. First, we define a grid of levels of the cost function and for each level calculate the best accuracy achievable on its corresponding contour. If this accuracy for some of the levels is close to our target, we find a corresponding pair of bandwidth values, , which then defines our optimized solution. If none of the accuracies is sufficiently close to the required precision, we take two levels for which the assigned accuracies bracket the target value and insert an intermediate level for which we calculate the corresponding best accuracy. We repeat this procedure until the best accuracy found for the newly added contour is sufficiently close to the target one. We then use it to find the pair of the optimized bandwidths as above. As mentioned before, in general, the two optimized bandwidth values will not be equal. This appears to be particularly the case when simulating the CMB spectra at very high multipoles and especially in the cases involving the modes, which have broader kernels and are more demanding in terms of bandwidth requirements. The procedure allows one to gain a factor of nearly % in terms of runtime inthea range of accuracy of interest for lensed multipoles close to , especially if high oversampling is required. For temperature and mode polarization, where less extra power is required in to obtain an accurate result, the gain can be quantified to be nearly %  %. We report in Figure 7 the dependence of the optimized bandwidth parameters as a function of the required accuracy imposed at different lensed multipoles of , , and spectra, in the right column, and contrast them with the bandwidths obtained in the case when both of them are assumed to be equal. In Figure 6 we show typical runtime gains as a function of the required accuracy.
We note that here that whether we choose to optimize the bandwidths or just assume that they are equal, we find that imposing a certain accuracy level at some multipole, , ensures that the same accuracy requirement will be fulfilled for all .
4.2.3 Lensed map bandlimit
For the resolution of the final map, we note that in an absence of numerical effects, such as those due to the pixelization and interpolation, the lensing procedure would be described by Eq. 12 and the bandwidth of the lensed map would be simply given by . In the presence of the numerical effects, the output map will have an effective bandwidth typically higher than that, which will lead to some poweraliasing at the high end if this theoretical band limit is imposed. We find this to be indeed the case in our numerical calculations. However, we also find that once the overpixelization factor is set correctly, the aliasing is localized to at most % of the bandwidth and therefore easy to deal with in postprocessing, e.g., step 6 of the algorithm outline in Sect. 2.2.1. Consequently, we used in our numerical simulations, with as the band limit.
It is important to emphasize that NGP is one of the sources of the aliasing, because it does not preserve the bandwidth of the interpolated function, like some of the other, ad hoc procedures proposed in this context. Clearly, an interpolation that preserves the function bandwidth would be a significant improvement for this type of algorithms, if it comes without prohibitive numerical cost. We leave such an investigation to future work.
4.2.4 Overpixelization factor
As we explained already our interpolation procedure consists of two steps: an overpixelization that is followed by an NGP assignment. The overpixelization involves producing maps with the sky signal sampled at significantly higher rate than is necessary given from the signal’s band limit. For the ECP grids used internally by lenSHAT, this is implemented by using times more points in each of the and directions. The remaining problem is then to fix the appropriate value of . To do so, we first observe that for the overpixelized grid, the NGP assignment can be seen in two ways. Either as approximating the true value of the sky signal, which needs to be calculated in one of the displaced directions, which are precisely computed in turn, which is the standard perspective and the only one available if a more sophisticated interpolation scheme is applied. Or it can be seen as approximating each displaced direction by a direction pointing toward the nearest grid point, with a correct sky signal value assigned to it. This second viewpoint provides us with an independent test to check if the density of our overpixelized grid is sufficiently high. The involved procedure involves first calculating the approximate displacement field and its power spectrum, which is then compared with the input power spectrum for the gravitational potential, . We note that the approximation used here can in general generate a nonzero curl and therefore there will be two nonvanishing spectra of the approximate displacement field, corresponding to its (gradient) and (curl) components. We then require that the recovered spectrum is significantly smaller than that of , and that both the recovered spectrum and the input one agree sufficiently well up to the angular scales, which are of interest given the range of the lensed spectrum we are after and its precision. These latter two are turned into the range requirement using one of the 2D kernels.
Examples of such comparisons are shown in Fig. 8 for a number of values of the oversampling factor ranging from up to . We see that for the latter value the approximate spectrum is consistent over the entire shown range of values and the recovered is there significantly smaller. We therefore continue to use this value in the runs discussed later in this work, even if, as noted in the next section, could be sufficient at least for . We also point out that, as it might have been expected, the departures of the recovered spectrum for the displacement from the input one are consistent with the presence of the nonzero type mode in the approximated displacement field with an amplitude similar to that of its mode spectrum, which renders our two criteria redundant. In addition, if only and CMB spectra are of interest, then is usually sufficient to obtain accurate results on the scales of interest because the longtails of the displacement spectrum are less relevant in these cases.
For completeness, in Fig. 9 we show the relevant CMB spectra computed with the same values as shown in Fig. 8 and aiming at a highprecision reconstruction for , demonstrating that both overpixelization rates, as inferred above, ensure a satisfactory recovery of this spectrum in the targeted range of . We provide more details about this Figure in Sect. 4.3.2.
4.3 Validation and tests
4.3.1 Simulated kernels
As a first step of validation of our code, we investigated whether its results agree with the prediction of the semianalytical approach used to model convolution in the harmonic domain. We focus here on numerically feasible studies of the 1D kernels, as defined in Eq. 13. For this purpose we assumed that the unlensed CMB signal, i.e., modes polarization in the case of the lensed modes spectrum, contains power only in a single harmonic mode, i.e., and computed
the resulting lensed modes spectrum for several values of using lenSHAT. We compared them with the analytic results obtained for the same multipole and displayed in Fig 2.
The results of this calculation are shown in Fig. 10, where we see that in a range where the analytic model is more reliable the agreement between the two curves is excellent if only a sufficient resolution for the unlensed grid is used.
On the other hand, in the region where the analytic approximation we used is not accurate anymore because amplitudes of the CMB signal and its gradient are comparable and therefore the truncation in the series expansion introduces a nonnegligible error, the discrepancy between our analytical model and the simulated 1D kernels becomes more evident. Such an approximation tends to overestimate the contribution of each single mode to its neighboring angular scales of a factor of nearly 50% with respect to simulated kernels and to slightly underestimate the contribution of each mode to the kernel tails, i.e., to the multipoles higher than the one in exam.
Nevertheless, the analyticallyapproximated and simulated kernels are found to be qualitatively quite similar, which validates therefore our semianalytic bandwidth requirements presented earlier.
4.3.2 Simulated spectra
Another batch of performed tests consisted in comparing the spectra obtained from lenSHAT and those derived with Boltzmann codes such as CAMB or CLASS.
In particular, the black solid line in Fig. 9 shows an example of the result obtained for a simulation of lensed modes designed to reach an accuracy of up to 0.1% at . Because no bandlimit optimization is performed, and it is therefore assumed that , the latter value has to be at least , Fig. 4. The lensing convolution of signals with such a band limit leads to polarized maps with power up to , which means that to avoid any aliasing, we would need a grid for the lensed sky and the displacement field with at least rings with pixels per ring, i.e., , where we have rounded the number of rings and pixels per rings to a power of .
Once the band limit of the signals and the respective grid for the lensed sky is set, we still need to define the overpixelization rate as required by our interpolation.
As noted in the previous section, there seem
to be a general
reasoning based on the discretized displacement spectra, which points toward as a sufficient value. Because calculating the overpixelized map, albeit with a restricted bandlimit,
is the most timeconsuming part of the code, there may be an interest on occasion to tune to be as small as possible.
In this context we find, as illustrated in Figs. 8 and 9, that if the extra power introduced by discretization of displacement field does not exceed % of the power in the nondiscretized displacement field on scales , an oversampling factor of is sufficient to render a power spectrum on scales with an accuracy as determined by the assumed bandwidth. However, the factor should be treated as a lower bound and be used with care because there will typically be a significant amount of extra power in the mode spectrum for , which may need to be efficiently filtered out before the respective map can be further used. In contrast, if the extra power found in the discretized displacement does not exceed % of the original power for all angular scales up to , then no overshooting takes place and the results remain highly accurate also beyond the scale of interest .
In Fig. 11 we present the spectra for the two polarized components and as well as the displacement field, , computed in a run aiming at recovering of these signals in a band up to with precision better than to %. For this run we assumed the value of the required bandlimits to be . These values were extrapolated from Fig. 7, where to obtain a % accuracy on modes on similar angular scales (e.g. ) we needed to include power up to . Following the same prescription as given for the previously detailed case of Fig. 9, we set the resolution of the unlensed sky and displacement field to while, to ensure the highest possible reliability of the result, we pushed the oversampling factor to 16. The discretization errors introduced by this setup are found to stay under the % level on all the angular scales involved in the calculation and no significant overshooting is shown (see Fig 11).
Though the band limit and resolution involved may look exaggerated from the practical point of view, they simultaneously demonstrate the capability of the numerical code while illustrating our conclusions concerning the precision of these calculations.
In general, we find that a simple algorithm as proposed in lenSHAT is capable of simulating effects of lensing on CMB over the range of angular scales of interest for current and foreseeable experimental efforts. Moreover, if used properly, it does so with an accuracy that on very small scales is limited rather by the precision of the input power spectrum of unlensed CMB than by the employed numerical algorithm.
4.4 Convergence tests
To investigate the precision and reliability of our approach it is interesting to investigate the numerical convergence of the results without relying on a direct comparison to an external Boltzmann code. Since several experiments in the future will be able to target nonGaussianities in CMB polarization, i.e., the statistical moments beyond the power spectrum, we decided to study the convergence of the results not only on the power spectrum level, but also in the real domain, i.e., on the map level.
4.5 Power spectrum convergence
We first investigate the convergence of the power spectrum up to a given scale of interest as a function of the bandwidths. This procedure allows us to simultaneously show the precision of our code and also to indirectly prove the validity of the bandwidth requirements given in Sect. 4.1. For this purpose we assumed the bandwidths for CMB and fields to be equal and then fixed the resolution of the grid following the prescriptions of Sect. 4.3.2 assuming . We simulated CMB maps off all three Stokes parameters , , and and then computed the precision of the amplitude of the power in some multipole of interest, , recovered from the simulation. The precision is defined as the fractional difference between the amplitudes obtained from two simulations performed for two considered values of . For these specific tests we verified that the random realization of the harmonic coefficients used in the simulation is the same when changing the value of the bandwidth from to a value for . We report the result of the numerical convergence for in Table 1. We note that the results agree with the analytic calculation of Sect. 4.1, where we saw that extending the band limit has no visible effect on the recovered results on the scale of interest if a proper amount of power has already been convolved. As expected, a significant fraction of modes power is converted into modes for angular scales but no significant improvement is seen if power beyond is included. We also performed a test case for , i.e., in the regime where the gradient approximation is expected to be less accurate, Table 2. The modes accuracies are consistent with those derived in Sect. 4.3.2 except for the last set of bandwidth parameters, where the fractional difference between simulated spectra seems to saturate at a level of %. This may be related to a small residual aliasing due to an underestimated oversampling parameter.
TT  43%  0.04%  0.02%  0.003% 
EE  31%  0.01%  0.01%  0.005% 
BB  35%  3%  0.02%  0.004% 
TE  32%  0.04%  0.01%  0.002% 
TT  31%  0.2%  0.09%  0.1% 
EE  32%  0.2%  0.05%  0.03% 
BB  7%  4.6%  0.1%  0.09% 
TE  21%  0.7%  0.2%  0.2% 
4.6 Map convergence
After showing the convergence on the power spectrum level, which provides information on the overall variance of the simulated maps, we investigated if the convergence of our numerical result is also realized in the real domain. For this purpose we first defined an error map obtained as a difference of two maps computed assuming two different bandwidths rescaled by the rms value of one of the two maps, i.e.,
(23) 
where is a simulated map of the field obtained assuming as the bandwidth. After deriving the harmonic coefficients with the procedure outlined in the previous section, we filtered out all modes on angular scales and resampled the signal on a grid that properly samples the signal up to multipole . To take advantages of the HEALPix visualization tools, we use for this purpose an HEALPix grid having () for (). After resampling the harmonic coefficients we computed the power spectrum of the error field , which demonstrates the precision obtained on the map level. In Fig. 12 we report the result of this analysis for the test case . The errorfield power spectrum is found to be very similar to a white noise spectra with r.m.s below %. If the power is properly resolved, an accuracy equivalent to on the map level can indeed be obtained, while the error slightly increases to for polarization (see Fig. 13). However, this test case does not include the effect of any realistic experiment setup; in a reallife case the criterion for the convergence is set by the noise level on the pixel, if instrumental noise has to be added to the simulated maps.
The results presented above show that the systematic uncertainties inherent to the pixeldomain simulation method can be controlled with high accuracy, demonstrating that this method can provide a sufficiently precisely framework within which to compare and study different physical assumptions entering such calculations and in particular to investigate the impact of cosmological models on the mode lensing predictions. We emphasize that the pixeldomain method is sufficiently general to be applicable to a range of diverse physical contexts of this kind. Even more importantly, the applicability of the considerations presented here goes beyond the pixeldomain method and can be straightforwardly extended to, for instance, raytracing approaches, which do not involve Bornapproximation.
4.7 Monte Carlo simulations
To test whether our method produces any significant bias in the power spectrum we produced independent realizations of , , and maps that were required to reach accuracy up to and investigated the statistical properties of the power spectra averaged over these realizations. The latter is expected to be nearly Gaussiandistributed since the nonGaussian correlations in the lensed power spectrum covariance induced by lensing itself are negligible for , and spectra. However, for all the power spectra including the field, we expect the latter statement to be only partially correct since the the covariance of this spectrum is nonGaussian, especially on small angular scale. Identifying the expected scatter of the averaged spectrum with the theoretical Gaussian sample variance therefore tends to underestimate the scatter itself.
For each pair of the Stokes parameters, and , we define a quantity
(24) 
where the bar denotes a power spectrum averaged over realizations. The ensemble of all values of is expected to be Gaussiandistributed with mean and variance , which can be tested by means of a KolmogorovSmirnov (KS) test. In addition, we define a reduced statistics, Eq. (24) and following Basak et al. (2009), as
(25) 
We report in Table 3 the results of these two tests expressed as the significance level probability of the null hypothesis. We found that the method does not produce any significant biases on and cross spectra either; these were not shown in the previous analysis but are of potential interest, because they are a sensitive test of any artificially induced correlation. Moreover, the precision and accuracy of the result can be tested quite independently of analytical models by devising a custom convergence procedure as explained in the previous section.
Significance  Significance  

TT  0.19  0.92 
EE  0.97  0.65 
BB  0.79  0.14 
TE  0.58  0.84 
TB  0.20  0.34 
EB  0.71  0.67 
4.8 Numerical performance and requirements
In this section we evaluate the strong scaling relations for numerical cost and memory requirements of lenSHAT, i.e., we run the code with the same parameters and test its scalability as a function of the number of MPI processes used in the calculation. For this benchmark test we used and a grid for lensed sky and displacement field of pixels.
The main data volume involved in the calculations is given by harmonic coefficients and maps that are evenly distributed between processors through the SHAT library. Their distribution is optimized for all spherical harmonics transform steps involved. The remapping method itself only depends on structures that are also distributed between processors, allowing us to preserve the scalability features inherited from the SHAT library. The overall memory requirements per processors for a lenS