Sparse Bayesian mass-mapping with uncertainties: peak statistics and feature locations
Weak lensing convergence maps – upon which higher order statistics can be calculated – can be recovered from observations of the shear field by solving the lensing inverse problem. For typical surveys this inverse problem is ill-posed (often seriously) leading to substantial uncertainty on the recovered convergence maps. In this paper we propose novel methods for quantifying the Bayesian uncertainty in the location of recovered features and the uncertainty in the cumulative peak statistic – the peak count as a function of signal to noise ratio (SNR). We adopt the sparse hierarchical Bayesian mass-mapping framework developed in previous work, which provides robust reconstructions and principled statistical interpretation of reconstructed convergence maps without the need to assume or impose Gaussianity. We demonstrate our uncertainty quantification techniques on both Bolshoi N-body (cluster scale) and Buzzard V-1.6 (large scale structure) N-body simulations. For the first time, this methodology allows one to recover approximate Bayesian upper and lower limits on the cumulative peak statistic at well defined confidence levels.
keywords:gravitational lensing: weak – methods: statistical – techniques: image processing – (cosmology:) dark matter – (cosmology:) cosmological parameters
In an empty universe the null geodesics along which photons travel correspond directly to straight lines. However, in the presence of a non-uniform distribution of matter the null geodesics are perturbed via gravitational interaction with the local matter over or under density i.e. the photons are gravitationally lensed (Grimm & Yoo, 2018; Schneider, 2005; Munshi et al., 2008; Heavens, 2009). As this gravitational interaction is sensitive only to the total matter distribution, and the overwhelming majority of matter is typically dark, gravitational lensing provides a natural probe of dark matter itself (Clowe et al., 2006).
Collections of associated photons emitted from a distant object travel along separate geodesics which are perturbed in different ways by the intervening matter distribution, e.g. photons traveling closer to matter over densities will interact more strongly and therefore be perturbed more than those farther away. As such the geometry of a distant object is warped (Bartelmann & Schneider, 2001) – i.e. colloquially the object is lensed.
Provided the propagating photons at no time come closer than one Einstein radius to the intervening matter over and under densities, the object is weakly lensed. Weak gravitational lensing of distant galaxies manifests itself at first order into two quantities; the spin-0 convergence which is a magnification, and the spin-2 shear which is a perturbation to the galaxy ellipticity (third-flattening).
Both the shear and the convergence have dominant intrinsic (i.e. in the absence of lensing effects) underlying values which makes measurements of the lensing effect difficult. In fact, there is no a priori way to know the intrinsic brightness of a galaxy (resulting in an inherent degeneracy – the mass-sheet degeneracy) and so the convergence is not an observable quantity (Grimm & Yoo, 2018). However, as the intrinsic ellipticity distribution of galaxies has zero mean one can average to recover the shearing contribution, hence the shear is an observable quantity. As such, measurements of the shear field are taken and inverted to form estimators of the convergence. Typically this inverse problem is seriously ill-posed and so substantial uncertainty on the reconstructed convergence map is introduced.
A wealth of information may be calculated directly from the shear field (often in the form of second order statistics (Kilbinger, 2015) – such as the power spectrum as in Alsing et al., 2016; Taylor et al., 2018) though recently there is increasing interested in extracting Non-Gaussian information from the convergence field, e.g. peak statistics, Minkowski functionals, extreme value statistics (Munshi & Coles, 2017; Coles & Chiang, 2000; Fluri et al., 2018; Peel et al., 2018, 2017a).
Primarily, the interest has arisen as higher-order statistics of the convergence field have been shown to provide complementary constraints on dark matter cosmological parameters which are typically poorly constrained by second-order statistics (Pires et al., 2010).
However, to make principled statistical inferences from the convergence field, the inversion from to must be treated in a principled statistical manner – something which until recently was missing from convergence reconstruction algorithms which were either not framed in a statistical framework (e.g. VanderPlas et al., 2011; Kaiser & Squires, 1993; Lanusse et al., 2016; Wallis et al., 2017; Jeffrey et al., 2018) or made assumptions of Gaussianity (e.g. Corless et al., 2009; Alsing et al., 2016; Schneider et al., 2015). As the information of interest in higher-order convergence statistics is non-Gaussian, assumptions of Gaussianity in the reconstruction process severely degrade the quality of the cosmological information.
Recently, a mass-mapping framework was developed (see Price et al., 2018a) which addressed precisely this issue. This new sparse hierarchical Bayesian mass-mapping formalism can be rapidly computed, can be extended to big data, and provides a principled statistical framework for quantifying uncertainties on reconstructed convergence maps. Notably, it has been shown to accurately reconstruct very high dimensional Bayesian estimators many orders of magnitude faster than state-of-the-art proximal MCMC algorithms – it was specifically benchmarked against Px-MALA (Pereyra, 2013; Durmus et al., 2018) in Price et al. (2018b).
In this paper, we propose two novel uncertainty quantification techniques, aimed to answer two frequently asked questions of the recovered convergence map. The first of these questions asks where a feature of interest in the reconstructed convergence map could have been observed – typically this has been addressed by bootstrapping; however we can now infer it directly in a Bayesian manner. The second question asks given a magnitude threshold what is the maximum and minimum number of peaks which could have been observed, within some well defined confidence.
The structure of this article is as follows. To begin, in section 2 we provide cursory introduction to weak lensing from a mathematical perspective, with emphasis on the weak lensing planar forward model in subsection 2.2. Following this we provide a brief overview of Bayesian inference and the previously developed (Price et al., 2018a) sparse hierarchical Bayesian mass-mapping algorithm in section 3. An introduction to Bayesian credible regions, specifically the highest posterior density credible region is provided in section 3.1. In section 4 we develop a novel Bayesian inference approach to quantifying the uncertainty in reconstructed feature location, which we then showcase on illustrative N-body cluster simulation data in section 5. We then introduce a novel Bayesian inference approach for recovery of principled uncertainties on the aggregate peak count statistic in section 6. Following this we showcase this Bayesian inference approach to quantify uncertainty in the aggregate peak statistic in section 7 on N-body large scale structure (LSS) illustrative simulation data. Finally we draw conclusions in section 8.
2 Weak Gravitational Lensing
In this section we provide a brief introduction to weak gravitational lensing, with emphasis on how this effect manifests itself into observable quantities. For a detailed background review of the field see Bartelmann & Schneider (2001); Schneider (2005). For a more mathematical background of the field, with emphasis on statistical methods see Grimm & Yoo (2018); Munshi et al. (2008); Heavens (2009). For a background of specifically the peak statistics see Lin (2016).
2.1 Mathematical Background
In a non-uniform distribution of matter the null geodesics along which photons travel are no longer straight lines, instead they are now sensitive to the local matter distribution. When many photons are propagating from a distant object to us here and now, the local matter distribution adjusts the geometry of the object we observe – the object is gravitationally lensed.
Provided the trajectory of the propagating photons at no time comes closer than one Einstein radius to the intervening matter over densities then the lens equation,
remains effectively singular and we are in the weak lensing regime. Equivalently one can define the weak lensing regime to be convergence fields for which – ensuring that the shear signal remains linear. Here the Einstein radius is given by,
where is the gravitational constant, is lensing mass, is the speed of light in vacuo and is the angular diameter distance defined as:
where is the comoving distance and is the curvature of the universe, which has been observed to be consistent with by Planck Collaboration et al. (2018).
As galaxies are naturally sparsely distributed across the sky, most observations fall within the weak lensing regime. The weak gravitational lensing effect can be described by a lensing potential which is the integrated Newtonian potential along the line of sight
where are angular spherical co-ordinates. A further constraint exists, such that the local Newtonian potential must satisfy the Poisson equation:
for matter density parameter , Hubble constant and scale parameter . Combined, equations (4) and (5) allow one to make inferences of cosmological parameters from observations of the lensing potential .
At linear order, gravitational lensing manifests itself as two quantities: the spin-0 convergence field (magnification) and the spin-2 shear field (perturbation to ellipticity). It can be shown that (Bartelmann & Schneider, 2001; Schneider, 2005) these quantities can be related to the lensing potential by,
where and are the spin raising and lowering operators respectively and are in general defined to be:
2.2 Lensing Planar Forward Model
Often second order statistics (Kilbinger, 2015) related to the shear are computed (e.g. the shear power spectrum as in Taylor et al., 2018; Alsing et al., 2016), though increasingly there is interest in extracting weak lensing information from the convergence directly – typically higher-order non-Gaussian information.
Unfortunately, due to an inherent degeneracy the convergence is not an observable quantity (Bartelmann & Schneider, 2001; Schneider, 2005; Grimm & Yoo, 2018) – this effect is colloquially referred to as the mass-sheet degeneracy. However, as the intrinsic ellipticity distribution of galaxies has zero mean, averaging many ellipticity observations within a given pixel provides a good estimate of the shear field.
In fact, there exists a further degeneracy between and such that the true observable is the reduced shear but for the context of the paper we will assume – see Mediavilla et al. pg.153, Price et al. (2018a), or Wallis et al. (2017) for details on how to account for the non-linear reduced shear.
As both and are related to a relation between and can be formed. Therefore, typically measurements of the shear field are taken and inverted to form estimates of the underlying convergence field. For small fields of view the flat sky approximation can be made, which reduces the spin-raising and lowering operators to (Bunn et al., 2003):
with the mapping operator being
where we have dropped the spin subscripts for clarity. To recover an estimator of the convergence one must invert this forward model.
The most naive inversion technique is that of Kaiser-Squires (KS) inversion (Kaiser & Squires, 1993), which is direct inversion in Fourier space, i.e.
where we have again dropped function arguments and subscripts for clarity. This approach attempts to mitigate the effect of noise by convolving the recovered convergence estimate with a broad Gaussian smoothing kernel, which severely degrades the quality of non-Gaussian information. This poses a somewhat serious issue as non-Gaussian information is precisely the information that is to be extracted from the convergence field. Therefore for higher-order convergence statistics the KS estimator is patently sub-optimal.
Moreover, decomposition of spin fields on bounded manifolds is well known to be degenerate (Bunn et al., 2003) and so inversion of shear to convergence for masked fields is inherently ill-defined – in particular the KS estimator is known to break down for non-trivial masking. In fact the lensing inverse problem is often seriously ill-posed, therefore motivating methods regularized by prior information.
3 Sparse Bayesian Mass-mapping
Many mass-mapping algorithms have been considered (e.g. VanderPlas et al., 2011; Kaiser & Squires, 1993; Lanusse et al., 2016; Wallis et al., 2017; Jeffrey et al., 2018; Jee et al., 2016; Chang et al., 2018), however in the context of this paper we wish to conduct principled statistical analysis of the reconstructed convergence map, and so we opt for the sparse hierarchical Bayesian algorithm presented in Price et al. (2018a) and benchmarked against MCMC algorithms in Price et al. (2018b).
Recently a sparse hierarchical Bayesian framework for convergence reconstruction was presented (Price et al., 2018a) which is not limited to Gaussian priors – in fact the prior need not even be differentiable. In this work we adopt this mass-mapping algorithm, which we briefly describe below.
First, by Bayes’ theorem the posterior distribution of the convergence reads
which shows how one should infer the posterior from the likelihood function (data fidelity term) and the prior (regularization term) (see e.g. Trotta, 2017, for a clear introduction to Bayesian inference in a cosmological setting). In the scope of this paper we do not consider the Bayesian evidence as it acts as a normalization term and so does not effect the recovered solution. Typically the Bayesian evidence is used for model comparison which is an avenue of study in of itself.
In a Bayesian inference problem one often wishes to find the solution which maximizes the posterior given data and some model. From the monotonicity of the logarithm function, maximization of the posterior is equivalent to minimization of the log-posterior such that
where the convergence which maximizes the posterior is given by , where MAP stands for maximum a posteriori. Provided the posterior is log-concave the minimization of the log-posterior takes the form of a convex optimization problem, which can be rapidly computed even in high dimensional settings.
Let be the discretized complex convergence field and be the discretized complex shear field, where is the number of binned shear measurements and is the dimensionality of the recovered convergence field. Suppose we can define a measurement operator which maps onto . On the plane, the measurement operator is given by (e.g. Price et al., 2018a)
where is the planar forward model in Fourier space as defined in equation (12), and is the forward (inverse) fast Fourier transform.
Now suppose that some noise is contaminating our measurements, then measurements are obtained through the measurement equation
Within this article we will consider the typical case in which the noise , i.e. i.i.d. (independent and identically distributed) zero mean Gaussian noise of variance . In this setting the likelihood function is given by a multivariate Gaussian, which for diagonal covariance is given compactly as
which (as in Price et al., 2018a) is regularized by a sparsity promoting Laplace-type -norm wavelet prior
where is the jointly inferred MAP regularization parameter (Pereyra et al., 2015) – the derivation and implementation of which may be found in Price et al. (2018a). Sparsity promoting priors in wavelet dictionaries have explicitly been shown to be effective in the weak lensing setting (Jeffrey et al., 2018; Lanusse et al., 2016; Peel et al., 2017b; Price et al., 2018a).
Using these terms, the minimization of the log-posterior becomes
which is then solved iteratively by implementing a proximal forward-backward splitting algorithm (e.g. Combettes & Pesquet, 2009).
Note that one can choose any convex log-priors e.g. an -norm prior from which one essentially recovers Weiner filtering (see Padmanabhan et al., 2003; Horowitz et al., 2018, for alternate iterative Weiner filtering approaches).
3.1 Bayesian credible regions
In Bayesian analysis a posterior credible region at confidence is a set which satisfies:
where is an indicator function defined such that,
There are in general a large number of posterior regions (hyper-volumes) which satisfy this integral. The decision-theoretically optimal region – in the sense of minimum enclosed volume – is called the highest posterior density (HPD) credible region (Robert, 2001) and is defined to be:
where is the log-prior term and is our data fidelity term (log-likelihood function). Here defines a level-set (i.e. isocontour) of the log-posterior set such that equation (21) is satisfied. In practice the true HPD credible region is difficult to compute due to the high dimensional integral in equation (21), motivating computationally efficient approximate techniques.
Recently a conservative approximation of the HPD credible set was proposed by Pereyra (2017) which exploits developments in probability concentration theory. The approximate HPD credible set is given by:
with approximate level-set threshold
where the bounding term which in turn is constrained to confidence . The error of this approximation is bounded above by
where . This upper-bound is typically conservative, meaning the isocontour is at all times larger than the true isocontour (i.e. this estimator will never produce an underestimate). In Price et al. (2018b) the bound on recovered local error bars was found to be to larger than the true MCMC – yet could be computed times faster. A similar comparison was done by Cai et al. (2017a) in a radio interferometric setting.
The concept of approximate HPD credible regions is particularly useful as it allows us to explore high dimensional posteriors – many orders of magnitude larger than state-of-the-art MCMC techniques are currently able to accomodate – in a computationally efficient manner.
4 Bayesian peak locations
Often one wishes to know the location of a feature of interest within the reconstructed convergence . Typically, this uncertainty is assessed via bootstrapping of the recovered image for a large number of simulated noise fields (as in e.g. Peel et al., 2017b).
With the concept of approximate HPD credible regions in mind, we propose a novel Bayesian approach to quantifying uncertainty in the peak location which we will refer to as the ‘Bayesian location’.
In essence the Bayesian location is computed as follows: A feature of interest is removed from the recovered convergence map, this feature is then inserted back into the convergence map at a new position to create a surrogate convergence map, if this surrogate map is within the approximate credible set then the position at which the feature was inserted cannot be rejected, if the surrogate is not in the approximate credible set then the position can be rejected. This process is computed for a sample of the total posible insertion positions, eventually providing an isocontour of ‘acceptable’ positions. This isocontour, at a well-defined confidence level, is the Bayesian location.
4.1 Bayesian Location
Suppose we recover a (MAP) convergence field via optimization of the objective function defined in equation (20) which contains a feature of interest (e.g. a large peak). Let us define the sub-set of pixels which contain this feature to be , where is the entire image domain.
To begin with, extract the feature , i.e. a convergence field which contains only the feature of interest. Now we adopt the process of segmentation inpainting (Cai et al., 2017b, a; Price et al., 2018a) to create a convergence field realization without the feature of interest but with background signal replaced.
Mathematically segmentation inpainting is represented by the iterations
where is an appropriately selected dictionary – for this purpose we simply use the Daubechies 8 (DB8) wavelet dictionary with 8-levels and is the soft-thresholding parameter.
Following the wavelet inpainting, in order to separate the true feature from the background residual convergence the signal which was inpainted into the region is subtracted from the extracted feature – effectively accounting for the residual background signal which would likely have been present even in the absence of the feature . At this junction the surrogate convergence is hypothesis tested for physicality (Cai et al., 2017a; Price et al., 2018a).
If a feature is not found to be physical, the algorithm terminates at this point as, fundamentally, it is illogical to evaluate the uncertainty in position of an object of which you cannot statistically determine the existence.
Now that we have successfully isolated we can insert it back into the surrogate field at a perturbed position. It is then sufficient to check if
where represents the surrogate with the feature inserted at a perturbed location.
If the inequality does hold, then the conclusion is that at confidence we cannot say that the feature could not be found at this location. If the equality does not hold then in its observed form could not have been found at the new location at confidence. The question then becomes, which perturbed positions are acceptable and which are not.
With the above in mind, we propose a typical inverse nested iterative scheme to determines the pixel-space isocontour for a given feature in the reconstructed convergence field. Schematically this iterative process is outlined in Figure 1. Essentially, inverse nesting is: start in a ring 1-pixel from the MAP peak location in the first iteration, moving the ring one pixel outwards after each iteration.
4.2 N-splitting Circular Bisection
Inverse nested iterations are sufficiently fast for low-dimensional reconstructions , however as the dimensionality of the reconstructed domain grows it becomes increasingly beneficial to adopt more advanced algorithms to compute the Bayesian location in an efficient manner.
We propose a novel iterative algorithm for computing the pixel-space position isocontour which we call N-splitting Circular Bisection (N-splitting), the full details of which can be found in appendix A. A brief outline of N-splitting is given below.
Suppose we wish to compute positions on the Bayesian location isocontour at equiangular intervals (defined clockwise about the peak location) relative to the -axis. Then we require sampling angles which are trivially given by,
where is an iterative factor which sets the angle for a given direction .
Once is defined for a single direction, the distance along direction such that the objective function saturates the level-set threshold is found by bisection. Mathematically, this is formally defined to be,
where is the sub-set of the real domain which lie on the line of infinite extent along a given direction sourced at the peak location, and is the surrogate convergence map constructed by inserting the feature of interest into a perturbed location .
Once a representative set of positions on the location isocontour are computed, the convex hull is taken – the convex hull is simply the smallest convex set which contains all samples of the location isocontour. The boundary of this closed convex set of acceptable pixels is taken as the Bayesian location.
5 Illustrative example of the Bayesian Location
In this section we perform sparse Bayesian reconstructions of a large cluster extracted from the Bolshoi N-body simulation (Klypin et al., 2011; Lanusse et al., 2016), upon which we construct and assess the performance of Bayesian locations for each of the four largest sub-halos. In line with previous work of Price et al. (2018a) and in the related article of Lanusse et al. (2016) we refer to this extracted cluster as Bolshoi-3.
We grid the Bolshoi convergence field onto a discretized complex map of dimension so as to demonstrate that the sparse Bayesian approach can construct Bayesian estimators efficiently even when the dimension of the problem is of or larger – dimensions at which MCMC techniques can become highly computationally challenging.
First, we construct a complex discretized set of artificial shear measurements by,
where is the input Bolshoi-3 convergence map. We then contaminate these mock measurements with noise , which for simplicity we select to be i.i.d. Gaussian noise of zero mean and variance . The variance is selected such that the SNR of the noisy artificial shear maps can be varied, and is therefore set to be,
The MAP convergence field is recovered via the sparse Bayesian mass-mapping algorithm using DB10 wavelets (10-levels), and the Bayesian location for the set of 4 peaks is constructed. For a detailed discussion of how noise levels in dB translate to physical quantities such as galaxy number density see Price et al. (2018a).
5.2 Analysis and computational efficiency
To demonstrate this uncertainty quantification technique we construct confidence Bayesian locations for the 4 largest sub-halos in the Bolshoi-3 cluster, for input SNR in decibels (dB) of .
In Figures 2 and 3 it is apparent that, as expected, the positional uncertainty isocontour at confidence is smaller for less noisy data, growing in proportion to the noise. In our analysis 32 N-splitting directions (pointings) were used, though as can be seen in Figures 2 and 3 as few as 16 directions would easily have been sufficient given the smoothness of the isocontour.
Computationally, reconstruction of the MAP convergence field and computation of the Bayesian location for the complete set of peaks took minutes on a standard 2016 MacBook Air. Notably, this is a conservative (Pereyra, 2017) and tight (Price et al., 2018b) approximate Bayesian inference in an over -dimensional space on a personal laptop in minutes. Further to this, the sparse Bayesian algorithmic structure can be easily parallelizable and so this computational efficiency can be optimizerd further.
6 Aggregate uncertainty in Peak Counts
Building on the notion of an approximate HPD credible region presented in section 3.1 we now ask the question: given a reconstructed convergence field , and at a given SNR threshold , what is the maximum and minimum peak count at confidence.
In this article we choose to define a peak in by a pixel which is larger than the 8 pixels which surround it (Lin, 2016). A point of the peak statistic is computed as follows: A threshold is taken on , and the peak count (number of peaks which have intensity larger than ) is taken on the sub-set of pixels larger than the threshold.
Formally we define the excursion set as,
where is the complete set of recovered pixels. Define a further sub-set as the set of peaks in :
where represents the set of immediately surrounding pixels.
Note that this definition is not valid for pixels on the boundary of the field, and so these pixels are necessarily not considered. This not only excludes the outer edge of but also any pixels surrounding masked regions (of which there are typically many).
Conceptually, we would then like to know at a given threshold what is the maximum and minimum number of peaks which could exist such that the surrogate solution still belongs to the approximate HPD credible set .
Let be the upper bound on the number of peaks, and be the lower bound on the number of peaks, for a given threshold , at confidence. Further let be the number of peaks calculated from the MAP solution at threshold . Formally these quantities are given by,
where is the cardinality of the peak set of a convergence field .
It is not all obvious how to locate the extremum of optimization problems given in equations (38) and (39) as they are inherently non-linear, non-convex problems. We can, however, propose a logical iterative approach to calculate well motivated approximations to the upper and lower peak count limits and .
6.1 Approximate Bayesian Lower Bound on Peak Counts
It is perhaps conceptually more straightforward to minimize the cardinality of the peak set and so we will first describe this process.
To calculate an approximate bound on we begin by creating the initial peak set from the recovered convergence . The peak in with lowest magnitude is located. The shortest distance from the pixel location to a pixel such that (where is some magnitude at which it is assumed the peaks influence is sufficiently small) is computed in Euclidean space as – within this paper we simply set .
Let us define the region of interest to be a circular aperture centered on the peak pixel location with radius . Conceptually, this acts as a proxy for the area effected by a large over-density sourced at the location of the peak.
Now, define a down-scaling kernel which has the action of scaling the magnitude of the sub-set of pixels belonging to the region of interest onto . Application of the down-scaling operator returns a surrogate solution . Mathematically this is,
Application of to an isolated region conserves the local topology of the field – which is precisely what we want as it means we are making no assumptions about the halo profile around a peak. Removing a peak by application of creates a surrogate solution which is likely to minimize the increase in the objective function.
As such is a good strategy for excluding peaks from as it will likely maximize the number of peaks which can be removed from before the level-set threshold is saturated. Thus, it will likely be near decision-theoretically optimal at minimizing equation (39), which is precisely what we want.
A schematic of the iterative process proposed to find the Bayesian lower bound on the peak statistic can be seen in Figure 4. In words, the process is as follows. Within each iteration, the lowest intensity peak within the peak set is removed forming a new surrogate convergence field , the objective function is recalculated and if the objective function is below the approximate level-set threshold then the lowest peak within is now removed, so on and so forth until the objective function rises above , at which the iterations are terminated and the minimum number of peaks is recovered.
6.2 Approximate Bayesian Upper Bound on Peak Counts
Now we invert our perspective in order to approximate the maximum number of peaks which could be observed at a given threshold at confidence. Here we will be considering the non-linear maximization problem constructed in equation (38).
First, we introduce the notion of the inclusion set , conjugate to such that and ,
With this in mind, we can now cast the maximization problem into a minimization problem analogous to that used before.
We now wish to minimize the number of peaks that belong to the inclusion set which is by definition equivalent to maximizing the number of peaks which belong to the excursion set – which is precisely what we want.
i.e. the sub-set of peaks below a threshold .
In contrast to section 6.1 we now locate the largest peak in . Suppose that this peak is found at , we now construct a circular aperture about with radius as defined before. Let this circular aperture set of pixels be .
Now we define an up-scaling kernel which has action,
which is very slightly different to the down-scaling operator in the numerator of the second term. Here is an infinitesimal quantity added such that the re-scaled peak within falls infinitesimally above the threshold and is therefore counted as a peak. In practice we set to be and find that adjusting this quantity by has negligible effect on the recovered uncertainties.
Finally we return the tuple which is in the form minimum, most likely, maximum at confidence.
6.3 Limitations of Re-scaling
Suppose the SNR threshold is large enough such that during iterations in schematic of Figure 4 the cardinality of the excursion peak set . In this situation even though the approximate level-set threshold is not saturated, the algorithm is forced to stop as there are simply no more peaks to exclude (push down). At this point the strategy for removing peaks becomes locally ill-defined. Effectively this is a clipping artifact. To avoid this effect entirely, if at any point within the iterations at a given threshold, the lower bound at threshold is set to , i.e. we are infinitely uncertain by construction.
Analogously, consider the case when is small enough that during the iterations in schematic 5 the cardinality of the inclusion peak set . In this situation there are simply no more peaks to include (pull up). Again we remove this clipping effect by setting at threshold is set to .
Typically these clipping effects only occur for very small or very large thresholds, and so a wealth of information can be extracted from the intervening scales. Low thresholds clip the upper limit as the cardinality of the peak set drops to 0 quickly, but the objective function rises comparatively slowly, as this SNR range is statistically dominated by noise. High threshold clip the lower limit simply due to the inherently low count of peaks at high SNR thresholds.
Further to this, the decision-theory approach adopted here for locating the maximal and minimal values of the cumulative peak statistic is based on several assumption: removing lower peaks increases the objective function by less than larger peaks; the extent of a peak (dark matter over-density) is approximated by a circular aperture; and removal of a peak has little to no effect on locations in the image domain which are outside of this aperture. All three of these assumptions are very reasonable.
Although further computational optimizations are not an immediate concern since our approach is already highly computationally efficient, we acknowledge that this iterative approach for removing peaks can easily be formulated as a bisection style problem which is likely to drastically reduce the computation time further – particularly for low thresholds, as it mitigates the number of trivial noise peak removal recalculations which are done in the brute force approach presented above. In future, should computational efficiency become of primary interest this speed up will be considered.
7 Illustrative example of Peak Uncertainties
In this section we apply the sparse Bayesian mass-mapping pipe-line to high resolution convergence maps extracted from the Buzzard V-1.6 N-body111Obtained due to our affiliation with the LSST-DESC collaboration. simulation, upon which we construct the cumulative peak statistic (number of peaks above a threshold as a function of the threshold). Additionally, we recover the approximate Bayesian constraints on the peak count at each threshold, from which we infer the constraint so as to aid the reader in comparison to typical error-bars quoted in related literature.
7.1 Simulated Data-sets
The Buzzard V-1.6 N-body simulation convergence catalog (DeRose et al., 2018; Wechsler, 2018) is extracted by full ray-tracing with the origin located at the box corner – leading to a quarter sky coverage. For wide-fields the flat sky approximation breaks down (Wallis et al., 2017) and so this quarter sky coverage was reduced to smaller planar patches.
The complete quarter sky convergence catalog was projected into a coarse HEALPix222http://healpix.sourceforge.net/documentation.php(Gorski et al., 2005) pixelisation (). Inside of each pixel, we further tessellated the largest square region which we then project into a grid. These gridded convergence maps formed our ground truth, discretized convergence fields.
As HEALPix samples in such a way as to provide equal area pixels, and the Buzzard simulation galaxy density is fairly uniform, each extracted square region contained galaxies leading to galaxies per pixel.
Due to a comparatively low density of samples, Poisson noise is prevalent, and thus extracted planar regions were passed through a multi-scale Poisson denoising algorithm. This consisted of a forward Anscombe transform (in order to Gaussianise the Poisson noise), several TV-norm (total-variation) denoising optimizations of differing scale, followed by an inverse Anscombe transform (as in Price et al., 2018b; Lanusse et al., 2016). A more involved treatment could be applied, but this approach is sufficient to demonstrate our peak reconstructions.
7.2 Application to Buzzard V-1.6
We select at random one of many planar patches produced for the following application. Following the methodology presented in section 5.1 we generate an artificial shear catalog which we then contaminate with independent and identically distributed (i.i.d.) Gaussian noise such that the SNR of mock shear measurements is 30 dB – i.e. an idealized noise-level simply for illustrative purposes.
The MAP convergence estimator is recovered from these noisy mock shear measurements via our sparse Bayesian mass-mapping framework. From we then calculate which we then use as a measure of the noise-level in the reconstructed convergence field. Implementing the uncertainty quantification technique presented in section 6 we then construct the cumulative peak statistic for SNR thresholds at increments of with upper and lower approximate Bayesian confidence limits.
Figure 7 displays the recovered cumulative peak statistic in both a linear and logarithmic scale. Typically, similar figures in the literature will quote error-bars, and so for comparisons sake we convert the Bayesian confidence limits into the confidence limits which are comparable to constraints ( in Figure 7 we provide both confidence limits for clarity).
Complete reconstruction of the peak statistics for 24 threshold bins, each with approximate Bayesian upper and lower bounds, for a resolution convergence map, with DB11 (11-level) wavelets, took hours on a 2016 MacBook Air. This is a non-trivial Bayesian inference in over dimensions, and so hours is a very reasonable computation time – though further speedups are possible, e.g. we can trivially parallelize the calculations for each threshold leading to an increase in computational efficiency by a factor of the number of thresholds taken (in our case ).
Additionally, the computational bottleneck is for lower thresholds as many low-intensity peaks must be removed, and thus an adaptive scheme could be implemented as discussed previously to avoid unnecessary sampling of these thresholds. With the aforementioned speed-ups, computation of the complete peak statistic is likely to take on a personal laptop.
Following this initial analysis we reduce the SNR to investigate the effect of increased noise on shear measurements to the cumulative peak statistics within our Bayesian framework. We first decrease the SNR to 25 dB, seen in Figure 8. Following which, we then reduce the input SNR futher to 20 dB, the corresponding results being plotting in Figure 9. This higher noise level of 20 dB is still a very optimistic (somewhat unrealistic) estimate of what upcoming surveys may reach; however in this paper we are primarily focused on demonstrating the methodology and leave detailed realistic simulations and forecasting for future work. A detailed description of how these noise levels in dB translate into observation contraints (e.g. galaxy number density e.t.c.) can be found in (Price et al., 2018a).
7.3 Analysis of Peak statistic
Figures 7, 8 and 9 clearly show that as the noise level in the discretized complex shear field increases the isocontours of the cumulative peak statistic at and loosen noticeably. Therefore this, unsurprisingly, indicates that cleaner measurements are likely to give tighter constraints on cosmological parameters – though it should be noted that increasing the number of data-points (i.e. pixels) would have a similar effect to reducing the noise level per pixel.
For an SNR of 20 dB (Figure 9) the first feature of note is the shaded blue region which indicates that for high thresholds the lower bound on the number of peaks at confidence is consistent (and clipped) at 0 – this is saying that at confidence the true number of peaks at a threshold in the blue shaded region could be 0. Note that in the blue region the Bayesian upper bound is still entirely valid, it is only the Bayesian lower bound which within our novel approach is no longer well defined.
Clearly the upper and lower bounds on the peak count statistic is dependent on the threshold one is considering and the total area over which observations are made – for wide-field surveys, more data is collected which is likely to reduce the variance of the statistic. In a general sense we summarize the mean (over all considered thresholds ) order of magnitude percentage spread on the peak statistic for the considered SNR thresholds below.
At input SNR of 20 dB, for thresholds on a single planar patch the upper and lower bounds exist and are of at confidence and of at .
At input SNR of 25 dB, for thresholds on a single planar patch the upper and lower bounds exist and are of at confidence and of at .
At input SNR of 30 dB, for thresholds on a single planar patch the upper and lower bounds exist and are of at confidence and of at .
These illustrative examples imply that for the Bayesian peak statistic to tightly constrain the cumulative peak statistic comparitively larger and or cleaner data-sets may be required – or, of course, a more informative prior (though this must be well justified). However, to reduce the shot noise introduced via intrinsic ellipticities more galaxies must be observed within a given pixel.
One way to increase this is to simply increase the observed number density of galaxy observations, however to do so one must observe galaxies at lower magnitude (for a fixed redshift), which inherently leads to more bright distant galaxies being detected which results in galaxy blending. Hence, increasing the number density significantly above gals/arcmin is typically quite difficult in practice.
Alternatively, the pixelisation can be adjusted to ensure that the mean galaxy count per pixel is above a given threshold – though for weak lensing the majority of non-Gaussian information is stored at fine-scales, which require small pixels, and so using larger pixels to reduce the noise level is sub-optimal for information extraction.
Within the definition of the up and down-scaling kernels (see sections 6.1 and 6.2) we define a circular aperture around a selected peak which we define to be the extent of the peak. These regions are roughly equivalent to super-pixel regions as described in Cai et al. (2017a). In previous work it was shown (Price et al., 2018b) that for local credible intervals (c.f. pixel level error bars) the typical error in the approximate HPD credible region is of , and is conservative – note that the quoted mean RMSE error is split approximately equally between the upper and lower bounds, therefore this roughly corresponds to an mean error of on both. Therefore the bounds drawn on the peak static here are likely to be less tight than the true Bayesian bounds – which could be formed if one were to reconstruct the dimensional posterior via MCMC.
In this paper (particularly the second half) we are primarily concerned with demonstrating how one may recover principled uncertainties on aggregate statistics of the convergence map – such as, but not limited to, the peak statistics. Hence we do not provide detailed analysis of how these Bayesian uncertainties may effect cosmological constraints derived from such statistics – this is saved for future work. However it is worth mentioning that one could either; leverage these uncertainties to define the data covariance in a Bayesian manner (as opposed to MC which is fast but may not necessarily be fully principled, or MCMC which is times slower than our MAP approach) before then running a standard likelihood analysis ; or perform a grid search in parameter space using these uncertainties again as the data covariance. Correctly accounting the uncertainties introduced during mass-mapping has been shown to be an important consideration for the future prospects of statistics such as this (Lin & Kilbinger, 2018).
Using the sparse Bayesian mass-mapping framework previously developed (Price et al., 2018a, b) we have presented two novel Bayesian uncertainty quantification techniques which can be performed directly on weak lensing convergence maps.
The first of these techniques recovers the uncertainty in the location of a feature of interest within a reconstructed convergence map – e.g. a large peak – at some well defined confidence. We call this locational uncertainty the ‘Bayesian location’.
Additionally, for computational efficiency we develop a novel sampling scheme of the position isocontour of a given feature which we call ‘N-splitting circular bisection’. We find that sampling the position isocontour in this way could be many orders of magnitude faster in high dimensions than typical inverse nesting approaches.
To evaluate this technique, we perform sparse Bayesian reconstructions of convergence maps extracted from Bolshoi N-body simulation datasets upon which we compute the Bayesian location of the four largest sub-halos for a range of noise-levels.
The second of theses techniques quantifies the uncertainty in the cumulative peak statistic of a recovered convergence map. With this technique we can for the first time provide principled Bayesian lower and upper bounds on the number of observed peaks at a given signal to noise threshold, for a single observation, at well defined confidence.
We extract convergence maps from the Buzzard V-1.6 N-body simulation, upon which we calculate the cumulative peak statistic with Bayesian upper and lower bounds at for a range of input noise-levels. We also provide the confidence bounds which we infer from the bounds to aid comparison to typical bootstrapping (MC) approaches.
For upcoming wide-field surveys convergence reconstruction will likely be done natively on the sphere (a single collective sample) to avoid projection effects, making bootstrapping approaches difficult and at worst infeasible due to the fact that they are only asymptotically exact.
Bayesian approaches require only a single set of observations to make exact inferences, and so extend trivially to the more complex spherical setting. Moreover the novel uncertainty quantification techniques presented in this paper and those presented previously in Price et al. (2018a, b); Cai et al. (2017a) can be rapidly computed and support algorithmic structure which can be highly parallelized, making them the ideal tools for principled analysis of convergence maps.
Author contributions are summarised as follows. MAP: conceptualisation, methodology, data curation, investigation, software, visualisation, writing - original draft; JDM: conceptualisation, methodology, project administration, supervision, writing - review & editing; XC: methodology, investigation, writing - review & editing; TDK: methodology, supervision, writing - review & editing.
This paper has undergone internal review in the LSST Dark Energy Science Collaboration. The internal reviewers were Chihway Chang, Tim Eifler, and François Lanusse. MAP is supported by the Science and Technology Facilities Council (STFC). TDK is supported by a Royal Society University Research Fellowship (URF). This work was also supported by the Engineering and Physical Sciences Research Council (EPSRC) through grant EP/M0110891 and by the Leverhulme Trust. The DESC acknowledges ongoing support from the Institut National de Physique Nucléaire et de Physique des Particules in France; the Science & Technology Facilities Council in the United Kingdom; and the Department of Energy, the National Science Foundation, and the LSST Corporation in the United States. DESC uses resources of the IN2P3 Computing Center (CC-IN2P3–Lyon/Villeurbanne - France) funded by the Centre National de la Recherche Scientifique; the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231; STFC DiRAC HPC Facilities, funded by UK BIS National E-infrastructure capital grants; and the UK particle physics grid, supported by the GridPP Collaboration. This work was performed in part under DOE Contract DE-AC02-76SF00515.
- Alsing et al. (2016) Alsing J., Heavens A., Jaffe A. H., Kiessling A., Wandelt B., Hoffmann T., 2016, MNRAS, 455, 4452
- Bartelmann & Schneider (2001) Bartelmann M., Schneider P., 2001, Phys. Rep., 340, 291
- Bunn et al. (2003) Bunn E. F., Zaldarriaga M., Tegmark M., de Oliveira-Costa A., 2003, Phys. Rev. D, 67, 023501
- Cai et al. (2017b) Cai X., Pereyra M., McEwen J. D., MNRAS in press, 2017b, preprint, (arXiv:1711.04818)
- Cai et al. (2017a) Cai X., Pereyra M., McEwen J. D., MNRAS in press, 2017a, preprint, (arXiv:1711.04819)
- Chang et al. (2018) Chang C., et al., 2018, MNRAS, 475, 3165
- Clowe et al. (2006) Clowe D., Brada M., Gonzalez A. H., Markevitch M., Randall S. W., Jones C., Zaritsky D., 2006, ApJ, 648, L109
- Coles & Chiang (2000) Coles P., Chiang L.-Y., 2000, Nature, 406, 376
- Combettes & Pesquet (2009) Combettes P. L., Pesquet J.-C., 2009, preprint, (arXiv:0912.3522)
- Corless et al. (2009) Corless V. L., King L. J., Clowe D., 2009, MNRAS, 393, 1235
- DeRose et al. (2018) DeRose J., Wechsler R., Rykoff E., 2018, in prep
- Durmus et al. (2018) Durmus A., Moulines E., Pereyra M., 2018, SIAM Journal on Imaging Sciences, 11, 473
- Fluri et al. (2018) Fluri J., Kacprzak T., Sgier R., Réfrégier A., Amara A., 2018, preprint, (arXiv:1803.08461)
- Gorski et al. (2005) Gorski K. M., Hivon E., Banday A., Wandelt B. D., Hansen F. K., Reinecke M., Bartelmann M., 2005, The Astrophysical Journal, 622, 759
- Grimm & Yoo (2018) Grimm N., Yoo J., 2018, preprint, (arXiv:1806.00017)
- Heavens (2009) Heavens A., 2009, Nuclear Physics B Proceedings Supplements, 194, 76
- Horowitz et al. (2018) Horowitz B., Seljak U., Aslanyan G., 2018, arXiv preprint arXiv:1810.00503
- Jee et al. (2016) Jee M. J., Dawson W. A., Stroe A., Wittman D., van Weeren R. J., Braggen M., Brada M., Rattgering H., 2016, The Astrophysical Journal, 817, 179
- Jeffrey et al. (2018) Jeffrey N., et al., 2018, preprint, (arXiv:1801.08945)
- Kaiser & Squires (1993) Kaiser N., Squires G., 1993, ApJ, 404, 441
- Kilbinger (2015) Kilbinger M., 2015, Reports on Progress in Physics, 78, 086901
- Klypin et al. (2011) Klypin A. A., Trujillo-Gomez S., Primack J., 2011, ApJ, 740, 102
- Lanusse et al. (2016) Lanusse F., Starck J.-L., Leonard A., Pires S., 2016, A&A, 591, A2
- Lin (2016) Lin C.-A., 2016, preprint, (arXiv:1612.04041)
- Lin & Kilbinger (2018) Lin C.-A., Kilbinger M., 2018, A&A, 614, A36
- Mediavilla et al. (2016) Mediavilla E., Muñoz J. A., Garzón F., Mahoney T. J., 2016, Astrophysical Applications of Gravitational Lensing
- Munshi & Coles (2017) Munshi D., Coles P., 2017, J. Cosmology Astropart. Phys., 2, 010
- Munshi et al. (2008) Munshi D., Valageas P., van Waerbeke L., Heavens A., 2008, Phys. Rep., 462, 67
- Padmanabhan et al. (2003) Padmanabhan N., Seljak U., Pen U. L., 2003, New Astron., 8, 581
- Peel et al. (2017a) Peel A., Lin C.-A., Lanusse F., Leonard A., Starck J.-L., Kilbinger M., 2017a, A&A, 599, A79
- Peel et al. (2017b) Peel A., Lanusse F., Starck J.-L., 2017b, ApJ, 847, 23
- Peel et al. (2018) Peel A., Pettorino V., Giocoli C., Starck J.-L., Baldi M., 2018, preprint, (arXiv:1805.05146)
- Pereyra (2013) Pereyra M., 2013, preprint, (arXiv:1306.0187)
- Pereyra (2017) Pereyra M., 2017, SIAM Journal on Imaging Sciences, 10, 285
- Pereyra et al. (2015) Pereyra M., Bioucas-Dias J., Figueiredo M., 2015, Maximum-a-posteriori estimation with unknown regularisation parameters. pp 230–234, doi:10.1109/EUSIPCO.2015.7362379
- Pires et al. (2010) Pires S., Starck J.-L., Amara A., Réfrégier A., Teyssier R., 2010, in Alimi J.-M., Fuözfa A., eds, American Institute of Physics Conference Series Vol. 1241, American Institute of Physics Conference Series. pp 1118–1127 (arXiv:0904.2995), doi:10.1063/1.3462608
- Planck Collaboration et al. (2018) Planck Collaboration et al., 2018, preprint, (arXiv:1807.06209)
- Price et al. (2018a) Price M. A., Cai X., McEwen J. D., Kitching T. D., Wallis C. G. R., 2018a, submitted to MNRAS
- Price et al. (2018b) Price M. A., Cai X., McEwen J. D., Pereyra M., Kitching T. D., 2018b, submitted to MNRAS
- Robert (2001) Robert C.-P., 2001, The Bayesian Choice, doi:https://doi.org/10.1007/0-387-71599-1.
- Schneider (2005) Schneider P., 2005, ArXiv Astrophysics e-prints,
- Schneider et al. (2015) Schneider M. D., Hogg D. W., Marshall P. J., Dawson W. A., Meyers J., Bard D. J., Lang D., 2015, ApJ, 807, 87
- Taylor et al. (2018) Taylor P. L., Kitching T. D., McEwen J. D., Tram T., 2018, preprint, (arXiv:1804.03668)
- Trotta (2017) Trotta R., 2017, preprint, (arXiv:1701.01467)
- VanderPlas et al. (2011) VanderPlas J. T., Connolly A. J., Jain B., Jarvis M., 2011, ApJ, 727, 118
- Wallis et al. (2017) Wallis C. G. R., McEwen J. D., Kitching T. D., Leistedt B., Plouviez A., 2017, preprint, (arXiv:1703.09233)
- Wechsler (2018) Wechsler R., 2018, in prep
Appendix A N-splitting Circular Bisection Details
In this appendix we consider the N-splitting Circular Bisection (N-splitting) algorithm for iteratively sampling the Bayesian confidence isocontour of the position of a feature in a reconstructed convergence map – or the Bayesian Location at confidence.
As in the text, we begin by defining the number of directions to sample from which we then form the angular increment . Starting from a vector oriented along the positive -axis define the pointing to be the vector,
and where is rotation by angle clockwise on 2D Euclidean space – a irreducible representation of which is the standard clockwise rotation matrix,
Now we know the direction along which we wish to sample we construct the bisection problem which is
where is a surrogate convergence map with the feature of interest inserted into perturbed location and is sub-set of the real domain which lie on the directional line centered at the original peak location with unit vector i.e.
A pictoral representation of how the problem is set up is provided in Figure 10.
For bisection we must first make an initial guess which we define to be square root of the number of pixels contained within the mask, as this is a typical measure of the length of a masked region. This choice is particularly logical as, if a feature of interest can be removed entirely from its masked location without saturating the level-set threshold then it by definition must be inconclusive, i.e. the data is insufficient evidence to say that the peak is physical.
To optimize the convergence of this algorithm further (for high sampling rates, low angular increments ) we also propagate information between pointing’s. For bisection problems associated with pointing the initial guess is now set to be twice the previous optimal value . This increases the computational efficiency by , in most cases.
Propagating information in this way relies on the assumption that the isocontour we are searching for is somewhat smooth and continuous, which is the case for most convergence reconstructions. If there is uncertainty as to the smoothness of the isocontour it is recommended that information is not propagated and the number of pointings is increased to correctly map the isocontour structure.
a.1 Convergence Properties
Standard inverse nesting algorithms iteratively sample the entire sub-space of the reconstructed domain bounded by the isocontour at confidence, making them inefficient when one is only interested in the boundary.
Consider the case where the isocontour of a reconstructed convergence map is a circular region of radius . Here inverse nesting will have to sample a square region out to , which is to say the total number of samples will at least be , where 1 is removed for the central location.
For our N-splitting algorithm we define pointings, and assume that the isocontour is relatively smooth. As the first bisection problem makes a large first guess it typically takes iterations to converge with a single pixel accuracy. The subsequent bisection problems converge within iterations. Therefore the total number of calculations is conservatively,
which is essentially independent from . There is in fact a small inverse dependence which is incorporated in the number of iterations needed for convergence, though this dependence is found to be small.
Comparing the computational efficiency of the two algorithms where,
Typically, we find an angular separation between pointings of (i.e. 16 pointings) is sufficient to accurately recover the isocontour. Additionally, the circular radius is typically pixels which indicates that,
i.e. N-splitting circular bisection on dimensional reconstructions is times faster than inverse nesting.
However, in the future we will be interested in recovering high dimensional convergence maps. In this setting the number of iterations for N-splitting to converge is assumed to change by 1-2, and the number of pointings to faithfully recover the isocontour will be increase by a factor of . Additionally, the radius of the circle increases by a factor of 4. Thus,
i.e. the conservative increase in computational efficiency of N-splitting over inverse nesting for becomes a factor of .
Further optimizations are possible, such as trivially parallelizing the bisection problems of each pointing. Doing so removes the scaling with the number of pointings, but now information about starting positions cannot be propagated.