Metacalibration: Direct SelfCalibration of Biases in Shear Measurement
Abstract
One of the primary limiting sources of systematic uncertainty in forthcoming weak lensing measurements is systematic uncertainty in the quantitative relationship between the distortions due to gravitational lensing and the measurable properties of galaxy images. We present a statistically principled, general solution to this problem. Our technique infers multiplicative shear calibration parameters by modifying the actual survey data to simulate the effects of a known shear. It can be applied to any shear estimation method based on weighted averages of galaxy shape measurements, which includes all methods used to date for shear estimation with real data. Use of the real images mitigates uncertainty due to unknown galaxy morphology, which is a serious concern for calibration of shear estimates based on image simulations. We test our results on simulated images from the GREAT3 challenge, and show that the method eliminates calibration biases for several different shape measurement techniques at the level of precision measurable with the GREAT3 simulations (a few tenths of a percent).
Subject headings:
cosmology: observations — gravitational lensing: weak — methods: observational1. Introduction
Accurate measurement of weak gravitational lensing offers the most direct probe of the dark sector of the universe (e.g., Bartelmann & Schneider 2001; Refregier 2003; Schneider 2006; Hoekstra & Jain 2008; Massey et al. 2010; Weinberg et al. 2013). Weak lensing measurements are thus a core part of the international cosmology program, and a key science driver for several widefield astronomical imaging cameras and associated surveys – the Kilo Degree Survey^{1}^{1}1http://kids.strw.leidenuniv.nl (de Jong et al. 2015), the Dark Energy Survey (Flaugher 2005), the Hyper SuprimeCam and associated survey^{2}^{2}2http://hsc.mtk.nao.ac.jp/ssp/ (Miyazaki et al. 2012), LSST^{3}^{3}3http://www.lsst.org/lsst/ (LSST Science Collaborations & LSST Project 2009), Euclid^{4}^{4}4http://sci.esa.int/euclid/http://www.euclidec.org (Laureijs et al. 2011), and WFIRST^{5}^{5}5http://wfirst.gsfc.nasa.gov (Spergel et al. 2015).
Despite this investment, the weak lensing community has more work to do in order to ensure that the algorithms for inferring shear are unbiased at the required levels to avoid systematic errors from dominating over the statistical errors. One of the largest such systematic error sources is the shear calibration bias, the quantitative relationship between the true gravitational lensing shear and its observables as estimated from the ensemble of galaxies in the survey.
In the weak shear limit that is most relevant for widefield cosmology, the gravitational lensing signal can be described as a linear transformation between the lensed and unlensed image coordinates, parameterized by two shears and a convergence
(1) 
. The major observable effect of weak lensing is to perturb the measured ellipticities of galaxies. At large separations, these shapes no know preferred direction, so the mean should vanish over a wide enough field. Weak lensing studies exploit this intrinsic symmetry, and search for spatially coherent anisotropies in the ensemble of observed galaxy shapes arising from lensing distortions produced by foreground matter.
The effects of the shear and convergence on this observable cannot be straightforwardly distinguished, so the fundamental quantity constrained by lensing is the reduced shear
(2) 
The responses of individual galaxy images to vary depending on the choice of ellipticity measure and the intrinsic shape and orientation of each galaxy. Lensing studies rely on ensemble averages of galaxy ellipticities, and the shears are weak enough that these ensemble averages usually respond linearly to an applied (reduced) shear, so it is conventional to define the multiplicative shear calibration and additive bias parameters as
(3) 
where and are ensembleaveraged shears and ellipticity measures, respectively. Generally, is a result of measurement biases (such as an incomplete correction for the pointspread function) that introduce a preferred direction in the image plane. It can in principle be known or removed with sufficient knowledge of the experiment. depends in part on the ensemble of (unobserved) galaxy properties, so it is impossible in principle to know exactly a priori (though Bernstein & Armstrong 2014 show how to derive this information for their proposed shear estimator, which does not make use of an ensemble average over ellipticities, from deeper calibration fields).
In practice, a nonlinear response generically introduced by the algorithms used for measurement of can introduce both multiplicative and additive biases in a manner that interacts with the unknown true ensemble properties of galaxies (Massey et al. 2007b; Zhang & Komatsu 2011), and are very difficult to predict from first principles. For this reason, the weak lensing community has organized a series of blind measurement challenges, where participants attempted to extract an unkown lensing signal from simulated images. The earliest of these were the first two Shear TEsting Programmes (Heymans et al. 2006; Massey et al. 2007a, STEP1, STEP2). The results made two things clear: that lensing measurement algorithms needed to improve in order to avoid being systematicsdominated, and that shear measurement was sufficiently complex that successive simulation challenges should focus on a subset of the issues.
The next round of simulation challenges – GREAT08, GREAT10, and GREAT3 (Bridle et al. 2009; Kitching et al. 2013; Mandelbaum et al. 2015) – embraced a narrower focus and saw significant performance improvements. They also drove improvements in our understanding of various sources of bias in shear estimation, which is of use in future algorithmic development. The bestperforming algorithms from the most recent challenge, GREAT3, reduced and to levels approaching those needed for the most ambitious planned lensing measurements, albeit with simulations that did not include all the features of real data.
While this was certainly good news, the narrowed focus of the GREAT challenges necessarily left some of the most important sources of lensing calibration bias untouched. Remaining issues of significant concern include biases resulting from:

object detection and selection

deblending

wavelengthdependent effects

instrumental defects and nonlinearities

stargalaxy separation

nonwhite pixel noise

cosmic rays and other image artifacts

redshiftdependent calibration biases

shear estimation for lowresolution and/or low signaltonoise ratio () galaxies
The impact of these factors depends strongly on the specifics of the experiment. For this reason, shear calibration in current and future experiments relies heavily on simulations designed to match the properties of each experiment (Hildebrandt et al. 2016; Jarvis et al. 2016). Such external simulations are always limited in their realism: accurately modeling everything relevant about the experiment turns out to be extremely difficult. Showing that a given simulation suite is adequate for calibrating a lensing measurement is a formidable challenge in its own right (c.f. the Ultra Fast Image Generation simulations described in Bergé et al. 2013, or the calibration simulations used for the KiDS weak lensing cosmology in Fenech Conti et al. 2016).
The method outlined in this paper is motivated by the observation that introducing a synthetic shear signal into real data is much easier than building a realistic comprehensive firstprinciples simulation suite. While in practice the need for accurate simulations of the ensemble of galaxy images is sometimes met by relying on images from external deep fields like the Hubble Space Telescope’s COSMOS survey (Koekemoer et al. 2007; Scoville et al. 2007b, a), the deeper fields needed for calibration of future surveys like LSST and WFIRST may not be available in the volume necessary.
Perturbing the actual data automatically incorporates features present in real images (e.g., image artifacts, selection biases, unusual highredshift galaxy morphologies) that are otherwise difficult to accurately simulate. It enables the determination of how the real galaxy population in the data responds to a shear directly and empirically.
We have implemented this concept, which we call metacalibration, using the public GalSim (Rowe et al. 2015) image simulation package, and designed our algorithm to wrap an arbitrary external shear estimation module, provided that it functions by estimation of pergalaxy ellipticities and then estimates the ensemble shear through weighted averages. We test our technique on simulated GREAT3 image data, and find that it successfully calibrates several older shear estimation methods to a level of accuracy comparable to the bestperforming algorithms from the GREAT3 challenge. We also demonstrate that our algorithm can detrend additive biases resulting from incomplete pointspread function (PSF) corrections by introducing synthetic PSF ellipticity. We make our metacalibration scripts available for general use.
2. Method
There are three layers to the shear calibration method we propose here. The first is the generation of the modified images using a procedure similar to one proposed in Kaiser (2000). We use the GalSim package (Rowe et al. 2015) to modify real astronomical images by adding synthetic shear and PSF distortions of known amplitude. These modified images are counterfactuals; they are a model for what would have been observed under (nearly) the same image quality conditions, on the same galaxies, with a different shear. If the measurement process is repeated on the counterfactual images, the result gives an accurate estimate of the response of the galaxy population to a shear.
The second layer is the choice of ellipticity measure used to estimate perobject shapes. This step is the primary focus of most studies that address shear calibration biases. Here we are agnostic about the choice of measurement algorithm; as long as the algorithm is sufficiently wellbehaved (in a manner that we will describe in Sec. 2.2), the image manipulation step can be used to generate an accurate shear responsivity.
The final layer is the choice of averaging mechanism to estimate the response of the ensemble shear estimate to an applied shear. Noise properties of shape measurements can vary widely depending on the shape measurement method, which entails similar variation in the metacalibration estimates for shear responsivity. For the cases we describe below, an optimal strategy for ensemble averaging produces significant gains over more straightforward averaging schemes.
2.1. Generating a Counterfactual Image
Fortunately, for the weak shears under consideration in most cosmological survey applications, the relationship between the shear and the galaxy shapes (or related observables) is very close to linear, so accurate shear calibration requires only the first derivative of the galaxy properties with respect to the shear. What follows is a method for estimating this derivative directly from the images. Throughout we will assume that the observed image is equal to the unsmeared galaxy image convolved with some pointspread function (including the atmospheric seeing, the optical PSF, and the pixel response function) .
In an ideal world, we would calibrate our measurement algorithm by making measurements while varying the gravitational shear experienced by the preseeing image, constructing the counterfactual image :
(4) 
where is the shear operator that produces the reduced shear , as in e.g. Bernstein & Jarvis (2002). The shear sensitivity of the image would then be a straightforward numerical derivative of with respect to , and the shear sensitivity of an ellipticity measure can be calculated from measurements on multiple counterfactual images. We can even write down a formal procedure for producing from if we know :
(5) 
The convolutions become products in Fourier space, where we can write
(6) 
Noise in the original image generally has power at Fourier modes where is small or vanishing. The power in these modes will thus be formally large or infinite. Because of the shear operation, this power is not subsequently cancelled by multiplication by . We must choose a new PSF for the final convolution step to suppress this deconvolutionamplified noise.
If is monotonically decreasing with , this condition can be achieved without introducing additional PSF anisotropy by choosing
(7) 
This does not always work, however. If crosses zero (as in cases with a strongly undersampled PSF) the ratio of and the sheared, deconvolved image will still be formally large or infinite, as power from values beyond the zero crossing will be dragged by the shear operation into the region where the dilated PSF does not vanish.
Other, implementationspecific considerations may be important when choosing . When choosing a target PSF, it may prove convenient to design one which is wellsuited to the shear estimator in hand. We defer exploration of this topic to future work.
Our chosen procedure for producing a sheared counterfactual image is
(8) 
This procedure clearly requires a good model for , but so do all shear measurements. PSF model errors enter at the same order in measurements on the resulting image that they would in an unmodified image.
Once the counterfactual image with has been created, the galaxy detection and shear measurement pipeline should be rerun. This provides a measure of the shear sensitivity – not for the original image, but for an image with the PSF . This requires that the full measurement – not just the sensitivity analysis – be run on an additional counterfactual image , so that the numerical derivative is welldefined.
This procedure introduces anisotropic correlated noise, which can produce a systematic multiplicative shear bias. If the noise properties of the initial image are known, the noise anistropy can be removed with the addition of further anisotropic correlated noise (with power spectrum carefully chosen). As we describe below, we have not found noise isotropization to be a necessary step for the images that we used for testing. These have an effective limit of , and the mode of the distribution is . Concurrent work (Sheldon & Huff 2017) investigates the effects of the anisotropic correlated noise at lower signaltonoise ratios, and describe effective mitigation procedures.
Metacalibration can be used to mitigate other systematics as well. Even those measurement methods with the highest scores in the GREAT3 lensing challenge were unable to completely remove the effects of PSF ellipticity on the inferred shear. We can introduce an artifical PSF anisotropy by replace with a PSF containing the desired synthetic distortion. We show below that reconstructing images with added PSF ellipticity, rather than added shear, allows us to detrend some of the bias due to PSF anisotropy. A similar approach could be used to measure additive or multiplicative calibration biases arising from any effect – signal or systematic error – that can be simulated by perturbing the images as above.
2.2. Shape Measurement Algorithms
Accurate ensemble shears can only be derived through measurement of the counterfactual images described above if the shape measurement algorithm is sufficiently wellbehaved. Here, that entails the requirement that the quantity reported by the shape measurement algorithm be sufficiently linear in the underlying shear in the regime relevant for the measurement that the ensemble response is truly linear.
We test a variety of shape algorithms below that make use of differing definitions of ellipticity. As we are attempting to construct a shear calibration procedure that is agnostic about the choice of perobject shape measurement algorithm, and which only requires that we use a measured galaxy property with approximately linear sensitivity to shear (called a shape measure), we will use below to signify all of the shape measures discussed in this paper, regardless of their precise definition.
2.3. Ensemble Shear Inference
Counterfactual images can in principle be used to derive a perobject shear response for a modified version of the original image. However, the quantities of interest in Eq. 3 are ensemble responses. Hence it is necessary to run the shear inference step on the counterfactual images, in order to get the shear responsivity of the ensemble shear estimate.
The distribution of measured shear responses determines the nature of the ensemble inference procedure. The distribution of MetaCal responses, especially for shape measures that involve ratios of noisy quantities, can make simple averaging schemes problematic; powerlaw tails result in a very high variance and slow convergence of the mean response. We develop and implement a simple technique below which deals adequately with the large noise in the regaussianized moment responses. If the measurement algorithm is nonlinear, however, then our averaging scheme will require some additional correction, beyond what we develop here, for the resulting nonlinear ensemble response.
2.4. Algorithmic Limitations
The fundamental assumptions of the image processing steps are frequently violated in real data. The image manipulation step assumes that the image is linearly related to the true surface brightness on the sky. This is not valid for common image processing artifacts like cosmic rays, for saturated pixels, or when chargedeflection effects produce a fluxdependent PSF (e.g., Gruen et al. 2015).
The presumption of a single shear, with a single response factor, can also be problematic. The lensing signal varies with redshift along a single line of sight, so blended images of multiple galaxies at appreciably different redshifts involve at least two different shears, and the relationship of the metacalibrated response to the underlying shear field is not straightforward. Nevertheless, these issues are generic, and will if unaddressed will cause problems for any shear inference method; we require our images to be reliable representations of the sky, and that the shear field be in some sense singlevalued where it can be measured.
Linearity in the ensemble inference is also vital if the image processing algorithm only computes first derivatives of the image with respect to the shear, as in the implementation we describe here. More finite difference steps could in princple be used to calibrate a nonlinear shear response, but this significantly increases the noise and the computation cost of the method. Here we will perform enough finite difference steps to allow for an estimate of the linearity of each estimator.
3. Implementation
3.1. Image Modification
We use GalSim^{6}^{6}6https://github.com/GalSimdevelopers/GalSim to manipulate the images and to generate simulations for validation. For each galaxy, we create nine modified images: two for each of the two shear components, two for each of the two PSF ellipticity components, and one for the final measurement using the enlarged PSF (twosided derivatives with respect to shear and PSF ellipticity were found to be less noisy than onesided derivatives). We run the provided shape measurement pipeline on each of these images, and the results are used to construct a set of finite difference estimates of shear calibration and additive PSF biases.
This sort of image manipulation is trivial to carry out using GalSim; we rely on the rigorous testing of the image convolution, interpolation, and resampling algorithms that the development team performed to enable the GREAT3 shear testing simulations. From the perspective of numerical validation, the tests in section 9 of Rowe et al. (2015) illustrate that GalSim can accurately render sheared images of quite complex galaxy and PSF light profiles with its default interpolants and settings that control numerical accuracy.
For each galaxy and PSF postage stamp, we first create an InterpolatedImage object. This object is deconvolved by the PSF model (including the pixel response). For the shear finite differences, we apply a small shear (typically 1%) to the resulting deconvolved image. The original PSF is dilated by twice the shear distortion to produce , and then reconvolved with the sheared deconvolved image. This reconvolved, sheared image is then passed to the shape measurement routine, along with the image representation of the new, enlarged PSF . For the PSF sensitivity, we follow a similar procedure, but shear the dilated PSF image rather than the deconvolved galaxy image. Finally, we create a reconvolved image with no added shear but with the PSF , on which we perform the final shape measurement.
3.2. Shear Estimation Algorithms
Since the metacalibration method can in principle be used to calibrate shears from any shear estimation algorithm derived using an average of perobject shapes, we chose three easily available shear estimation methods, all of which are implemented in GalSim. Two of these methods are traditional shear estimation methods that have somewhat different assumptions but are both based on object moments. One method is not a standard shear estimation method at all: we use linear combinations of the directly observed second moments without any correction for the PSF. In principle, the information about how those respond to shear should be determined by metacalibration to correctly infer the shear. The difference in this case is that instead of providing a small correction to the outputs of a PSF correction method, we rely on metacalibration to do the entirety of the PSF correction, which is a very stringent test that may at least partially violate some of the assumptions about the ensemble average of the measured quantities having a linear response to shear. The three methods are described below.
3.2.1 Regaussianization
ReGaussianization (Hirata & Seljak 2003) is a PSF correction method based on the use of the moments of the image and of the PSF to correct for the effects of the PSF on the galaxy shapes. It includes corrections for the nonGaussianity of the galaxy profile (Bernstein & Jarvis 2002; Hirata & Seljak 2003) and of the PSF (to first order in the PSF nonGaussianity). The performance of this algorithm has been extensively studied in real data and simulations (e.g., Mandelbaum et al. 2005, 2012, 2013, 2015).
The outputs of the reGaussianization algorithm are PSFcorrected “distortions”, which for an object with purely elliptical isophotes with minortomajor axis ratio and position angle with respect to the axis in pixel coordinates are defined as
(9) 
As discussed in Bernstein & Jarvis (2002), the response of a distribution of galaxies with some intrinsic distribution of distortions to a shear depends on the itself. Conceptually, we can think of an ensemble shear estimator using reGaussianization outputs as
(10) 
where the denominator gives the response of the ensemble average distortion to a shear (often called the responsivity). Estimators of this shear responsivity use the observed galaxy and its moments, and for typical , the denominator is around – in terms of the percomponent RMS distortion. As this implementation was meant to be a simple and fast example, its intrinsic calibration correction is a simple one that does not include all known systematics.
3.2.2 Ksb
The KSB method (Kaiser et al. 1995) parametrises galaxies and stars according to their weighted quadrupole moments. The main assumption of the KSB method is that the PSF can be described as a small but highly anisotropic distortion convolved with a large circularly symmetric function. With that assumption, the shear can be recovered to firstorder from the observed ellipticity of each galaxy via
(11) 
where asterisks indicate quantities that should be measured from the PSF model at that galaxy position, is the smear polarisability (see Heymans et al. 2006 for definitions) and is the correction to the shear polarisability that includes the smearing with the isotropic component of the PSF. The ellipticities are constructed from weighted quadrupole moments, and the other quantities involve higher order moments. A circular Gaussian weight of scale length is used, where is galaxy size, as determined by the second moment of the surfacebrightness profile.
The KSB method returns a perobject estimate of the shears . We can use metacalibration to remove multiplicative and additive biases that come from averaging the perobject KSB shear estimates.
3.2.3 Linear Moments
As mentioned previously, the third method we use does not involve PSFcorrected galaxy shapes. Instead, we use linear combinations of the second moments of galaxy images. The motivation behind this choice is as follows. One way to estimate the distortion is via combinations of the second moments of the light profile,
(12) 
for ,
(13) 
for , and finally
(14) 
One source of noise (and noise bias) in traditional momentsbased methods is the division of two noisy quantities in Eq. 14, typically followed by further division by other noisy quantities to remove the dilution of the galaxy shape by the PSF. Thus, as a final example of a statistic that we will attempt to use as a calibrated shear estimator with metacalibration, we define the following linear combinations of moments:
(15) 
Clearly these moments are sensitive to a number of nuisance quantities, like the galaxy flux and size, and the PSF size and shape. In principle, metacalibration should be able to nonetheless determine the response of this statistic to shear, , and produce a reliable shear estimate from the ensembleaveraged values, provided that the linear model for the signal and dominant sources of systematic error is correct. This is a quite stringent test of the metacalibration method, as it is unclear whether that purely linear model will be valid in this case.
3.3. PerObject Responsivity
Shape measurements on the set of modified images can be used to derive noisy shear and PSF responsivities for individual galaxies. In the case where the measured ellipticity is thought to depend linearly on the lensing shear and the PSF ellipticity, the shapes measured from the sheared, reconvolved images with positive and negative applied shears, and , can be used for a straightforward finitedifference estimate of the perobject shear response
(16)  
(17) 
Additive biases introduced by the shape measurement can be extracted from the sum of these two quantities
(18) 
where the final on the righthand side is the ellipticity measured using the reconvolved PSF , not that measured from the unmodified original image.
If the shape measurement algorithm does not perfectly remove PSF ellipticity, then the shapes measured from shearing the PSF ( and ) permit calculation of at least the linearorder residual PSF ellipticity biases:
(19) 
The result of this process is a catalog of perobject shear and PSF responsivities every galaxy. A histogram of these quantities for one of the measurement algorithms examined in this paper is shown in figure 1.
Accurate calibration depends on characterising the ensemble response, however, not the perobject responses. We argue that the powerlaw wings of the distribution make estimation of the ensemble response by simple averaging problematic, and describe next a general scheme for regularizing the ensemble response estimation. These wings are a consequence of the fact that the regaussianization shapes are themselves ratios of noisy quantities; other methods with more compact support may not require this sort of regularization scheme.
3.4. Ensemble Responsivity
We test the metacalibration procedure on two different shear estimation methods – regaussianization and KSB – each of which is known to have calibration biases that depend on the , galaxy size and morphology, and so on. For both methods, we use the entire ensemble of measured shapes to build a model of the unlensed shape distribution, . There is no guarantee that the average shear (or PSF ellipticity) over the ensemble is actually small enough that the effective mean ellipticity is zero, however. Before proceeding with the inference, we subtract from each galaxy the expected contribution from the PSF for its field, . We then symmetrize the resulting distribution by averaging it with its reflection about (note that the resulting model for the null shape distribution will be somewhat broader than the true distribution of ellipticities at zero shear). The newly symmetrized model unlensed distribution is . We perform the same PSF subtraction from the measured catalog prior to performing inference in the procedure described below.
If the measured shape distribution is linear in the shear, then we can write
(20) 
It will be convenient to discretize this distribution into a histogram. If the probability of a galaxy ending up in the shape histogram bin is , and each galaxy’s shape can be considered an independent draw from some underlying distribution, then the likelihood function for an observed histogram is the multinomial likelihood
(21) 
where is the total number of samples in the histogram. The covariance matrix for the bin amplitudes of this histogram is
(22) 
To make the notation for what follows less cumbersome, let the normalized histogram be , and its first derivative with respect to the shear be .
Given a measured shape histogram with some unknown shear and a fiducial, unlensed shape histogram, the (componentwise) minimumvariance estimator for is
(23) 
with variance
(24) 
This method for shear inference has as its tunable parameter only the histogram binning scheme. Once we have chosen a suitable scheme, we then bin the prior into equalnumber bins and calculate its shear derivative by adding a small shear using the perobject responses, then rebinning. This provides the inputs for the perfield shear estimation. To carry out the perfield shear estimation, we calculate a shape histogram with these bins for each separate field, and evaluate equations 23 and 24 using the globallyderived .
3.5. Validating the Shear Inference
If we have a poor model for the unlensed histogram, , then the results for shear inference will be biased. Once we have derived a mean shear for each field, we can quantify the degree to which the shape histogram resembles our model of the unlensed prior distribution.
To do this, we subtract from each galaxy in each field the product of its individual responsivity and the estimated shear (the PSF contribution having already been removed, as above). We then define the distance between each field and the unlensed prior using equation 21, taking the probabilities from the unlensed model and the histogram amplitudes from the current field, after correcting for the estimated shear. If the shear response measured for the unlensed prior is correct, then the performance of the estimator will depend only on the similarity of the model to the measurement field. The likelihood can then be used as an objective criterion for the quality of the inference.
In practice, we only apply a cut on the likelihood in cases where we have an a priori reason to believe that the prior constructed as we describe above will not be an adequate description of all of the simulation fields. In our tests below, we apply this cut in simulation branches where there is large expected variation in the PSF properties. In these cases, we exclude from our analysis those fields with the worst loglikelihood cuts.
We can also devise a simple test for the linearity (in the shear) of the ensemble estimator. The estimator we use to infer the shear response relies on twosided finitedifference numerical derivatives, but it is also possible to get a noisier measure the nonlinear ensemble response by testing for consistency between the forward and backwards finitedifference estimates for the ensemble response. We quantify this with the parameter , defined as
(25) 
where the and subscripts refer to the forward and backwards finite difference histogram derivative estimates.
3.6. Relationship to previous implementations
As shown in the GREAT3 results paper (Mandelbaum et al. 2015), an early version of metacalibration was used in the GREAT3 challenge. That implementation differs from the one presented here and released publicly in association with this paper in two important ways. First, the model for systematics was simpler than the one presented below; it neglected additive systematics entirely, focusing exclusively on multiplicative systematics. Second, the method for inferring shears from an ensemble of objects was entirely different. These differences are of sufficient importance that the GREAT3 results (especially the ones for additive systematics) are not relevant to the implementation described here.
4. Testing Framework
We test the performance of our algorithm on simulated image sets. Our baseline simulations are drawn from the GREAT3 simulation suite. We run additional simulations in order to distinguish between potential biases arising separately from the three steps in our inference framework.
4.1. Simulated Images
We use the GREAT3 simulation framework as the source of simulated images that we use for testing purposes. For more detail about that simulation framework, see the GREAT3 handbook (Mandelbaum et al. 2014) and results paper (Mandelbaum et al. 2015), or the publicly available software^{7}^{7}7https://github.com/barnabytprowe/great3public.
In brief, we use simulated “branches” containing 200 “subfields”. Each subfield contains galaxies placed on a grid; the galaxies in a given subfield all have the same (unknown) shear and the same known PSF. The galaxy population within a subfield follows a distribution of signaltonoise ratio, size, ellipticity, and morphology based on that in the Hubble Space Telescope (HST) COSMOS survey (Koekemoer et al. 2007; Scoville et al. 2007b, a), roughly approximating a galaxy sample with a depth of . To ensure that most methods will be able to measure all galaxies, an effective signaltonoise cut^{8}^{8}8This was initially advertised as a cut at 20, however the GREAT3 results paper shows that for a more realistic signaltonoise estimator, the effective cut is around 12. of and a minimal resolution cut was imposed (resulting in different sets of galaxies in subfields that have differentsized PSFs). 90 rotated pairs of galaxies were included, to cancel out shape noise (Massey et al. 2007a). The PSF in the simulations comes from the combination of an optics model from a groundbased telescope, along with a Kolmogorov PSF with a typical ellipticity variance. Thus, the galaxy and PSF properties are nontrivially complicated. The noise is stationary Gaussian noise. The ultimate goal is to estimate the average shear in each subfield in an unbiased way, without any multiplicative bias or correlations with the persubfield PSF ellipticity.
We consider two sets of galaxy populations. One comes directly from HST images, and includes a process to remove the HST PSF before shearing (both operations being carried out in Fourier space) and convolving with the final target PSF (Mandelbaum et al. 2012). The other galaxy population consists of simple parametric representations of those HST images. These populations have the same effective distributions of size, ellipticity, and so on, but one includes realistic galaxy morphology while the other only includes such realism as can be captured by the sum of two Sérsic profiles. In the language of GREAT3, we use simulations corresponding to controlgroundconstant (describing the parametric galaxy sample, groundbased simulated data, with a constant persubfield shear) and realgalaxygroundconstant (the realistic galaxy sample), denoted CGC and RGC.
4.2. Simulation Branches
The simplest of our simulation tests was performed on a newlygenerated set of simulations that is closely analogous to the GREAT3 CGC branch (parametric galaxy profiles), but with one modification to avoid a problem raised in the results paper (Mandelbaum et al. 2015). There, it was noted that the CGC branch has a small fraction of outlier fields related to unusually large optical PSF aberrations, specifically defocus and trefoil. Thus, our first simulated dataset is designed exactly like CGC but with all aberrations in the optical PSF set to zero, to ensure reasonable consistency of data quality. Note that the atmospheric PSF is still drawn from a distribution of seeing values for each subfield. For this branch, it is not necessary to use a likelihood cut to remove fields with aberrant PSF behavior in defining the null ellipticity distribution, and so we include all branches in our analysis. In the accompanying figures and table, this branch is denoted by “CGCnoaberregauss.”
The next branch we analyze is similar to the previous one, but with realistic galaxies. This lets us separate the effects of realistic galaxy morphology from the problems inherent in correcting a complex PSF. Just as with the previous branch, is not necessary to use a likelihood cut to remove fields with aberrant PSF behavior in defining the null ellipticity distribution, and so we include all the generated fields in our analysis. The results from this branch are labelled “RGCnoaberregauss.”
We use the GREAT3 CGC simulation branch as a baseline, and report the performance of metacalibration on this branch for all three of our chosen shape measurement algorithms; our results here are denoted in the accompanying figures and tables with the labels “CGCregauss”, “CGCmoments”, and “CGCksb”, as appropriate. Tests on this branch allow straightforward comparison between the performance of our chosen procedure and the variety of other algorithms tested in the GREAT3 challenge.
We next report results for the GREAT3 RGC branch (“RGCregauss”), incorporating more realistic galaxy morphologies along with the aberrant PSFs introduced in the CGC branch.
One issue of concern is how to understand outlier fields. In GREAT3, there was a concern that some outliers were due to failure of our model for interpreting the perobject shapes in fields that had large aberrations. As a way to understand this, we generated a version of RGC that had quite large aberrations that were identical in each field: specifically defocus of waves and one component of trefoil of wave. An example of a PSF in one subfield is shown in Fig. 7; they do not all look identical, since the atmospheric component was still allowed to vary stochastically according to a distribution of seeing FWHM. This removes the difficulties in building a null ellipticity distribution, isolating the impact of a complex PSF. The results from this simulation are labelled “RGCFixedAberregauss”.
Finally, we investigate the effects of increasing the noise in the images. We rerun the “CGCregauss” analysis five times, increasing the noise in the initial image each time. The correlated noise produced by the image modification procedure may affect the derived calibration, and we expect this number of realizations to demonstrate whether correlated noise biases is consistent with the expected signaltonoise scaling.
Simulation branch/algorithm pairs shown in order from top to bottom are RGCregauss, CGCKSB, and CGCmoments. The combination of the metacalibration algorithm with our maximumlikelihood averaging procedure makes accurate corrections when the PSF ellipticities are small or comparable to the magnitude of the shear signal. It is clear that a large fraction of the trend remaining after correction is driven by remaining unmasked highPSFellipticity outlier fields. While these were not rejected by our likelihood criterion, they would typically not pass the image quality requirements in a realistic experiment.
5. Results
Metacalibration reduces the shear calibration biases in every branch that we have tested in, for all methods, and nearly in all cases to a level that is consistent with zero within the errors. In the one branch (CGCMoments) where it does not eliminate multiplicative biases for both shear components within our ability to measure them, it reduces by a factor of eighty. Our objective criterion for the quality of perfield mean shear inference generally succeeds in flagging problem fields, and (as is visible in figure 2), the metacalibrated shears tend to be less noisy than the unmetacalibrated shears.
Our PSF detrending algorithm is less successful, probably for reasons we discuss in section 6. Even here, most fields see a significant reduction in the impact of the PSF anisotropy on the inferred shears.
The remainder of section draws conclusions about the performance of the metacalibration algorithm by comparing the results across simulation branches. We discuss the impact of realistic galaxy morphologies, correlated noise, a heterogeneous PSF, and shape measurement algorithms.
The additive and multiplicative calibration biases for each simulation branch, with and without metacalibration, are reported in full in Table 1. Before/after trends for the multiplicative calibration biases and the trends with PSF ellipticity are shown in figures 2 and 3, respectively. Visualizations of the overall impact of metacalibration and PSF ellipticity detrending on the ensembleaverage quantities in each branch are provided in Figures 5 and 6.
5.1. Galaxy Morphology
We can compare the performance of the metacalibration algorithm on model and realistic galaxy morphologies in two cases (RGCregauss vs. CGCregauss, and RGCnoaberregauss vs. CGCnoaberregauss). In neither case does the introduction of realistic morphology have any impact on the multiplicative shear calibration biases: metacalibrating both pairs of branches results in multiplicative biases that are consistent with zero. The most notable difference between the model and real morphologies is apparent in the PSF trends. Our perturbative PSF detrending scheme reduces the residual additive bias in most cases, but it does not completely correct the model morphologies, whereas there is no evidence of any residual PSF effects in any of the realistic morphology branches. This should not surprise us: we have chosen to perturb and detrend the linear effects of PSF ellipticity, but other PSF properties may be more significant and hence our model is not a complete description of additive systematics. The coupling between PSF morphology and inferred shear will in general depend on the details of both, as well as the galaxy morphology and measurement algorithm. We defer further development of methods to select appropriate control variables for PSF detrending to future work.
5.2. Correlated Noise
The image reconvolution procedure described in section 2.1 also modifies the noise field in the original image: the isotropic white noise typical of real images will be transformed in the counterfactual images into a correlated noise field with preferred direction and scale.
This effect was first documented in the context of shear measurement in the work described in Sheldon & Huff (2017) in simulations conducted at much lower signaltonoise ratios than those of the Great3 challenge. A fuller exploration of the impact of correlated noise on metacalibration, along with a comprehensive mitigation strategy, is presented in that work.
We investigate the impact of this noise by successively adding noise to the CGC images and rerunning the metacalibration and shear estimation algorithms. In the process we gradually increase the noise relative to its initial level by up to a factor of .
The additive biases resulting from varying the noise level are shown in Figure 4. For the typical signaltonoise ratios seen in the GREAT3 simulations, biases resulting from correlated noise appear to scale roughly as the variance of the initial noise field. We note that these effects do not become significant until the overall signaltonoise ratio has been reduced by a factor of two compared to the fiducial GREAT3 values.
5.3. Heterogeneous PSF
Large variations in the PSF properties can impact our measurement algorithm via two channels: first, a heterogeneous PSF can result in individual fields deviating from the zeroshear distribution constructed from the ensemble of measured shapes, potentially biasing the histogram estimator; and second (as discussed above) via the usual mechanism of incomplete PSF correction and detrending.
Eliminating outlier fields in those branches with large variations in the PSF via the rejection mechanism described in section 3.5 tends to significantly improve the calibration bias after metacalibration. The rejected fields (red squares in Figure 3) tend to have substantially larger residual shear calibration biases than the mean field in each branch, and without the outlier rejection, we see fewpercent level calibration biases in each of these branches. With the likelihoodbased rejection mechanism in place, the multiplicative and additive biases in the CGC and RGCregauss branches are consistent with those in the CGCnoaber and RGCnoaber branches. It should be noted that we arrived at the cut by choosing a level that typically eliminated outliers from the compact core of the distribution of perfield likelihoods. There are certainly more objective ways to choose this cut – one can eliminate fields that are much lower than the values generated by sampling from equation (21), for instance – but we defer calibration of the outlierrejection technique to future work.
Finally, we compare the performance of metacalibration in the RGCnoaber, and RGCFixedAber branches. These branches have a relatively homogenous PSF, differing only in the complexity of the PSF morphology (see Figure 7). We see only marginal evidence () for residual calibration biases in this case (though this is the only significant bias seen among the six regauss fields we report on, so a single detection is to be expected even in the absence of any real calibration biases). It is also notable that the RGCFixedAber branch sees no significant trend between PSF anisotropy and inferred shear.
Given these results, it would seem that the primary channel through which a heterogeneous PSF impacts our shear inference is via the first channel; the ensemble response of those fields with outlier PSF properties is sufficiently different from that estimated from the global null ellipticity distribution that identification of these fields is essential.
5.4. Shape Measurement Algorithms
We investigate the effects of the shape measurement algorithm by comparing the performance of the regauss, KSB, and linear moments measurement algorithms on the CGC simulation branch. We see large improvements in performance for each method after using metacalibration. There is weak evidence for residual KSB calibration biases, and strong evidence for residual biases in the linear moments. In the latter case, however, the improvement relative to the unmetacalibrated case is large: the multiplicative and additive biases are reduced by factors of and , respectively, resulting in performance for metacalibrated linear moments that is quantitatively superior to that of the unmetacalibrated KSB. This is remarkable given that simple linear moments involve no PSF correction, leaving the entirety of the PSF correction process to metacalibration.
One potential reason for the difference in performance for different shape measurement methods is the linearity of the ensemble estimates in the resulting shear. We find values for of , , and , for the regauss, KSB, and linear moments algorithms, respectively (where the errors are bootstrap estimates). The quadratic correction to the shear response is much larger for KSB than for the other methods, suggesting that any problems with the KSB shapes may arise from its more nonlinear shear response (see Viola et al. 2011 for a detailed investigation of this issue).
Another potential driver of residual additive systematics is incomplete PSF detrending. The reasons for this are discussed in the previous section. The linear moments see significant PSF residuals, and Figure 3 suggests that these are primarily driven by the fields with large PSF ellipticities.
The values and the residual additive and multiplicative biases for these methods suggest that the small residual calibration bias in KSB is driven by the nonlinear response of that shape measurement algorithm, while the residual linear moments calibration bias is primarily driven by imperfect PSF correction.
6. Applicability to real data
Several implementation issues need to be solved before this method can be deployed on real survey data. We have made no attempt to deal with the effects of masked pixels or blending, and while it seems clear that our proposed algorithm has the potential to deal with selection biases, we have not demonstrated that capability here. We have also demonstrated the presence of a calibration bias which scales with the variance of the pixel noise, which may be a result of the correlations imposed on the noise field during the construction of the counterfactual image.
In this work, we have attempted to remove PSF systematics by measuring the response of the shape measure to PSF ellipticity. There is no guarantee, however, that the PSF ellipticity is the correct parameter to use for this detrending. In a realistic measurement, it would be best to first determine which modes of PSF variation are most likely to impact the chosen shape measure, and then use the image modification and detrending technique described here to remove those effects in the data.
Finally, our shear inference procedure is designed to extract the mean shear from a constantshear field. While this procedure may be applicable to galaxygalaxy lensing (i.e., projecting the shapes onto the tangent to the appropriate lens), it is not suitable for measurements like cosmic shear that typically rely on second or higher moments of the shear field. While a similar histogram estimation procedure could be implemented to model the responsivity of the distribution of ellipticity products (as would be needed for twopoint shear correlation functions), we leave design and implementation of this generalization to future work.
At the time of this writing, metacalibration is being actively adapted to realistic measurements in the Dark Energy Survey. Concurrent work (Sheldon & Huff 2017) will demonstrate algorithmic improvements that allow this technique to be used on Dark Energy Survey data with stateoftheart shear estimation algorithms.
7. Conclusions
We have proposed and implemented the first method for selfcalibration of shear measurements that does not rely on deeper calibration fields or simulations. Our method can be wrapped around any sufficiently wellbehaved shape measurement algorithm. We use GREAT3 and related simulations to demonstrate that metacalibration reduces or eliminates shear calibration biases across a variety of galaxy morphologies, PSF properties, and for several otherwise biased shape measurement algorithms. We have argued that our technique works because it takes advantage of the fundamental linearity of astronomical images in the weak lensing shear signal, in combination with the fact that the effects of shear on an image with a known PSF are modelindependent.
Those cases we have examined where the initial biases are large or not linear are not completely corrected by our linear detrending scheme, though in every case we have studied the algorithm appears to substantially improve biases resulting from faulty additive PSF correction and multiplicative shear calibration biases. Even the nearly informationfree linear moments algorithm appears to be calibrated by our procedure to a level superior to uncorrected KSB, a widely used traditional shear estimation algorithm.
We expect future work to extend this method to deal with the complexities inherent in real data.
Acknowledgments
We are grateful to Erin Sheldon and to Mike Jarvis for discussion, advice, and for checking the results reported here against their own independent image simulation work.
We also thank Klaus Honscheid and Peter Melchior for useful discussions and advice, and Neha Bagchi for proofreading and editing support.
RM acknowledges the support of the Department of Energy Early Career Award program.
EMH was supported for part of the work on this proposal by the US Department of Energyâs Office of High Energy Physics (DEAC0205CH11231), and from the Center for Cosmology and AstroParticle Physics at the Ohio State University.
Part of the research was carried out at the Jet Propulsion Laboratory (JPL), California Institute of Technology, under a contract with the National Aeronautics and Space Administration.
References
 Bartelmann & Schneider (2001) Bartelmann, M. & Schneider, P. 2001, Phys. Rep., 340, 291
 Bergé et al. (2013) Bergé, J., Gamper, L., Réfrégier, A., & Amara, A. 2013, Astronomy and Computing, 1, 23
 Bernstein & Armstrong (2014) Bernstein, G. M. & Armstrong, R. 2014, MNRAS, 438, 1880
 Bernstein & Jarvis (2002) Bernstein, G. M. & Jarvis, M. 2002, AJ, 123, 583
 Bridle et al. (2009) Bridle, S., ShaweTaylor, J., Amara, A., Applegate, D., Balan, S. T., Berge, J., Bernstein, G., Dahle, H., Erben, T., & et al. 2009, Annals of Applied Statistics, 3, 6
 de Jong et al. (2015) de Jong, J. T. A., Verdoes Kleijn, G. A., Boxhoorn, D. R., Buddelmeijer, H., Capaccioli, M., Getman, F., Grado, A., Helmich, E., Huang, Z., Irisarri, N., Kuijken, K., La Barbera, F., McFarland, J. P., Napolitano, N. R., Radovich, M., Sikkema, G., Valentijn, E. A., Begeman, K. G., Brescia, M., Cavuoti, S., Choi, A., Cordes, O.M., Covone, G., Dall’Ora, M., Hildebrandt, H., Longo, G., Nakajima, R., Paolillo, M., Puddu, E., Rifatto, A., Tortora, C., van Uitert, E., Buddendiek, A., HarnoisDéraps, J., Erben, T., Eriksen, M. B., Heymans, C., Hoekstra, H., Joachimi, B., Kitching, T. D., Klaes, D., Koopmans, L. V. E., Köhlinger, F., Roy, N., Sifón, C., Schneider, P., Sutherland, W. J., Viola, M., & Vriend, W.J. 2015, A&A, 582, A62
 Fenech Conti et al. (2016) Fenech Conti, I., Herbonnet, R., Hoekstra, H., Merten, J., Miller, L., & Viola, M. 2016, ArXiv eprints
 Flaugher (2005) Flaugher, B. 2005, International Journal of Modern Physics A, 20, 3121
 Gruen et al. (2015) Gruen, D., Bernstein, G. M., Jarvis, M., Rowe, B., Vikram, V., Plazas, A. A., & Seitz, S. 2015, Journal of Instrumentation, 10, C05032
 Heymans et al. (2006) Heymans, C., Van Waerbeke, L., Bacon, D., Berge, J., Bernstein, G., Bertin, E., Bridle, S., & et al. 2006, MNRAS, 368, 1323
 Hildebrandt et al. (2016) Hildebrandt, H., Viola, M., Heymans, C., Joudaki, S., Kuijken, K., Blake, C., Erben, T., Joachimi, B., Klaes, D., Miller, L., Morrison, C. B., Nakajima, R., Verdoes Kleijn, G., Amon, A., Choi, A., Covone, G., de Jong, J. T. A., Dvornik, A., Fenech Conti, I., Grado, A., HarnoisDéraps, J., Herbonnet, R., Hoekstra, H., Köhlinger, F., McFarland, J., Mead, A., Merten, J., Napolitano, N., Peacock, J. A., Radovich, M., Schneider, P., Simon, P., Valentijn, E. A., van den Busch, J. L., van Uitert, E., & Van Waerbeke, L. 2016, ArXiv eprints
 Hirata & Seljak (2003) Hirata, C. & Seljak, U. 2003, MNRAS, 343, 459
 Hoekstra & Jain (2008) Hoekstra, H. & Jain, B. 2008, Annual Review of Nuclear and Particle Science, 58, 99
 Jarvis et al. (2016) Jarvis, M., Sheldon, E., Zuntz, J., Kacprzak, T., Bridle, S. L., Amara, A., Armstrong, R., Becker, M. R., Bernstein, G. M., Bonnett, C., Chang, C., Das, R., Dietrich, J. P., DrlicaWagner, A., Eifler, T. F., Gangkofner, C., Gruen, D., Hirsch, M., Huff, E. M., Jain, B., Kent, S., Kirk, D., MacCrann, N., Melchior, P., Plazas, A. A., Refregier, A., Rowe, B., Rykoff, E. S., Samuroff, S., Sánchez, C., Suchyta, E., Troxel, M. A., Vikram, V., Abbott, T., Abdalla, F. B., Allam, S., Annis, J., BenoitLévy, A., Bertin, E., Brooks, D., BuckleyGeer, E., Burke, D. L., Capozzi, D., Rosell, A. C., Kind, M. C., Carretero, J., Castander, F. J., Clampitt, J., Crocce, M., Cunha, C. E., D’Andrea, C. B., da Costa, L. N., DePoy, D. L., Desai, S., Diehl, H. T., Doel, P., Neto, A. F., Flaugher, B., Fosalba, P., Frieman, J., Gaztanaga, E., Gerdes, D. W., Gruendl, R. A., Gutierrez, G., Honscheid, K., James, D. J., Kuehn, K., Kuropatkin, N., Lahav, O., Li, T. S., Lima, M., March, M., Martini, P., Miquel, R., Mohr, J. J., Neilsen, E., Nord, B., Ogando, R., Reil, K., Romer, A. K., Roodman, A., Sako, M., Sanchez, E., Scarpine, V., Schubnell, M., SevillaNoarbe, I., Smith, R. C., SoaresSantos, M., Sobreira, F., Swanson, M. E. C., Tarle, G., Thaler, J., Thomas, D., Walker, A. R., & Wechsler, R. H. 2016, MNRAS
 Kaiser (2000) Kaiser, N. 2000, ApJ, 537, 555
 Kaiser et al. (1995) Kaiser, N., Squires, G., & Broadhurst, T. 1995, ApJ, 449, 460
 Kitching et al. (2013) Kitching, T. D., Rowe, B., Gill, M., Heymans, C., Massey, R., Witherick, D., Courbin, F., Georgatzis, K., Gentile, M., Gruen, D., Kilbinger, M., Li, G. L., Mariglis, A. P., Meylan, G., Storkey, A., & Xin, B. 2013, ApJS, 205, 12
 Koekemoer et al. (2007) Koekemoer, A. M., Aussel, H., Calzetti, D., Capak, P., Giavalisco, M., Kneib, J., Leauthaud, A., Le Fèvre, O., & et al. 2007, ApJS, 172, 196
 Laureijs et al. (2011) Laureijs, R., Amiaux, J., Arduini, S., Auguères, J. ., Brinchmann, J., Cole, R., Cropper, M., Dabin, C., Duvet, L., Ealet, A., & et al. 2011, ArXiv eprints (1110.3193)
 LSST Science Collaborations & LSST Project (2009) LSST Science Collaborations & LSST Project. 2009, ArXiv eprints (0912.0201), http://www.lsst.org/lsst/scibook
 Mandelbaum et al. (2012) Mandelbaum, R., Hirata, C. M., Leauthaud, A., Massey, R. J., & Rhodes, J. 2012, MNRAS, 420, 1518
 Mandelbaum et al. (2005) Mandelbaum, R., Hirata, C. M., Seljak, U., Guzik, J., Padmanabhan, N., Blake, C., Blanton, M. R., Lupton, R., & Brinkmann, J. 2005, MNRAS, 361, 1287
 Mandelbaum et al. (2015) Mandelbaum, R., Rowe, B., Armstrong, R., Bard, D., Bertin, E., Bosch, J., Boutigny, D., Courbin, F., Dawson, W. A., Donnarumma, A., Fenech Conti, I., Gavazzi, R., Gentile, M., Gill, M. S. S., Hogg, D. W., Huff, E. M., Jee, M. J., Kacprzak, T., Kilbinger, M., Kuntzer, T., Lang, D., Luo, W., March, M. C., Marshall, P. J., Meyers, J. E., Miller, L., Miyatake, H., Nakajima, R., Ngolé Mboula, F. M., Nurbaeva, G., Okura, Y., PaulinHenriksson, S., Rhodes, J., Schneider, M. D., Shan, H., Sheldon, E. S., Simet, M., Starck, J.L., Sureau, F., Tewes, M., Zarb Adami, K., Zhang, J., & Zuntz, J. 2015, MNRAS, 450, 2963
 Mandelbaum et al. (2014) Mandelbaum, R., Rowe, B., Bosch, J., Chang, C., Courbin, F., Gill, M., Jarvis, M., Kannawadi, A., Kacprzak, T., Lackner, C., & et al. 2014, ApJS, 212, 5
 Mandelbaum et al. (2013) Mandelbaum, R., Slosar, A., Baldauf, T., Seljak, U., Hirata, C. M., Nakajima, R., Reyes, R., & Smith, R. E. 2013, MNRAS, 432, 1544
 Massey et al. (2007a) Massey, R., Heymans, C., Bergé, J., Bernstein, G., Bridle, S., Clowe, D., Dahle, H., Ellis, R., & et al. 2007a, MNRAS, 376, 13
 Massey et al. (2010) Massey, R., Kitching, T., & Richard, J. 2010, Reports on Progress in Physics, 73, 086901
 Massey et al. (2007b) Massey, R., Rowe, B., Refregier, A., Bacon, D. J., & Bergé, J. 2007b, MNRAS, 380, 229
 Miyazaki et al. (2012) Miyazaki, S., Komiyama, Y., Nakaya, H., Kamata, Y., Doi, Y., Hamana, T., Karoji, H., Furusawa, H., Kawanomoto, S., Morokuma, T., Ishizuka, Y., Nariai, K., Tanaka, Y., Uraguchi, F., Utsumi, Y., Obuchi, Y., Okura, Y., Oguri, M., Takata, T., Tomono, D., Kurakami, T., Namikawa, K., Usuda, T., Yamanoi, H., Terai, T., Uekiyo, H., Yamada, Y., Koike, M., Aihara, H., Fujimori, Y., Mineo, S., Miyatake, H., Yasuda, N., Nishizawa, J., Saito, T., Tanaka, M., Uchida, T., Katayama, N., Wang, S.Y., Chen, H.Y., Lupton, R., Loomis, C., Bickerton, S., Price, P., Gunn, J., Suzuki, H., Miyazaki, Y., Muramatsu, M., Yamamoto, K., Endo, M., Ezaki, Y., Itoh, N., Miwa, Y., Yokota, H., Matsuda, T., Ebinuma, R., & Takeshi, K. 2012, in Proc. SPIE, Vol. 8446, Groundbased and Airborne Instrumentation for Astronomy IV, 84460Z
 Refregier (2003) Refregier, A. 2003, ARA&A, 41, 645
 Rowe et al. (2015) Rowe, B. T. P., Jarvis, M., Mandelbaum, R., Bernstein, G. M., Bosch, J., Simet, M., Meyers, J. E., Kacprzak, T., Nakajima, R., Zuntz, J., Miyatake, H., Dietrich, J. P., Armstrong, R., Melchior, P., & Gill, M. S. S. 2015, Astronomy and Computing, 10, 121
 Schneider (2006) Schneider, P. 2006, in SaasFee Advanced Course 33: Gravitational Lensing: Strong, Weak and Micro, ed. G. Meylan, P. Jetzer, P. North, P. Schneider, C. S. Kochanek, & J. Wambsganss, 269–451
 Scoville et al. (2007a) Scoville, N., Abraham, R. G., Aussel, H., Barnes, J. E., Benson, A., Blain, A. W., Calzetti, D., Comastri, A., Capak, P., & et al. 2007a, ApJS, 172, 38
 Scoville et al. (2007b) Scoville, N., Aussel, H., Brusa, M., Capak, P., Carollo, C. M., Elvis, M., Giavalisco, M., Guzzo, L., Hasinger, G., & et al. 2007b, ApJS, 172, 1
 Sheldon & Huff (2017) Sheldon, E. S. & Huff, E. M. 2017, ApJ, submitted
 Spergel et al. (2015) Spergel, D., Gehrels, N., Baltay, C., Bennett, D., Breckinridge, J., Donahue, M., Dressler, A., Gaudi, B. S., Greene, T., Guyon, O., Hirata, C., Kalirai, J., Kasdin, N. J., Macintosh, B., Moos, W., Perlmutter, S., Postman, M., Rauscher, B., Rhodes, J., Wang, Y., Weinberg, D., Benford, D., Hudson, M., Jeong, W.S., Mellier, Y., Traub, W., Yamada, T., Capak, P., Colbert, J., Masters, D., Penny, M., Savransky, D., Stern, D., Zimmerman, N., Barry, R., Bartusek, L., Carpenter, K., Cheng, E., Content, D., Dekens, F., Demers, R., Grady, K., Jackson, C., Kuan, G., Kruk, J., Melton, M., Nemati, B., Parvin, B., Poberezhskiy, I., Peddie, C., Ruffa, J., Wallace, J. K., Whipple, A., Wollack, E., & Zhao, F. 2015, ArXiv eprints
 Viola et al. (2011) Viola, M., Melchior, P., & Bartelmann, M. 2011, MNRAS, 410, 2156
 Weinberg et al. (2013) Weinberg, D. H., Mortonson, M. J., Eisenstein, D. J., Hirata, C., Riess, A. G., & Rozo, E. 2013, Phys. Rep., 530, 87
 Zhang & Komatsu (2011) Zhang, J. & Komatsu, E. 2011, MNRAS, 414, 1047