Photo-z self-calibration

# Self-calibration of photometric redshift scatter in weak lensing surveys

Pengjie Zhang, Ue-Li Pen, Gary Bernstein
Key Laboratory for Research in Galaxies and Cosmology, Shanghai Astronomical Observatory, Nandan Road 80, Shanghai, 200030, China;
pjzhang@shao.ac.cn
Canadian Institute for Theoretical Astrophysics, University of Toronto, 60 St. George Street, Toronto, ON M5S 3H8, Canada
Department of Physics & Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA
###### Abstract

Photo-z errors, especially catastrophic errors, are a major uncertainty for precision weak lensing cosmology. We find that the shear-(galaxy number) density and density-density cross correlation measurements between photo-z bins, available from the same lensing surveys, contain valuable information for self-calibration of the scattering probabilities between the true-z and photo-z bins. The self-calibration technique we propose does not rely on cosmological priors nor parameterization of the photo-z probability distribution function, and preserves all of the cosmological information available from shear-shear measurement. We estimate the calibration accuracy through the Fisher matrix formalism. We find that, for advanced lensing surveys such as the planned stage IV surveys, the rate of photo-z outliers can be determined with statistical uncertainties of - for galaxies. Among the several sources of calibration error that we identify and investigate, the galaxy distribution bias is likely the most dominant systematic error, whereby photo-z outliers have different redshift distributions and/or bias than non-outliers from the same bin. This bias affects all photo-z calibration techniques based on correlation measurements. Galaxy bias variations of produce biases in photo-z outlier rates similar to the statistical errors of our method, so this galaxy distribution bias may bias the reconstructed scatters at several- level, but is unlikely to completely invalidate the self-calibration technique.

###### keywords:
(cosmology:) large-scale structure of Universe: gravitational lensing: theory: observations

## 1 Introduction

Weak gravitational lensing is emerging as one of the most powerful probes of dark matter, dark energy [1] and the nature of gravity at cosmological scales [18]. In less than a decade after first detections [3, 22, 36, 38], the lensing measurement accuracy and dynamical range have been improved dramatically (e.g. Fu et al. 7). Future weak lensing surveys have the potential to measure the lensing power spectrum with sub- statistical accuracy for many multipole bins. However, whether we can fully utilize this astonishing capability is up to the control over various systematic errors. They could arise from uncertainties in theoretical modeling, including the non-linear evolution of the universe [8, 9, 10] and the influence of baryons [37, 40, 19, 30]. They could also arise from uncertainties in the lensing measurement. An incomplete list includes the galaxy intrinsic alignment [12, 24, 13, 27, 28], influence of the telescope PSF [11, 25], photometric redshift (photo-z) calibration errors [23, 6], etc. Precision lensing cosmology puts stringent requirements on calibrating these errors [15].

Weak lensing surveys are rich in physics and contain information beyond the cosmic shear power spectrum [5, 45]. This bonus allows for self-calibration of weak lensing systematic errors, such as the galaxy intrinsic alignment [45, 20]. In the present paper, we utilize the galaxy density-shear cross-correlation and density-density correlations in photometric survey data to self-calibrate the photo-z scatters between redshift bins. Namely, for a given photo-z bin, we want to figure out (reconstruct) the fraction of galaxies which are actually located in a distinct redshift bin. These scatters quantitatively describe the photo-z outliers or catastrophic errors. The effect of photo-z outliers on cosmological inference from the shear-shear power spectrum is discussed by Bernstein & Huterer [6], which also quantifies the task of calibrating these outliers by direct spectroscopic sampling of the galaxy population. Since spectroscopic sampling of faint galaxies at completeness is an expensive or infeasible task for current ground-based capabilities, Newman [26] proposes a technique based on cross-correlation between the photo-z sample and an incomplete spectro-z sample.

Here we ask a complementary question: how well can the outlier rate be determined using purely photometric data from the original lensing survey? As pointed out by Schneider et al. [32], the spurious cross correlation between the galaxy density in two photo-z bins can be explored to calibrate photo-z errors. They found that, for a photo-z bin of the size in a LSST-like survey, level scatters can be identified and the mean redshift can be calibrated within the accuracy of . This result is impressive. Unfortunately, a factor of improvement is still required to meet the statistical accuracy of those ambitious “stage IV” projects (, Huterer et al. 15). In combination with baryon acoustic oscillation and weak lensing measurements, the constraints can be significantly improved [42]. However, these procedures adopt a number of priors/parameterizations, which may bias the calibration.

A successful self-calibration should not correlate cosmological uncertainties with astrophysical uncertainties (in our case, photo-z errors). To meet this requirement, the self-calibration should adopt as few cosmological priors as possible, preferably none. On the other hand, it should not result in loss of cosmological information. In this paper, we propose to combine the galaxy-galaxy clustering measurement and shear-galaxy cross correlation measurement to perform the photo-z self-calibration, strictly reserving the shear-shear measurement for cosmology. Finally, it must be able to reach sufficiently high statistical accuracy and have controllable systematics, if any. As we will show, this self-calibration meets all the three requirements and is able to detect scatters as low as .

There are a number of differences between our self-calibration method and the method proposed by Schneider et al. [32]. (1) The inclusion of shear-galaxy cross correlation measurement breaks a severe degeneracy in the previous method and thus significantly improves the calibration accuracy. (2) We do not adopt any parameterizations on the photo-z probability distribution function (PDF) and are thus free of possible bias induced by improper parameterizations. (3) This method is a true self-calibration, in the sense that the photo-z scatters are reconstructed solely from the given weak lensing surveys, no external measurements nor priors on cosmology and galaxy bias, are needed.111Of course, the calibration accuracy depends on the fiducial photo-z PDF or the actual photo-z PDF in the given survey. (4) We argue that the shot noise is the only relevant noise term for the likelihood analysis. Sample variance and non-Gaussianity, do not affect the reconstruction. This allows us to go deeply into the nonlinear regime, gain many more independent modes for the reconstruction, and significantly improve the reconstruction accuracy.

Like most of these predecessor papers, we analyze an idealized survey: apparent density fluctuations induced by gravitational lensing are ignored; galaxy biasing is assumed to be common to all galaxies at a given redshift; and shear measurement errors are ignored. Incorporation of these and other effects can substantially degrade the correlation-based photo-z calibration methods [6]. A framework for comprehensive analysis of photometricspectroscopic lensing survey data is presented in Bernstein [5], and is necessary to make a final judgment on the efficacy of photo-z self-calibration. However this complicated analysis has not yet been applied to the problem of photo-z outlier calibration. The simpler analysis presented here will demonstrate that the galaxy-shear and galaxy-galaxy correlations contain sufficient information to measure the outlier rate to useful precision, and we will then examine the possibility of degradation by the non-ideal effects.

This paper is organized in the following way. In §2, we describe our self-calibration technique and target a fiducial “Stage IV” lensing survey for the error forecast. We discuss in §3 possible systematic errors which can be incorporated into our technique and will not bias the photo-z PDF reconstruction. Other systematic errors cannot be self-calibrated without strong priors or external information. For these, we quantify the induced bias in §4. We further discuss uncertainties in the error forecast due to uncertainties in the fiducial model and the robustness of our self-calibration technique (§5). We discuss possibilities to improve the calibration accuracy (§6). We also include two appendices (§A & B) for technical details of the Fisher matrix analysis and bias estimation.

## 2 Photo-z self-calibration

We first define several key notations used throughout the paper.

• The superscript “P” denotes the property in the photo-z bin.

• The superscript “R” denotes the corresponding property in the true-z bin.

• The capital “G” denotes gravitational lensing, to be more specific, the lensing convergence converted from the more direct observable cosmic shear.

• The little “g” denotes galaxy number density (or over-density).

We split galaxies into photo-z bins. The -th photo-z bin has the range . Our notation is that larger means higher photo-z. , the total number of galaxies in the -th photo-z bin, is an observable. We also have true-z bins, with the choice of redshift range identical to that of the photo-z bins. We denote as the total number of galaxies in the -th photo-z bin which also belong to the -th true-z bin, namely, whose true redshift fall in . The process is similar to scatters (transitions) between different quantum states. So we often call this process as the photo-z scatter. The scatter rate is related to the the photo-z probability distribution function, , by the following relation

 Ni→j=∫zj+Δzj2zj−Δzj2n(zP)dzP⎡⎢⎣∫zi+Δzi2zi−Δzi2p(z|zP)dz⎤⎥⎦ . (1)

Here, is the number distribution of galaxies in the photo-z space. We define , which represents the averaged over the relevant redshift ranges, or the binned photo-z PDF. In the limit that , . Since , we have independent .222In general, and are independent and . These together represent a non-parametric description of the photo-z PDF and completely describe the scattering probabilities between redshift bins, namely the rate of photo-z outliers, or leakage rate. In Bernstein & Huterer [6], they are called “contamination” coefficients.

Scatters in photo-z, especially catastrophic photo-z errors, cause a number of spurious correlations between photo-z bins. Our proposal is to reconstruct all from these cross correlations. For convenience, we will work on the corresponding cross power spectra throughout the paper, instead of the cross correlation functions.

We use fiducial power spectra, fiducial leakage and the survey specification to perform the error forecast. To generate the fiducial power spectra, we adopt a flat CDM cosmology with , , , and . The transfer function is obtained using CMBFAST [33]. The nonlinear matter power spectrum is calculated using the fitting formula of Smith et al. [35]. Unless specified, we adopt a fiducial galaxy bias . We are then able to calculate the galaxy power spectra and lensing-galaxy power spectra. The calculation of these power spectra is by no mean precise, due to the simplification of . However, this simple bias model suffices for the purpose of this paper, namely, to demonstrate the feasibility of our self-calibration technique. We will further investigate the impact of scale dependent bias in §5, where we find that our self-calibration technique is also applicable.

We also need the galaxy distribution and the input of fiducial . Due to uncertainties in survey specifications, we will not focus on any specific survey. Instead, we will target a fiducial lensing survey with some characteristics of “Stage IV” lensing surveys like LSST333Large Synoptic Survey Telescope, http://www.lsst.org/lsst, Euclid444http://www.astro.ljmu.ac.uk/ airs2008/docs/euclid_astronet%202.pdf and JDEM. The analysis can be redone straightforwardly to incorporate changes in the survey specifications. We follow Huterer et al. [15], Zhan & Knox [43] and adopt the photo-z distribution , where . For such a distribution, the median redshift is . We adopt , the mean galaxy surface density per arcmin, and the fractional sky coverage . The rms dispersion in the shear measurement induced by the galaxy intrinsic ellipticities is adopted as .

The fiducial and are calculated from the simulated data of Bernstein & Huterer [6], which were produced using the method described in Jouvel et al. [21]. The - distribution in this data is shown in Fig. 1, where is the spectroscopic redshift, which can be approximated as the true redshift. This simulated data is for SNAP-like space weak lensing surveys, conducted in 8 broad bands spanning -m wavelength. These surveys have near-IR bands and hence likely smaller photo-z errors than LSST. Since we find that the statistical accuracy of the photo-z reconstruction is not very sensitive to the choice of fiducial photo-z PDF and since photo-z calibration has room to improve, the adopted photo-z PDF should be a good representative case to illustrate our method.

Furthermore, the accuracy of the self-calibration is sensitive to the choice of redshift bins. Unless otherwise specified, we will adopt eight redshift bins, , , , , , , and . For comparison, we also investigate the case of much finer redshift bins, , , , , , , , . We call these coarse bins and fine bins, respectively. We have utilized the freedom in the data analysis and disregarded the galaxies with . These galaxies only account for of the total galaxies. Neglecting them does not result in significant loss of information. On the other hand, these galaxies may have large photo-z catastrophic errors (Fig. 1). Neglecting them helps to reduce systematic errors.

To close the scattering process, we need to assume that no galaxies with come from . Under this apssumption, what our self-calibration technique actually does, is to assign those galaxies into the true-z bin, instead of randomly assigning them elsewhere.555These galaxies do not lens other galaxies so they have to be put in the highest redshift bin. More details on how the self-calibration technique rank galaxies are given in §2.3. We argue that this approximation is sufficiently accurate, for two reasons. First, in the simulated data we used, the total fraction of galaxies with and is only . Second, about of them leak into the photo-z bin. Although the contamination rate in this redshift bin is high (), the induced error in lensing modelling is small, since the lensing weighting kernel varies slowly at . Contaminations to other photo-z bins are much smaller. These galaxies account for of total galaxies in the photo-z bin, in the and photo-z bins. In the adopted simulated data, we detect no such galaxies in photo-z bins . Due to these tiny fractions, mis-assignment of these galaxies is not a limiting factor of our self-calibration technique.

### 2.1 Galaxy-galaxy clustering

Photo-z errors, especially catastrophic errors, induce non-zero galaxy-galaxy correlation between different redshift bins, which should vanish at sufficiently small scale, where the Limber approximation holds. This set of correlations has been explored to perform the photo-z self-calibration [32]. With the presence of photo-z scatters, the measured galaxy surface density in a given photo-z bin is the combination of the corresponding ones in the true redshift bins,

 δΣ,Pi=∑kpk→iδΣ,Rk . (2)

The galaxy power spectrum between the -th and -th photo-z bins is

 Cgg,Pij=∑kmpk→ipm→jCgg,Rkm≃∑kpk→ipk→jCgg,Rkk . (3)

The last equation has approximated , which holds under the Limber approximation. At sufficiently large angular scales, the Limber approximation fails and the intrinsic cross correlation . For this reason, we exclude the modes with . We will further discuss this issue later in the paper (§3.3).

If the relevant photo-z scatters are sufficiently large, will dominate the associated shot noise. In this case, the relevant photo-z scatters become detectable. For the adjacent bins (), the dominant contribution to obviously comes from and , unless the corresponding and are tiny, which is unlikely. For correlations between non-adjacent photo-z bins, this may not be true. In Fig. 2, we show the result of . In this case, the dominant contribution comes from , through the scatters and . Due to the huge sky coverage of the fiducial stage IV lensing survey, even though the signal is suppressed by a huge factor () relative to , it still overwhelms the shot noise and becomes observable.

However, an intrinsic degeneracy encoded in the galaxy-galaxy clustering significantly limits the accuracy of this approach. From Eq. 3, it is clear that the correlation can be induced by a nonzero or by a non-zero . In other words, () only measures the combination . Thus with measurement at a single multipole bin, the galaxy clustering measurement alone can not break this degeneracy between up and down scatters. Adding more bins can break this degeneracy—if the ratio varies with . If for example the 3D galaxy power spectrum is a strict power-law over the relevant scales at the relevant redshift range, whose power index does not vary with redshift,666This condition is necessary. If the power index of the 3D power spectrum varies with redshift, the shape of will vary with redshift. then all are self-similar and the solution remains degenerate. Observationally, departures from a power law have been found in the galaxy correlation function [39], so galaxy clustering measurements at many bins can break the above degeneracy. Nevertheless, galaxy correlation functions are observed to be close to power laws. This degrades the reconstruction accuracy. Furthermore, the slope depends weakly on galaxy type. If one uses the slight deviations from a power law, small changes in the slope of the leaking population could lead to large systematic reconstruction errors. This is an example of the galaxy distribution bias that we will scrutinize in §4.1. We will further investigate this issue in §5.

In the next section, we will show that, due to the unique geometry dependence of gravitational lensing, the degeneracy between up and down scatters is broken naturally by adding the lensing-galaxy correlation, resulting in significant improvement in the reconstruction.

### 2.2 Lensing-galaxy correlations

Galaxy-galaxy lensing brings new observables for each angular scale, which could be used to break the previous degeneracy. The scatter can render the otherwise vanishing foreground lensing-background galaxy cross correlation non-zero (). More importantly, lensing, due to its geometry dependence, distinguishes up-scatters from down-scatters. This new piece of information is the key to significantly improve the photo-z self-calibration.

Without loss of generality, we will work on the lensing convergence , instead of the more direct observable cosmic shear , which are locally equivalent in Fourier space.777Cosmic shear measurement directly measures the reduced shear . So the above statement only holds at first order approximation. However, this complexity does not affect our self-calibration, in which we do not rely on a theory to predict or . With the presence of photo-z scatters, the measured lensing convergence in a given photo-z bin is some linear combination of the ones in true-z bins weighted by the scatter probability,

 κPi=∑kpk→iκRk . (4)

The cross correlation power spectrum between the lensing convergence in the -th photo-z bin and the galaxy number density in the -th photo-z bin is given by

 CGg,Pij = ∑k≥mpk→ipm→jCGg,Rkm . (5)

In the absence of lensing magnification bias, only when the source redshifts are higher than the galaxy redshifts (namely, ). Discussion of magnification bias and other errors will be postponed to §3 and §4.

We show in Fig. 2 the case of as an example. It is mainly contributed by through the scatters and , by from the scatters and by from the scattering . Given the strength of and , the resulting is sufficiently large to overwhelm the shot noise at . We notice that, depending on the values of the relevant and the size of the redshift bin, the dominant contribution can come from the term, despite the heavy suppression in its amplitude due to the low amplitude of the lensing kernel over the relevant redshift range.

In most of the cases, can not be measured with comparable accuracy to that of (however, refer to Fig. 2 for one exception). However, we point out that it is valuable to include this piece of information in the photo-z self-calibration. It turns out to be the key to breaking the strong degeneracy between up and down scatters. Look at the configuration . The scatter contributes to , while the scatter does not. For this reason, it can break the degeneracy between and , encountered in the self-calibration based on galaxy clustering alone. The discriminating power relies on the intrinsic asymmetry between up and down scatters in generating the lensing effect. So it remains efficient, even in the case of self-similar , where the self-calibration based on galaxy clustering alone blows up (§5).

### 2.3 The photo-z self-calibration combining the galaxy-galaxy and lensing-galaxy measurements

Counting degrees of freedom suggests that we should be able to perform a rather model-independent self-calibration without any priors. Mathematically, we need to solve Eq. 3 & 5 for all , and simultaneously. At the beginning of this process, and should be replaced with the corresponding measurements. The reconstructed power spectra and contain valuable information on cosmology and can be further explored. The reconstructed can then be applied to the shear-shear correlation measurement to correct for bias induced by photo-z scatters and infer the correct cosmology. In the present paper, we will focus on only . Thus we treat and as nuisance parameters to be marginalized over.

The unknown parameters to be determined simultaneously by our self-calibration technique are , with , and . For multipole bins and redshift bins, we have quantities to solve ( for , from () and for ). On the other hand, we have independent measurements of correlations. of them come from and from .888The measurement is identical to the measurement .

The equations to solve are quadratic in and linear in (Eq. 3 & 5). For this reason, to guarantee a unique solution for , the number of measurements should be at least larger than the number of unknowns. This condition is satisfied when . If all the equations ( Eq. 3 & 5) are independent then is the minimum requirement. If some of the equations are linear combinations of the others, we will need . For example, if the galaxy power spectra are strict power laws in , then even perfect measurements of at all and redshift bins cannot break the degeneracies in and the self-calibration fails. In reality, none of the galaxy power spectra and shear-galaxy power spectra is strictly power law and we expect a valid self-calibration barring another unforeseen degeneracy. Furthermore, the baryon oscillations leave features in and , which help to improve the reconstruction accuracy [41].

To better understand the self-calibration process, we recast it as a mathematical problem to assign galaxies with photo-z labels into correct subsets (true-z bins), based on the cross correlation measurements. The condition tells us that these subsets do not overlap with each other in true redshift. The condition sets the correct order of these true-z bins, ranking from low to high redshifts, since only galaxies behind a lens can be lensed. Furthermore, the measured power spectra have different dependences on and (linearly on while quadratically on ). This implies the possibility to separate from .

However, careful readers may have already noticed a puzzling behavior. Eq. 3 & 5 are invariant under the scaling

 pk→i → f−1kpk→i , Cgg,Rkk → f2kCgg,Rkk , (6) CGg,Rkm → fkfmCGg,Rkm ,

where are some arbitrary constants. This seems to imply that we are not able to determine without knowing and . However, this is not a real degeneracy. The reason is that we have the conditions . These constraints uniquely fix the freedom of . In the Fisher matrix analysis carried out in the next section, we enforce the conditions by explictly setting .

The above solution to this puzzling problem also leads to the solution of another question. Can we choose more true-z bins than photo-z bins in the self-calibration technique? The answer is no. In such case, the number of unknown constants is larger than the number of constraints . Thus we are not able to uniquely fix and thus .

The absence of degeneracy in the self-calibration is, in the end, confirmed by the stability of our Fisher matrix inversion (below) for all the configurations that we have checked, including both the coarse bins and fine bins, various and . 999To be more strict, the stability of the Fisher matrix inversion means that the solution is a local maximum in the likelihood space, since the Fisher matrix is based on the Taylor expansion around the given solution. For the solution to be unique, in principle we have to go through the whole likelihood space and prove that it is the global maximum. This work is beyond the scope of this paper.

We thus believe that our photo-z self-calibration does work. It does not rely on any assumption of the underlying photo-z PDF. Furthermore, it does not rely on cosmological priors, since all cosmology-dependent quantities (e.g. the galaxy-galaxy and lensing-galaxy power spectra) are self-calibrated simultaneously. So the reconstructed is independent of uncertainties in cosmology. In the next section, we will quantify the reconstruction error and show that the proposed self-calibration is indeed powerful. For these reason, it can be and should be applied to ongoing and proposed weak lensing surveys such as CFHTLS, DES, LSST, JDEM and Euclid.

Before quantifying the reconstruction accuracy, we want to address a fundamental limitation of this self-calibration technique. It is designed to diagnose scatters between redshift bins. It is thus completely blind to photo-z errors which do not cause such scatter. Any one-to-one mapping between photo-z and true-z preserves and cannot be discriminated using galaxy-galaxy correlations. Some such photo-z errors can cause , so we still have some discriminating power left. If, however, the mapping between photo-z and true-z is monotonically increasing, then we have and , and our self-calibration technique completely lacks the capability to detect such a photo-z error. One simple example is , where is a constant. We must rely on spectroscopic redshift measurements to diagnose such errors. In other words, our method can determine only the scattering matrix of photo-z’s, and is insensitive to any recalibration of the mean photo-z’s. In this paper, we assume no such mean photo-z error exists. This is certainly a crucial point of further investigation.

### 2.4 Error estimation

We derive the likelihood function and adopt the Fisher matrix formalism to estimate the capability of our self-calibration technique. The details are presented in the appendices A & B. We want to highlight that the error estimation here is distinctly different from that in routine exercises of cosmological parameter constraints and that in Schneider et al. [32]. In these cases, the theory predicts the ensemble average power spectra, which are then compared to the data. An inevitable consequence of any power-spectrum determination is uncertainty due to cosmic (sample) variance. But in our self-calibration, the cosmic variance does not work in this way, because the are fitted parameters, not theoretical predictions. What enters into the key equations 3 and 5 is not the ensemble average of the power spectra, but the actual values in the observed cosmic volume, i.e. they are the sums of their ensemble averages and cosmic variances within the observed volume. Galaxies in the same true-z bin (but different photo-z bins) share the same cosmic volume, thus their power spectra share the same sample variance (however, see §4.1 for complexities), as do their cross power spectra with the matter. Such coherence has been pointed out by Pen [29] and has been applied to improve the weak lensing measurement [29] and primordial non-Gaussianity measurement through two-point galaxy clustering [34]. Furthermore, we do not rely on a cosmological theory to predict these power spectra. Instead, our self-calibration reconstructs the actual power spectra in the observed survey volume. For this reason, the only relevant source of noise in writing down the likelihood function is the shot noise.121212We also want to address that this does not mean that the influence of cosmic variance vanishes magically in the Fisher matrix error forecast. In fact, it enters the fiducial power spectra, since the fiducial ones should be those measured in a given cosmic volume instead of the ensemble average. To carry out the Fisher matrix analysis more robustly, we need to generate many realizations of the fiducial power spectra, do the Fisher matrix analysis, and weigh the error forecast according to the probability of each realization of the fiducial power spectra to find out the final answer. The good thing is that the cosmic variance of each power spectrum is usually much smaller than the ensemble average (), given the large sky coverage of the fiducial stage IV lensing survey. The reconstruction error in our self-calibration is not sensitive to such small fluctuations in the fiducial power spectra. Thus, we are safe to skip the full process and just use the ensemble average as the fiducial power spectra. For forecasts of surveys with much smaller sky coverage, such as CFHTLS, we may need to go through the full process. Furthermore, if we want to infer cosmology from the reconstructed power spectra, cosmic variance definitely enters.

This point is of crucial importance for our error analysis. (1) It allows us to derive the likelihood function robustly in essentially all range. It is simply Gaussian, thanks to the central limit theorem and the stochasticity of shot noise. This is even true for the high- regime, where the underlying density fields are highly nonlinear and non-Gaussian, but the shot noise remains Gaussian over many independent modes. (2) Since we do not rely upon any theoretical model for the power spectra, we do not need a theory capable of predictions at small scales where non-linear and baryonic physics are important. (3) For these reasons, we do not need to disregard those measurements in highly nonlinear regime, as Schneider et al. [32] did. The inclusion of these measurements significantly improves the reconstruction, especially for low redshift bins. This explains much of the difference between the reconstruction errors of this paper from that of Schneider et al. [32], using the galaxy clustering measurement alone. The high- limit of this analysis will ultimately depend upon other limitations such as the applicability of the weak-lensing approximation at small scales, which can in principle render the adopted fiducial power spectra unrealistic and thus the error forecast unrealistic. However, in our exercise, we expect and have numerically confirmed that the contribution of high- is highly suppressed by the shot noise (as can be seen from Fig. 2), so the reconstruction accuracy is not sensitive to the high- limit. Throughout this paper, the results shown are based on the choice of . (4) It significantly simplifies the matrix inversion and improves the numerical accuracy. For and , the Fisher matrix to invert is several thousand by several thousand. However, a dominant portion of this Fisher matrix is block diagonal, due to the shot noise feature of the error sources. This allows us to significantly reduce the work on matrix inversion. The detail is explained in the appendix.

We show the error forecast in Figs. 3, 4, 5 & 6. For most bins at , can be reconstructed to accuracy , for either coarse or fine bins. The improvement by adding the shear-galaxy measurement is often better than , up to a factor of a few. We can also compress errors in into a single number , the statistical error in the mean true redshift of each photo-z bin. by no means captures all information of the reconstruction error, but it is a convenient reference. The result of for coarse bins is shown in Fig. 4 and that for fine bins is shown in Fig. 5. For the coarse bins, it can reach for . The improvement by adding the lensing-galaxy measurement is a factor of at , mainly through the improvement in constraining scatters from high redshift bins. for fine bins is larger (Fig. 5). However, these errors are tightly anti-correlated, as can be inferred from Fig. 5 and 6. This is the reason we see big improvement when choosing bigger bin size.

We present an order of magnitude estimation to understand these numbers of the reconstruction error. For example, many can be determined within the accuracy of . We take as an example. The scatter causes . Ignoring other scatters, the threshold of is roughly set when the accumulated signal-to-noise of the measurement is 1,

 (SN)212∼∑ℓ(Cgg,P12σgg12)2=p22→1∑ℓ(Cgg,R22σgg12)2=1 . (7)

Here, is the associated shot noise power spectrum. We find that the threshold inferred from the above equation is even smaller than , the statistical error in (Fig. 3). This is indeed expected. Due to simplifications made in the derivation, mainly the neglect of error propagation from other and , the threshold of obtained from the above approximation is certainly a lower limit of .

From similar arguments, we also expect that the statistical errors for are large for those high redshift bins (e.g. ), because the number of galaxies in these high redshift bins is small and thus the shot noise is large. However, as explained early, is not the only source of information for the reconstruction. can also play important role. For example, the reconstruction of can reach an accuracy of . The reason is that the scatter causes . The combined G-g measurements have even for . The measurement also contributes, but since the number of galaxies in the -th bin is only 1% of total galaxies, the associated shot noise is large. So its contribution is overwhelmed by that from (). This explains the factor of 10 improvement in the reconstruction when adding the G-g measurements. Finally, we caution that, although we are able to qualitatively explain some results of Fig. 3, the error prorogation is complicated and the above estimation only serves as a convenient tool to understand Fig. 3.

, the errors in , are correlated, and the correlations show rich structures. To better demonstrate these features, we adopt the fine redshift bins, in total. We define the cross correlation coefficient between , as . The resulting is shown in Fig. 6. Strong positive and negative correlations exist for errors between many scatters into the same photo-z bins ( and , regions around the diagonal of Fig. 6). These scatters are coupled since both reduce ( and ) or ( and ). Scatters and are coupled, too, since they contribute to . This explains some strong (both positive and negative) correlations of the off-diagonal elements.

## 3 Extra sources of statistical errors

Our self-calibration technique does not rely on priors on cosmology or photo-z distribution. In this sense, it is robust. However, there are still several sources of systematic error. Some of them, if handled properly, can be rendered into statistical errors, without resorting to external information, and will not bias our reconstruction of . We will discuss them in this section. The remaining of them can not be incorporated into the self-calibration without strong priors and will be discussed in §4.

We find that galaxy intrinsic alignment (§3.1), the magnification and size bias (§3.2) and the intrinsic cross correlation between different galaxy bins (§3.3), can in principle be incorporated into our self-calibration technique and thus do not bias the reconstruction. Furthermore, we argue that the inclusion of these complexities is unlikely to significantly degrade the accuracy of our self-calibration technique.

### 3.1 The intrinsic alignment

Surprisingly, galaxy intrinsic alignments do not bias the reconstructed through our self-calibration technique for , although they definitely bias the inferred values. With the presence of the intrinsic alignment , Eq. 5 becomes

 CGg,Pij = ∑k>mpk→ipm→jCGg,Rkm (8) + ∑kpk→ipk→j[CIg,Rkk+CGg,Rkk] .

Since we do not make any assumption on , our self-calibration technique automatically takes the intrinsic alignment into account and measures the sum of and . Clearly, it does not bias the reconstruction of . On the other hand, it certainly affects the statistical accuracy of the reconstruction of . Unless , its existence does not affect the error forecast significantly.

If the intrinsic alignment depends upon galaxy properties, it is possible that this term will differ for outlier galaxies than for those correctly assigned to photo-z bin . In this case a bias in may result. This behavior is similar to the systematics from variation of that are discussed in more detail in §4.1.

### 3.2 Magnification and size bias

In reality, the measured galaxy distribution is the one lensed by foreground matter distribution. The measured galaxy over-density then has extra contribution from the lensing. Besides the well known magnification bias due to the lensing magnification on galaxy flux, there is also a size bias due to the lensing magnification on galaxy size [16, 31]. Both can be incorporated into a function , determined by the flux and size distribution of galaxies in the given redshift bin. The lensed galaxy over-density then takes the form .

The existence of this extra term induces non-vanishing and . If not taken into account, it will certainly bias the reconstruction. The good thing is, at least in principle, the same weak lensing surveys contain the right information to correct for this effect. Given a lensing survey, we are able to split galaxies into bins of flux and size. Since the prefactor is determined by the flux and size distribution and is a measurable quantity, we are able to separate its effect from others. Or, alternatively, we can design an estimator such that , averaged over all flux and size bins. The price to pay is the statistical accuracy of the correlation measurement. For galaxy clustering, the shot noise increases by a factor , with respect to the clustering signal. Here, is the galaxy bias. Robust modeling of this factor requires information on galaxies to high redshifts and faint luminosities. Furthermore, we need to evaluate the effect of measurement error on galaxy flux and size. None of these exercises are trivial, so we postpone such studies elsewhere. However, we argue qualitatively that the degradation in the statistical accuracy is not likely dramatic. Since is always positive and changes sign from the bright end down to the faint end of the galaxy luminosity function, we do not expect a large loss of statistical accuracy by such weighting. However the success of the self-calibration will depend upon the accuracy of methods to estimate (see also Bernstein & Huterer 6).

### 3.3 The intrinsic galaxy cross correlation between non-overlapping redshift bins

Under the Limber approximation, the galaxy cross correlation between non-overlapping redshift bins vanishes. However, the Limber approximation is not accurate. In reality, there is indeed a non-vanishing intrinsic galaxy cross correlation , especially at large scales. As correctly pointed out by Schneider et al. [32], this intrinsic cross correlation biases the reconstruction of . Eq. 3 should now be replaced by

 Cgg,Pij = ∑kpk→ipk→jCgg,Rkk+∑k≠mpk→ipm→jCgg,Rkm ≃ ∑kpk→ipk→jCgg,Rkk+∑k−m=±1pk→ipm→jCgg,Rkm .

In the last expression, we only keep the correlation between two adjacent redshift bins and neglect the correlations between non-adjacent redshift bins (). This approximation should be sufficiently accurate in practical applications. Thus even if no photo-z error is presented, there is still an intrinsic (real) correlation between two different (especially adjacent) redshift bins. If not accounted for, these non-zero will be mis-interpreted as a photo-z error and thus bias the reconstruction of .

We first attempt to quantify . Since we only need to evaluate at large scales where it is relevant, we can adopt the linear theory to calculate it through the following well known formula

 Cgg,Rij=∫∞0Δ2m(k,z=0)dkkQi(k,l)Qj(k,l) , (10)

where

 Qi(k,l)=∫zi+Δzi/2zi−Δzi/2D(z)bg(z)jl(kχ)n(z)dz∫zi+Δzi/2zi−Δzi/2n(z)dz . (11)

Unfortunately, since () oscillates around zero and positive and negative contributions to the integral largely cancel, the numerical integration is very sensitive to numerical errors and is thus highly unstable. Nevertheless, through the Monte Carlo numerical integral, we believe that, at , the cross power spectrum between redshift bins and falls below of the geometrical mean of the corresponding two auto correlation power spectra, confirming the findings of Schneider et al. [32]. More accurate evaluation of the intrinsic cross correlation may be performed in real space, where we can avoid the highly oscillating integrand encountered in the multipole space. This issue will be further investigated.

The bias induced is roughly or . Depending on the low cut, this bias may become comparable to the statistical accuracy of self-calibration method. However, there are several possible ways to eliminate or reduce this bias.

One way is to start with the last approximation of Eq. 3.3, treat as free parameters and fit them simultaneously with other parameters. This can eliminate virtually all the associated bias, with the expense of larger statistical errors. We can easily figure out that there are still many more measurements than unknowns, so this remedy is doable. Furthermore, the degeneracy between and is weak. For example, induced by the scattering has the property . On the other hand, the intrinsic cross correlation induced by the deviation from the Limber approximation decreases quickly with and thus decreases quickly with . These distinctive behaviors help to distinguish the intrinsic cross correlation from the one induced by photo-z errors. The characteristic behavior of allows us to take priors which are weak, while still helpful to discriminate between and . For example, we can set when and thus reduces the number of extra unknowns. Alternatively, we can model as a power law of decreasing power with respect to .

The inclusion of the lensing-galaxy cross correlation measurement also helps to break the degeneracy between and . The photo-z scatters induce both non-zero and . On the other hand, the failure of the Limber approximation does not cause , since the lensing kernel vanishes.

We then conclude that the intrinsic galaxy cross correlation between non-overlapping redshift bins may be non-negligible for stage IV lensing surveys. However, our self-calibration technique has the capability to take this complexity into account. Further investigation is required to quantify its influence on the self-calibration.

## 4 Possible systematics

There are some error sources which cannot be incorporated into our self-calibration technique without strong priors or without external information. Thus they will bias the reconstructed photo-z scatters. We discuss the influence of the galaxy distribution bias in §4.1 and the multiplicative error bias in cosmic shear measurement §4.2.

### 4.1 The galaxy distribution bias

A crucial assumption in the existing self-calibration technique is that those galaxies scattering out of the true redshift bin have the same spatial distribution as those remain in the true redshift bin. This implicit assumption can be inferred from Eq. 2. By straightforward math, we can find that in this equation is actually

 δΣ,Rk,i=∫kni(z)δg,i(z)dz∫kni(z)dz , (12)

where the integral is over the -th redshift bin, is the true redshift distribution of the -th photo z bin, and is the overdensity for galaxies in this photo-z bin. In Eq. 2, there is an implicit approximation .

This assumption is likely problematic. Those galaxies scattering from true redshift bin to photo-z bin could have either different redshift distribution () or different clustering () or both compared to galaxies correctly identified in photo-z bin . Furthermore, the difference in means that these subcategories of galaxies do not sample the cosmic volume with identical weighting (despite sharing the same true-z bin), so they do not share exactly the same cosmic variance and thus do not have the identical clustering pattern. We call all these complexities as the galaxy distribution bias.

This galaxy distribution bias certainly biases the reconstruction. For example, if a subcategory of galaxies that did not cluster at all were to scatter, a cross galaxy-galaxy correlation would not detect those, and result in the incorrect leakage reconstruction. Interestingly, even for this extreme case of galaxy distribution bias, the galaxy-lensing correlation brings hope. For this subcategory of galaxies that scattered but did not cluster at all, other galaxies apparently behind of them may still be able to lens them and cause a detectable spurious foreground shear-background galaxy correlation. This example further demonstrates the gain by adding the galaxy-lensing cross correlation measurements in the self-calibration.

Unfortunately, even with the aid of galaxy-lensing correlation measurement, the self-calibration still fails, if no priors are adopted. The galaxy distribution bias has not only a deterministic component, but also a stochastic component, i.e. the noise induced by the different cosmic (sample) variance realizations for different photo-z bins. We show that, even if we can neglect the stochasticity, the degrees of freedom in the galaxy distribution bias kill the self-calibration.

In this limit, the galaxy distribution bias can be completely described by the relative bias parameter , namely the ratio of the bias between those scattered into the -th redshift bin to those remaining in the -th redshift bin, weighted by the difference in . With the presence of the deterministic galaxy distribution bias, Eq. 3 and Eq. 5 become

 Cgg,Pij≃∑kpk→ipk→j~bk→i~bk→jCggkk , (13)
 CGg,Pij=∑k≥mpk→ipm→j~bm→jCGgkm . (14)

First of all, we notice a degeneracy in Eq. 13, of the form (). The same argument helps to break the scaling invariance of Eq. 2.3 does not apply here, simply due to many more free parameters involved here. The galaxy-lensing correlation measurements do help, since the scaling invariance in Eq. 13, (), does not hold in Eq. 14. Unfortunately, in general, are scale dependent and there are of them, which nearly triple the number of unknown parameters, making the number of unknowns larger than the number of independent measurements and thus ruining the self-calibration.

In reality, from the origins of the galaxy distribution bias, we expect that it is scale dependent and is unlikely deterministic. Thus, we are not able to render it as a statistical error. Instead, we will live with it and quantify the induced bias in the reconstructed .

If we were to neglect this (namely by assuming ), the reconstructed would have a bias . To robustly quantify the induced bias in , we need robust measurement or modeling of , which we lack. To proceed, we adopt a toy model, . The details of this calculation are shown in the appendix. The resulting bias in scales as . For the case of , the result is shown in Fig. 7, 8 & 9. We find that, for the most significant bias in , it indeed satisfies the relation . For those whose value is small, the dominant bias is induced by the propagation from other parameters and thus do not follow this relation.

Depending on the actual amplitude of this galaxy distribution bias, this may be the dominant systematic error. It may also be non-negligible, or even dominant, comparing to the statistical errors in the reconstruction (Fig. 7, 8 & 9). There are possible ways to reduce it. By choosing finer bin size, we can reduce the galaxy distribution bias caused by the difference in , at the expense of more and parameters to constrain. With high-quality imaging or photometry we could further split galaxies into sub-samples of morphological or spectral types to reduce the difference in clustering strength. If eventually we can reach , the galaxy distribution bias will not be catastrophic, but still significant (Fig. 7).

We caution that this galaxy distribution bias also exists in the calibration technique based on cross correlations between photo-z and spec-z samples [26, 6]. In principle, direct spec-z sampling of the photo-z galaxies [6] allows for direct measurement of the galaxy distribution bias including its stochasticity. Since the galaxy distribution bias has its own sample variance, the spec-z sampling must be sufficiently wide in sky coverage, deep in redshift and reach high completeness. These are crucial issues for further investigation.

### 4.2 The multiplicative error bias

Due to incomplete PSF correction, shear measurement can have multiplicative errors and additive errors.The additive errors do not bias the self-calibration results, since they do not correlate with galaxies. However, the multiplicative errors, which renders to , can. If is the same for those galaxies whose photo-z remains in the true-z bin and those galaxies scatter out of the true-z bin, it does not induce bias in the reconstruction. However, in principle, these galaxies could have different multiplicative error. The multiplicative error could for example depend on the size of galaxies. If the photo-z error depends on some intrinsic properties of galaxies, which correlate with the galaxy size, then the multiplicative error would vary across different photo-z samples with the same true redshift. In this case, Eq. 5 and 14 no longer hold. Eq. 14 should be replaced by

 CGg,Pij=∑k≥mpk→ipm→j(1+Δfk→i)~bm→j~CGg,Rkm . (15)

Here, the parameter describes the relative difference in the multiplicative errors of the two galaxy samples in the -th true redshift bin (one scatters to the -th photo-z bin and the other remains in the -th photo-z bin). . Clearly, if , it does not bias the reconstruction, since we just need to redefine . If , the bias induced in is . Current shape measurement algorithms control to levels with potentially larger redshift or size dependence [25]; but cosmic-shear analysis of future large surveys will require if induced systematics are to be subdominant to statistical errors [15, 2]. Anticipating future progress in shape measurement errors, we can expect an induced error , which is sub-dominant to the one induced by the galaxy distribution bias and is thus likely negligible. However, in case that the shape measurement errors fail to reach the required accuracy, the bias induced by the relative multiplicative error must be taken into account carefully.

## 5 Dependence on the fiducial model

The error forecast depends on the fiducial model we adopt, including the galaxy properties and the survey specifications. A thorough analysis over all uncertainties in the fiducial model is beyond the scope of the current paper. Instead, we will present brief discussions on several key issues.

### 5.1 Dependence on the galaxy clustering properties

In the above analysis, we have made a number of simplifications. (1) We have adopted a scale independent and redshift independent galaxy bias for the error forecast. In reality, the galaxy bias is both redshift and scale dependent. As we have seen, the reconstruction does not require assumptions on the actual cosmology or clustering properties, so a variable bias is much like changing the background cosmology, which in principle is independent of the inferred scattering. (2) All the power spectra in Eq. 3, 5, etc. are the ones in the observed cosmic volume. Due to the cosmic variance, they can differ from the ensemble averages that we adopt for the fiducial power spectra. As explained before, these uncertainties also affect the error forecast. Eventually we will apply this self-calibration technique to real data and thus will completely avoid the ambiguity in the fiducial model.

Here we will address the impact of scale dependent bias. As we have mentioned in §2, the self-calibration with only galaxy-galaxy clustering heavily relies on the shape differences between different . If the 3D galaxy clustering is close to power-law over a large scale range with redshift independent power index, the resulting are close to self-similar and thus the self-calibration with only galaxy-galaxy clustering will degrade significantly. The question is, can the full self-calibration, with the aid from galaxy-lensing cross correlation measurement, avoid this potential degradation?

To investigate this issue, we adopt an ad-hoc model for galaxy clustering, in which the galaxy bias is scale dependent such that 3D galaxy power spectrum (variance) takes the form

 Δ2g(k,z)=101.25k1.85(1+z)−2  when  k

and when . If Mpc, at and Mpc Mpc, it is close to the matter power spectrum (variance) and thus close to the galaxy clustering with . But it shows significant deviation from the matter power spectrum at other and other redshifts. In the limit that , this galaxy power spectrum becoms a strict power-law and we expect that the self-calibration based on galaxy clustering alone fails. Numerically, we find that when Mpc, the Fisher matrix inversion based on galaxy clustering alone blows up, indicating its failure.

Fig. 10 shows the degradation when increasing . Despite better galaxy clustering measurement due to stronger clustering strength, the self-calibration accuracy degrades, since the galaxy power spectrum is closer to a strict power-law and the degeneracy between up and down photo-z scatters becomes more severe.

We expect that the full self-calibration is basically free of this problem, since it mainly relies on the lensing geometry dependence to break the degeneracy between up and down scatters—namely, a lens can only lens a galaxy behind it. This is indeed what we find numerically. To demonstrate this point, we assume the galaxy bias with respect to the matter density to be deterministic, thus the 3D matter-galaxy cross correlation power spectrum (variance) is given by

 Δ2mg(k,z)=√Δ2g(k,z)×Δ2m(k,z) . (17)

This quantity determines the galaxy-galaxy lensing power spectrum. The accuracy of the full self-calibration is shown in Fig. 10. The reconstruction accuracy only degrades slightly even when the galaxy power spectrum is very close to a strict power-law (e.g. Mpc).

This extreme example confirms our expectation that scale dependence of the galaxy bias is unlikely to alter the major conclusions of this paper, namely the feasibility of our self-calibration technique.

### 5.2 Dependence on other specifications

The scaling of reconstruction accuracy with fiducial quantities can be roughly understood as follows. The observed density-density correlations and the shear-density correlations , where is the matter power spectrum (variance). These are the signals. For the noises in the correlation measurements, we have shown that shot noises in shear and galaxy number density measurements are the only relevant ones, which scale as and respectively.

Then if relying on galaxy-galaxy lensing measurement alone, the reconstruction error . In combination with galaxy-galaxy clustering measurement, the dependence becomes weaker and we expect that , where . If we adopt a fiducial value instead of , the reconstruction accuracy will degrade by a factor of .

For fixed galaxy distribution, matter power spectrum , and , following similar argument above, we find that

 σp∝f−1/2sky¯n−1gb−dgγerms . (18)

Here, the bias dependence and, as a reminder, .

It is interesting to quantify the performance of the self-calibration technique for surveys like CFHTLS and DES. Based on the above scalings, we are able to do an order of magnitude estimation. (1) One of the major differences between these surveys and the fiducial stage IV survey, is the sky coverage. From the dependence alone, for CFHTLS, we expect a factor of 10 larger reconstruction errors, since the sky coverage is a factor of 100 smaller. For DES, we expect the degradation to be a factor of 2. Since the statistical accuracy of these surveys scales exactly the same way with respect to the sky coverage , the self-calibration technique works equally fine for these surveys, from this viewpoint. (2) Another difference is that the number density of source galaxies in CFHTLS and DES is likely a factor of 2 smaller. This results in another factor of 2 degradation in the reconstruction accuracy. Since the lensing measurement at is not completely shot noise dominated, the statistical accuracy has weaker dependence on . From this viewpoint, the self-calibration technique works better for surveys with higher galaxy number density.

We caution that the above estimation neglects many complexities. For example, CFHTLS, DES and Pan-STARRS are shallower than the fiducial survey and thus the galaxy number densities at high redshift in these surveys are likely much smaller, while the galaxy number densities at low redshift are comparable. This implies that the reconstruction in these surveys at high-z is more affected that at low-z. Furthermore, the above estimation neglects the difference in the photo-z error, which is likely considerably worse for CFHTLS and DES. It will definitely affect the reconstruction. However, as we discussed in §2.4, the detection threshold of contamination rate () is mainly determined by the ratio of noise and signal of corresponding true-z bins. In this sense, the self-calibration technique works better for worse photo-z estimation.

For a robust forecast of the performance of our self-calibration technique in each specific survey, we need a detail fiducial model of the galaxy distribution, clustering and photo-z error distribution. Although this task is beyond the scope of this paper, the above estimates imply that the self-calibration technique will be applicable.

## 6 Discussions

It is possible to further improve the statistical accuracy of the self-calibration technique. For example, so far we treat and as independent quantities. Improvement can be made by utilizing their internal connection. Both of them are determined by the same mass-galaxy cross correlation over the same redshift range. For this reason, they are connected by a simple scaling relation (in the absence of intrinsic alignments), as pointed out by Jain & Taylor [17], Zhang et al. [44], Bernstein [4].

In weak lensing cosmology, people often disregard the lensing power spectrum measurement at -, because theoretical prediction at such scale is largely uncertain. However, the shear-shear measurement at such scales contains useful information to improve the photo-z reconstruction as well as shear-ratio information that is useful for cosmology. For example, they allow for better handle over the magnification and size bias induced correlations, mainly in the background density-foreground shear correlation measurement. We could use this information usually disregarded in cosmological applications to improve the photo-z calibration.

So far we only focus on the reconstructed . The reconstructed and contain valuable information on cosmology and can be further explored. Since these reconstructed power spectra do not suffer from the problem of the photo-z scatters, they can be utilized for purpose beyond cosmology. For example, the self-calibration of galaxy intrinsic alignment, proposed by Zhang [45], relies on the measurement of to infer the GI correlation contaminated the weak lensing measurement. This quantity is contaminated by the photo-z scatters. Since we are now able to quantify the photo-z scatters and simultaneously, we are able to quantify and correct for the effect of photo-z scatters in the self-calibration of galaxy intrinsic alignment.

One key issue missing in this paper is to propagate the errors or biases on the reconstructed into errors on cosmology inferred from shear-shear data. This will tell us whether the errors in the self-calibration are small enough to avoid significant biases or inflated errors in a shear-shear cosmology measurement. Although many of the ’s are uncertain by more than the that is needed to make the biases negligible [6], there are many correlations between these errors which complicate the estimation. This issue definitely deserves further investigation.

We emphasize that our purely photometric self-calibration technique is complementary to those based on cross correlations between photo-z and spec-z samples [26, 5], which we call cross-calibration. An advantage of cross-calibration is that it can identify a special type of photo-z error, namely when the mean photo-z is a monotonic increasing function of true-z (other than the correct identity function). As explained earlier, the self-calibration technique fails completely for such type of photo-z error. The cross-calibration technique will also likely be more able to infer galaxy distribution bias caused when photo-z outliers have a different than other galaxies in the same true-z bin. On the other hand, the self-calibration has a number of advantage over the cross correlation. Since the total number of photo-z galaxies is much larger than that of the spec-z sample, it can reach higher statistical accuracy. Since what it measures is the photo-z scatters in the whole survey volume, it avoids possible cosmic variance in the photo-z scatters, which could bias the cross-calibration. And since the spec-z targets may be a very different population from the photo-z galaxies, the cross-correlation method will be more susceptible to biases from varying among subpopulations.

The photo-z scatter self-calibration method described here has many attractive aspects: it does not depend on any cosmological priors or on models of the power spectrum; it is unaffected by intrinsic alignments; its errors are determined by shot noise, not sample variance, so that higher source densities are exploited if observed; it remains Gaussian and tractable to high ; and is not significantly affected by shear measurement errors. And of course it can be conducted with the same imaging data used for the shear-shear correlation measurement without degrading the shear-shear information content. The Fisher analysis suggests that photo-z outlier rates can be determined with statistical errors of 0.01–1% for bins at . It will be necessary to correct data for lensing magnification bias, and in principle this can be done with little statistical penalty, but the magnification bias factor must be determined to sufficient accuracy. The biggest issue is “galaxy distribution bias,” whereby photo-z outlier galaxies might have different or bias than the non-outlying galaxies in the same true-z bin. We find that the effective bias of galaxies must vary by in order to avoid systematic errors in scattering rates that exceed the expected statistical errors. This is an area deserving of more detailed attention.

In this paper, we have presented a concept study of the proposed self-calibration. For the idealized survey of stage IV survey specifications, we have shown that it can in principle reconstruct the photo-z error distribution to useful precision. More robust forecast shall include all extra sources of error, as listed in §3 and §4, more realistic fiducial model (§5), and possibly more uncertainties, into account.

## 7 Acknowledgment

We thank Hu Zhan, Sarah Bridle and Jun Pan for many useful discussions and the anonymous referee for many useful suggestions. PJZ thanks the hospitality of the UPenn physics and astronomy department and the Aspen center for physics, where part of the work was done. PJZ thanks the support of the one-hundred talents program of the Chinese academy of science, the national science foundation of China (grant No. 10533030, 10543004, 10821302 & 10973027), the CAS grant KJCX3-SYW-N2 and the 973 program grant No. 2007CB815401. GMB acknowledges support from grant AST-0607667 from the National Science Foundation and Department of Energy grant DOE-DE-FG02-95ER40893.

## References

• Albrecht et al. [2006] Report of Dark Energy Task Force. Andreas Albrecht, et al. arXiv:astro-ph/0609591
• Amara & Réfrégier [2008] Amara, A., & Réfrégier, A. 2008, MNRAS, 391, 228
• Bacon et al. [2000] Bacon, D. J., Refregier, A. R., & Ellis, R. S. 2000, MNRAS, 318, 625
• Bernstein [2006] Bernstein, G. 2006, ApJ, 637, 598
• Bernstein [2009] Bernstein, G. M. 2009, ApJ, 695, 652
• Bernstein & Huterer [2009] Bernstein, G., & Huterer, D. 2009, arXiv:0902.2782
• Fu et al. [2008] Fu, L., et al. 2008, AAP, 479, 9
• Heitmann et al. [2005] Heitmann, K., Ricker, P. M., Warren, M. S., & Habib, S. 2005, ApJS, 160, 28
• Heitmann et al. [2008] Heitmann, K., White, M., Wagner, C., Habib, S., & Higdon, D. 2008, arXiv:0812.1052
• Heitmann et al. [2009] Heitmann, K., Higdon, D., White, M., Habib, S., Williams, B. J., & Wagner, C. 2009, arXiv:0902.0429
• Heymans et al. [2006] Heymans, C., et al. 2006, MNRAS, 368, 1323
• Hirata et al. [2004] Hirata, C. M., et al. 2004, MNRAS, 353, 529 PRD, 70, 063526
• Hirata et al. [2007] Hirata, C. M., Mandelbaum, R., Ishak, M., Seljak, U., Nichol, R., Pimbblet, K. A., Ross, N. P., & Wake, D. 2007, MNRAS, 381, 1197
• Huterer & Takada [2005] Huterer, D., & Takada, M. 2005, Astroparticle Physics, 23, 369
• Huterer et al. [2006] Huterer, D., Takada, M., Bernstein, G., & Jain, B. 2006, MNRAS, 366, 101
• Jain [2002] Jain, B. 2002, ApJ, 580, L3
• Jain & Taylor [2003] Jain, B., & Taylor, A. 2003, Physical Review Letters, 91, 141302
• Jain & Zhang [2008] Jain, B., & Zhang, P. 2008, PRD, 78, 063503
• Jing et al. [2006] Jing, Y. P., Zhang, P., Lin, W. P., Gao, L., & Springel, V. 2006, ApJ, 640, L119
• Joachimi & Bridle [2009] Joachimi, B., & Bridle, S. L. 2009, arXiv:0911.2454
• Jouvel et al. [2009] Jouvel, S., et al. 2009, arXiv:0902.0625
• Kaiser et al. [2000] Kaiser, N., Wilson, G., & Luppino, G. A. 2000, arXiv:astro-ph/0003338
• Ma, Hu, & Huterer [2006] Ma, Z., Hu, W., & Huterer, D. 2006, ApJ, 636, 21
• Mandelbaum et al. [2006] Mandelbaum, R., Hirata, C. M., Ishak, M., Seljak, U., & Brinkmann, J. 2006, MNRAS, 367, 611
• Massey et al. [2007] Massey, R., et al. 2007, MNRAS, 376, 13
• Newman [2008] Newman, J. A. 2008, ApJ, 684, 88
• Okumura et al. [2009] Okumura, T., Jing, Y. P., & Li, C. 2009, ApJ, 694, 214
• Okumura & Jing [2009] Okumura, T., & Jing, Y. P. 2009, ApJ, 694, L83
• Pen [2004] Pen, U.-L. 2004, MNRAS, 350, 1445
• Rudd et al. [2008] Rudd, D. H., Zentner, A. R., & Kravtsov, A. V. 2008, ApJ, 672, 19
• Schmidt et al. [2009] Schmidt, F., Rozo, E., Dodelson, S., Hui, L., & Sheldon, E. 2009, arXiv:0904.4702
• Schneider et al. [2006] Schneider, M., Knox, L., Zhan, H., & Connolly, A. 2006, ApJ, 651, 14
• Seljak & Zaldarriaga [1996] Seljak, U., & Zaldarriaga, M. 1996, ApJ, 469, 437
• Seljak [2009] Seljak, U. 2009, Physical Review Letters, 102, 021302
• Smith et al. [2003] Smith, R. E., et al. 2003, MNRAS, 341, 1311
• Van Waerbeke et al. [2000] Van Waerbeke, L., et al. 2000, AAP, 358, 30
• White [2004] White, M. 2004, Astroparticle Physics, 22, 211
• Wittman et al. [2000] Wittman, D. M., Tyson, J. A., Kirkman, D., Dell’Antonio, I., & Bernstein, G. 2000, Nature, 405, 143
• Zehavi et al. [2004] Zehavi, I., et al. 2004, ApJ, 608, 16
• Zhan & Knox [2004] Zhan, H., & Knox, L. 2004, ApJ, 616, L75
• Zhan & Knox [2006] Zhan, H., & Knox, L. 2006, ApJ, 644, 663
• Zhan [2006] Zhan, H. 2006, Journal of Cosmology and Astro-Particle Physics, 8, 8
• Zhan & Knox [2006] Zhan, H., & Knox, L. 2006, arXiv:astro-ph/0611159
• Zhang et al. [2005] Zhang, J., Hui, L., & Stebbins, A. 2005, ApJ, 635, 806
• Zhang [2008] Zhang, P. 2008, arXiv:0811.0613

## Appendix A The likelihood analysis and the Fisher matrix layout

We work on the likelihood of the band angular power spectra to quantify the self-calibration error. As explained in §2, the error estimation here is distinctly different from that in the routine exercises of cosmological parameter constraints and the only relevant noise term is the shot noise. This makes the error estimation much simpler. Here we recast this argument in a more formal way.

The data we have are a set of measured power spectra contaminated by shot noise, . Here, the superscript “D” denotes the data. is the fluctuation in shot noise. are the power spectra in photo-z bins, which are related to by a set of through Eq. 3 & 5. We want to know the likelihood function . Since the distribution of is completely determined by