How to coadd images? I. Optimal source detection and photometry using ensembles of images
Abstract
Stacks of digital astronomical images are combined in order to increase image depth. The variable seeing conditions, sky background and transparency of groundbased observations make the coaddition process nontrivial. We present image coaddition methods optimized for source detection and flux measurement, that maximize the signaltonoise ratio (). We show that for these purposes the best way to combine images is to apply a matched filter to each image using its own point spread function (PSF) and only then to sum the images with the appropriate weights. Methods that either match filter after coaddition, or perform PSF homogenization prior to coaddition will result in loss of sensitivity. We argue that our method provides an increase of between a few and 25 percent in the survey speed of deep groundbased imaging surveys compared with weighted coaddition techniques. We demonstrate this claim using simulated data as well as data from the Palomar Transient Factory data release 2. We present a variant of this coaddition method which is optimal for PSF or aperture photometry. We also provide an analytic formula for calculating the for PSF photometry on single or multiple observations. In the next paper in this series we present a method for image coaddition in the limit of backgrounddominated noise which is optimal for any statistical test or measurement on the constantintime image (e.g., source detection, shape or flux measurement or stargalaxy separation), making the original data redundant. We provide an implementation of this algorithm in MATLAB.
Subject headings:
techniques: image processing — techniques: photometric1. Introduction
Coaddition of observations is one of the most basic operations on astronomical images. The main objectives of coaddition are to increase the sensitivity (depth) of observations, to remove signal from high energy particle hits on the detector (sometimes called cosmic rays), and to decrease the point spread function (PSF) width (e.g., deconvolution from wavefront sensing, Primot et al., 1990; lucky imaging, Law et al., 2006). One question that is common for all these tasks is what is the best method to coadd the images. This of course requires a definition of what ”best” means.
Here, we assume that all observations are registered and resampled to the same grid (e.g., using Swarp; Bertin et al., 2002), and focus on the combination operation of the images. There are several common complications for image coaddition. Groundbased images are often taken under variable seeing, background and transparency conditions. Even some spacebased observations may suffer from such problems. For example, Xray images commonly have nonuniform PSF accross the field of view. These complications make the coaddition operation less trivial than simple (weighted) scalar addition.
Coaddition is playing, and will play, a major role in ongoing and future surveys (e.g., SDSS, York et al., 2000; PanSTARRS, Kaiser et al., 2002; PTF, Law et al., 2009, DES, The Dark Energy Survey Collaboration, 2005; LSST, Ivezic et al., 2008). There are many approaches for image coaddition after the registration step. One prestep common to many methods is to reject some of the images with the worst seeing, background or transparency. There is no sensible prescription to which images one should reject, and it is straightforward to show that this approach leads to a loss of sensitivity compared with the maximum possible. One extreme example for this process is lucky imaging (Law et al., 2006), where one often discards up to 99% of the data frames. The next coaddition steps involves either weighted summation of the images, or applying partial filtering (or deconvolution) to each image in order to homogenize the PSFs of all the images followed by coaddition.
For example, Annis et al. (2014) recommended coadding the SDSS Stripe 82 images by weighted summation of the images, where the weights are of the form:
(1) 
and , , and . Here is the weight applied to the image , is proportional to the product of the telescope effective area, detector sensitivity and atmospheric transparency (i.e., fluxbased photometric zero point), is the width of the PSF, and is the variance of all the background noise sources (e.g., background, readout noise). On the other hand Jiang et al. (2014) adopted a similar formula but with , , . Another example for such a weighted summation was given by Fischer & Kochanski (1994) who solved a numerical optimization scheme for the optimal weights in which the images should be added with.
Other approaches are also used. For example, some authors perform PSF homogenization on all the images prior to coaddition. This is done by convolving the frames with some kernel in order to bring all the observations to have the same PSF. For example, Darnell et al. (2009) and Sevilla et al. (2011) define a median seeing PSF, and then for each image find the convolution kernel that will transform the image (via convolution) to a medianseeing image, applying for all images, and then sum the images. This convolution kernel can be found using techniques outlined in Phillips & Davis (1995), Alard & Lupton (1998) and Bramich (2008). However, for images with seeing which is worse than the target seeing, the PSF homogenization operation becomes a deconvolution operation, which amplifies the noise in high spatial frequencies and creates long range correlations in the noise. Another issue of the PSF homogenization approach is the artificial insertion of correlations between neighboring pixels. As a result, any measurement on the data becomes more involved. For example, applying a naive matched filter with the coaddimage PSF will result in an additional loss of information.
In a series of several papers we will tackle the problem of optimal image coaddition. In this paper (paper I), we will lay down the statistical formalism and assumptions we use for the coaddition problem. We construct a coaddition method that achieves the maximum possible signaltonoise ratio () for detection of nonblended faint pointlike sources in the coadded image under the assumption of constant variance Gaussian noise^{1}^{1}1The constant noise requirement can be relaxed as one can solve the problem locally to a region in which the noise variance is roughly constant.. In addition, we present a coaddition method that is optimal for photometric measurements of both faint and bright objects (i.e., even for sources in which the source noise is nonnegligible). These coaddition products are the analogues of linear matched filter for ensemble of images. We note that this coaddition method, optimized for source detection, was used by Masci & Fowler (2009) to reduce the WISE satellite data. However, in order to recover the sharpness of the original images they applied deconvolution after the coaddition process.
It is noteworthy that linear matched filter images are smeared and their pixels are correlated. Therefore, general hypothesis testing and statistical measurements on them requires knowledge and consideration of the spatial covariance matrix. This makes them nonattractive for visualization and non intuitive for performing measurements other than source detection and flux measurement (e.g., resolving binary stars or measuring galaxy shapes).
In Paper II (Zackay & Ofek, 2015) we build on the results of this paper to construct a coaddition method for the special (but common) case of backgrounddominated noise. Equipped with the backgrounddominated noise assumption we find a simple transformation that removes the pixel correlation induced by the matchedfiltering step, and makes the additive noise in all pixels uncorrelated and with equal variance. Most importantly, this technique provides a sufficient statistic^{2}^{2}2A statistic is sufficient with respect to a model and its associated parameter if no other statistic that can be calculated from the same sample provides any additional information as to the value of the models parameter. for any statistical measurement or decision on the data (e.g., source detection, flux measurement, shape measurement). Moreover, this method conserves the information of all the spatial frequencies, and provides a PSF which is typically sharper than the PSF of the best image in the stack we coadd. In addition, this image is practically indistinguishable from a regular astronomical observation, and any image analysis code can be applied to it unchanged. Combining all these properties together, it allows a major reduction in the amount of data products users will need in order to perform any further processing on data (e.g., galaxy shape measurements for weak lensing). Last, this coaddition algorithm is fast and its implementation is trivial.
In future papers (Zackay et al., in prep.) we apply the algorithm from Paper II for the case of rapid imaging through the turbulent atmosphere and where the PSF is either known (through observing a reference star or using a wavefront sensor) or unknown. We show that this method potentially provides better results than the existing approaches for the coaddition of speckle images (e.g., speckle interferometry, Labeyrie, 1970; lucky imaging, Law et al., 2006, and its suggested improvements, e.g., Garrel et al., 2012).
The structure of this paper is as follows: In §2, we review the statistical formalism of image coaddition and refer to some basic lemmas regarding how to combine random variables. In §3, we discuss the optimal coaddition image for faint source detection. In §4 we describe how to obtain the optimal coaddition image for photometric measurements, not restricted to faint sources. In §5, we give a simple to use formula to calculate the expected signaltonoise ratio for source detection and PSF photometry for both single images and ensembles of images. In §6, we suggest possible variants of this method, that albeit suboptimal for source detection and photometry, are resilient to particle hits, and generalize the concept of median for images. In §7, we demonstrate our detection algorithm on simulated and real images and show it surpasses the sensitivity of current popular methods. In §8 we briefly describe the code we use. Finally, we summarize the important formulae and conclude in §9.
2. Statistical formalism and background
Denote the ’th measured image, in a series of backgroundsubtracted observations of the same position by . Further, denote by the point spread function (PSF) of the ’th observation.
The statistical model for the ’th image is given by:
(2) 
where denotes the true (background subtracted) sky image, is a scalar, representing the transparency (same as in §1), represents convolution, and the noise term has variance
(3) 
where is the variance of all the position independent noise – the sum of the sky background variance, the read noise variance and the dark current variance. For simplicity, we assume that the gain is 1 and that the counts are measured in electrons.
In the following subsections we derive several well known important results of signal detection using hypothesis testing. The optimal detection of an attenuated signal in the presence of varying noise is reviewed in §2.1. In §2.2 we derive a general rule for weighted addition of random variables, while in §2.3 we derive the well known linear matched filter solution for source detection.
2.1. Detection and measurement of an attenuated signal in the presence of varying noise
In this section we review the simple problem of how to detect or measure a signal, with a known template, in data given by an ensemble of observations. We strictly assume that the signal in all observations is attenuated linearly with known attenuation coefficients, and that the noise in all the observations is Gaussian. Albeit simple, the solution to this problem is the key towards constructing an optimal algorithm for image coaddition.
Let be a set of independent Gaussian variables with variance
(4) 
Given the null hypothesis, which states that there is no signal, and is denoted by , we assume:
(5) 
where denotes the expectancy of the variable given that the hypothesis is true. Given the alternative hypothesis, that states there is a signal, and is denoted by , we assume:
(6) 
Here are known scaling factors of each observation. The loglikelihood ratio test statistic (which is proven to be the most powerful test at any given size, and therefore optimal. See Neyman & Pearson, 1933), is simply given by:
(7) 
For detection purposes, i.e., to reject the null hypothesis, the term can be ignored, as it does not depend on the data. However, this term is still relevant for flux measurements. Simplifying and absorbing the factor of 2 into the likelihood, we get:
(8) 
The loglikelihood ratio is linear in , hence we can identify a sufficient statistic
(9) 
with a distribution that is independent of , and therefore can be used to reject the null hypothesis against the alternative hypotheses for all values of simultaneously. Therefore, the optimal strategy for detection of a statistically significant signal, is to calculate , and check if it is above or below a certain threshold . The value of is fixed by the desired false alarm probability . To evaluate this threshold, we need to calculate the expectancy and variance of given :
(10) 
Given the independence of and the fact that , the variance of given is:
(11) 
And the threshold is:^{3}^{3}3This threshold was calculated under the convenient approximation that , which is a good approximation for , which is the reasonable significance range for practical surveys.
(12) 
Under the same definitions, if we wish to get the most accurate estimator of , we can use the fact that the likelihood of the data can be calculated for all using (Eq. 9) and (Eq. 11). This is because we can express the log likelihood of given by:
(13) 
where does not depend on and therefore cannot influence any measurement. No matter if the Bayesian or frequentist approaches are used, the likelihood of the data as a function of is sufficient for any further measurement or detection process. Therefore, and are sufficient for any measurement or statistical decision on the set of statistics .
2.2. Weighted addition of random variables
The statistics and we have derived in Equations 9 and 11 naturally arise in a much more general setup, which is known as weighted addition of random variables. Assume we have an arbitrary set of independent random variables and two hypotheses that satisfy
(14) 
We can define the signal to noise ratio () of a statistic with regard to the decision between and by:
(15) 
Next, we can ask: what is the statistic, linear in , that maximizes this ? The solution to this problem is the following weighted mean:
(16) 
which coincides with the previously defined (Eq. 9). Moreover, the maximal is . The detailed solution of this proplem is given in Appendices A and C.
Weighted addition extends also to the case of adding estimators. If we are given a sequence of independent measurements
(17) 
in which we assume only that , The same set of statistics and arise, and the maximum S/N estimator for is :
(18) 
This simple and rigorous fact is derived in Appendix B.
The fact that this same solution repeats under many different general contexts and assumptions, allows us to derive optimal statistics to realworld astronomical problems. The first of them is the well known matched filter for source detection in single astronomical images.
2.3. The matched filter solution for source detection
Now that we are equipped with the simple yet robust tools for adding random variables, we can derive the standard matched filter solution for point source detection in a single image. To allow rigorous statistical analysis of source detection, we assume (throughout the paper) that the source we are trying to detect is well separated from other sources, which might interfere with its detection. Let be an image such that:
(19) 
where is the true, backgroundsubtracted sky image, is the image PSF, and is an additive white Gaussian noise term.
We would like to detect, or measure, a single point source at position (,). Each pixel coordinates (,) in the image is an independent^{4}^{4}4In practice the pixels maybe slightly correlated due to charge repulsion and charge diffusion in a CCD. information source, where the PSF scales the amount of flux each pixel would get. That is:
(20) 
Using equation 9, we can write the optimal statistic for source detection at position for a single image:
(21)  
(22) 
where . Thus, we obtained the well known optimal statistic for the detection of a single source in an image with additive white Gaussian noise. This solution is to convolve the image with it’s reversed PSF (i.e., cross correlation of the image with it’s PSF). This statistic is called the linear matched filter image.
3. Image coaddition for source detection
Now that we have shown that using the tools outlined in §2 we get the optimal statistic for source detection in single images, we can apply these methods for source detection in an ensemble of images. Before doing that it is worth while to develop an intuition on how the solution should look like. Expanding the linear matched filter solution of one image to multiple images we expect the solution to be somewhat like a three dimensional linear matched filter summed over the image index axis, to get a statistic with the dimensions of an image. Therefore, we expect that the solution will be simply matched filtering each image with its own PSF followed by weighted summation.
For source detection, we assume that the noise is approximately Gaussian and that the variance in each pixel, excluding the variance induced by the source itself, is independent of pixel position. Since our null hypothesis is that there is no source, the noise induced by the source should be ignored. However, noise induced by neighboring sources can not be ignored. Therefore, our assumption that the variance is constant (at least locally) in the image is true only if the image is not crowded^{5}^{5}5For detecting a source in the vicinity of other sources, our estimator is obviously biased by the nearby source, which is not accounted for in the statistical model presented in this work. Correctly resolving and measuring blended sources requires a more elaborate technique in general. In paper II in this series, we show a coaddition process that is optimal for all hypothesis testing cases, including resolving blended sources. (i.e., confusion noise can be neglected). Using these assumptions, we denote the variance of the ’th image by:
(23) 
Here is the background (or more formally the spatially invariant variance component) of the image. We can now apply the machinery for coaddition of random variables developed in §2 to the problem of image coaddition.
Lets assume that we would like to detect a point source at position . In this case the model for such a point source is , where is the point source’s flux and is a two dimensional array with at position and zero otherwise. Inserting this to the model of we get:
(24) 
Using the result from Appendix A, we can combine all the pixels from all the images as a list of independent statistics. Therefore the optimal detection statistic for a source at position is:
(25) 
For simplicity represents the coordinate. Using the fact that searching a source at position on a set of images, is exactly as searching for a source at position on the input images shifted by we get:
(26) 
This can be written in vectorial notation as:
(27) 
where . We note that this could be easily computed (using the fact that is a scalar) in Fourier space using the fast Fourier transform:
(28) 
where the sign represents Fourier transform, and the sign denotes the complex conjugate operation. We note that by putting the bar sign over the hat sign we mean that the complex conjugate operation follows the Fourier transform operation.
It is worth while to note that , the coaddition of filtered images is a log likelihood ratio image, where is the optimal test statistic for the existence of a source at . In order to find sources using , we therefore have to search for local maxima. To decide if the source candidate is statistically significant, it is more convenient to normalize by it’s standard deviation
(29) 
The value of each local maximum divided by the standard deviation of the noise (Eq. 29) is the source significance in units of standard deviations. To get in units of flux (best estimate only in the context of backgroundnoise dominated source), we need to normalize differently (see Eq. 18), to get:
(30) 
Analyzing our solution we can see that if we have only one image then the well known linearmatched filter is recovered. However, when we have more than one image, the optimal detection statistic is equivalent to applying a matched filter to each image separately, weighting the images by , and accumulating. This is different than the other solutions found in the literature, where you first coadd and then match filter, or perform partial filtering for homogenization, then coadd, and then match filter. The method we developed is optimal by construction, hence it will have better performance relative to other coaddition schemes. We note that with this method at hand, any additional image, no matter how poor the conditions are, will increase the sensitivity. Therefore, the rejection of images becomes a damaging operation. Because of the robustness of our optimality arguments, whenever the assumptions that we have made on the noise are true we expect this method to surpass the performance of all popular methods found in the literature.
An important question is by how much our method improves source detection relative to other methods. The answer to this question depends on the details, like the distribution of observing conditions (e.g., seeing) and the exact method we would like to compare with. We can quantify the gain of using our method, by answering a simple question: how much survey speed is being lost when a nonoptimal filter is used in the case of a single image? A single matched filter cannot be adequate for multiple images with different PSFs. No matter what weights are later applied in the combination of the different matched filtered images (all with the same filter), the of the other methods cannot surpass the sum of the in their individual pixels.
Given a single image with a Gaussian PSF with width , the best filter is a Gaussian with width . In Appendix D we calculate the loss factor, when one uses a Gaussian filter with a width . We find that the sensitivity ratio of these two cases is:
(31) 
This means that the reduction in survey speed (or the detection information) is:
(32) 
Figure 1 shows this survey speed reduction as a function of k. As expected, this function has a maximum of unity at . Using a width which is a factor of () different than the actual PSF width will result in survey speed loss of about 6% (40%). Therefore, given the typical distribution of seeing conditions, one should expect to obtain an improvement of between a few percents and about 25 percents in survey speed using our method, relative to e.g., the Annis et al. (2014) weighted summation.
4. Optimal photometric measurements from a set of observations
Another important problem in the context of astronomical image processing is measuring the flux of sources. The two leading methods for performing flux measurements are aperture photometry and PSF photometry. Both methods are well defined on single images. When the PSF is exactly known, the method of PSF photometry is the most exact since it applies an appropriate weight to each pixel before summation. Aperture photometry is a good approximation that is often used when either the star is bright, the PSF is unknown, or the measurement is done for sources that are not point like. In this section, we present the correct way to extend photometric measurements to ensembles of observations. In §4.1 we discuss the extension of PSF photometry to ensembles of images and in §4.2 we cover aperture photometry. We note that in the case of spatially homogeneous noise variance (i.e., faint source) and known PSF, it is both easier and more accurate, to use the coaddition method presented in paper II, and then perform aperture/PSF photometry.
4.1. Extending PSF photometry to ensembles of images
We define the optimal photometric measurement as a statistic such that the expectancy of the statistic is the true flux of the source and its variance is minimal. That is, and is minimal. This is equivalent to finding a statistic that maximizes the signaltonoise ratio, and normalizing it to units of flux. Because in the case of photometry we need to take into account the source noise, in this section we will work under the assumption that:
(33) 
where the distribution of is Gaussian with variance
(34) 
which does not distribute uniformly across the image, because bright stars have larger variance components. We note that we still assume that the measured source is well separated from other sources.
Following the same procedure we did for source detection, we get the maximum for the flux measurement. When measuring a point source at position , with true flux we can write:
(35) 
Again, we assume every pixel is statistically independent from the others, and therefore we can apply the previously developed machinery (Eq. 18) pixel by pixel, and get the maximal estimator:
(36) 
However, in order to compute this statistic, we need to know the true flux that we are looking for. This problem could be avoided by either:

Solving numerically for the true in iterations: Guess , compute , and then update , and compute again with the resulting flux from the first iteration. Since the first estimate of is an excellent first guess, the latter is practically converged, and more iterations are unnecessary. This solution is source specific, and therefore should be implemented only if photometry of a single source is needed.

Scanning the space logarithmically  Using a slightly wrong does not lead to significant errors in the measurement. Since the guess of influences only weights that are independent of the data itself, it does not introduce a bias. The standard deviation of a statistic that is calculated with a slightly wrong is generally very close to the optimal value. For example, using Gaussian PSFs with width pixels, and , we find that using which is wrong by a factor of 2, leads to reduction of at most 1% in survey speed. This solution requires the coaddition process to produce several images (, as the relevant range is from the background level up to the maximal dynamic range value. These numbers are typically only up to a factor of from one another).

Aim for a specific . If one knows that the targets have a specific magnitude range, it is possible to use the most relevant .
We want to stress, that the whole discussion on is only relevant for flux estimation of bright sources. For the detection of sources, we are not required to know apriori, as demonstrated in §3.
The calculation of this measurement statistic can be done fast, by noticing that once is assumed, the calculation of the measurement statistic on all image positions can be done by convolution using the fast Fourier transform (FFT)^{6}^{6}6Calculation of convolution directly (without FFT) could also be very fast if the convolution kernel is small..
To gain intuitive interpretation of the PSF photometry measurement statistic, it is useful to inspect the behavior of the statistic on one image. In the expression for , we can identify the two general approximations astronomers often make in the extreme limits of the relation between the brightness of the target and the background. On the one hand, if a target is brighter than the background on many pixels, we can see that the optimal measurement statistic we obtained is similar to aperture photometry with the optimal aperture, as in the center the weights of the pixels converge to 1. On the other hand, we see that if a target is much weaker than the background variance, the statistic converges to a matched filter on the image. When extending this to multiple images we can see that in the statistic , the combination of different images is rather nontrivial.
4.2. Extending aperture photometry for ensembles of images
Even when the PSF is unknown, or when the object shape for which we perform flux measurements is poorly known (e.g., a galaxy), it is still important to combine the flux measurements from different images correctly. This is because observations in the ensemble will generally have different noise properties. For each image, depending on the seeing conditions, source flux and its PSF, one can choose the best aperture radius that maximizes the in the image (see below).
We can calculate from each image the aperture photometry measurement
(37) 
where is given by:
(38)  
Substituting Equation 2 into Equation 37:
(39) 
Denote the source intrinsic light distribution by . satisfies:
(40) 
and . Next, we define
(41) 
to be the fraction of the source flux within the measurement aperture. Now, we can write:
(42) 
The variance of is:
(43) 
Now that the problem is cast into the same form as solved in §2, we can use the maximum measurement statistic (Eq.18) to find the best estimator for aperture photometry:
(44) 
Note that if the source is very strong, the term might dominate the noise variance . In that case, one needs to have a reasonable guess of the source flux prior to the coaddition process (see §4.1).
Moreover, one needs to solve for the best aperture radius (that depends on ). This solution can be obtained separately for each image by maximizing the squared signal to noise of :
(45) 
where denotes the value of the parameter for which the function obtains its maximum. Maximizing this could be done if one has a model for , which could be deduced from the intrinsic size of the object measured and a rough PSF model.
5. Estimating the limiting flux and the expected S/N of sources
For several reasons, including calculation and optimizations, it is worth while to have a formula for the of PSF photometry over multiple images. Albeit this formula is easy to derive, we are not aware of a simple analytic expression for the of PSF photometry in the literature for either single or multiple observations.
Following the previous section, we do not necessarily assume that the sources are weak compared to the sky, and therefore our measurement model is:
(46) 
where is Gaussian with mean and variance
(47) 
where is the background noise, and is the Poisson noise of the source.
In order to calculate the we refer to the fact that is an additive quantity. That is, for any two uncorrelated statistics , we can build a statistic
(48) 
such that the of satisfies
(49) 
This fact is proven in Appendix C. Using this, all that we need to do in order to calculate the total of the optimal coaddition of a source with flux in an ensemble of images is to sum the over all the pixels in all the images.
Therefore, the for flux measurement (the source noise term in the variance is irrelevant for detection purposes) is given by:
(50)  
For the sake of signal to noise calculation, it is a good enough approximation to assume that the PSF is a two dimensional symmetric Gaussian. Rewriting the above with a Gaussian PSF, with width , and replacing the and coordinates by the radial coordinate this can be written as
(51)  
This integral has an analytic solution given by:
(52) 
In the sourcenoise dominated case (), the formula simplifies to
(53) 
In the backgroundnoise dominated case (), the becomes:
(54) 
We stress that this equation is correct only if the numbers and are in units of electrons. Another remark is that Equation 54 is actually the exact (no approximation) for the detection of a source with flux , because in this situation, the relevant variance should be exactly . Therefore, by isolating in Equation 54 we can get a formula for the limiting magnitude for detection at a given :
(55) 
6. Robust coaddition
A drawback of the method presented so far is that it uses the summation operator which is not resilient against outliers. Outliers can be a major source of noise and often require a special treatment. The typical way outliers are dealt with is either using median coaddition, min/max rejection, or sigma clipping.
The summation in either Equation 26 or 27 includes weights, and in either case we may want to choose an operation which is more robust. Because of the weights in Equations 26 and 27, a simple median operation is not possible. Instead, one can use the weighted median estimator to replace the weighted average. The weighted median of a set of estimators
(56) 
is simply defined as the 50% weighted percentile of the normalized estimators , where each of the estimators is given the weight of the inverse standard deviation^{7}^{7}7Inverse standard deviation and not inverse variance. A way to derive this is to define a symmetrical exponential loss function for the error in each image and find the value that minimizes the global loss function. This value is given analytically by the weighted median., . In a similar manner, one could apply sigma clipping to the estimators, removing estimators that deviate from the average estimated value of by more than 3 of its own standard deviation.
It is important to note, that these operations make sense only when we measure the flux of unblended objects using the matched filtered images^{8}^{8}8Performing median on regular images will yield weird artifacts when performed on images with different PSFs. Examples for such are seeing distribution dependent bias, flux dependent PSF, and the effective removal of any observation with a PSF which is different (better or worse) from the majority of the images.. Note that since our method uses the matched filter image as the estimator, sharp outliers like cosmic rays or hot pixels will tend to reduce in significance (as we are using the wrong filter for their detection).
Given the problems involved in using robust estimators, we would like to suggest a different approach, which is to identify and remove sharp outliers using image subtraction, given an artifact free image subtraction statistic. Such an image subtraction algorithm can be found in Zackay, Ofek & GalYam (2015). In the situation where a large number of images is coadded (), any outlier that will influence the final measurement must be easily visible in the subtraction image of any two images. Therefore, the detection and removal of any significant outliers prior to coaddition should be possible, and probably constitutes the most accurate approach for dealing with artifacts. We will further discuss this in Zackay, Ofek & GalYam (2015).
7. Tests
To verify that our method indeed works properly and to try and identify possible problems we tested it on simulated data (§7.1) and real data (§7.2).
7.1. Simulated data
The actual improvement of the coaddition technique depends on the distribution of the observing conditions (e.g., seeing, background and transparency). Therefore, we use parameters that roughly mimic typical observing conditions. Obviously, for different surveys and use cases these parameters could vary significantly.
We repeated the following simulation 100 times: We simulated 20 pix images, each with 2048 stars in an equally spaced grid. For simplification, we assumed all the images have equal transparency. The seeing distribution was taken as uniform with full width at half maximum (FWHM) between 1.5 and 6 pixels. The background distribution was uniform between 500 to 1900 e pix. The point spread functions were assumed to be circular two dimensional Gaussians. We adjusted the brightness of the sources to be close to the detection limit in the coadd image (100 photons per image, see §5). The images were created aligned, so no further registration was applied.
Next, we summed the images using various coaddition schemes. The techniques we tested are: (i) Weighted summation of the images using the Annis et al. (2014) scheme (see Equation 1); (ii) Weighted summation of the images using the Jiang et al. (2014) method (see Equation 1); (iii) Our method. We note that unlike Annis et al. (2014) or Jiang et al. (2014) we did not remove any images^{9}^{9}9Given the fact that weights (if calculated correctly) should take care of low quality images, we regard this step as damaging rather than beneficial.. To evaluate the detection of each method, we averaged the of all sources at their correct positions in the coaddition image (using Eq. 29). Given the number of sources simulated this average is accurate to the level of about . We then calculated, using the specific seeing and brightness parameters of the images, the theoretically maximal for source detection (Eq. 54).
In Figure 2, we plot the distribution of the ratio of the obtained with the various methods and the theoretically calculated . It is clear that the method presented in this paper achieves the maximum possible , and that the symmetric scatter around it’s value is the measurement error of the achieved . It is further clear that for this specific case, our method provides a higher than the method of Jiang et al. (2014), and a higher than Annis et al. (2014). These translate to a survey speed increase of about 10% over Jiang et al. (2014), and 20% over Annis et al. (2014).
7.2. Real data
We performed several comparisons of our method with other techniques, using real data. Our test data originates from the Palomar Transient Factory (PTF^{10}^{10}10http://www.ptf.caltech.edu/iptf; Law et al. 2009; Rau et al. 2009) data release 2. The images were obtained under various atmospheric conditions. All the images were reduced using the PTF/IPAC pipeline (Laher et al. 2014), and were photometrically calibrated (Ofek et al. 2012).
We used four sets of images. For each set we prepared a deep reference image. The reference image is based on coaddition of a subset of the best images using the Annis et al. (2014) weights and no filtering. We also selected a subset of images which we coadded using the various techniques including: equal weights, Annis et al. (2014) weighting, Jiang et al. (2014) weighting, our optimaldetection coaddition and the optimal coaddition method described in paper II. In order to minimize nonlinear registration effects we used a section of pixels near the center of each field. The four sets of images, along with their selection criteria, are listed in Table 1. Small cutouts of the coadd images are presented in Figure 3. We note that it is very difficult to spot the fine differences between these images by eye and quantitative tests are required.
Set  Field  CCD  #im  Type  Criteria 

1  100031  6  425  Ref.  Variance1000 e & FWHM4” & PSF stars 
45  coadd  All images taken on Oct, Nov, Dec 2012 & PSF stars  
2  100031  6  425  Ref.  Variance1000 e & FWHM4” & PSF stars 
48  coadd  All images taken on the first 9 days of each Month in 2011 & PSF stars  
3  100031  4  263  Ref.  Variance1000 e & FWHM4” & PSF stars 
39  coadd  All images taken on Oct, Nov, Dec 2012 & PSF stars  
4  100031  4  263  Ref.  Variance1000 e & FWHM4” & PSF stars 
17  coadd  All images taken on the first 9 days of each Month in 2011 & PSF stars 
Note. – Selection criteria for reference images and coadd images in the various sets used for testing the coaddition methods. Field indicate the PTF field ID, while CCDID is the CCD number in the mosaic (see Law et al. 2009; Laher et al. 2014). Type is either Ref. for the deep reference image, or Coadd for the subset we coadd using the various methods. Note that field 100031 is in the vicinity of M51.
We multiply each image by its CCD gain (to work in electron units). The images were coadded using sim_coadd.m (Ofek 2014). This function first registers the images, using sim_align_shift.m, subtracts the background from all the images (sim_back_std.m), calculates the weights (for the various methods, weights4coadd.m), and estimates the PSF for each image (build_psf.m), and finally coadds the images. The program optionally filters each image with its PSF prior to coaddition (sim_filter.m). All these functions are available^{11}^{11}11http://webhome.weizmann.ac.il/home/eofek/matlab/ as part of the astronomy and astrophysics toolbox for MATLAB (Ofek, 2014) (see additional details in §8). We note that the coaddition technique described in paper II, is implemented in sim_coadd_proper.m.
In order to compare between the various techniques, we first run our source extraction code mextractor.m. Like SExtractor (Bertin & Arnouts, 1996), this code uses linear matched filter to search for sources. However, while SExtractor thresholds the filtered image relative to the standard deviation of the unfiltered image, our code does the thresholding relative to the filtered image. This small, but important, difference means that our thresholding is always done in units of standard deviations, so images with different PSFs can be compared easily. We set the detection threshold to . We note that the coadded images were always filtered using their PSF.
We matched the sources found in each coadded image, against sources found in the deep reference image. Table 2 lists the number of sources detected in each image (), the number of sources in the reference that are matched in the coadd (), the number of sources in the reference that are unmatched in the coadd image (), and the number of false detections in the coadd (). Our method consistently finds more (3% to 12%) real sources than other weightedsummation methods. We note that the slight increase in the number of false sources using our method is due to the effective better PSF that our method has relative to regular weighted addition (after matchedfiltering; see paper II). An image with better seeing, once matched filtered, has a larger number of uncorrelated scores, and therefore an increased amount of sources with score that is larger than some threshold. This effect is reproduced in simulations, and could be easily accounted for in the survey design by slightly increasing the threshold above which a source is declared statistically significant. It is further important to note that even if we fix the number of false positives to some constant, our method still detects more sources than the other techniques. In Table 2 we also list the results based on the method presented in paper II. For source detection, the technique described in paper II is identical (up to small numerical errors) to the method discussed in this paper. However, the method described in paper II has several important advantages, and hence should be used whenever adequate.
Set  Method  

1  Deep  0  2433  
Equal weights  0.65  1136  1087  1346  49  
Annis et al. (2014)  0.91  1332  1255  1178  77  
Jiang et al. (2014)  0.89  1299  1229  1204  70  
This work  1  1417  1324  1109  93  
paper II  1  1419  1325  1108  94  
2  Deep  0  2433  
Equal weights  0.59  1183  1085  1348  98  
Annis et al. (2014)  0.93  1481  1343  1090  138  
Jiang et al. (2014)  0.90  1421  1301  1132  120  
This work  1  1541  1390  1043  151  
paper II  1  1542  1391  1042  151  
3  Deep  0  4062  
Equal weights  0.53  1645  1539  2523  106  
Annis et al. (2014)  0.84  1955  1796  2266  159  
Jiang et al. (2014)  0.81  1912  1759  2303  153  
This work  1  2206  1976  2086  230  
paper II  1  2206  1976  2086  230  
4  Deep  0  4062  
Equal weights  0.83  2319  2007  2055  312  
Annis et al. (2014)  0.82  2144  1981  2081  163  
Jiang et al. (2014)  0.92  2272  2062  2000  210  
This work  1  2401  2125  1937  276  
paper II  0.99  2400  2124  1938  276 
Note. – Quantitive comparison between images coadd using various technique matched against a deep reference image. The four blocks, separated by horizontal lines correspond to the four data sets in Table 1. is the approximate survey speed ratio between the image compared and our optimal coaddition method (see text for details). is the number of detected sources to detection threshold of 4. is the number of real sources (i.e., detected sources which have a counterpart within in the deep reference image. is the number of real undetected sources (i.e., sources detected in the reference which are not detected in the coadd image). is the number of false detections.
Next, we compared the detection of the sources in the various images^{12}^{12}12Note that the detection is defined without the sourcenoise term in the denominator (see Eq. 54).. The reason for using is that it is an additive quantity, that is proportional to the survey speed and to the detection information. In order to do the comparison, we run mextractor.m again in forceddetection mode. In this mode, we provide the code with a list of positions of the sources detected in the deep reference image, and the code measures the detection significance and at these locations. For each one of the different coaddition methods on the subset of images we calculated the following quantity: For each star detected in the reference image, we divide its measured^{13}^{13}13The is measured by filtering the image with its PSF, normalizing the filtered image by its own standard deviation, and reading the value at the local maximum that corresponds to the source. in a coaddition image by its S/N measured in our optimally coadded image. Then, we sum the squares of these ratios, and calculate their median. This roughly gives the survey speed in each coadd image, relative to our coaddition method, and is listed in Table 2 (). In these specific examples, our method provide a factor of 17% to 47% improvement over unweighted coaddition, and 8% to 18% improvement over the weighted addition methods. We note that this is sensitive to the exact seeing distribution of the observations and we estimate that the typical improvement will be between a few percents to 25%, relative to Annis et al. (2014). Finally, in Figure 4 we present, for data set 3, the relative ratio between detection of individual sources in the coadded images relative to the detection in the optimal coadd image. Again, our coaddition provides better both at the faint and bright ends. We note that using the techniques described in §4, the flux measurement in the bright end can be further improved.
8. Code and implantation details
The formulae we present in this paper are straight forward to code. However, most of the attention in the implementation should be given to preprocessing steps and measurements of the properties of the images. This includes estimating the background mean, the variance, the PSF and cosmic ray removal.
Code that performs the coaddition technique suggested in this paper is available as part of the Astronomy and Astrophysics toolbox for MATLAB (Ofek, 2014). This package can be used for all the image processing steps, including the debias, flat field correction and cosmicray removal. However, here we cover only the functions that are closely related to the coaddition step.
Table 3 lists the highlevel functions related to coaddition that we provide in the Astronomy and Astrophysics toolbox for MATLAB (Ofek, 2014). Many other lowlevel functions are documented in the code. These functions are under constant improvement, and we expect that versions with better performances will be available in the near future. Each function has a detailed help section with examples. Furthermore, a manual of the Astronomy and Astrophysics toolbox for MATLAB is also available.
We note that usual implementation details like registration, resampling to the same grid, measuring the background and variance levels, and measuring/interpolating the PSF, as always, require attention. However, the attention to the details required by this method is not different than that required for the successful application of other methods. Below we comment on several important implementation details:
Background and variance estimation: The background and variance in real wide field of view astronomical images cannot be treated as constants over the entire field of view. Therefore, we suggest to estimate them locally and interpolate. To estimate the background and variance one needs to make sure that the estimators are not biased by stars or small galaxies. Our suggestion is to fit a Gaussian to the histogram of the image pixels in small regions^{14}^{14}14We are currently using arcsec blocks., and to reject from the fitting process pixels with high values (e.g., the upper 10percentile of pixel values).
Estimating the transparency: The transparency of each image is simply its fluxbased photometric zero point^{15}^{15}15See appendix A of Ofek et al. (2011) for a method to calculate this zeropoints.. However, one has to make sure that these zero points are measured using PSF photometry rather than aperture photometry, otherwise the zeropoints may depend on the seeing.
Estimating the PSF: Among the complications that may affect the PSF measurement are pixelization, interpolation and resampling. Furthermore, the PSF is likely not constant spatially and it also may change with intensity due to charge self repulsion. This specifically may lead to the brighterfatter effect (e.g., Walter 2015). We note that the fact that one needs to estimate the PSF in order to run this method should not be viewed as a drawback, as any decent method that finds sources in the image requires this step anyway.
Name  Description 

sim_coadd.m  Coadd a list of images, using various weighting schemes. 
The function also allows for filtering the images prior to the coaddition.  
The function can also align the images, calculate the weights and PSFs.  
sim_coadd_proper.m  Proper coaddition of images (see (Zackay & Ofek, 2015)). 
The function can also align the images, calculate the weights and PSFs.  
sim_align_shift.m  Register a set of images against a reference image. 
The function assumes the images can be registered  
using an arbitrary large shift, but only a small rotation term.  
psf_builder.m  Construct a PSF template by resampling the pixels around 
selected bright/isolated stars.  
weights4coadd.m  Calculate parameters required for calculation of weights for 
coaddition. Including the background, its variance, estimate of the  
fluxbased zero points (i.e., transparency), and measure the PSF.  
sim_back_std.m  Estimate the spatialydependent background and variance of images. 
Note. – Highlevel functions relevant for coaddition, which are part of the Astronomy and Astrophysics toolbox for MATLAB (Ofek 2014).
9. Summary
We argue that popular image coaddition methods do not achieve the maximal possible for source detection and photometry. We derive the optimal statistic for source detection and photometry under the assumptions that the noise is approximately Gaussian and that the target is well separated from other bright sources. We show that the optimal way to coadd images for source detection is by filtering (i.e., crosscorrelating) each image with its PSF, and then sum with weights. This method is summarized by Equation 27 (or equivalently, Eq. 28). In order to find sources we need to find local maxima in the calculated score map. The significance of these sources in units of standard deviations is given by Equation 29. We note that the computational requirements of applying the method are low, as the crosscorrelation operation can be computed using the fast Fourier transform (FFT)^{16}^{16}16The run time of FFT on an image is typically an order of magnitude faster than reading/writing an image from the hard disk.. We also derive optimal statistics for PSF photometry (Eq. 36) and aperture photometry (Eq. 44) for an ensemble of images. Finally we derive a formula to calculate the for PSF photometry in an ensemble of images with a symmetric Gaussian PSF (Eq. 52). For statistical tasks other than source detection or photometric measurements, we refer readers to paper II in this series, in which we provide a coaddition method that is optimal for any statistical measurement or decision, under the more restrictive (but common) assumption of background dominated noise.
We demonstrate our coaddition method for source detection on simulated images as well as on real images. Our method increases the survey speed (for faint source detection and photometry) by between a few percents to 25% over traditional methods, both in theory and in practice.
Appendix A Weighted addition of independent random variables for hypothesis testing
Here, we will show that the same statistic that was derived in §2.1 as an exact solution for detection of an attenuated signal in the presence of varying Gaussian noise, is the solution of the much more robust question – maximizing the of the weighted addition of independent random variables. Given a set of statistical variables, with two hypotheses and , with expectencies:^{17}^{17}17We note that even if , it is always possible to transfer the problem to hypothesis testing on , in which case the expectancy given the null hypothesis is again zero.
(A1) 
Further, we assume that the variance of is equal under the assumption of both hypotheses. i.e,
(A2) 
If we know the exact distributions of the variables given both hypotheses, we can use the NeymanPearson lemma (Neyman & Pearson, 1933), which states that the optimal test statistic for the decision between and is the loglikelihood ratio test. Here, we want to construct a test statistic that we can apply even when the exact distributions are unknown, but their first and second moments are known.
Given the fact that all the statistics are assumed to be independent, it is reasonable to assume that the correct way to combine the variables is via a linear sum (if the distributions are known we can simply sum the difference in the log of the probablity of observing for each hypothesis). That is, we are interested in a linear statistic of the form:
(A3) 
If the set of variables we have is large, so that the distribution of a weighted sum of the variables can be approximated by a Gaussian distribution for both hypotheses, then the only important properties of the resulting sum are the expectancy of given both hypotheses, and the variance of given both hypotheses. Moreover, if the statistic has equal variance under both hypotheses, then there is one number that characterizes our ability to distinguish between the hypotheses, which is the squared signaltonoise ratio:
(A4) 
Thus, we want to find the linear combination that maximizes the . i.e., find a set of weights such that the score in Equation A3 will have maximal (Equation A4).
We assume that , , and are complex variables (this will become useful in future papers in the series). Substituting {0,} as the expectancies of given the two hypotheses, we get:
(A5) 
Given the hypothesis , where , the variance of is:
(A6) 
In order to maximize the squared with respect to , we equalize all the partial derivatives to zero:
(A7) 
which gives us:
(A8) 
where the accent denotes complex conjugation. Simplifying, we get the equation:
(A9) 
Noticing that multiplying all the weights by a constant factor does not change the , we can simplify further and get:
(A10) 
Thus, we now have a general formula for weighted addition of random variables.
We note that if the exact distributions of the random variables are known, then the optimal combination is the loglikelihood ratio test statistic. This statistic could be composed of some function of the variables themselves that is not linear. This means that the best score will be of the form
(A11) 
What we have derived in this appendix, is the best linear approximation to this score, that could be derived under much less exact knowledge on the variables themselves.
Appendix B Weighted addition of independent estimators
Here, we will show that the same statistic that was derived in §2.1 and Appendix A is the solution (up to a multiplicative constant) of another general question: What is the maximum linear combination of independent estimators?
Given a set of statistics such that
(B1) 
where are known complex variables,
(B2) 
We want to find the best weighted linear combination estimator
(B3) 
such that and is minimal. Calculating the expectancy of we get:
(B4) 
while the variance of is:
(B5) 
Maximizing the of the estimator:
(B6) 
is equivalent to minimizing the variance of with respect to a fixed . Moreover, is invariant to scalar multiplication of because:
(B7) 
Therefore, we can maximize with respect to all . After normalization the minimum variance estimator is:
(B8) 
To maximize with respect to , we equalize all the partial derivatives to zero:
(B9) 
which gives us:
(B10) 
Where the accent denotes complex conjugation. Simplifying, we get the equation:
(B11) 
Because is invariant under multiplication by a constant factor, we get:
(B12) 
Adjusting the estimator to match our first requirement that we can finally write:
(B13) 
Appendix C of the addition of optimally weighted statistics
In Appendix A we find the best way to add statistics even when we do not know their exact distributions . Here we would like to derive a closed formula for the of the resulting statistic. Denote by
(C1) 
Writing the expression for the statistic , the maximum weighted addition of the variables defined in Appendix A:
(C2) 
Calculating the expected difference in given the two different hypotheses we get:
(C3) 
while the variance of is:
(C4) 
Therefore, the squared of is simply given by:
(C5) 
This means that the squared of a statistic which is the optimal weighted addition of a set of statistics is the sum of squared ’s of the individual statistics. This property now allows us to easily estimate the sensitivity of observations ahead of time, with an intuitive closed formula. This (almost) same calculation applies also for optimal weighted addition of estimators, and we omit it to prevent cumbersome repetitions.
Appendix D How much survey speed is lost when we use the wrong PSF?
Here we derive an analytic solution to the question how much sensitivity is lost in the detection process when the wrong linear matched filter is used. Following previous sections, we define the measured image by:
(D1) 
and assume that the source we are trying to detect is well separated from other sources. We also assume that the source we are looking for is faint relative to the background, leading to a spatially invariant noise variance:
(D2) 
For simplification, we assume that the PSF,