Optimal matched filter in the low-number count Poisson noise regime and implications for X-ray source detection
Detection of templates (e.g., sources) embedded in low-number count Poisson noise is a common problem in astrophysics. Examples include source detection in X-ray images, -rays, UV, neutrinos, and search for clusters of galaxies and stellar streams. However, the solutions in the X-ray-related literature are sub-optimal – in some cases by considerable factors. Using the lemma of Neyman-Pearson we derive the optimal statistics for template detection in the presence of Poisson noise. We demonstrate that this method provides higher completeness, for a fixed false-alarm probability value, compared with filtering the image with the point-spread function (PSF). In turn, we find that filtering by the PSF is better than filtering the image using the Mexican-hat wavelet (used by wavedetect). For some background levels, our method improves the sensitivity of source detection by more than a factor of two over the popular Mexican-hat wavelet filtering. This filtering technique can also be used also for fast PSF photometry and flare detection, and it is efficient, as well as straight forward to implement. We provide an implementation in MATLAB.
Subject headings:methods: statistical — techniques: image processing — techniques: photometric
The detection of a signal for which the template is roughly known, in the presence of Poisson noise, is considered to be a notorious problem. In astronomy, this problem appears regularly in the context of source detection in low-number count imaging like X-ray, UV, -ray and neutrino detectors, as well as detection of clustering of objects (e.g., cluster of galaxies identification).
Given the importance of this problem for X-ray waveband astronomy,
several techniques for source detection, in the presence of Poisson
noise, were developed. However, as we show here,
previously available solutions are sub-optimal
In the presence of additive white Gaussian noise, it is well known that the best way to find a signal with a roughly known template, is to cross-correlate the data with the expected signal template. This operation is called matched filtering, and it stands at the base of many detection algorithms such as the LIGO gravitational wave detection (e.g., Owen & Sathyaprakash 1999), source extraction in astronomical images (e.g., Bertin & Arnouts 1996) and many more applications in virtually any field of science and engineering. This simple, but powerful result is the consequence of the Neyman-Pearson lemma (Neyman & Pearson 1933). The derivation of the Gaussian-noise matched filter from the lemma of Neyman Pearson (e.g., §2) proves that the matched filter is the best solution and not only the best linear solution.
The two most common source detection techniques in X-ray astronomy are wavedetect (Freeman et al. 2002) and the sliding cell method (Harnden et al. 1984). Both methods are in fact filtering techniques, while wavedetect filters (i.e., cross-correlates) the image with a wavelet of some form, in the sliding-cell method we filter the image with a top-hat function. In these previous algorithms, the filter (e.g., wavelet) was selected without optimality proof or rigorous justification.
Here, we use the lemma of Neyman-Pearson to derive the optimal source detection technique for the case of Poisson noise. Not surprisingly, when the count-rate is high, this result converges to the well known Gaussian-noise matched filter. We also discuss extensions of this algorithm to flare detection and point-spread function (PSF) photometry. We demonstrate that this new technique provides better results than previously used methods, and we provide an implementation in MATLAB.
The structure of this paper is as follows. In §2 we derive the new algorithm, and in §3 we present extensions of this method. In §4 we discuss some implementation considerations. A step-by-step description of the algorithm is outlined in §5, while in §6 we test and compare the algorithm with other methods using simulations. We briefly present our code in §7, and we conclude in §8.
2. Derivation of the Poisson-noise matched filter
Our goal is to derive the optimal template-detection method when the
noise is Poisson.
Our derivation applies for a signal with any number of dimensions.
The starting point is the lemma of Neyman-Pearson that states:
When performing a hypothesis test between two
where the probability ,
is the most powerful
We denote the measured data by . For example, could be a two-dimensional image in raw counts (e.g., number of photons, electrons or events), in which we would like to find sources with some known point spread function (PSF) or template denoted by . We assume that the pixels in are independent Poisson random variables. Given the data , we would like to make a decision between two hypotheses. The null hypothesis () that there is no source at position in the data, and the alternative hypothesis () that there is a source at position in the data. The measurements (), in our case, are drawn from the Poisson distribution. The Poisson probability density function to get the measurement given the expectancy is
The model for the null hypothesis is:
and for the alternative hypothesis is:
Here is a Poisson random variable with expectancy , is the measured data with multi-dimensional coordinate (e.g., coordinates for an image), is the expectancy value for the background level (e.g., as estimated from a region around the position), is the unknown flux of the source we would like to detect, and is the PSF (or more generally a template) we would like to search in our data. The PSF is normalized to unity
However, is a composite hypothesis as it has a free parameter, . This will be treated later and for the time being we will assume that is known.
Following the directive of the Neyman-Pearson lemma, we write the -likelihood difference at position
Some of the terms in this expression cancel out. Furthermore, we are allowed to remove any term that does not depend on the data (i.e., ) – such terms contribute a fixed constant which we can absorb into the threshold (). Denoting the log-likelihood difference by and simplifying, we get
Equation 9 can be identified with the cross-correlation operation and therefore can be re-written, simultanously for all positions in the image, as
where denotes convolution, denotes coordinates reversal (e.g., ), and is the Poisson-noise optimal filter
This can also be written in Fourier space as
Here the symbol denote the Fourier transform, the bar symbol denotes the complex conjugation. The fact that the bar sign is over the hat symbol means that the complex conjugate operation follows the Fourier transform.
In the language of matched filtering we can identify as the filter. We note that our convention is to subtract the background expectation value from the image prior to filtering, and not to renormalize to unity. Other conventions are valid as long as they are applied consistently.
It is interesting to note that Taylor expansion of Equation 9 leads to the well known matched filter in the case of Gaussian noise (i.e., in this case the filter is ).
However, we are not done yet. The flux of the source we would like to find () is unknown apriori, and Equation 12 is non-linear in . More generally, is not a simple hypothesis as it has a free parameter . Moreover, in this case, it is possible to show that there is no value of which is uniformly most powerful (see Appendix A). A solution for this problem is to set a flux threshold and to filter the image with several filters, each with a different value (). However, as we demonstrate in Appendix A for any practical application it is enough to filter the image with a single value of . Although this is not uniformly most powerful, we show that the loss of sensitivity due to this approximation is of the order of %. Therefore, for detection purposes we have to set in Equation 12 to our preferred flux threshold.
However, there are two additional complications: (i) We do not know apriori what is the flux threshold associated with our desired false-alarm probability; (ii) In order to find sources in we need to look for local maxima in and calculate the probability to get this, or larger, local maxima value given the probability distribution of values in . However, is the result of summation of many Poisson random variables with weights, and the resulting probability distribution is complicated.
These problems can be solved numerically using the following scheme: Given the background () and PSF (), and a desired false alarm probability , we would like to find a self-consistent flux threshold () such that
and is the normalization factor that transforms the units of the score statistics to flux-count units . Note that has the flux units of the measured data prior to background subtraction, while is measured in the background subtracted image. This normalization is simply the summation over a PSF with unit flux multiplied by its filter:
Finally, is the probability distribution function of the background pixels in . We note that in Equation 14 we subtract from as our convention is to subtract the background from the images. While refers to the flux in the original image, refers to the flux in the background subtracted image.
A simple method to find is using numerical Monte-Carlo simulations. In such simulations we need to generate Poisson random images with expectancy , subtract the expectancy , and filter them with our matched filter
where is a function that generates Poisson random variables with
expectancy over an image of size .
Now, is given by the probability distribution (histogram) of values in .
Finally, a good first-iteration guess for is given
where is the width of the PSF (e.g., of a Gaussian PSF). In practice we find that only a few iterations are required in order to find the self-consistent and .
An interesting question is by how much this method improves over a naive matched filter (i.e., cross correlating the image with its PSF)? or other filters, like the one advocated by the wavedetect method (Freeman et al. 2002)? We address these questions, using simulations, in §6.3.
Finally, we would like to emphasize that other filters can be designed such that they will have the same false-alarm probability (). However, detection based on such filters will always have lower completeness (i.e., higher ) compared with the method presented in this paper – this is a consequence of the Neyman-Pearson lemma.
3. Additional applications
3.1. PSF photometry
The statistics can be used to perform PSF photometry in the case of Poisson noise. The prescription is similar to the one suggested in Zackay & Ofek (2017a) for the Gaussian noise case. In a nutshell, each pixel in S contains the PSF-weighted sum of its neighboring pixels. Therefore, providing a PSF photometry estimator.
In order to measure the flux of a source one needs to convert to flux units. The flux estimator, is given by dividing by (Eq. 15)
The estimator for the flux of the source is simply at the position of the source. However, this equation is not implicit and therefore we need to solve it iteratively for . Alternatively, a good approximation is to calculate this equation for several values, on the right-hand side, in some logarithmic steps. Another minor problem with this approach is that it ignores the exact (sub-pixel) position of the source (see additional discussion and details in Zackay & Ofek 2017a).
3.2. Flare detection
We note that the method presented in this paper can be extended to the problem of flare detection in the Poisson-noise regime (e.g., Scargle et al. 2013). In this case all we need to do is to match filter the image also in the time dimension, not only in the position dimensions. Since the position (PSF) and time (flare template) dimensions are independent, the expression for the filter is simply:
Here is the background per unit time, and is the flare template as a function of time (counts per unit time normalized to unity). Note that in this case the cross-correlation operation is performed in both position and time.
This simple extension allows us to filter the image in both the temporal and position dimensions, and to optimally detect flares which have some known template. Even if the flare template is unknown, one may use a bank of top-hat or fast-rise exponential decay functions with various time scales. The main advantage over other methods (e.g., Scargle et al. 2013) is that it will have increased sensitivity due to the additional filtering in the position dimensions. We may further discuss and demonstrate this idea in a future publication.
4. Implementation considerations
4.1. Flat fielding / exposure maps
In some cases, our photon-count image has to be flat fielded. A simple solution is to use the flat field or exposure map to estimate the background locally, and to apply our filter locally with the appropriate background. This is reasonable as long as the background varies slowly compared to the PSF size.
4.2. Background estimation
It is straight forward to estimate the expectancy of a Poisson random variable, by taking the mean value of the counts. However, any decent background estimator should ignore regions that contains bright sources or artifacts. We suggest that a good way to estimate the background is to do this iteratively. I.e., detect sources, estimate the background in source-free regions, and run the source detection again.
A less robust, but somewhat simpler approach, that may work in some cases, is to ignore pixels with events and to use the ratio between the number of pixels with 1 event and 0 events as a robust estimator for the background:
It is important to emphasize that a perfect knowledge of the PSF is not required. This point is further demonstrated in §6.3 where we show that small differences between the optimal filter of the image and the used filter have negligible effect on the source-detection results. This indicates that a perfect knowledge of the PSF is not a requirement (see also a discussion regarding the Gaussian-noise case in Zackay & Ofek 2017a).
4.4. PSF variations
In X-ray images the PSF may vary substantially across the image. A way to deal with this problem is to partition the image to regions in which the PSF (and background) are roughly constant and apply the filter for each such sub image separately. We note that it is possible to code an efficient convolution algorithm that allows the PSF to vary smoothly across the image.
4.5. Energy dependent PSF and background
In many cases in high-energy astronomy, the instrument is sensitive to a wide range of energies, and the PSF and background are energy dependent. Extending our optimal filter to an energy dependent PSF and background is straight forward. One needs to break the image into multiple energy channels, and filter each energy-channel image with its own filter. This approach is valid as long as the energy range in which the PSF and background change is larger than the uncertainty in the photons energy. The result is a score image per energy channel (). Finally, since the energy channels provide independent information, the optimal statistics for source detection is given by the summation over all
We note that in this approach one can also account for the predicted spectrum of the source. In this case can be derived based on simulations that take into account the expected value of the background, PSF, and source flux, as a function of energy.
5. Step by step algorithm
Our Poisson-noise filtering algorithm can be summarized by the following steps: For each image, or section of the image, in which the properties of the PSF and background are uniform:
Estimate the local background (; see §4.2).
Subtract the background expectancy from the image.
Estimate or use the known PSF ().
Select a false alarm probability .
Given solve Equations 13, 14, and 15 for .
This can be done either by interpolating Table 1, or direct calculation using the following steps:
(i). Set to the guess value using Equation 17.
(ii). Simulate a background image (), subtract the background expectancy, and filter (Equation 16).
(iii). Calculate , which is given by the upper quantile of the values in .
(iv). Calculate the flux normalization (Equation 15).
(v). Set , and go to step (ii), until convergence.
(vi). In the final iteration you have and .
Calculate the Poisson-noise filter (Equation 11) by setting from step 5.
Filter the image. This can be done in real space (Equation 10) or using FFT (Equation 12). If using FFT, make sure to FFT shift the filter before the cross correlation operation, such that the filter peak will be at the image origin (corners). Otherwise the convolution operation will introduce a shift to the result.
Find local maxima in the filtered image . If the value of the local maxima is larger than then declare a source detection in the pixel of the local maxima.
We note that the threshold should be adjusted to the look-elsewhere effect – the number of trials per image is , where is the number of pixels in the image and is the Gaussian PSF (in pixels).
Furthermore, we note again that our convention is to normalize P to unity, but not to renormalize to unity, and to subtract the background expectation value from the image prior to filtering. Other conventions are valid as long as they are applied consistently.
In §6.1 we discuss the properties of the probability density function based on simulations, and in §6.2 we present tabulated values of the detection threshold as a function of the background, PSF width, and the false-alarm probability. In §6.3 we compare the Poisson Matched Filter with filtering with the PSF and the Mexican-hat (Ricker) wavelet.
6.1. The properties of
In general is a complicated distribution. Inspection of a simulated shows that it is not a smooth function, but instead looks like peaks and dips over a smooth envelope. This behavior is expected, as originates from a linear combination of discrete Poisson distributions. To emphasize this point, Figure 1 shows for counts pix, Gaussian PSF with pix and . Although, this is a complicated function, it can be calculated.
A problem with our approach for calculating is that it is not an efficient method for calculating the threshold for very small , as in this case a very large number of simulations is required. In Figure 2 we show the false alarm probability as a function of , where the false alarm probability is defined as . This plot suggests that the false alarm probability can be approximated (to within a factor of 2, for large values of ) as a power law with slope of about .
An alternative analytical method for calculating (almost) exactly and without the need for a large number of simulations is to sum the probability over all possible combinations of photons in pixels. If the background is low (e.g., ) and the PSF is circularly symmetric, then the number of possible permutations (e.g., all possibilities, except those with negligible probability, of the number of counts in each pixel in the PSF) is not very large and can be computed by finite number of summations.
6.2. Tabulated flux thresholds
In §7, we provide tools for calculating and as a function of the PSF, , and . However, sometimes it is useful (or more efficient) to derive these thresholds from interpolation of a pre-calculated grid.
In Table 1 we list and , for a Gaussian PSF,
as a function of the PSF width , the background ,
and the false alarm probability .
We note that can be calculated from ,
but here we list both for convenience.
The table is also available electronically
Note. – and , for a Gaussian PSF, as a function of the PSF width , the background , and the false alarm probability . Here we present the first few lines. The full table is available in the electronic version of the paper. In addition it is available online (§7).
6.3. Comparison with wavelets and PSF filtering
Figure 3 shows a 1-D cut through a Gaussian PSF, the Mexican-hat wavelet, and the normalized Poisson filters (i.e., Equation 11), for various background levels. In all cases we assumed a Gaussian PSF with width and a false alarm probability of . For the Mexican hat we selected the support (width) that roughly maximizes the completeness (see below).
From this figure it is apparent that the Poisson-noise matched filter converges to the PSF when the background is high. In fact this plot suggests that when the background is larger than a few counts per PSF area, the Gaussian-noise matched filter (i.e., the PSF) is an excellent approximation to the Poisson-noise optimal filter. Next we see that for low background level the Mexican hat, Gaussian-noise, and Poisson-noise matched filters are very different. Since we know by construction that the Poisson-noise matched filter is optimal, we are led to suspect that using the Gaussian-noise or Mexican-hat filters leads to a loss of information. In other words, we expect that if we will use a sub-optimal filter (e.g., PSF or wavelet), for a fixed false alarm probability flux threshold, we will have lower completeness compared with the optimal filter.
In order to compare the various methods we calculate, using simulations, the detection completeness as a function of the source flux when the false-alarm probability is fixed. We present two sets of simulations. In the first simulation we used counts pix, a Gaussian PSF with width of pix, and false alarm probability , while in the second simulation we used counts pix, and the other parameters remained the same. For each filter, we translated the false-alarm probability to . Then we simulated small images, each with a single source that follows the Gaussian PSF, with a known flux. We cross-correlated these images with each filter we test and check if the source position in the filtered images is above the detection threshold. These simulations were used to estimate the source-detection completeness, as a function of flux.
Figure 4 shows the completeness as a function of the source flux, for the two simulated background levels, for three filters: (i) the Poisson-noise filter (Equation 11); (ii) the PSF itself; and (iii) a Mexican-hat wavelet filter. Since with the Mexican-hat filter we are free to choose the filter width (so called support), we used the support that maximizes the completeness. For our counts pix we find this support to be to pix, while for counts pix it changed to to pix.
In order to better appreciate also the completeness at high fluxes, in Figure 5 we show, for the same parameters, the 1 minus completeness in logarithmic scale.
These figures demonstrate that our new filtering technique provides better results than filtering the image by the PSF or a Mexican-hat wavelet. For very low background levels our new method is better than PSF and wavelet filtering. For example, for the counts pix case, at a flux of 5 counts the completeness are 0.81, 0.75, and 0.71 for the Poisson-noise, PSF, and Mexican-hat filters, respectively. For 50% completeness the flux threshold is 3.0 3.3, and 3.6 counts, respectively. Therefore, in this case, the Poisson-noise filter improves upon the Mexican-hat filter, by about 20% in depth for a fixed completeness.
For intermediate flux levels (i.e., counts pix; lower panel in Figs. 4-5) the PSF filtering converges with the Poisson-noise filtering. However, in this case the Mexican-hat (Ricker) wavelet provides poor results. For example, for sources with flux of 10 counts, the Mexican-hat wavelet filtering completeness is factor of 2.5 times lower than that of the optimal filter. We conclude that applying our new algorithm for e.g., X-ray images, may result in a considerable improvement in source-detection completeness and false-alarm probability.
We developed a code for Poisson-noise optimal source detection.
The code is available as part of the
MATLAB Astronomy & Astrophysics
Given that this code will undergo improvements,
for further information we refer the reader
to the online documentation
We derive the optimal source detection algorithm in the presence of Poisson noise. We demonstrate that this filter improves the completeness, at a fixed false alarm probability, compared with other popular techniques. The algorithm is straightforward to implement and we provide an implementation in MATLAB.
The new algorithm can improve the completeness and purity of source detection in X-ray images, as well as UV, EUV, -ray, and particle detection (e.g., TeV-photons detectors). Furthermore, this algorithm can be adapted for other applications such as flare detection and detection of stars or galaxies clustering in the sky. Following the approach of Zackay & Ofek (2017a,b) and Zackay, Ofek, & Gal-Yam (2016), the new technique can likely be extended for image coaddition and subtraction.
Appendix A An almost optimal filter
A problem with our suggested filter is that it is optimal (by construction) only when the flux of the source is known. We note that in the Gaussian-noise case, it is possible to show that the matched filter solution is uniformly most powerful (UMP). However, in the Poisson noise case this is not true (see Figure 6).
A simple solution to this problem is to filter the image with multiple filters (Eq. 2), each having a different flux () value. Since the problem is not very sensitive to the exact shape of the filter, there is no need to run it for many values (e.g., several logarithmically spaced flux values). However, in practice, using simulations we found that, at a given false alarm probability, the value of has small effect on the completeness. Therefore, using a single value may be sufficient for most applications.
In order to test this we conducted the following simulations with counts pix and Gaussian PSF with . For each source flux level we estimated by applying the Poisson-noise filter (Eq. 2) with . For each we calculated the value of that corresponds to the false alarm probability of . Next, we generated simulated images with a source with flux and Poisson noise, than we applied the Poisson-noise matched filter (Eq. 2) with , for , and calculated the detection completeness (i.e., the fraction of simulations in which ). Figure 6 presents the as a function the the source flux , where is the completeness at flux level , and . Figure 7 shows the same but for counts pix.
These simulations demonstrate that in many cases, the loss of sensitivity due to the use of the wrong in Equation 11, is below %. Therefore, we suggest that our filter with a single value is nearly optimal, and using a single value of is good enough for most practical applications.
- affiliation: Benoziyo Center for Astrophysics, Weizmann Institute of Science, 76100 Rehovot, Israel
- affiliation: Benoziyo Center for Astrophysics, Weizmann Institute of Science, 76100 Rehovot, Israel
- slugcomment: Submitted to ApJ
- By sub-optimal we mean that information is being lost. One of the consequences of sub-optimality is that for a fixed false-alarm probability the source-finding completeness is lower compared with the optimal solution.
- Simple hypothesis specifies the population distribution completely. In such an hypothesis, there are no free parameters.
- The power of a hypothesis test is the probability that the test correctly rejects the null hypothesis () when the alternative hypothesis () is true.
- For example, in source detection where the noise is additive white gaussian noise, the matched filter solution is optimal regardless of the flux of the source, which is a free parameter.
- Can be calculated using, e.g., poissinv in MATLAB.
- Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393
- Casella, G., Berger, R. L. 2008, Statistical Inference, Brooks/Cole. ISBN 0-495-39187-5 (Theorem 8.3.17)
- Freeman, P. E., Kashyap, V., Rosner, R., & Lamb, D. Q. 2002, ApJS, 138, 185
- Harnden, F. R., Jr., Fabricant, D. G., Harris, D. E., & Schwarz, J. 1984, SAO Special Report, 393,
- Neyman, J., Pearson, E. S. 1933, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences. 231 (694â706): 289â337.
- Owen, B. J., & Sathyaprakash, B. S. 1999, Phys. Rev. D, 60, 022002
- Scargle, J. D., Norris, J. P., Jackson, B., & Chiang, J. 2013, ApJ, 764, 167