Multi-target Detection with
an Arbitrary Spacing Distribution
Motivated by the structure reconstruction problem in cryo-electron microscopy, we consider the multi-target detection model, where multiple copies of a target signal occur at unknown locations in a long measurement, further corrupted by additive Gaussian noise. At low noise levels, one can easily detect the signal occurrences and estimate the signal by averaging. However, in the presence of high noise, which is the focus of this paper, detection is impossible. Here, we propose two approaches—autocorrelation analysis and an approximate expectation maximization algorithm—to reconstruct the signal without the need to detect signal occurrences in the measurement. In particular, our methods apply to an arbitrary spacing distribution of signal occurrences. We demonstrate reconstructions with synthetic data and empirically show that the sample complexity of both methods scales as SNR in the low SNR regime.
We consider the multi-target detection (MTD) problem  to estimate a signal from a long, noisy measurement
where is the linear convolution of an unknown binary sequence with the signal, further corrupted by additive white Gaussian noise with zero mean and variance , and we assume . Both and are treated as deterministic variables. The non-zero entries of indicate the starting positions of the signal occurrences in . We require the signal occurrences not to overlap, so consecutive non-zero entries of are separated by at least positions. Figure 1 gives an example of the measurement that contains three signal occurrences at different noise levels.
MTD belongs to the wider class of blind deconvolution problems , which have applications in astronomy [19, 26], microscopy [25, 23], system identification , and motion deblurring [21, 12], among others. The main difference is that we are only interested in estimating the signal : we treat as a nuisance variable, while most literature on blind deconvolution focuses on estimating both of them. This distinction allows us to estimate the signal at higher noise levels, which are usually not addressed in the literature on blind deconvolution, specifically because cannot be accurately estimated in such regimes, as we argue momentarily.
This high-noise MTD model has been studied in [6, 7] under the assumption that the signal occurrences either are well separated or follow a Poisson distribution. In applications, however, the signal occurrences in the measurement typically follow an arbitrary distribution. In this study, we extend the framework of [6, 7] to allow arbitrary spacing distribution of signal occurrences by simultaneously estimating the signal and the distance distribution between consecutive signal occurrences.
The solution of the MTD problem is straightforward at high signal-to-noise ratio (SNR), such as the case shown in Figure 1(b). The signal can be estimated by first detecting the signal occurrences and then averaging. In the low SNR regime, however, this simple method becomes problematic due to the difficulty of detection, as illustrated in Figure 1(c). We address this issue using two different approaches—autocorrelation analysis and an approximate version of the expectation maximization (EM) algorithm .
Autocorrelation analysis relates the autocorrelations calculated from the noisy measurement to the signal. The signal is then estimated by fitting the autocorrelations through least squares. This approach is efficient as it requires only one pass over the data to calculate the autocorrelations.
In the second approach, the approximate EM algorithm, the signal is reconstructed by iteratively maximizing the data likelihood, which marginalizes over but does not estimate it explicitly. In contrast to autocorrelation analysis, the approximate EM algorithm scans through the whole dataset in each iteration, and hence requires much longer computational time.
In this study, we demonstrate the reconstruction of the underlying signal from the noisy measurement using the two proposed approaches. Our numerical experiments show that the approximate EM algorithm provides slightly more accurate estimates of the signal in the low SNR regime, whereas autocorrelation analysis is considerably faster than the EM approach, especially at low SNR. It is empirically shown that the sample complexity of both approaches scales as SNR at low SNR, with details discussed later in the text. In the high SNR regime, the sample complexity of both methods scales as SNR, the same as the sample complexity of the simple method that estimates the signal by first detecting the signal occurrences and then averaging.
The main contributions of this work are as follows.
We formalize MTD for arbitrary signal spacing distribution, making it a more realistic model for applications.
We propose two algorithms to solve the MTD problem. In particular, our algorithm based on autocorrelations illustrates why, to recover , we need not estimate all of , but rather only a low-dimensional summary of it; and why it is possible, in principle at least, to solve this problem for arbitrary noise levels (given a sufficiently long observation). For our second algorithm, we note that the popular EM method is intractable, but we show how to implement an approximation of it, which performs well in practice.
Ii Autocorrelation analysis
In what follows, we discuss autocorrelations of both (of length ) and (of length ). To keep notation general, we here consider a sequence of length , and define its autocorrelations of order for any integer shifts as
where is zero-padded for indices out of the range . We have when represents the signal and when represents the measurement . Since the autocorrelations only depend on the differences of the integer shifts, for the second- and third-order autocorrelations we have the symmetries
In this section, we describe how the autocorrelations of the noisy measurement are related to the underlying signal . These relations are later used to estimate without the need to identify the locations of signal occurrences, which are nuisance variables and difficult, if not impossible, to determine reliably at high noise levels. For completeness, we include a brief discussion of the special case where the signals are well separated and refer the reader to [6, 7] for details. The generalization to arbitrary signal spacing distribution follows.
Ii-a Well-separated signals
The signals are said to be well separated when the consecutive non-zero entries of are separated by at least positions. Under this condition, the autocorrelations of with integer shifts within the range are unaffected by the relative positions of signal occurrences. As a result, these autocorrelations of provide fairly direct information about those of , and therefore about itself. Due to the presence of Gaussian noise, the entries of are stochastic. Taking the expectations of the first three autocorrelations of with respect to the distribution of Gaussian noise, we obtain these relations (see Appendix A):
where and . Here, denotes the signal density, where is the number of signal occurrences in . The delta functions, defined by and , are due to the autocorrelations of the white Gaussian noise. As indicated by the studies in phase retrieval [11, 4, 8], in general a 1D signal cannot be uniquely determined by its first two autocorrelations. It is thus necessary (and generically sufficient ) to include the third-order autocorrelations to uniquely determine .
The autocorrelations of are estimated by averaging over the noisy measurement. This data reduction requires only one pass over the data, which is a great computational advantage. As shown in Appendix A, the average over noisy entries of results in standard deviations that scale as for autocorrelations of order at high noise levels. Therefore, we need in order for the calculated from the noisy measurement to be a good estimator for . Since the SNR is proportional to , the sample complexity therefore scales as SNR. We also expect the error of the reconstructed signal to depend on the errors of the highest-order autocorrelations used in the analysis at high noise levels.
We estimate the signal density and signal by fitting the first three autocorrelations of via non-linear least squares:
The weights are chosen as the inverse of the number of terms in the respective sums. Since the autocorrelations have symmetries, as indicated in (2), the summations above are restricted to the non-redundant shifts. Due to the errors in estimating with , we expect the root-mean-square error of the reconstructed signal ,
to scale as in the low SNR regime.
Ii-B Arbitrary spacing distribution
The condition of well-separated signals can be further relaxed to allow arbitrary spacing distribution by assuming that the signal occurrences in any subset of follow the same spatial distribution. To this end, we define the pair separation function as follows.
For a given binary sequence identifying starting positions of signal occurrences (that is, ), the pair separation function is the number of consecutive 1’s in separated by exactly positions, divided by so that .
In particular, since we exclude overlapping occurrences; then is the fraction of pairs of consecutive signal occurrences that occur right next to each other (no spacing at all), is the fraction of pairs of consecutive signal occurrences that occur with one signal-free entry in between them, etc.
In contrast to the well-separated model, autocorrelations of may now involve correlating distinct occurrences of , which may be in various relative positions. The crucial observation is that these autocorrelations depend only weakly on , namely, through the much smaller-dimensional unknown . We show the explicit relations here (see Appendix A for details):
After calculating the first three autocorrelations of from the noisy measurement, we estimate the signal and the parameters and by fitting the autocorrelations of through the non-linear least squares
Numerically, we find that this problem can often be solved, meaning that, even though cannot be estimated, we can still estimate and the summarizing statistics and . As discussed in Section II-A, the RMSE of the reconstructed signal is expected to scale as in the low SNR regime owing to the errors in estimating with .
Ii-C Frequency marching
To minimize the least squares in (II-A) and (II-B), we ues the trust-regions method in Manopt . However, as the least squares problems are inherently non-convex, we observe that the iterates of the trust-regions method used for minimization are liable to stagnate in local minima. To alleviate this issue, we adopt the frequency marching scheme  in our optimization. The idea behind frequency marching is based on the following heuristics:
The coarse-grained (low-resolution) version of the original problem has a smoother optimization landscape so that it is empirically more likely for the iterates of the optimization algorithm to reach the global optimum of the coarse-grained problem.
Intuitively, the global optimum of the original problem can be reached more easily by following the path along the global optima of a series of coarse-grained problems with incremental resolution.
Our goal is to guide the iterates of the optimization algorithm to reach the global optimum of the original problem by successively solving the coarse-grained problems, which are warm-started with the solution from the previous stage.
The coarse-grained problems are characterized by the order of the Fourier series, , used to express :
where . Instead of the entries of , the least squares are minimized with respect to the Fourier coefficients in our frequency marching scheme. The order is related to the spatial resolution by Nyquist rate:
The spatial resolution subdivides the signal into units and represents the “step size” of the shifts for the coarse-grained autocorrelations.
We define the coarse-grained autocorrelations of of order for integer shifts as
where rounds the argument to the nearest integer. The coarse-grained autocorrelations of are given by sub-sampling the original autocorrelations calculated from the full measurement. With denoting the bin centered at , where , we estimate the coarse-grained autocorrelations of by
where , , and and represent the number of terms in the respective sums.
Above, we define as the product of the signal density and the coarse-grained pair separation function:
where . In each stage of frequency marching, we estimate the Fourier coefficients and the parameters by fitting the coarse-grained autocorrelations of through the non-linear least squares
Our frequency marching scheme increments the order from to , and the computed solution of each stage is used to initialize optimization in the next stage.
Iii Expectation maximization
In this section, as an alternative to the autocorrelations approach, we describe an approximate EM algorithm to address both the cases of well-separated signals and arbitrary spacing distribution. A frequency marching scheme is also designed to help the iterates of the EM algorithm converge to the global maximum of the data likelihood.
Iii-a Well-separated signals
Given the measurement that follows the MTD model (1), the maximum marginal likelihood estimator (MMLE) for the signal is the maximizer of the likelihood function . Within the EM framework , the nuisance variable is treated as a random variable drawn from some distribution under the condition of non-overlapping signal occurrences. The EM algorithm estimates the MMLE by iteratively applying the expectation (E) and maximization (M) steps. Specifically, given the current signal estimate , the E-step constructs the expected log-likelihood function
where the summation runs over all admissible configurations of the binary sequence . The signal estimate is then updated in the M-step by maximizing with respect to . The major drawback of this approach is that the number of admissible configurations for grows exponentially with the problem size. Therefore, the direct application of the EM algorithm is computationally intractable.
In our framework of the approximate EM algorithm, we first partition the measurement into non-overlapping segments, each of length , and denote the segment by . Overall, the signal can occur in different ways when it is present in a segment. The signal is estimated by the maximizer of the approximate likelihood function
where we ignore the dependencies between segments. Our approximate EM algorithm works by applying the EM algorithm to estimate the MMLE of (13), without any prior on the signal. As we will see in Section IV, the validity of the approximation is corroborated by the results of our numerical experiments.
Depending on the position of signal occurrences, the segment can be modeled by
Here, first zero-pads entries to the left of , and circularly shifts the zero-padded sequence by positions, that is,
where . The operator then crops the first entries of the circularly shifted sequence, which are further corrupted by additive white Gaussian noise. In this generative model, represents no signal occurrence in , and enumerate the different ways a signal can appear in a segment.
In the E-step, our algorithm constructs the expected log-likelihood function
given the current signal estimate , where
with the normalization . From Bayes’ rule, we have
which is the normalized likelihood function , weighted by the prior distribution . In general, the prior distribution is independent of the model and can be estimated simultaneously with the signal.
Denoting the prior distribution by , we rewrite (14) as (up to an irrelevant constant)
The M-step updates the signal estimate and the priors by maximizing under the constraint that the priors lie on the simplex :
As shown in Appendix B, we obtain the update rules
where , and
where . We repeat the iterations of the EM algorithm until the estimates stop improving, as judged within some tolerance.
Iii-B Arbitrary spacing distribution
We extend the approximate EM approach to address arbitrary spacing distribution of signal occurrences by reformulating the probability model: each segment can now contain up to two signal occurrences. In this case, the two signals can appear in different combinations in a segment, which is explicitly modeled by
where and . Given the signal estimate , the probability that and is
and we have the normalization condition
Incorporating the terms with two signal occurrences, the E-step constructs the expected log-likelihood function as
where the prior is denoted by .
Under the assumption that the signal occurrences in any subset of follow the same spatial distribution, we can parametrize the priors with the pair separation function (see Definition 1). Recall that is the number of pairs of consecutive signal occurrences whose starting positions are separated by exactly positions. The priors can be related to the probability that two signal occurrences appear in the combination specified by in a segment of length selected from the measurement , which is estimated by
where and . Here, is the number of segments that realize the configuration of signal occurrences specified by , and indicates the total number of segment choices. The priors can similarly be related to the probability that a signal occurs in the way specified by in a segment of length :
where . An interesting observation is that the number of signal occurrences can be estimated by
since the signal occurrences are required not to overlap. The value of the prior is determined by the normalization
In the special case of well-separated signals, where for , we have and the normalization .
In the M-step, we update the signal estimate and the parameters by maximizing under the constraint that (24) is satisfied:
As shown in (III-B), is additively separable for and , so the constrained maximization (III-B) can be reached by maximizing with respect to and the parameters separately. Maximizing with respect to yields the update rule (see Appendix C for derivation)
where . To update the parameters , we note that the function is concave with respect to these parameters and the constraint (24) forms a compact convex set. Therefore, the constrained maximization is achieved using the Frank-Wolfe algorithm .
Iii-C Frequency marching
Because the iterates of the EM algorithm are not guaranteed to converge to the maximum likelihood, we develop a frequency marching scheme to help the iterates converge to the global maximum. Recall that frequency marching converts the original optimization problem into a series of coarse-grained problems with gradually increasing resolution. The coarse-grained version of the EM algorithm is characterized by the spatial resolution
where . This spatial resolution models the signal shifts in the coarse-grained problem, with the corresponding expected likelihood function given by
where . This likelihood function is equal to the original one shown in (III-B) restricted to the rounded shifts, so the signal can be updated with (III-B) by ignoring the terms with irrelevant shifts.
where , , , and denotes the coarse-grained pair separation function. Similarly, the priors are uniquely specified by the positive parameters , where
We update the priors by maximizing with respect to the parameters under the constraint that the parameters lie on the simplex defined by
Incrementing from to , the estimated signal and priors in each stage of frequency marching are used to initialize the optimization problem in the next stage.
Iv Numerical Experiments
This section describes the construction of our synthetic data and the performance of the two proposed methods. A measurement is generated by first sampling integers from , with the constraint that any two samples differ by at least entries. The sampling is done by generating the random integers uniformly one by one, rejecting any integer that would violate the constraint with respect to previously accepted samples. The integers indicate the starting positions of the signal occurrences, which are recorded as the non-zero entries of the binary sequence . The noisy measurement is given by the linear convolution of with the signal , with each entry of further corrupted by additive white Gaussian noise of zero mean and variance . Our synthetic data all have length , and are constructed with the signal shown in Figure 2(a), which has length . Also shown in Figure 2(b) and 2(c) are examples of the corrupted signals by different levels of noise. The number of signal occurrences is adjusted so that the signal density is equal to 0.3 and 0.5 for the cases of well-separated signals and arbitrary spacing distribution, respectively. We use for the case of well-separated signals, while is chosen to test our methods for arbitrary spacing distribution of signal occurrences, with the resulting pair separation function shown in Figure 3. The code for all experiments is publicly available at https://github.com/tl578/multi-target-detection.
For both cases of signal spacing distribution, we reconstruct the signal from synthetic data using the two proposed methods with the frequency marching scheme over a wide range of . When the frequency marching scheme is not implemented, we observe that the algorithms usually suffer from large errors due to the non-convexity of the problems. A total of 20 instances of synthetic data are generated for each value of , and each instance is solved by the two methods with 10 different random initializations. The computed solutions that minimize the cost functions defined in (II-A) or (II-B) are chosen as the reconstruction for each instance using autocorrelation analysis. For the EM approach, the computed solutions with the maximum data likelihood are selected as the reconstruction for each instance. Given the signal estimate , we approximate the data likelihood for the cases of well-separated signals and arbitrary spacing distribution by
The RMSEs of the reconstructed signal for the case of well-separated signals are shown in Figure 4. As predicted in Section II-A, the RMSE for autocorrelation analysis scales with at high SNR and at low SNR. Interestingly, the EM algorithm exhibits the same behavior, which empirically shows that the two approaches share the same scaling of sample complexity in the two extremes of noise levels. The same scaling is not observed for the reconstructed signal density (not shown), although the relative errors are generally well below a few percents even in the noisiest cases.
Figure 5 shows the RMSEs of the reconstructed signal for the case of arbitrary spacing distribution. As a comparison, we also implemented the algorithms that assume well-separated signals on the same datasets. We first see that autocorrelation analysis assuming well-separated signals obtains poor reconstruction at all noise levels. Although the EM approach that assumes well-separated signals produces more accurate estimates, the nearly constant RMSE at high SNR indicates the systematic error due to model misspecification. By contrast, the algorithms that assume arbitrary spacing distribution achieve much better reconstruction, and the resulting RMSEs have the same scaling behaviors at the two extremes of SNR as shown in Figure 4. The RMSEs of the reconstructed values of , defined in (10), are shown in Figure 6. Although our methods are not able to reconstruct to high precision, the results shown in Figure 5 indicate the necessity to include the pair separation function in the model to achieve good reconstruction of the signal.
Figure 7 shows the average run time to solve an instance of synthetic data for the two approaches. The lower speed of the EM approach results from the need to cross-correlate all the observed segments with the signal estimate in each iteration. This issue becomes especially prominent in the case of arbitrary spacing distribution because the configuration of two signals has to be considered for each observed segment. On the other hand, autocorrelation analysis requires shorter computation time by summarizing the data as autocorrelation statistics with one pass over the data. We note that the distinct computational speed for autocorrelation analysis and the EM algorithm is also observed in the related problem of multi-reference alignment [5, 10]. The difference in run time and the similar reconstruction quality of the two methods at low SNR (see Figure