MTD with rotations

# Multi-target detection with rotations

## Abstract.

We consider the multi-target detection problem of estimating a two-dimensional target image from a large noisy measurement image that contains many randomly rotated and translated copies of the target image. Motivated by single-particle cryo-electron microscopy, we focus on the low signal-to-noise regime, where it is difficult to estimate the locations and orientations of the target images in the measurement. Our approach uses autocorrelation analysis to estimate rotationally and translationally invariant features of the target image. We demonstrate that, regardless of the level of noise, our technique can be used to recover the target image when the measurement is sufficiently large.

###### Key words and phrases:
Multi-target detection, autocorrelation analysis, bispectrum, single-particle reconstruction, cryo-EM

## 1. Introduction

Let be a noisy measurement image that contains randomly rotated and translated copies of a target image . More precisely, suppose that is supported on the unit disc, and is the rotation of by angle about the origin. Further, let be the discretization of defined by for a fixed integer . We assume that the measurement has the form

 (1) M(x)=p∑j=1Fϕj(x−xj)+ε(x),

where are uniformly random rotations; are arbitrary translations; and is i.i.d. Gaussian noise on with mean zero and variance , see the example in Figure 1.

We further impose a separation condition for , which ensures that the targets in the measurement are separated by at least the diameter of their support. We also assume a density condition so that the targets appear in the measurement at some minimal density.

Given the measurement , the objective is to recover the function . This problem is called multi-target detection (MTD) with rotations [7, 17]. Motivated by single-particle cryo-electron microscopy (cryo-EM), we focus on the low SNR regime, see Figure 1(c), where estimating the unknown translations and rotations is challenging [3, 6, 15]. We pose the following question.

###### Question.

Suppose that is a measurement of the form described in (1) for fixed signal radius and density . If the variance of the noise  is fixed (but might be arbitrarily high), can the function be estimated from  to any fixed level of accuracy when is sufficiently large?

In this paper, we develop a mathematical and computational framework for MTD with rotations, and show empirically that the answer to the above question is affirmative. In particular, we describe an algorithm for recovering the function  from the measurement and demonstrate its effectiveness and numerical stability. Additionally, in Section 3, we consider a simplified version of this statistical estimation problem in one dimension, where we are able to establish a theoretical foundation for this algorithm.

## 2. Motivation and related work

### 2.1. Motivation

Our interest in the MTD model arises from the structure determination problem for biological molecules. In the past decade, cryo-EM has emerged as a potent alternative to X-ray crystallography and nuclear magnetic resonance (NMR) spectroscopy to resolve the structures of proteins that either cannot be crystallized or are too complex for NMR. In cryo-EM, a solution that contains many copies of the target particle is rapidly cooled to form thin vitreous ice sheets whose thickness is comparable to the single molecule size. These sheets are then imaged with an electron microscope. The measurements in cryo-EM can be modeled as two-dimensional tomographic projections of identical biomolecules at unknown locations and orientations followed by some image distortion due to the imaging system. The projection images are embedded in a large, noisy image, called a micrograph. The crux of single-particle cryo-EM reconstruction is that, with sufficiently many micrographs, projection images of similar molecule orientations can be combined to improve the SNR and, in turn, reconstruct the high-resolution three-dimensional structure of the molecule.

The current computational pipeline for cryo-EM requires particle picking, the extraction of the biomolecule projection images from the micrographs [10, 11, 14, 27, 32, 31]. Then, the three-dimensional structure is built from the extracted images using a variety of algorithms [5, 12, 13, 21, 26, 29]. This approach is problematic for small particles where the SNR of micrographs is low, and detection becomes impossible [6, 15]. As such, the difficulty of detection sets a lower bound on the usable molecule size in the current analysis workflow of cryo-EM data.

Interest in signal recovery beyond the detection limit has prompted the realization that the locations of the signal in the measurement are nuisance parameters; the emerging claim is that signal recovery can be achieved directly from the measurement [6]. Methodologies for direct image estimation have been inspired by Zvi Kam’s introduction of autocorrelation analysis to the structure reconstruction problem dealing with randomly oriented biomolecule projections [16]. The process involves accumulating the “spatial correlations”, or autocorrelations, of signal density in the measurements in order to average out the noise without estimating the rotations. These averages are then used for the reconstruction of the target image. Following Kam’s seminal paper on autocorrelations, several procedures based on correlations and moments have been proposed for cryo-EM and related modalities, e.g.,  [4, 8, 1, 18, 20, 24, 25, 28].

### 2.2. Related work

The problem addressed in this paper—with rotated and translated iterations of  within —extends previous works on the MTD model [7, 17]. In particular, we extend [19] by providing new theoretical understanding of a 1-dimensional model, and demonstrating empirically that reconstruction is possible from a measurement of the form (1). This is an important step toward the reconstruction of molecules in the undetectable domain. More generally, it attests to the possibility of direct image estimation from measurements so that limitations on particle picking do not necessarily translate to limitations on structure determination.

## 3. One-Dimensional Problem

Before considering the two-dimensional problem (1), we introduce an analogous problem in one dimension. This simplified version will allow us to derive solid theoretical justification for the autocorrelation framework we devise for the two-dimensional case.

### 3.1. Measurement

Let be a one-dimensional target signal supported on , and be the result of cyclically rotating the support of . That is, for , where we consider an integer modulo to be an element of , and when . Here, we will work with an one-dimensional analogue of the two-dimensional measurement defined in (1), where the measurement  is given by

 (2) M(x)=p∑j=1Fτj(x−xj)+ε(x),

where are uniformly random cyclic shifts; are arbitrary translations; and is i.i.d. Gaussian noise on with mean zero and variance . We plot an examples of the 1-dimensional measure with three different levels of noise in Figure 2.

The circular shifts are the one-dimensional analogues of the two-dimensional rotations in (1). To extend the previous assumptions about the target separation and bounded density to this one-dimensional formulation, we assume that for all and that . In Theorem 3.2, we will additionally impose that the discrete Fourier transform (DFT) of is non-vanishing.

As above, our objective is to estimate the function from the measurement  in the low SNR regime. In particular, we would like to show that can be reliably and accurately estimated from at any fixed level of noise, which might be arbitrarily high, as long as the size of the measurement is sufficiently large. The reliability and accuracy of this estimate will be quantified below. The approach is based on seeking features of that determine the function and are invariant to translations and circular shifts of the support . The construction of these invariant features is based on autocorrelation analysis.

### 3.2. Invariant features

Recall that is supported on and is a rotated version of . We can define features of that are invariant to rotations and translations. The most direct example is the mean of the function

 (3) TF=12nn−1∑x=−nF(x).

Motivated by autocorrelation analysis, the mean above can also be interpreted as the first-order autocorrelation. The rotationally-averaged second-order autocorrelation is defined by

 UF(x1)=12nn−1∑τ=−n12nn−1∑x=−nFτ(x)Fτ(x+x1).

Considering the sum geometrically, we observe that is only a function of the magnitude , and so, cannot contain sufficient information to recover . Thus, the critical invariant is the rotationally-averaged third-order autocorrelation , which is defined by

 (4) VF(x1,x2)=12nn−1∑τ=−n12nn−1∑x=−nFτ(x)Fτ(x+x1)Fτ(x+x2).

By construction, both and are invariant under translations and rotations . That is, and when for fixed and .

### 3.3. Estimation from measurement

The function can be estimated from a measurement of the form described in §3.1. For simplicity, let us extend the separation condition so that it also holds periodically in the sense that . We define the third-order autocorrelation of the measurement by

 AM(x1,x2)=1mm∑x=1M(x)M(x+x1modm)M(x+x2modm),

where an integer modulo is taken as an element of .

The following lemma shows that can be estimated from (namely, from the data) if is much larger than . Information theoretic results that were derived for a closely related model called multi-reference alignment indicate that this is the optimal estimation rate in the low SNR regime where while and are fixed [2, 4, 20].

###### Lemma 3.1.

Suppose that everywhere for some constant . Under the one-dimensional model (2), we have:

 E(AM(x1,x2))=γnVF(x1,x2)+2γTFσ2(δ0(x1−x2)+δ0(x1)+δ0(x2)),

and

 Var(AM(x1,x2))=O(nm(γF6max+σ6)),

where the expectation and variance are taken with respect to the random cyclic shifts and the Gaussian noise, and when and otherwise.

###### Proof.

See Appendix A. ∎

The retrieval of from , combined with Lemma 3.1, would result in the extraction of from the measurement up to a rotation, given a sufficiently large measurement.

### 3.4. Recovery from invariant features

The DFT of the function considered as a function on its support is defined by

 (5) ak:=n−1∑x=−nF(x)e−2πikx/(2n),k∈{−n,…,n−1}.

We now show that determines via a closed form when its DFT is non-vanishing. We remark that such a non-vanishing condition is standard for problems related to autocorrelation inversion, see for example [8, 20].

###### Theorem 3.2.

Suppose that the DFT of expressed in (5) is non-vanishing. Then, can be determined from via a closed form expression such that for some . That is, can be recovered up to a circular shift.

###### Proof.

Let designate the third-order autocorrelation

 (6) AF(x1,x2)=12n2n−1∑x=−2nF(x)F((x+x1)mod4n)F((x+x2)mod4n),

where an integer modulo is taken to be element of . Observe that we have

 VF(x1,x2)=12nn−1∑τ=−nAFτ(x1,x2).

Indeed, since is supported on , taking it as a periodic function in (6) does not change the result. Let denote the Fourier coefficients of considered as a periodic function on ; that is,

 bm:=n−1∑x=−nF(x)e−2πimx/(4n).

By Fourier inversion on the interval , we have

 F(x)=14n2n−1∑m=−2nbme2πimx/(4n).

Substituting the representation of in terms of the coefficients into and summing over yields

 AF(x1,x2)=12n1(4n)32n−1∑m1,m2=−2nbm1bm2b−m1−m2e2πi(m1x1+m2x2)/(4n).

Next, by taking the two-dimensional DFT of on , we can recover for :

 bm=n−1∑j=−n(12nn−1∑k=−nake2πikj/(2n))e−2πimj/(4n)=n−1∑k=−nγm,kak,

where

 γm,k:=12nn−1∑j=−ne2πikj/(2n)e−2πimj/(4n).

Observe that if we have and if . It follows that

 b2k1b2k2b−2k1−2k2=ak1ak2a−k1−k2

for . The quantity , called the bispectrum of the function , is invariant under cyclic shifts of the underlying function and determines uniquely, up to a global cyclic shift  [8, 23, 30]. Thus, given , we can determine for some , as desired. ∎

## 4. Two-dimensional problem

After having established the theoretical foundation for the one-dimensional problem above, we aim to extend the recovery of the underlying function to two dimensions. In order to make this estimation tractable, it is necessary to make regularity assumptions on the function ; we build the foundation for these assumptions below.

### 4.1. Invariant features in the continuous setting

As in §3.2, we define the continuous two-dimensional analogues for the features of that are invariant under translations and rotations. As before, the first invariant is the mean of the function

 qf:=∫R2f(x)dx.

Letting be the rotation of by angle about the origin, we define rotationally-averaged second-order autocorrelation by

 rf(x1):=12π∫2π0∫R2fϕ(x)fϕ(x+x1)dxdϕ.

Finally, to gain enough information for the recovery of , the rotationally-averaged third-order autocorrelation is

 sf(x1,x2):=12π∫2π0∫R2fϕ(x)fϕ(x+x1)fϕ(x+x2)dxdϕ.

In this case, observe that is a function of and the angle between  and . Geometrically, as a function of three variables, potentially contains enough information to recover .

### 4.2. Invariant features in the discrete setting

As in the one-dimensional case, we will focus on the recovery of some discretization of from some discretization of  under similar assumptions to those in Theorem 3.2. Restricting our attention to this problem is consistent with the fact that actual measurements are discretized over a pixel grid. Conveniently, this also considerably simplifies the presentation of the method.

We define the discretization of by

 Fϕ(x)=fϕ(x/n),forx∈Z2,

where is a fixed integer that determines the sampling resolution. We define the discrete rotationally-averaged third-order autocorrelation by

 (7) SF(x1,x2):=12π∫2π014n2∑x∈Z2Fϕ(x)Fϕ(x+x1)Fϕ(x+x2)dϕ.

Since is supported on the open unit disc , it follows that is supported on , and is supported on

 X:={−2n,…,2n−1}2⊂Z2,

which contains points.

### 4.3. Estimation from measurement

Suppose that , a measurement of the form in (1), is given. We define the third-order autocorrelation of as by

 AM(x1,x2):=1m2∑x∈Z2M(x)M(x+x1)M(x+x2).

Recall that the measurement is defined by

 M(x)=p∑j=1Fϕj(x−xj)+ε(x),

where are random angles, are translations, and is noise (see §1). As before, we assume that images in are separated by at least one image diameter according to

 |xj1−xj2|≥4n,forj1≠j2,

and that the density of the target images in the measurement is for a fixed constant . Under these assumptions, it is straightforward to show that for any fixed level of noise , fixed signal radius and fixed ,

 (8) AM(x1,x2)→γ2πSF(x1,x2)+γ2πσ2μF(δ(x1)+δ(x2)+δ(x1−x2)),

as (see for example [6]), where is the discrete mean of defined by

 μF=14n2∑x∈Z2Fϕ(x).

As such, (8) relates the third-order autocorrelation of the measurement to the invariant features and . In practice, and can be estimated from : can be estimated by the variance of the pixel values of in the low SNR regime, while can be estimated by the empirical mean of . As a result, , a feature of the image, can be estimated from , a feature of the measurement, up to a constant factor.

### 4.4. Band-limited functions on the unit disc

The Dirichlet Laplacian eigenfunctions on the unit disc are solutions to the eigenvalue problem

 {−Δψ=λψin Dψ=0on ∂D,

where is the Laplacian, and is the boundary of the unit disc. In polar coordinates , these eigenfunctions are of the form

 (9) ψν,q(r,θ)=Jν(λν,qr)eiνθ,

where , is the -th order Bessel function of the first kind, and is the -th positive root of . Recall that is a solution to the differential equation

 y′′(r)+1ry′(r)+(1−ν2r2)y(r)=0.

Therefore, by writing the Laplacian in polar coordinates, we have

 −Δψν,q(r,θ)=−(∂rr+1r∂r+1r2∂θθ)ψν,q(r,θ)=λ2ν,qψν,q(r,θ),

and, as such, is the eigenvalue corresponding to the eigenfunction . Therefore, the projection operator

 Pλf=∑(ν,q):λν,q≤λ⟨f,ψν,q⟩∥ψν,q∥22ψν,q

can be viewed as a low-pass filter for functions on the unit disc; we call functions that are invariant under this projection operator band-limited.

### 4.5. Steerable bases

Recall that is supported on the unit disc. Using the notation from §4.4, the assumption that is band-limited on its support can be written as

 (10) f(r,θ)=∑(ν,q):λν,q≤λαν,qψν,q(r,θ),for r≤1,

where is the band-limit frequency, and are expansion coefficients. For each , we define

 gν(r,θ)=∑q:λν,q≤λαν,qψν,q(r,θ)=⎛⎝∑q:λν,q≤λαν,qJν(λν,qr)⎞⎠eiνθ,

so that we can write by

 (11) f(r,θ)=N∑ν=−Ngν(r,θ),

where .

The advantage of expressing a function in terms of Dirichlet Laplacian eigenfunctions is that the basis is steerable—the effect of rotations on expansion coefficients of the images are expressed as phase modulation. Specifically, a steerable basis diagonalizes the rotation operator so that the rotation of about the origin by angle can be computed by multiplying each term in the sum in (11) by :

 (12) fϕ(r,θ)=N∑ν=−Ngν(r,θ)eiνϕ.

From this point forward, we will switch between considering functions in polar coordinates or Cartesian coordinates , where , depending on which is more convenient.

### 4.6. Using the band-limited assumption

We now take advantage of the assumption that is band-limited on the unit disc. Let be the discretization of the Dirichlet Laplacian eigenfunctions

 Ψν,q(x)=ψν,q(x/n),

where is supported on the unit disc, as in §4.4. With this notation,

 Fϕ(x)=∑(ν,q):λν,q≤λαν,qΨν,q(x)eiνϕ.

By considering and as functions on , we can express their DFT by

 ^Fϕ(k)=∑x∈XFϕ(x)e−2πix⋅k/(4n).

Finally, we let be the DFT of

 ^Ψν,q(k)=∑x∈XΨν,q(x)e−2πix⋅k/(4n).

Then, by the linearity of the DFT it follows from the previous section that

 ^Fϕ(k)=∑(ν,q)∈Vαν,q^Ψν,q(k)eiνϕ,

where .

### 4.7. Discrete Fourier transform of invariant features

The Fourier transform defined in the previous section can now be related to the Fourier transform of . We define by

 (13) ^SF(k1,k2):=∑x1∈X∑x2∈XSF(x1,x2)e−2πi(k1⋅x1+k2⋅x2)/4n,

where addition is considered modulo with as the representatives of the different equivalence classes. Substituting (7) into (13) and simplifying gives

 ^SF(k1,k2)=∫2π0^Fϕ(k1)^Fϕ(k2)^Fϕ(−k1−k2)dϕ.

This integral over can be replaced by a summation over the rotations at the Nyquist rate so that the expression becomes:

 (14) ^SF(k1,k2)=6N−1∑j=0^Fϕj(k1)^Fϕj(k2)^Fϕj(−k1−k2),

where . If , then the products that appear in (14) are band-limited by with respect to . Also, note that the summand on the right hand side of (14) is the DFT of the third autocorrelation of a function and is called the bispectrum [8, 23, 30]. We encountered the one-dimensional analogue of the bispectrum at end of the proof to Proposition 3.2.

### 4.8. Vector notation

Recall the enumeration from §4.6, which consists of such that

 ^Fϕ(k)=d∑j=1ανj,qj^Ψνj,qj(k)eiνjϕ,

for . For a fixed angle and , we define the vector by

 uj(ϕ,k)=^Ψνj,qjeiνjϕ.

Thus, each vector defines the DFT of a band-limited function by

 ^Fv,ϕ(k)=n∑j=1vjuj(ϕ,k)=v⊤u(ϕ,k),

where is the transpose of . The following lemma is immediate from (14) and the product rule.

###### Lemma 4.1.

We have

 ^SFv(k1,k2)=6N−1∑j=0v⊤u(ϕj,k1)v⊤u(ϕj,k2)v⊤u(ϕj,−k1−k2)

where . Moreover, the -dimensional gradient satisfies

 ∇v^SFv(k1,k2)=6N−1∑j=0(v⊤u(ϕj,k1)v⊤u(ϕj,k2)u(ϕj,−k1−k2)+v⊤u(ϕj,k1)v⊤u(ϕj,−k1−k2)u(ϕj,k2)+v⊤u(ϕj,k2)v⊤u(ϕj,−k1−k2)u(ϕj,k1)).

In the next section, we describe how to estimate from a measurement and form an optimization problem to recover the target image from the measurement .

### 4.9. Computational complexity

For some intuition regarding the computational complexity of computing and , recall that so that, crudely, the image has pixels. In the following, we make the assumption that the number of eigenfunctions used to expand the function should not exceed the number of pixels in the image. Notationally, .

###### Proposition 4.2.

We can compute and for all  in  operations.

###### Proof.

First, we compute for and . Each inner product resolves to operations for total evaluations. Eigenvalue asymptotics show that is of the order of so this computation involves operations.

After this pre-computation, it is straightforward to calculate for all in operations. For , the key observation is that the gradient is a linear combination of vectors of length . First, we compute the coefficients in operations and then sum the vectors in operations.

## 5. Numerical results

### 5.1. Optimization problem

We now delineate an optimization problem for the estimation of from . Recall that each vector defines the Fourier transform of a band-limited function on the disc by

 ^Fv,ϕ(k)=v⊤u(ϕ,k),

as in §4.8. We define the least squares cost function by

 g(v)=12∑(k1,k2)∈X2(^SFv(k1,k2)−^SF(k1,k2))2.

Using the chain rule, the gradient of is

 ∇g(v)=∑(k1,k2)∈X2(^SFv(k1,k2)−^SF(k1,k2))∇v^SFv(k1,k2),

where and can be computed via the formulas in Lemma 4.1.

### 5.2. Recovery from invariant features

Given a cost function and a gradient, there are a variety of optimization methods that can be used. For simplicity, we use the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm, which is a popular gradient-based optimization method.

First, we consider the problem of recovering from in the absence of noise. We generate a band-limited image by projecting a image of a tiger onto the span of the first Dirichlet Laplacian eigenfunctions as in Figure 3.

Using the BFGS optimization algorithm, the image in Figure 3 can be recovered with reconstruction error . This optimization takes seconds parallelized over 100 CPUs in total.

### 5.3. Using symmetry to average noise

Recall that is a discrete version of by

 sf(x1,x2):=12π∫2π0∫R2fϕ(x)fϕ(x+x1)fϕ(x+x2)dxdϕ,

which only depends on the three parameters: the magnitudes , and the angle between and . Moreover, the Fourier transform of will have these same symmetries. So, it follows that , which is a discrete version of , will also approximately exhibit these symmetries.

However, since is sampled on a grid, the symmetry will not be exact. In order to still take advantage of the expected symmetry when is estimated from a noisy measurement , we introduce a “binning” function. Let be defined by

for fixed parameters and be the range of . The corresponding cost function is then

 gb(v)=12∑T∈T⎛⎝∑(k1,k2)∈IT(^SFv(k1,k2)−^SF(k1,k2))⎞⎠2,

such that

 ∇gb(v)=∑T∈T⎛⎝∑(k1,k2)∈IT^SFv(k1,k2)−^SF(k1,k2)⎞⎠∑(k1,k2)∈IT∇^SFv(k1,k2),

where is the pre-image of under . This is the same as the cost function above except that the elements are now summed within the same symmetric bin . For the numerical results reconstructing from a noisy measure, we report errors with binning.

### 5.4. Recovery from noisy micrographs

Returning to the original problem presented in §1, we recall the problem of recovering a band-limited image from a measurement as the size of the measurement tends to infinity. To approximate the behavior practically, for both computational purposes and reflecting the applications to cryo-EM, we fix and call a measurement a micrograph.

We report numerical results in terms of the number of independent micrographs that are used. In particular, we report both the relative error in the invariant and the recovery of the image . The error for is summed over the bins while the relative error in recovering is calculated after running BFGS optimization using the cost and gradient described in §5.3. In numerical experiments, we take a target image with and assume that the image is band-limited in the first Dirichlet Laplacian eigenfunctions. The relative errors from these experiments are displayed in Figure 4.

Note that the relative error in the binned invariant and reconstruction both decrease at a consistent rate of one over the square root of the number of micrographs (the expected estimation rate if the locations and rotations of the images were known). Thus, with enough micrographs, these results indicate that the recovery is possible regardless of the level of noise. Moreover, this conclusion resolves the initial question of this paper: we have presented an algorithm for the recovery of an image from a measurement of the form (1) that gives predictable results in terms of the error in the invariant.

## 6. Discussion

This paper contributes to a series of works whose goal is to understand the limits of image recovery using an invariant-based approach to solve multi-target detection problems. After the challenges of particle picking were identified and a detection limit was proven, a promising direction recalled autocorrelation analysis for image recovery that did not rely on first identifying signal location within a measurement. This paper shows that the estimation of the target image is possible theoretically in one-dimension and builds on this intuition to empirically demonstrate that direct recovery is possible regardless of noise level in two-dimensional settings with in-plane translations and rotations of the target signal.

Future work includes extending Theorem 3.2 to the two-dimensional case, studying the high-dimensional regime where the size of the target image is large (see for example [22]), and exploring super-resolution limits in the MTD model [9]. Our ultimate goal is to complete the program outlined in [6] and devise a computational framework to recover a three-dimensional molecular structure directly from micrographs.

### Acknowledgments

NFM was supported in part by NSF DMS-1903015. TYL and AS were supported in part by Award Numbers FA9550-17-1-0291 and FA9550-20-1-0266 from AFOSR, Simons Foundation Math+X Investigator Award, the Moore Foundation Data-Driven Discovery Investigator Award, NSF BIGDATA Award IIS-1837992, NSF DMS-2009753 and NIH/NIGMS 1R01GM136780-01 . TB was supported in part by NSF-BSF grant number 2019752, and the Zimin Institute for Engineering Solutions Advancing Better Lives.

## Appendix A Technical lemmas

### Proof of Lemma 3.1 (Expectation)

By the definition of and linearity of expectation, we have

 E(AM(y1,y2))=1mm∑x=1E(M(x)M(x+y1)M(x+y2)).

Let for notational purposes. By the definition of , see (2), we have