Resolution analysis of imaging with optimization
We study array imaging of a sparse scene of point-like sources or scatterers in a homogeneous medium. For source imaging the sensors in the array are receivers that collect measurements of the wave field. For imaging scatterers the array probes the medium with waves and records the echoes. In either case the image formation is stated as a sparsity promoting optimization problem, and the goal of the paper is to quantify the resolution. We consider both narrow-band and broad-band imaging, and a geometric setup with a small array. We take first the case of the unknowns lying on the imaging grid, and derive resolution limits that depend on the sparsity of the scene. Then we consider the general case with the unknowns at arbitrary locations. The analysis is based on estimates of the cumulative mutual coherence and a related concept, which we call interaction coefficient. It complements recent results in compressed sensing by deriving deterministic resolution limits that account for worse case scenarios in terms of locations of the unknowns in the imaging region, and also by interpreting the results in some cases where uniqueness of the solution does not hold. We demonstrate the theoretical predictions with numerical simulations.
today Key words. array imaging, sparse, optimization, cumulative mutual coherence.
Array imaging is an inverse problem for the wave equation, where the goal is to determine remote sources or scatterers from measurements of the wave field at a collection of nearby sensors, called the array. The problem has applications in medical imaging, nondestructive evaluation of materials, oil prospecting, seismic imaging, radar imaging, ocean acoustics and so on. There is extensive literature on various imaging approaches such as reverse time migration and its high frequency version called Kirchhoff migration [2, 3, 14], matched field imaging , Multiple Signal Classification (MUSIC) [30, 23], the linear sampling method , and the factorization method . In this paper we consider array imaging using optimization, which is appropriate for sparse scenes of unknown sources or scatterers that have small support in the imaging region.
Imaging with sparsity promoting optimization has received much attention recently, specially in the context of compressed sensing [21, 19, 29], where a random set of sensors collect data from a sparse scene. Such studies use the restricted isometry property of the sensing matrix  or its mutual coherence  to derive probability bounds on the event that the imaging scene is recovered exactly for noiseless data, or with small error that scales linearly with the noise level. The array does not play an essential role in these studies, aside from its aperture bounding the random sample of locations of the sensors, and for justifying the scaling that leads to models of wave propagation like the paraxial one .
A different approach proposed in [11, 12] images a sparse scattering scene using illuminations derived from the singular value decomposition (SVD) of the response matrix measured by probing sequentially the medium with pulses emitted by one sensor at a time and recording the echoes. Iluminations derived from the SVD are known to be useful in imaging [27, 6, 4, 5, 23] and they may mitigate noise. The setup in , which is typical in array imaging, lets the sensors be closely spaced so that sums over them can be approximated by integrals over the array aperture. We consider the same continuous aperture setup here and study the resolution of the images produced by optimization, also known as basis pursuit and penalty. We address two questions: (1) How should we chose the discretization of the imaging region so that we can guarantee unique recovery of the sparse scene, at least when the unknowns lie on the grid? (2) If the imaging region is discretized on a finer grid, for which uniqueness does not hold, are there cases where the solution of the optimization is still useful?
By studying question (1) we complement the existing results with deterministic resolution limits that account for worse case scenarios, and guarantee unique recovery of the scene for a given sparsity . This is defined as the number of non-zero entries of the vector of unknowns or, equivalently, the number of grid points in the support of the sources/scatterers that we image. We consider a geometric setup with a small array, where wave propagation can be modeled by the paraxial approximation. We have a more general paraxial model than in , which takes into consideration sources/scatterers at different ranges from the array. This turns out to be important in narrow-band regimes. We also consider broad-band regimes and show that the additional multi-frequency data improves the resolution.
It is typical in imaging with sparsity promoting optimization to assume that the unknown sources or scatterers lie on the discretization grid, meaning that they can be modeled by a sparse complex vector , where is the number of grid points. If the unknowns lie off-grid the results deteriorate. We refer to [24, 18] for a perturbation analysis of compressed sensing with small off-grid displacements. General tight error bounds can be found in . They may be quite large and increase with . Thus, there is a trade-off in imaging with optimization: on one hand we need a coarse enough discretization of the imaging region to ensure unique recovery of the solution, and on the other hand finer discretization to minimize modeling errors due to off-grid placement of the unknowns. This trade-off is particularly relevant in the narrow-band paraxial regime, where the resolution limits may grow significantly with the sparsity of the scene.
At question (2) we consider fine discretizations of the imaging region, to mitigate the modeling error. The problem is then how to interpret the result of the minimization, which is no longer guaranteed to be unique. We show that there are cases where the minimization may be useful. Specifically, we prove that when the unknown sources/scatterers are located at points or clusters of points that are sufficiently well separated, an minimizer is supported in the vicinity of these points. While the entries of may not be close in the point-wise sense to those of , their average over such vicinities are close to the true values in in the case of well separated points, or the averages of the true values in the case of clusters of points. That is to say, optimization gives an effective vector of source/scatterer amplitudes averaged locally around the points in its support. Note that question (2) was also investigated in , where novel algorithms for imaging well separated sources have been introduced and analyzed. Our study complements the results in  by analyzing directly the performance of the minimization and -penalty, and also considering clusters of sources/scatterers.
The paper is organized as follows. In section LABEL:sect:formulate we formulate the problem, introduce notation, and describe the relation between imaging sources vs. scatterers. Question (1) is studied in section LABEL:sect:parax. We describe the paraxial scaling regime and derive resolution bounds that depend on the sparsity of the imaging scene. In section LABEL:sect:cont we study question (2). In both sections we begin with the statement of results and numerical illustrations, and end with the proofs. A summary is in section LABEL:sect:sum.
2 Formulation of the imaging problem
We formulate first the basic problem of imaging point-like sources with a remote array of sensors that record the incoming sound waves. The generalization to the inverse scattering problem is described in section LABEL:sect:F1.2.
Suppose that there are unknown sources located at points in the imaging region , emiting signals at frequency , for . The hat stands for Fourier transform with respect to time, and reminds us that we work in the frequency domain. The receivers are at locations , for , where is a set on the measuring surface, called the array aperture. The sound pressure wave measured at and frequency is modeled by
where is the outgoing Green’s function of Helmholtz’s equation. The propagation is through a homogeneous medium with sound speed , and the Green’s function is
where is the wavenumber. The inverse source problem is to determine from the measurements (LABEL:eq:F1) at one or more frequencies .
2.1 Imaging sources with optimization
To state the inverse problem as an optimization we discretize with a regular grid of rectangular prisms, and let be the set of grid points denoted by . The lengths of the edges of the rectangular prisms are the components of the vector , called the mesh size. The sources may be on or off the grid. If they are on the grid, as assumed in section LABEL:sect:parax, we denote by the set of indexes of the grid points that support them. Explicitly, we define the bijective map , such that , for , and . When the sources are not on the grid, we let index the nearest points to each source, as explained in more detail in section LABEL:sect:cont. We assume henceforth that , meaning that the imaging scene is sparse.
Let be the data vector, with components , for and . The number of measurements is . Let also be the sensing matrix, with entries defined by , where normalizes the columns of , denoted by . Absorbing the normalization constants in the vector of unknowns, we obtain the linear system
For single frequency measurements at the normalization constant is
so that when the sources lie on the grid there are non-zero entries in , equal to
Here is the inverse of the mapping . For multiple frequency measurements we simplify the problem by letting
so that all the sources emit the same known signal multiplied by an unknown complex amplitude . This simplification is motivated by the inverse scattering problem described in section LABEL:sect:F1.2. It keeps the same number of unknowns as in the single frequency case, although we have more measurements. The normalization constants are
and the non-zero entries of equal
Note that we could have written the multiple frequency problem for unknowns, the Fourier coefficients of the signals emitted by the sources. However, at each frequency these have the same spatial support, so another optimization approach, known as Multiple Measurement Vector (MMV)  would be more appropriate. For the purpose of this paper it suffices to consider the simpler model (LABEL:eq:F6).
The optimization (basis pursuit) formulation of the inverse source problem is
where Our goal in section LABEL:sect:parax is to determine bounds on the mesh size so that (LABEL:eq:F7) has a unique sparse solution, equal to the true defined in (LABEL:eq:F4) and (LABEL:eq:F6). The analysis is based on the next lemma, following from [31, 32, 17].
Suppose that (LABEL:eq:F3) has an sparse solution and that the cumulative mutual coherence of matrix with columns of Euclidian length equal to one satisfies
where denotes the usual inner product in , and is a set of cardinality . Then is the unique sparse solution of (LABEL:eq:F3) and the unique minimizer of (LABEL:eq:F7).
To deal with noise and modeling (discretization) error, we also consider in section LABEL:sect:cont the penalty problem 
with parameter accounting for the trade-off between the approximation error and the sparsity of the unknown vector .
2.2 Imaging point scatterers
The problem of imaging a sparse scene of point scatterers at for can be written as one of imaging sources, as we now explain.
Let the array probe the medium with a signal emitted from the sensor at . Using the Foldy-Lax model [22, 26] we write the scattered wave at in the form (LABEL:eq:F1), with effective sources at emitting signals
Here is the reflectivity of the th scatterer, and is the wave that illuminates it. It is given by the sum of the incident wave and the wave scattered at the other points
where is the Kronecker delta.
In the Born approximation we neglect the sum in (LABEL:eq:F10), and simplify (LABEL:eq:F9) as
This can be written in the form (LABEL:eq:F3) with entries of like in (LABEL:eq:F6), and slightly redefined matrix and normalization constant
Multiple scattering effects can be included by solving the Foldy-Lax equations (LABEL:eq:F10) or, equivalently, the linear system
with the vector with components , and the matrix with entries
Note that is a perturbation of the identity matrix, and depending on the magnitude of the reflectivities and the distance between the scatterers, it is invertible. Again we can write the problem in the form (LABEL:eq:F3) with entries of like in (LABEL:eq:F6), except that now are more complicated and depend on the unknown reflectivity. An elegant solution of this nonlinear problem is in . It amounts to solving a source imaging problem like (LABEL:eq:F7), to determine the locations , for Because there are multiple emitters the authors use an MMV approach. Then the reflectivities are estimated using the Foldy-Lax model.
Given the relation between inverse scattering and source problems describe above, we focus attention henceforth on imaging sparse scenes of sources using (LABEL:eq:F7) or (LABEL:eq:F7b).
3 Imaging with small arrays
The setup is illustrated in Figure LABEL:fig:parax. We consider a planar square array of aperture size , and a coordinate system with origin at the center of the array and range axis orthogonal to it. The locations of the receivers are , with and , for . The imaging region is a rectangular prism with center on the range axis, at distance from the array. It has a square side in the cross-range plane, parallel to the array, and length in the range direction. The discretization of has grid points , with cross-range vector and range , and the mesh size is . Our goal in this section is to estimate so that the optimization problem (LABEL:eq:F7) determines exactly the unknown sources supported at for , a set of cardinality .
We begin in section LABEL:sect:parax_scaling with the scaling regime and the paraxial model of wave propagation. The resolution limits are stated and illustrated with numerical simulations in sections LABEL:sect:parax_sf and LABEL:sect:parax_bb. The setup of the simulations is described in appendix LABEL:ap:numeric. The proofs are in section LABEL:sect:parax_proofs.
3.1 Scaling regime and the paraxial model
The scaling regime is defined by the relation between the important length scales: the wavelength , the range scale , the aperture , and the size and of the imaging region. We assume for now a single frequency , and refer to section LABEL:sect:parax_bb for the multi frequency case where another important scale arises, the bandwidth .
The scales are ordered as
and satisfy the following assumptions
Roughly, conditions (LABEL:eq:P4) say that the imaging region is small enough and far enough from the array, so that we can linearize phases of the Green’s functions in , for all Physically, this means that when viewed from the imaging region, the wave fronts appear planar. The array aperture is small with respect to the distance of propagation of the waves, but equations (LABEL:eq:P2) say that the Fresnel number is large, so we have diffraction effects.
We show in appendix LABEL:ap:parax_deriv that
where the bar denotes complex conjugate.
The terms in (LABEL:eq:PGG) are highly oscillatory when is large, but can be absorbed in the vector of unknowns. This is convenient because it implies that when two points and are close to each other, the inner product of the th and th columns of the scaled sensing matrix is close to one. Let be the diagonal matrix with entry , for , and rewrite the linear system (LABEL:eq:F3) as . To simplify the presentation we denote henceforth by the new sensing matrix and by the new unknown vector , so that the scaled system looks the same as (LABEL:eq:F3). The Green’s function with the large phase removed is still denoted by , and satisfies
Assuming that the receivers are on a square grid of spacing , satisfying the scaling relations
we see that the exponential in (LABEL:eq:P8) is approximately constant in each grid cell in . This allows us to use the continuous aperture approximation in the analysis, where the sum over can be replaced by the integral over ,
With our discretization the number of unknowns is and the problem is underdetermined when the number of measurements satisfies or, equivalently,
which is consistent with (LABEL:eq:P9) for and .
3.2 Single frequency resolution limits
Using the paraxial model for our sensing matrix , we obtain from (LABEL:eq:P10) and the definition of the cumulative coherence that
and is the absolute value of the Fresnel integral
The search set of cardinality is denoted by , to distinguish it from the set of indexes of the true support points of , called calygraphic .
As stated in Lemma LABEL:lem.1, unique recovery of a sparse with the minimization (LABEL:eq:F7) is guaranteed when . This criterion allows us to estimate the resolution limit stated in the next two theorems.
If the mesh size satisfies
optimization recovers exactly two sources located at any distinct grid points.
We call the estimates and the “base resolution”. They are the same, up to order one constants, as the well known resolution limits in array imaging . We verified numerically the estimates (LABEL:eq:P16) as follows. To check the value of , we solved the optimization problem (LABEL:eq:F7), as explained in appendix LABEL:ap:numeric, for data corresponding to a vector supported at any two grid points offset in cross-range. We determined the smallest so that the relative error between and its numerical reconstruction was less than . A similar estimation was done for , with supported at points offset in range. The results were close to those in Theorem LABEL:thm.1: for and for .
The next result states that when there are more sources to estimate, the resolution limits deteriorate. This is also illustrated in Figure LABEL:fig:IllRes, where we display images discretized at base resolution. The reconstruction is perfect for sources (top line), but not for sources (bottom line). How the resolution limits deteriorate with depends on the distribution of the sources in the imaging region. Our estimate in the next theorem accounts for worse case scenarios.
There exists a constant of order one such that if the mesh size satisfies the conditions
optimization recovers exactly sources located at any distinct grid points.
The isotropic dilation of the mesh in (LABEL:eq:P17) is for convenience, but the result generalizes to anisotropic dilations, where the mesh is stretched much more in one direction than the others. We refer to the proof in section LABEL:sect:parax_pf_sf for details on the generalization. The resolution decrease with predicted by Theorem LABEL:thm.2 may be traced to the slow decay with range offsets of the terms summed in . This is also why
Sources at different ranges may have strong interaction, hence they must be further apart in order to get . In the next section we show that if the base range resolution improves, as it does with broad band data, then there is almost no resolution loss with the sparsity .
3.3 Broad band resolution limits
When we have frequency measurements, the vector of unknowns is defined as in (LABEL:eq:F6), and the rows of the sensing matrix are indexed by the receiver-frequency pair . Let be the central frequency, so that
where is the bandwidth assumed to satisfy the scaling relation
The lower bound says that , so that is the scale of all the measured frequencies , for . The upper bound implies , where we recall from the previous section that is, up to a factor of order one, the base range resolution for single frequency measurements at . The next theorem states that the base range resolution for multi frequency measurements is of order , so (LABEL:eq:BB2) implies a gain in range resolution.
Let us assume, for simplicity of the calculations, a Gaussian signal
The results should extend to any signal with bandwidth , with modifications of the constants in the bounds. We show in appendix LABEL:ap:BBparax_deriv that
where is the central wavenumber, and we proceeded as in Remark LABEL:rem:z3 to absorb the large phases in the vector of unknowns. Assuming that the array is discretized on a mesh with spacing satisfying (LABEL:eq:P9), we approximate the sum over by an integral over the aperture , as in the previous section. We also suppose that the frequencies are spaced at intervals satisfying so that we can write the sum over the frequencies as an integral over the bandwidth. Equation (LABEL:eq:BB3) becomes
and we can simplify it further by neglecting the quadratic phase in the integral over the aperture. This is because
for range offsets in the support of order of the Gaussian factor in (LABEL:eq:BB6). Integrating over the aperture and normalizing, we arrive at the following model of the products of the columns of the sensing matrix ,
These are the terms in the cumulative coherence , and the resolution limits are as stated next.
Assume a sensing matrix with inner products of the columns defined by (LABEL:eq:BB7). If the mesh size satisfies
optimization recovers exactly any two sources on the grid. Moreover, there exists an order one constant , independent of , such that if
optimization recovers exactly any sources on the grid.
We call and the base resolution, as in the previous section. While the cross-range resolution is the same as in the single frequency case, the base range resolution is significantly better. Moreover, there is little loss of resolution at large . The mesh size grows at most logarithmically with , as opposed to in the single frequency case. This agrees with the known fact in array imaging that bandwidth improves the quality of images.
The proofs of Theorems LABEL:thm.1 and LABEL:thm.2 which estimate the resolution in the single frequency case are in section LABEL:sect:parax_pf_sf. The proof of the broad band result in Theorem LABEL:thm.3 is in section LABEL:sect:parax_pf_sf.
3.4.1 Single frequency
We begin with some basic bounds on the Fresnel integral (LABEL:eq:P16). The simplest estimate is for , in which case
For we can change variables and rewrite (LABEL:eq:P16) in the form
for any and , where we used that
The final estimate
follows from contour integration, as shown in appendix LABEL:ap:contour.
Here we used that the sources are on the grid and denote by vectors with integer components. We display in Figure LABEL:fig:Fresnel the Fresnel integral , and note that it is bounded above by , and larger than for near the origin. When we get from (LABEL:eq:PP1) that
because at least one of and is not equal to zero. If we have by (LABEL:eq:PP2)
Thus, is guaranteed to be less than if and . This concludes the proof of Theorem LABEL:thm.1.
Proof of Theorem LABEL:thm.2: Note from the expression (LABEL:eq:cumC) of the cumulative coherence that it is translation invariant in . Thus, we can fix the origin at one source location and rewrite (LABEL:eq:cumC) as
where is a set of cardinality . The proof of the theorem follows from the bound on stated in the next lemma and the relation and established above. The constant in the lemma is the same as in (LABEL:eq:P17).
There exists constants , and of order one such that
The assumption in Theorem LABEL:thm.2 that the mesh is dilated in an isotropic fashion, and the relations and established above imply
Thus, the first term in the right hand side of (LABEL:eq:lem2) dominates the others for large and and the result stated in Theorem LABEL:thm.2 follows. For anisotropic dilations of the mesh the logarithmic terms in (LABEL:eq:lem2) may become important. For example, when , the last term in (LABEL:eq:lem2) dominates the bound, and we can get a unique solution for a modest mesh stretch in the cross-range direction at the expense of a very large stretch in range .
Proof of Lemma LABEL:lem.2: Let be the set on which the maximum in (LABEL:eq:PP4) is achieved. It is difficult to determine explicitly, so we construct another set , which allows us to bound the cumulative coherence. We use the behavior of the Fresnel integral , shown in Figure LABEL:fig:Fresnel, to guide us in the construction. We write as the union of two sets and . The first set contains the left and right cones (in Figure LABEL:fig:Fresnel they are defined by the diagonal lines ), where displays a slower decay, as well as a vicinity of the origin
The second set is for the points with ,
The parameter is used to control the volume of , so that it contains at least grid points,
The proof consists of two steps. First we derive the bound
Then we estimate the sum in the right hand side of (LABEL:eq:PP8). To prove (LABEL:eq:PP8) we show:
contains at least grid points.
For any , we have
For any ,