Convex recovery from interferometric measurements

# Convex recovery from interferometric measurements

## Abstract

This paper discusses some questions that arise when a linear inverse problem involving is reformulated in the interferometric framework, where quadratic combinations of are considered as data in place of .

First, we show a deterministic recovery result for vectors from measurements of the form for some left-invertible . Recovery is exact, or stable in the noisy case, when the couples are chosen as edges of a well-connected graph. One possible way of obtaining the solution is as a feasible point of a simple semidefinite program. Furthermore, we show how the proportionality constant in the error estimate depends on the spectral gap of a data-weighted graph Laplacian.

Second, we present a new application of this formulation to interferometric waveform inversion, where products of the form in frequency encode generalized cross-correlations in time. We present numerical evidence that interferometric inversion does not suffer from the loss of resolution generally associated with interferometric imaging, and can provide added robustness with respect to specific kinds of kinematic uncertainties in the forward model .

Acknowledgments. The authors would like to thank Amit Singer, George Papanicolaou, and Liliana Borcea for interesting discussions. Some of the results in this paper were reported in the conference proceedings of the 2013 SEG annual meeting [18]. This work was supported by the Earth Resources Laboratory at MIT, AFOSR grants FA9550-12-1-0328 and FA9550-15-1-0078, ONR grant N00014-16-1-2122, NSF grant DMS-1255203, the Alfred P. Sloan Foundation, and Total SA.

## 1 Introduction

Throughout this paper, we consider complex quadratic measurements of of the form

 Bij=(Ax)i¯¯¯¯¯¯¯¯¯¯¯¯¯(Ax)j,(i,j)∈E, (1)

for certain well-chosen couples of indices , a scenario that we qualify as “interferometric”. This combination is special in that it is symmetric in , and of rank 1 with respect to the indices and .

The regime that interests us is when the number of measurements, i.e., of couples in , is comparable to the number of unknowns. While phaseless measurements only admit recovery when has very special structure – such as, being a tall random matrix with Gaussian i.i.d. entries [7, 11] – products for correspond to the idea of phase differences, hence encode much more information. As a consequence, stable recovery occurs under very general conditions: left-invertibility of and “connectedness” of set of couples . These conditions suffices to allow for to be on the order of . Various algorithms return accurately up to a global phase; we mosty discuss variants of lifting with semidefinite relaxation in this paper. In contrast to other recovery results in matrix completion [8, 24], no randomness is needed in the data model, and our proof technique involves elementary spectral graph theory rather than dual certification or uncertainty principles.

The mathematical content of this paper is the formulation of a quadratic analogue of the well-known relative error bound

 ∥x−x0∥∥x0∥≤κ(A)∥e∥∥b∥ (2)

for the least-squares solution of the overdetermined linear system with , and where is the condition number of . Our result is that an inequality of the form (2) still holds in the quadratic case, but with the square root of the spectral gap of a data-weighted graph Laplacian in place of in the right-hand side. This spectral gap quantifies the connectedness of , and has the proper homogeneity with respect to .

The numerical results mostly concern the case when is a forward model that involves solving a linearized fixed-frequency wave equation, as in seismic or radar imaging. In case is a reflectivity, is the wavefield that results from an incoming wave being scattered by , and has a meaning in the context of interferometry, as we explain in the next section. Our claims are that

• Stable recovery holds, and the choice of measurement set for which this is the case matches the theory in this paper;

• Interferometric inversion yields no apparent loss of resolution when compared against the classical imaging methods; and

• Some degree of robustness to specific model inaccuracies is observed in practice, although it is not explained by the theory in this paper.

### 1.1 Physical context: interferometry

In optical imaging, an interference fringe of two (possibly complex-valued) wavefields and , where is either a time or a space variable, is any combination of the form . The sum is a result of the linearity of amplitudes in the fundamental equations of physics (such as Maxwell or Schrödinger), while the modulus squared is simply the result of a detector measuring intensities. The cross term in the expansion of the squared modulus manifestly carries the information of destructive vs. constructive interference, hence is a continuous version of what we referred to earlier as an “interferometric measurement”.

In particular, when the two signals are sinusoidal at the same frequency, the interferometric combination highlights a phase difference. In astronomical interferometry, the delay is for instance chosen so that the two signals interfere constructively, yielding better resolution. Interferometric synthetic aperture radar (InSAR) is a remote sensing technique that uses the fringe from two datasets taken at different times to infer small displacements. In X-ray ptychograhy [25], imaging is done by undoing the interferometric combinations that the diffracted X-rays undergo from encoding masks. These are but three examples in a long list of applications.

Interferometry is also playing an increasingly important role in geophysical imaging, i.e., inversion of the elastic parameters of the portions of the Earth’s upper crust from scattered seismic waves. In this context however, the signals are more often impulsive than monochromatic, and interferometry is done as part of the computational processing rather than the physical measurements1. An interesting combination of two seismogram traces and at nearby receivers is then , i.e., the Fourier transform of their cross-correlation. It highlights a time lag in the case when and are impulses. More generally, it will be important to also consider the cross-ambiguities where .

Cross-correlations have been shown to play an important role in geophysical imaging, mostly because of their stability to statistical fluctuations of a scattering medium [3] or an incoherent source [14, 20]. Though seismic interferometry is a vast research area, both in the exploration and global contexts [4, 13, 16, 28, 29, 32, 35], explicit inversion of reflectivity parameters from interferometric data has to our knowledge only been considered in [12, 18]. Interferometric inversion offers great promise for model-robust imaging, i.e., recovery of reflectivity maps in a less-sensitive way on specific kinds of errors in the forward model.

Finally, interferometric measurements also play an important role in quantum optical imaging. See [27] for a nice solution to the inverse problem of recovering a scattering dielectric susceptibility from measurements of two-point correlation functions relative to two-photon entangled states.

As we revise this paper, we also note the recent success of interferometric inversion for passive synthetic-aperture radar (SAR) imaging, under the name low-rank matrix recovery [23].

### 1.2 Mathematical context and related work

The setting of this paper is discrete, hence we let and in place of either a time of frequency variable. We also specialize to , and we let to possibly allow an explanation of the signal by a linear forward model2 .

The link between products of the form and squared measurements goes both ways, as shown by the polarization identity

 fi¯¯¯¯¯fj=144∑k=1e−iπk/2|fi+eiπk/2fj|2.

Hence any result of robust recovery of , or , from couples , implies the same result for recovery from phaseless measurements of the form . This latter setting was precisely considerered by Candès et al. in [7], where an application to X-ray diffraction imaging with a specific choice of masks is discussed. In [1], Alexeev et al. use the same polarization identity to design good measurements for phase retrieval, such that recovery is possible with .

Recovery of from for some when (interferometric phase retrieval) can be seen a special case of the problem of angular synchronization considered by Singer [30]. There, rotation matrices are to be recovered (up to a global rotation) from measurements of relative rotations for some . This problem has an important application to cryo-electron microscopy, where the measurements of relative rotations are further corrupted in an a priori unknown fashion (i.e., the set is to be recovered as well). An impressive recovery result under a Bernoulli model of gross corruption, with a characterization of the critical probability, were recently obtained by Wang and Singer [33]. The spectrum of an adapted graph Laplacian plays an important role in their analysis [2], much as it does in this paper. Singer and Cucuringu also considered the angular synchronization problem from the viewpoint of rigidity theory [31]. For the similar problem of recovery of positions from relative distance, with applications to sensor network localization, see for instance [17].

The algorithmic approach considered in this paper for solving interferometric inversion problems is to formulate them via lifting and semidefinite relaxation. This idea was considered by many groups in recent years [7, 9, 17, 30, 34], and finds its origin in theoretical computer science [15].

## 2 Mathematical results

### 2.1 Recovery of unknown phases

Let us start by describing the simpler problem of interferometric phase recovery, when and we furthermore assume . Given a vector such that , a set of pairs , and noisy interferometric data , find a vector such that

 |xi|=1,∑(i,j)∈E|xi¯¯¯¯¯xj−Bij|≤σ, (3)

for some . Here and below, if no heuristic is provided for , we may cast the problem as a minimization problem for the misfit and obtain a posteriori.

The choice of the elementwise norm over is arbitrary, but convenient for the analysis in the sequel3. We aim to find situations in which this problem has a solution close to , up to a global phase. Notice that is feasible for (3), hence a solution exists, as soon as .

The relaxation by lifting of this problem is to find (a proxy for ) such that

 Xii=1,∑(i,j)∈E|Xij−Bij|≤σ,X⪰0, then let x be the top eigenvector of X with ∥x∥2=n. (4)

The notation means that is symmetric and positive semi-definite. Again, the feasibility problem (2.1) has at least one solution () as soon as .

The set generates edges of a graph , where the nodes in are indexed by . Without loss of generality, we consider to be symmetric. By convention, does not contain loops, i.e., the diagonal is not part of . (Measurements on the diagonal are not informative for the phase recovery problem, since .)

The graph Laplacian on is

 Lij=⎧⎪⎨⎪⎩diif i=j;−1if (i,j)∈E;0otherwise,

where is the node degree . Observe that is symmetric and by Gershgorin. Denote by the eigenvalues of sorted in increasing order. Then with the constant eigenvector . The second eigenvalue is zero if and only if has two or more disconnected components. When , its value is a measure of connectedness of the graph. Note that by Gershgorin again, where is the maximum degree.

Since , the second eigenvalue is called the spectral gap. It is a central quantity in the study of expander graphs: it relates to

• the edge expansion (Cheeger constant, large if is large);

• the degree of separation between any two nodes (small if is large); and

• the speed of mixing of a random walk on the graph (fast if is large).

More information about spectral graph theory can be found, e.g., in the lecture notes by Lovasz [21]. It is easy to show with interlacing theorems that adding an edge to , or removing a node from , both increase . The spectral gap plays an important role in the following stability result.

In the sequel, we denote the componentwise norm on the set by .

###### Theorem 1.

Assume , where is the second eigenvalue of the graph Laplacian on . Any solution of (3) or (2.1) obeys

 ∥x−eiαx0∥≤4√∥ε∥1+σλ2,

for some .

The proof is in the appendix. Manifestly, recovery is exact (up to the global phase ambiguity encoded by the parameter , because the algorithm could return another vector multiplied by some over which there is no control.) as soon as and , provided , i.e., the graph is connected. The easiest way to construct expander graphs (graphs with large ) is to set up a probabilistic model with a Bernoulli distribution for each edge in an i.i.d. fashion, a model known as the Erdős-Rényi random graph. It can be shown that such graphs have a spectral gap bounded away from zero, independently of and with high probability, with edges.

A stronger result is available when the noise originates at the level of , i.e., has the form .

###### Corollary 2.

Assume and , where is the second eigenvalue of the graph Laplacian on . Any solution of (3) or (2.1) obeys

 ∥x−eiαx0∥≤4√σλ2+∥e∥,

for some .

###### Proof.

Apply theorem 1 with , in place of , then use the triangle inequality. ∎

In the setting of the corollary, problem (3) always has as a solution, hence is feasible even when .

Let us briefly review the eigenvector method for interferometric recovery. In [30], Singer proposed to use the first eigenvector of the (noisy) data-weighted graph Laplacian as an estimator of the vector of phases. A similar idea appears in the work of Montanari et al. as the first step of their OptSpace algorithm [19], and in the work of Chatterjee on universal thresholding [10]. In our setting, this means defining

 (˜L)ij=⎧⎪⎨⎪⎩diif i=j;−Bijif (i,j)∈E;0otherwise,

and letting where is the unit-norm eigenvector of with smallest eigenvalue. Denote by the eigenvalues of . The following result is known from [2], but we provide an elementary proof (in the appendix) for completeness of the exposition.

###### Theorem 3.

Assume . Then the result of the eigenvector method obeys

 ∥x−eiαx0∥≤√2n∥ε∥˜λ2,

for some .

Alternatively, we may express the inequality in terms of , the spectral gap of the noise-free Laplacian defined earlier, by noticing4 that . Both and are computationally accessible. In the case when , we have , hence is (slightly) greater than the spectral gap of . Note that the scaling appears to be sharp in view of the numerical experiments reported in section 3. The inverse square root scaling of theorem 1 is stronger in the presence of small spectral gaps, but the noise scaling is weaker in theorem 1 than in theorem 3.

### 2.2 Interferometric recovery

The more general version of the interferometric recovery problem is to consider a left-invertible tall matrix , linear measurements for some vector (without condition on the modulus of either or ), noisy interferometric measurements for in some set , and find subject to

 ∑(i,j)∈E∪D|(Ax)i¯¯¯¯¯¯¯¯¯¯¯¯¯(Ax)j−Bij|≤σ. (5)

Notice that we now take the union of the diagonal with . Without loss of generality we assume that , which can be achieved by symmetrizing the measurements, i.e., substituting for .

Since we no longer have a unit-modulus condition, the relevant notion of graph Laplacian is now data-dependent. It reads

 (L|b|)ij=⎧⎪ ⎪⎨⎪ ⎪⎩∑k:(i,k)∈E|bk|2if i=j;−|bi||bj|if (i,j)∈E;0otherwise.

The connectedness properties of the underlying graph now depend on the size of : the edge carries valuable information if and only if both and are large.

A few different recovery formulations arise naturally in the context of lifting and semidefinite relaxation.

• The basic lifted formulation is to find some such that

 ∑(i,j)∈E∪D|(AXA∗)ij−Bij|≤σ,X⪰0, then let x=x1√η1, where (η1,x1) is the top eigen-pair of X. (6)

Our main result is as follows, see the appendix for the proof.

###### Theorem 4.

Assume , where is the second eigenvalue of the data-weighted graph Laplacian . Any solution of (2.2) obeys

 ∥x−eiαx0∥∥x0∥≤15κ(A)2√∥ε∥1+σλ2,

for some , and where is the condition number of .

The quadratic dependence on is necessary5. In section 3, we numerically verify the inverse square root scaling in terms of .

If the noise originates from rather than , the error bound is again improved to

 ∥x−eiαx0∥∥x0∥≤15κ(A)2√σλ2+κ(A)∥e∥∥b∥,

for the same reason as earlier.

• An alternative, two-step lifting formulation is to find through such that

 ∑(i,j)∈E∪D|Yij−Bij|≤σ,Y⪰0, then let x=A+y1√η1, where (η1,y1) is the top eigen-pair of Y. (7)

The dependence on the condition number of is more favorable than for the basic lifting formulation.

###### Theorem 5.

Assume , where is the second eigenvalue of the data-weighted graph Laplacian . Any solution of (5) or (2.2) obeys

 ∥x−eiαx0∥∥x0∥≤15κ(A)√∥ε∥1+σλ2,

for some .

However, this formulation may be more computationally expensive than the one-step variant if data () space is much larger than model () space.

The quantity is not computationally accessible in general, but it can be related to the second eigenvalue of the noisy data-weighted Laplacian,

 (˜LB)ij=⎧⎪ ⎪⎨⎪ ⎪⎩∑k:(i,k)∈EBkkif i=j;−Bijif (i,j)∈E;0otherwise.

It is straightforward to show that , where is the elementwise maximum on , is the spectral norm, and is the maximum node degree.

### 2.3 Influence of the spectral gap

In this section, we numerically confirm the tightness of the error bound for phase recovery given by theorem 1 on toy examples (), with respect to the spectral gap. We perform the corresponding experiment for the situation of theorem 3. In order to span a wide range of spectral gaps, three types of graphs are considered:

• the cycle graph which is proven to be the connected graph with the smallest spectral gap6;

• graphs obtained by adding randomly edges to with ranging from 1 to 50;

• Erdős-Rényi random graphs with probability ranging from 0.03 to 0.05, conditioned on connectedness (positive specrtal gap).

A realization of the two latter types of graphs is given in figure 1.

To study the eigenvector method, we draw one realization of a symmetric error matrix with , with . The spectral norm of the noise (used in theorem 3) is .
For different realizations of the aforementioned graphs, we estimate the solution with the eigenvector method and plot the recovery error as a function of . See figure 2.

To study the feasibility method, we consider the case of an approximate fit () in the noiseless case (). The feasibility problem (2.1) is solved using the Matlab toolbox cvx which calls the toolbox SeDuMi. An interior point algorithm (centering predictor-corrector) is used. The recovery error as a function of the spectral gap is illustrated in figure 3. The square root scaling of the theorem seems to be a good match for the numerical experiments.

## 3 Numerical results: interferometric inverse scattering

### 3.1 Setup and context

An important application of interferometric inversion is seismic imaging, where the (simplest) question is to recover of a medium’s wave velocity map from recordings of waves scattered by that medium. Linear inverse problems often arise in this context. For instance, inverse source problems arise when locating microseismic events, and linear inverse scattering in the Born regime yield model updates for subsurface imaging. These linear problems all take the form

 Fm=d, (8)

where is the forward or modeling operator, describing the wave propagation and the acquisition, is the reflectivity model, and are the observed data. (We make this choice of notation from now on; it is standard and more handy than for what follows.) The classical approach is to use the data (seismograms) directly, to produce an image either

• by migrating the data (Reverse-time migration, RTM),

 IRTM=˜F∗d

where is the simulation forward operator and stands for the adjoint ;

• or by finding a model that best fits the data, in a least-squares sense (Least-squares migration, LSRTM),

 mLSM=argminm12||˜Fm−d||22. (9)

It is important to note that the physical forward operator and the one used in simulation can be different due to modeling errors or uncertainty. Such errors can happen at different levels:

1. background velocity,

3. sources time profiles.

This list is non-exhaustive and these modeling errors have a very different effect from additive Gaussian white noise, in the sense that they induce a coherent perturbation in the data. As a result, the classical approaches (RTM, LSM) may fail in the presence of such uncertainties.

The idea of using interferometry (i.e. products of pairs of data) to make migration robust to modeling uncertainties has already been proposed in the literature [4], [26], [29], producing remarkable results. In their 2005 paper [4], Borcea et al. developed a comprehensive framework for interferometric migration, in which they proposed the Coherent INTerfermetic imaging functional (CINT), which can be recast in our notation as

 ICINT=diag{˜F∗(E∘[dd∗])˜F},

where is the matrix of all data pairs products, is a selector, that is, a sparse matrix with ones for the considered pairs and zeros elsewhere, is the entrywise (Hadamard) product, and diag is the operation of extracting the diagonal of a matrix. The CINT functional involves , and , (implicitly through ), so cancellation of model errors can still occur, even when and are different. Through a careful analysis of wave propagation in the particular case where the uncertainty consists of random fluctuations of the background velocity, Borcea et al. derive conditions on under which CINT will be robust.

The previous sections of this paper do not inform the robustness mentioned above, but they explain how the power of interferometry can be extended to inversion. In a noiseless setting, the proposal is to perform inversion using selected data pair products,

 find m s.t. E∘(˜Fmm∗˜F∗)=E∘[dd∗], (10)

i.e., we look for a model that explains the data pair products selected by . (The notation has not changed from the previous sections.) Here, and are meta-indices in data space. For example, in an inverse source problem in the frequency domain, and , and for inverse Born scattering, and .

A straightforward and noise-aware version of this idea is to fit products in a least-squares sense,

 ˆmLS,pairs=argminm||E∘(˜Fmm∗˜F∗−dd∗)||2F. (11)

While the problem in (10) is quadratic, the least-squares cost in (11) is quartic and nonconvex. The introduction of local minima is a highly undesired feature, and numerical experiments show that gradient descent can indeed converge to a completely wrong local minimizer.

A particular instance of the interferometric inversion problem is inversion from cross-correlograms. In that case,

 Ei,j=1 ⇔ωi=ωj.

This means that considers data pairs from different sources and receivers at the same frequency,

 di¯¯¯¯¯dj=d(ri,si,ω)¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯d(rj,sj,ω),

where the overline stands for the complex conjugation. This expression is the Fourier transform at frequency of the cross-correlogram between trace and trace .

The choice of selector is an important concern. As explained earlier, it should describe a connected graph for inversion to be possible. In the extreme case where is the identity matrix, the problem reduces to estimating the model from intensity-only (phaseless) measurements, which does not in general have a unique solution for the kind of we consider. The same problem plagues inversion from cross-correlograms only: does not correspond to a connected graph in that case either, and numerical inversion typically fails. On the other hand, it is undesirable to consider too many pairs, both from a computational point of view and for robustness to model errors. For instance, in the limit when is the complete graph of all pairs , it is easy to see that the quartic cost function reduces to the square of the least-squares cost function. Hence, there is a trade-off between robustness to uncertainties and quantity of information available to ensure invertibility. It is important to stress that theorem 4 gives sufficient conditions on E for recovery to be possible and stable to additive noise , but not to modeling error (in the theorem, ). It does not provide an explanation of the robust behavior of interferometric inversion; see however the numerical experiments.

In the previous section, we showed that lifting convexifies problems such as (10) and (11) in a useful way. In the context of wave-based imaging, this idea was first proposed in [9] for intensity-only measurements. In this section’s notations, we replace the optimization variable by the symmetric matrix , for which the data match becomes a linear constraint. Incorporating the knowledge we have on the solution, the problem becomes

 find M s. t. E∘[˜FM˜F∗]=E∘[dd∗],M⪰0,rank(M)=1.

The first two constraints (data fit and positive semi-definiteness) are convex, but the rank constraint is not and would in principle lead to a combinatorially hard problem. However, as the theoretical results of this paper make clear, the rank constraint can often be dropped. We also relax the data pairs fit – an exact fit is ill-advised because of noise and modeling errors – to obtain the following feasibility problem equivalent to (2.2),

 find M s. t. ,||˜FM˜F∗−dd∗||ℓ1(E)≤σ,M⪰0. (12)

The approximate fit is expressed in an entry-wise sense. This feasibility problem is a convex program, for which there exist simple converging iterative methods. Once is solved for, we have already seen that a model estimate can be obtained by extracting the leading eigenvector of .

### 3.2 A practical algorithm

The convex formulation in (12) is too costly to solve at the scale of even toy problems. Let be the total number of degrees of freedom of the unknown model ; then the variable of (12) is a matrix, on which we want to impose positive semi-definiteness and approximate fit. To our knowledge, there is no time-efficient and memory-efficient algorithm to solve this type of semi-definite program when ranges from to .

We consider instead a non-convex relaxation of the feasibility problem (12), in which we limit the numerical rank of to , as in [6]. We may then write where is and . We replace the approximate fit by Frobenius minimization. Regularization is also added to handle noise and uncertainty, yielding

 ˆR=argminR||E∘(˜FRR∗˜F∗−dd∗)||2F+λ||R||2F. (13)

An estimate of is obtained from by extracting the leading eigenvector of . Note that the Frobenius regularization on is equivalent to a trace regularization on , which is known to promote the low-rank character of the matrix. The rank- relaxation (13) can be seen as a generalization of the straightforward least-squares formulation (11). The two formulations coincide in the limit case . The strength of (13) is that the optimization variable is in a slightly bigger space than formulation (11).

The rank- relaxation (13) is still nonconvex, but in practice, no local minimum has been observed even for , whereas the issue often arises for the least-squares approach (11). It should also be noted that the success of inversion is simple to test a posteriori, by comparing the sizes of the largest and second largest eigenvalues of .

In practice, the memory requirement of rank- relaxation is simply times than that of non-lifted least-squares. The runtime per iteration is typically lower than times that of a least-squares step however, because the bottleneck in the computation of and is usually the LU factorization of a Helmholtz operator, which needs to be done only once per iteration. Empirically, the number of iterations needed for convergence does not vary much as a function of .

### 3.3 Examples

Our examples fall into three categories.

1. Linear inverse source problem. Here we consider a set of receiver locations , and a constant density acoustics inverse source problem, which reads in the Fourier domain

 −(Δ+ω2m0(x))ˆus(x,ω)=ˆw(ω)m(x)
 Fm=d(xr,ω)=ˆus(xr,ω)
 m0(x)=1c0(x)2(squared slowness)

Waves are propagating from a source term with known time signature . The problem is to reconstruct the spatial distribution .

2. Linearized inverse scattering problem. Here we consider a set of receivers , waves generated by sources , and a constant density acoustic inverse problem for the reflectivity perturbation ,

 −(Δ+ω2m0(x))ˆu0,s(x,ω)=ˆw(ω)δ(x−xs)−(Δ+ω2m0(x))ˆu1,s(x,ω)=ω2ˆu0,s(x,ω)m1(x)
 Fm1=d(xr,xs,ω)=ˆu1,s(xr,ω).

The isolation of the Born scattered wavefield (primary reflections) from the full scattered field, although a very difficult task in practice, is assumed to be performed perfectly in this paper.

3. Full waveform inversion (FWI). We again consider a set of receivers , waves generated by sources , and a constant density acoustic inverse problem for the reflectivity , with

 −(Δ+ω2m(x))ˆus(x,ω)=ˆw(ω)δ(x−xs)
 F(m)=d(xr,xs,ω)=ˆus(xr,ω).

In a first numerical experiment (Figure 4), we consider a linearized inverse scattering problem where is the Marmousi2 p-wave velocity model [22], is a smoothed version of , and is the model perturbation used to generated data in the linearized forward model. The sources and receivers are placed at the top of the image in an equispaced fashion, with 30 sources and 300 receivers. The frequency sampling is uniform between 3 and 10 Hz, with 10 frequency samples. The Helmholtz equation is discretized with fourth-order finite differences at about 20 points per wavelength for data modeling, while the simulations for the inversion are with second-order finite differences. The noise model is taken to be Gaussian,

 ˜di=di+ηiηi∼CN(0,σ2)

where , so that (10% additive noise). Figure 4 shows stable recovery of from noisy , both by least-squares and by interferometric inversion. In this case the graph is taken to be an Erdős-Rényi random graph with to ensure connectedness. (Instabilities occur if is smaller; larger values of do not substantially help.) Note that if were chosen as a disconnected graph that only forms cross-correlations (same for the data indices and ), then interferometric inversion is not well-posed and does not result in good images (not shown). The optimization method for interferometric inversion is the rank-2 relaxation scheme mentioned earlier. The message of this numerical example is twofold: it shows that stable recovery is possible in the interferometric regime, when is properly chosen; and a comparison of Figures 4 bottom and middle shows that it does not come with the loss of resolution that would be expected from CINT [4]. This observation was also a key conclusion in the recent paper by Mason et al. [23].

In a second numerical experiment (Figure 5), we consider full waveform inversion where is the Marmousi2 p-wave velocity model, and the initial model is the same as used in the previous example. The setup for the acquisition, the Helmholtz equation, the noise, and the mask are the same as before. Figure 5 shows the result of quasi-Newton (LBFGS) iterations with a frequency sweep, and the corresponding interferometric inversion result. To produce this latter result, we simply replace the classical adjoint-state gradient step by interferometric inversion applied to the data residual. In all the numerical experiments so far (Figures 4 and 5), the values of the model misfits are not meaningfully different in the least-squares and interferometric cases. The message of this numerical example is that there seems to be no loss of resolution in the full waveform case either, when comparing Figure 5 (bottom) to its least-squares counterpart (second from bottom).

Possible limitations of the interferometric approach are the slightly higher computational and memory cost (as discussed in the previous section); the need to properly choose the data mask ; and the fact that the data match is now quartic rather than quadratic in the original data . Quartic objective functions can be problematic in the presence of outliers, such as when the noise is heavy-tailed, because they increase the relative importance of those corrupted measurements. We did not attempt to deal with heavy-tailed noise in this paper.

It is also worth noting that “backprojecting the cross-correlations” (or the more general quadratic combinations we consider here) can be shown to be related to the first iteration of gradient descent in the lifted semidefinite formulation of interferometric inversion.

So far, our numerical experiments merely confirm the theoretical prediction that interferometric inversion can be accurate and stable under minimal assumptions on the mask/graph of data pair products. In the next section, we show that there is also an important rationale for switching to the interferometric formulation: its results display robustness vis-a-vis some uncertainties in the forward model .

### 3.4 Model robustness

In this section we continue to consider wave-based imaging scenarios, but more constrained in the sense that the receivers and/or sources completely surround the object to be imaged, i.e., provide a full aperture. The robustness claims below are only for this case. On the other hand, we now relax the requirement that the forward model used for inversion be identical or very close to the forward map used for data modeling. The interferometric mask is also different, and more physically meaningful than in the previous section: it indicates a small band around the diagonal , i.e., it selects nearby receiver positions and frequencies, as in [4]. The parameters of this selector were tuned for best results; they physically correspond to the idea that two data points and should only be compared if they are within one cycle of one another for the most oscillatory phase in . In all our experiments with model errors, it is crucial that the data misfit parameter be positive nonzero – it would be a mistake to try and explain the data perfectly with a wrong model.

In the next numerical experiment (Figure 6), we consider the inverse source problem, where the source distribution is the Shepp-Logan phantom. This community model is of interest in ultrasound medical imaging, where it represents a horizontal cross-section of different organs in the torso. The receivers densely surround the phantom and are depicted as white crosses. Equispaced frequencies are considered on the bandwidth of . A small modeling error is assumed to have been made on the background velocity: in the experiment, the waves propagated with unit speed , but in the simulation, the waves propagate more slowly, . As shown in Figure 6 (middle), least-squares inversion does not properly handle this type of uncertainty and produces a defocused image. (Least-squares inversion is regularized with the norm of the model, a.k.a. Tykhonov. A large range of values of the regularization parameter was tested, and the best empirical results are reported here.) In contrast, interferometric inversion, shown in Figure 6 (bottom), enjoys a better resolution. The price to pay for focusing is positioning: although we do not have a mathematical proof of this fact, the interferometric reconstruction is near a shrunk version of the true source distribution.

In the last numerical experiment (Figure 7), we consider the linearized inverse scattering problem, where the phantom is now the reflectivity perturbation . As the figure shows, sources and receivers surround the phantom, with a denser sampling for the receivers than for the sources. The wave speed profile is uniform (). In this example, the modeling error is assumed to be on the receiver positions: they have a smooth random deviation from a circle, as shown in Figure 7 (top). Again, least-squares inversion produces a poor result regardless of the Tykhonov regularization parameter (Figure 7, middle), where the features of the phantom are not clearly reconstructed, and the strong outer layer is affected by a significant oscillatory error. Interferometric inversion produces a somewhat more focused reconstruction (Figure 7, bottom), where more features of the phantom are recovered and the outer layer is well-resolved.

Model robustness is heuristically plausible in the scenario when data are of the form for some traveltimes which are themselves function of a velocity through . In that case, the combination has a phase that encodes the idea of a traveltime difference. When is near in data space (because they correspond, say, to nearby receivers), it is clear that depends in a milder fashion on errors in , or on correlated errors in , than the individual traveltimes themselves. This results in some degree of stability of selected with respect to those types of errors, which in turn results in more focused imaging. This phenomenon is not unlike the celebrated statistical stability of under some models of randomness on either the velocity or the sources. For random media, this behavior was leveraged in the context of coherent interferometric imaging (CINT) in the 2011 work of Borcea et al. [5]

## 4 Discussion

This paper explores a recent optimization idea, convex relaxation by lifting, for dealing with the quadratic data combinations that occur in the context of interferometry. The role of the mask that selects the quadratic measurements is explained: it is shown that it should encode a well-connected graph for robustness of the recovery. The recovery theorems do not assume a random model on this graph; instead, they involve the Laplacian’s spectral gap.

Numerical experiments in the context of imaging are shown to confirm that recovery is stable when the assumptions of the theorem are obeyed. Developing methods to solve the lifted problem at interesting scales is a difficult problem that we circumvent via an ad-hoc rank-2 relaxation scheme. It is observed empirically that there is no noticeable loss of resolution from switching to an interferometric optimization formulation. It is also observed that interferometric inversion for imaging can display a puzzling degree of robustness to very specific model uncertainties.

We leave numerous important questions unanswered, such as how to optimize the choice of the selector when there is a trade-off between connectivity and robustness to model uncertainties; how to select the data misfit parameter and whether there is a phase transition in for model robustness vs. lack thereof; how to justify the gain in focusing in some reconstructions; and whether this gain in focusing is part of a tradeoff with the geometric faithfulness of the recovered image.

## Appendix A Proofs

### a.1 Proof of theorem 1.

Observe that if is feasible for (3), then is feasible for (2.1), and has as leading eigenvector. Hence we focus without loss of generality on (2.1).

As in [30], consider the Laplacian matrix weighted with the unknown phases,

 L=ΛLΛ∗,

with . In other words with . We still have and , but now . Here and below, and refer to , and has unit norm.

The idea of the proof is to compare with the rank-1 spectral projectors of . Let be the Frobenius inner product. Any obeying (3) can be written as with . We have

 ⟨X,L⟩=⟨X0,L⟩+⟨˜ε,L⟩

A short computation shows that

 ⟨X0,L⟩ =∑i(X0)ii¯¯¯¯¯¯¯Lii+∑(i,j)∈E(X0)ij¯¯¯¯¯¯¯Lij =−∑idi+∑(i,j)∈E(x0)i¯¯¯¯¯¯¯¯¯¯¯(x0)j¯¯¯¯¯¯¯¯¯¯¯(x0)i(x0)j =∑i⎡⎣−di+∑j:(i,j)∈E1⎤⎦ =0.

Since on , the error term is simply bounded as

 |⟨˜ε,L⟩|≤∥˜ε∥1

On the other hand the Laplacian expands as

 L=∑jvjλjv∗j,

so we can introduce a convenient normalization factor and write

 ⟨Xn,L⟩=∑jcjλj, (14)

with

 cj=⟨Xn,vjv∗j⟩=v∗jXvjn.

Notice that since we require . Their sum is

 ∑jcj=⟨Xn,∑jvjv∗j⟩=⟨Xn,I⟩=tr(X)n=1.

Hence (14) is a convex combination of the eigenvalues of , bounded by . The smaller this bound, the more lopsided the convex combination toward , i.e., the larger . The following lemma makes this observation precise.

###### Lemma 1.

Let with , , and . If , then

 c1≥1−μλ2.
###### Proof of lemma 1..
 μ=∑i≥2cjλj≥λ2∑j≥2cj=λ2(1−c1),

then isolate . ∎

Assuming , we now have

 ⟨Xn,v1v∗1⟩≥1−∥˜ε∥1nλ2.

We can further bound

 ∥Xn−v1v∗1∥2F =tr[(Xn−v1v∗1)2] =tr((v1v∗1)2)+tr(X2)n2−2tr(Xnv1v∗1).

The first term is 1. The second term is less than 1, since for positive semidefinite matrices. Therefore,

 ∥Xn−v1v∗1∥2F≤2−2tr(Xnv1v∗1)≤2∥˜ε∥1nλ2.

We can now control the departure of the top eigenvector of from by the following lemma. It is analogous to the sin theta theorem of Davis-Kahan, except for the choice of normalization of the vectors. (It is also a generalization of a lemma used by one of us in [11] (section 4.2).) The proof is only given for completeness.

###### Lemma 2.

Consider any Hermitian , and any , such that . Let be the leading eigenvalue of , and the corresponding unit-norm eigenvector. Let be defined either as (a) , or as (b) . Then

 ∥x∥x∥−eiαv∥v∥∥≤2√2∥X−vv∗∥,

for some .

###### Proof of Lemma 2.

Let . Notice that . Decompose with eigenvalues sorted in decreasing order. By perturbation theory for symmetric matrices (Weyl’s inequalities),

 max{|∥v∥2−η1|,|η2|,…,|ηn|}≤δ, (15)

so it is clear that , and that the eigenspace of is one-dimensional, as soon as .

Let us deal first with the case (a) when . Consider

 vv∗−xx∗=vv∗−X+Y,

where

 Y=x1(∥v∥2−η1)x∗1+n∑j=2xjηjx∗j.

From (15), it is clear that . Let . We get

 ∥vv∗−xx∗∥≤∥vv∗−X∥+∥Y∥≤2δ.

Pick so that . Then

 ∥v∥v∥−e−iαx∥x∥∥2 =∥v∥4+∥x∥4−2∥v∥∥x∥Re−iαv∗x =∥v∥4+∥x∥4−2∥v∥∥x∥|v∗x|by def.% of α ≤∥v∥4+∥x∥4−2|v∗x|2by Cauchy-Schwarz =∥vv∗−xx∗∥2F ≤2∥vv∗−xx∗∥2since vv∗−xx∗ has % rank 2 ≤8δ2.

The case (b) when is treated analogously. The only difference is now that

 Y=n∑j=2xjηjx∗j.

A fortiori, as well.

Part of lemma 2 is applied with in place of , and in place of . In that case, . We conclude the proof by noticing that , and that the output of the lifting method is normalized so that .

### a.2 Proof of theorem 3

The proof is a simple argument of perturbation of eigenvectors. We either assume or enforce it by symmetrizing the measurements. Define as previously, and notice that . Consider the eigen-decompositions

 Lvj=λjvj,˜L˜vj=˜λj˜vj,

with . Form

 ˜Lvj=λjvj+rj,

with . Take the dot product of the equation above with to obtain

 ⟨˜vk,rj⟩=(˜λk−λj)⟨˜vk,vj⟩.

Let , and use . We get

 ∑k≥2|⟨˜vk,v1⟩|2≤∑k≥2|⟨˜vk,r1⟩|2maxk≥2|˜λk|2≤∥ε∥2˜λ22.

As a result,

 |⟨˜v1,v1⟩|2≥1−∥ε∥