Simultaneous Phase Retrieval and Blind Deconvolution via Convex Programming

Simultaneous Phase Retrieval and Blind Deconvolution via Convex Programming

Ali Ahmed
Department of Electrical Engineering, Information Technology University, Lahore, Pakistan; Email: ali.ahmed@itu.edu.pk
   Alireza Aghasi
Department of Business Analytics, Georgia State University, Atlanta, GA; Email: aaghasi@gsu.edu
   Paul Hand
Department of Mathematics and Khoury College of Computer Sciences, Northeastern University, Boston, MA; Email: p.hand@northeastern.edu
Abstract

We consider the task of recovering two real or complex -vectors from phaseless Fourier measurements of their circular convolution. Our method is a novel convex relaxation that is based on a lifted matrix recovery formulation that allows a nontrivial convex relaxation of the bilinear measurements from convolution. We prove that if the two signals belong to known random subspaces of dimensions and , then they can be recovered up to the inherent scaling ambiguity with phaseless measurements. Our method provides the first theoretical recovery guarantee for this problem by a computationally efficient algorithm and does not require a solution estimate to be computed for initialization. Our proof is based on Rademacher complexity estimates. Additionally, we provide an alternating direction method of multipliers (ADMM) implementation and provide numerical experiments that verify the theory.

Keywords: Hyperbolic Constraints, Blind Deconvolution, Phase Retrieval, Convex Analysis, Rademacher Complexity

1 Introduction

This paper considers recovery of two unknown signals (real- or complex-valued) from the magnitude only measurements of their convolution. Let , and be vectors residing in , where denotes either , or . Moreover, denote by the DFT matrix with entries We observe the phaseless Fourier coefficients of the circular convolution of , and

(1)

where returns the element wise absolute value of the vector . We use to denote noiseless measurements, and reserve the notation for more general noisy measurements. We are interested in recovering and from the phaseless measurements or of their circular convolution. In other words, the problem concerns blind deconvolution of two signals from phaseless measurements. The problem can also be viewed as identifying the structural properties on such that its convolution with the signal/image of interest makes the phase retrieval of a signal well-posed. Since and are both unknown, and in addition, the measurements are phaseless, the inverse problem becomes severly ill-posed as many pairs of and correspond to the same . We show that this non-linear problem can be efficiently solved, under Gaussian measurements, using a semidefinite program and also theoretically prove this assertion. We also propose a heuristic approach to solve the proposed semidefinite program computationally efficiently. Numerical experiments show that, using this algorithm, one can successfully recover a blurred image from the magnitude only measurements of its Fourier spectrum.

Phase retrieval has been of continued interest in the fields of signal processing, imaging, physics, computational science, etc. Perhaps, the single most important context in which phase retrieval arises is the X-ray crystallography [Harrison, 1993, Millane, 1990], where the far-field pattern of X-rays scattered from a crystal form a Fourier transform of its image, and it is only possible to measure the intensities of the electromagnetic radiation. However, with the advancement of imaging technologies, the phase retrieval problem continues to arise in several other imaging modalities such as diffraction imaging [Bunk et al., 2007], microscopy [Miao et al., 2008], and astronomical imaging [Fienup and Dainty, 1987]. In the imaging context, the result in this paper would mean that if rays are convolved with a generic pattern (either man made or naturally arising due to propagation of light through some unknown media) prior to being scattered/reflected from the object, the image of the object can be recovered from the Fourier intensity measurements later on. As is well known from Fourier optics [Goodman, 2008], the convolution of a visible light with a generic pattern can be implemented using a lens-grating-lens setup.

Despite recent advances in theoretical understanding of phase retrieval [Candes et al., 2013, Candes et al., 2015b], the application to actual problems such as crystallography remains challenging owing partly to the simplistic mathematical models that may not fully capture the actual physical problem at hand. Our comparatively more complex model in (1) more elaborately encompasses structure in actual physical problems, for example, crystallography, where due to the natural periodic arrangement of a crystal structural unit, the observed electron density function of the crystal exactly takes the form (1); for details, see, Section 2 of [Elser et al., 2017].

Blind deconvolution is a fundamental problem in signal processing, communications, and in general system theory. Visible light communication has been proposed as a standard in 5G communications for local area networks [Azhar et al., 2013, Retamal et al., 2015, Azhar et al., 2010]. Propagation of information carrying light through an unknown communication medium is modeled as a convolution. The channel is unknown and at the receiver it is generally difficult to measure the phase information in the propagated light. The result in this paper says that the transmitted signal can be blindly deconvolved from the unknown channel using the Fourier intensity measurements of the light only. The reader is referred to the first section of the supplementary note for a detailed description of the visible light communication and its connection to our formulation.

Main Contributions. In this paper, we study the combination of two important and notoriously challenging signal recovery problems: phase retrieval and blind deconvolution. We introduce a novel convex formulation that is possible because the algebraic structure from lifting resolves the bilinear ambiguity just enough to permit a nontrivial convex relaxation of the measurements. The strengths of our approach are that it allows a novel convex program that is the first to provably permit recovery guarantees with optimal sample complexity for the joint task of phase retrieval and blind deconvolution when the signals belong to known random subspaces. Additionally, unlike many recent convex relaxations and nonconvex approaches, our approach does not require an initialization or estimate of the true solution in order to be stated or solved. While our convex formulation is presented in a lifted domain (with increased dimensionality), in implementing the convex problem, we have been able to use some recent results in Burer-Monteiro-type approaches and perform the optimization in a factored space (solving a series of nonconvex programs which are guaranteed to land on the global minima).

Finally, an earlier version of this paper with only the exact recovery result form noiseless measurements appeared in [Ahmed et al., 2018] by the same authors. This paper extends the previous result to more general noisy measurements with a significantly modified proof. Moreover, the implementation in [Ahmed et al., 2018] was performed in a lifted domain and the proposed scheme required iterative projections onto the positive semidefinite cone, which was computationally prohibitive for large scale problems. By considering a different way of modeling the optimization problem, in Section 2 we present a more efficient algorithm, which is solved in a factored space using a Burer-Monteiro-type approach. This makes our implementation applicable to a much larger class of problems.

1.1 Observations in Matrix Form

The phase retrieval, and blind deconvolution problem has been extensively studied in signal processing community in recent years [Candes et al., 2015a, Ahmed et al., 2014] by lifting the unknown vectors to a higher dimensional matrix space formed by their outer products. The resulting rank-1 matrix is recovered using nuclear norm as a convex relaxation of the non-convex rank constraint. Recently, other forms of convex relaxations have been proposed [Bahmani and Romberg, 2017b, Goldstein and Studer, 2018, Aghasi et al., 2017a, Aghasi et al., 2017b, Aghasi et al., 2018] that solve both the problems in the native (unlifted) space leading to computationally efficiently solvable convex programs. This paper handles the non-linear convolutional phase retrieval problem by lifting it into a bilinear problem. The resulting problem, though still non-convex, gives way to an effective convex relaxation that provably recovers and exactly.

We consider the problem of recovering from measurements of the form (1). It is clear that uniquely recovering and , even up to the global bilinear abiguity, is not possible without extra knowledge or information about the problem. We will address the problem under the broad and generally applicable structural assumptions that both and are members of known subspaces of . This means that and can be parameterized in terms of unknown lower dimensional vectors and , respectively, as follows

(2)

where , and are known matrices whose columns span the subspaces in which and belong, respectively. Since the circular convolution operator diagonalizes in the Fourier domain, noiseless measurements become

(3)

where , , and represents the the Hadamard product. Denoting by and the rows of and , respectively, the entries of the noiseless measurements can be expressed as

This problem is non-linear in both unknowns; however, it reduces to a bilinear problem in the lifted variables and , taking the form

(4)

where and . Treating the lifted variables and as unknowns makes the measurements bilinear in the unknowns; a structure that will help us formulate an effective convex relaxation.

In the case of noisy measurements, we will write without loss of generality that

(5)
(6)

The noiseless case is given by .

1.2 Novel Convex Relaxation

The task of recovering and from the noiseless measurements in (5) can be naturally posed as an optimization program

(7)

Both the measurement and the rank constraints are non-convex. Further, the immediate convex relaxation of each measurement constraint is trivial, as the convex hull of the set of satisfying is the set of all possible .

To derive our convex relaxation, recall that the true , and are also positive semidefinite (PSD). This means that incorporating the PSD constraint in the optimization program translates into the fact that the variables and are necessarily non-negative. That is,

where the implication follows by the definition of PSD matrices. This observation restricts the hyperbolic constraint set in Figure 1 to the first quadrant only. For a fixed , we propose replacing the non-convex hyperbolic set with its convex hull In short, our convex relaxation is possible because the PSD constraint from lifting happens to select a specific branch of the hyperbola given by any particular bilinear measurement, and this single branch has a nontrivial convex hull.

The rest of the convex relaxation is standard, as the rank constraint in (7) is then relaxed with a nuclear-norm minimization, which reduces to trace minimization in the PSD case:

In the noiseless or noisy cases, we will study the following program, which only differs in that the noiseless observations are substituted by the possibly noisy ones given from (5):

(8)

The convexity of the optimization program above is established in the lemma below. A formal proof of he lemma can be found in Appendix A.

Lemma 1

The optimization problem (8) is a convex program.

\setkeys

Gin width=0.57height=0.3tics=10\OVP@calc       

Figure 1: Left: The restriction of the hyperbolic constraint to the first quadrant; Right: Abstract illustration of the geometry of the convex relaxation. The PSD cone (blue) and the surface of the hyperbolic set (red) formed by two intersecting hyperbolas . Evidently, there are multiple points on the surface and also in the convex hull of the hyperbolic set that lie on the PSD cone. The minimizer of the optimization program (8) picks the one with minimum trace that happens to lie at the intersection of hyperbolic ridge and the PSD cone (pointed out by an arrow).

1.3 Main Results

We consider the case of i.i.d. Gaussian measurements,

(9)

We show that with this choice, (8) recovers a global scaling of The exact value of the unknown scalar multiple can be characterized for the solution of (8). Observe that the solution of the convex optimization program in (8) obeys . We aim to show that the solution of the optimization program recovers the following scaling of the true solution :

(10)

It is worth noting that , , and .

We show that if and are random, and is sufficiently large with respect to , then the convex program (8) stably recovers the true solution up to the global bilinear scaling, with high probability.

Theorem 1 (Stable Recovery)

Given the magnitude only Fourier measurements (5) of the convolution of two unknown vectors , and in contaminated with additive noise in . Suppose that , and are generated as in (2), where , and are known standard Gaussian matrices as in (9). Assume without loss of generality that noise components for every . Then for any , when , with probability at least , the solution of the convex optimization program in (8) obeys

where , and is an absolute constant.

As a straightforward special case, for noiseless measurements, solving the proposed convex program would identify the true signals exactly, up to the global bilinear ambiguity, with high probability.

Corollary 1 (Exact Recovery)

Consider the magnitude-only Fourier measurements in (3) and a similar setting as Theorem 1. Fixing , the convex optimization in (8) uniquely recovers for with probability at least whenever , where is an absolute constant.

Both Theorem 1 and Corollary 1 establish high probability recovery for phaseless blinear inversion within random subspaces, provided that on the order of . Except for log factors, this sample complexity is optimal. Proof for the theorem is in the appendix and is based on Rademacher complexity estimates of descent directions objective.

2 Implementing the Convex Program

A conference paper by the authors [Ahmed et al., 2018] presented an ADMM scheme to address the central convex program (8). One of the main computational challenges with that proposed scheme is that it uses a projection onto the positive semi-definite cone at every ADMM iteration. Such an operation makes the algorithm prohibitively expensive for large problem sizes. In this section, we consider an alternative ADMM scheme which uses a Burer-Monteiro low-rank factorization [Burer and Monteiro, 2003, Burer and Monteiro, 2005, Bhojanapalli et al., 2018] to bypasses the PSD projection and speed up the algorithm convergence111An implementation of our solver is publicly available at: https://github.com/branchhull/BDPR.

To proceed, consider our central convex program

(11)

Note that complex-valued positive semidefinite matrices are necessarily Hermitian. For a simpler notation, we define the convex set

(12)

An alternative way of formulating program (11) is

(13)

where

Defining the dual vectors , the augmented Lagrangian for (13) takes the form

(14)

To set up an ADMM scheme, each variable update at the -th iteration is performed by minimizing with respect to that variable while fixing the others. More specifically, using the superscript to denote the iteration, for we have the primal updates

(15)
(16)

along with the dual updates

In the sequel we outline a computational procedure for each step of the proposed ADMM scheme.

2.1 Performing the -update

Central to the ADMM step (15), in this section we focus on addressing the convex program

(17)

One of the most successful heuristics to address (17), which was brought into attention by [Burer and Monteiro, 2003], is to consider the PSD factorization and to address the non-convex program

(18)

For a large class of objectives, there have been theoretical arguments that local minimizers to (18) can form the global minimizer to (17). Specifically, for the objective form (17), [Bhojanapalli et al., 2018] have recently shown that for almost all objectives of this form, if is a second-order stationary solution to (18) and , then is a global minimizer to (17) (see Corollary 2 in the aforementioned reference).

Finding solutions to (18) can be performed via standard optimization toolboxes. In particular, we use quasi-Newton methods with cubic line search as implemented in [Schmidt, 2005], which only need the gradient of the objective in (18), calculated as

It is noteworthy that the gradient calculation only requires a series of matrix-vector multiplications.

With the proposed computational scheme, to update at each ADMM iteration, another iterative scheme needs to be carried out to solve (18). Despite the nested nature of this framework, a very good initialization for at the start of each ADMM update is the optimal from the previous ADMM step. Aside from the factorization technique, such choice of initialization further contributes to fast solutions of (17).

2.2 Performing the -update

The -update in (16) is a standard projection problem onto the set . It is straightforward to see that program

(19)

decouples into distinct programs of the form

(20)

In the sequel we focus on addressing (20), as solving (20) for each component would deliver the solution to (19). We proceed by forming the Lagrangian for the constrained problem (20)

Along with the primal constraints, the Karush-Kuhn-Tucker optimality conditions are

(21)
(22)

We now proceed with the possible cases.

Case 1. :
In this case we have and this result would only be acceptable when and .

Case 2. , :
In this case the first feasibility constraint of (20) requires that , which is not a possiblity.

Case 3. , :
Similar to the previous case, this cannot happen when .

Case 4. , :
In this case we have , combining which with (22) yields , or

(23)

Similarly, (21) yields

(24)

Since the condition requires that , can be eliminated between (23) and (24) to generate the following forth order polynomial equation in terms of :

After solving this fourth order polynomial equation, we pick the real root which obeys

(25)

Note that the second inequality in (25) warrants nonnegative values for thanks to (23). After picking the right root, we can explicitly obtain using (24) and calculate the using (22). The resulting pair presents the solution to (20), and finding such pair for every provides the solution to (19). Thanks to the decoupling of the projection step in , the -update can enjoy a parallel computing framework.

3 Experiments and Application

We now present numerical experiments that verify the recovery guarantee for bilinear inversion from phaseless Fourier measurements by program (8). We consider the noiseless case with i.i.d. Gaussian matrices and . In Figure 2 we present the phase portrait associated with the proposed convex framework. To obtain the diagram on the left panel, for each fixed value of , we run the algorithm for 100 different combinations of and , each time using independently generated Gaussian matrices and . If the algorithm converges to a sufficiently close neighborhood of the ground-truth solution (a relative error of less than 1% with respect to the norm), we label the experiment as successful. Figure 2 shows the collected success frequencies, where solid black corresponds to 100% success and solid white corresponds to 0% success. For an empirically selected constant , the success region almost perfectly stands on the left side of the line . The results indicate that the constants in the Theorem are not unreasonably large in practice.

While the analysis in this paper is specifically focused on the Gaussian subspace embeddings for and , we additionally consider the case where is deterministic and is Gaussian. Specifically will be an equispaced sampling of the columns of the identity matrix. On the right panel of Figure 2, we have plotted the phase diagram for this case of deterministic and random . These results hint that the convex framework is applicable to more realistic deterministic subspace models.

\setkeys

Gin height=.24tics=10\OVP@calc \setkeysGin height=.24tics=10\OVP@calc

Figure 2: Phase portraits highlighting the frequency of successful recoveries of the proposed convex program for random and deterministic channel subspaces (see the text for the experiment details).

We do not want to give the reader the impression that the present paper solves the problem of blind deconvolutional phase retrieval in practice. The numerical experiments we perform do indeed show excellent agreement with the theorem in the case of random subspaces. Such subspaces are unlikely to appear in practice, and typically appropriate subspaces would both be deterministic, including partial Discrete Cosine Transforms or partial Discrete Wavelet Transforms. Numerical experiments, not shown, indicate that our convex relaxation is less effective for the cases of these doubly deterministic subspaces. We suspect this is due to the fact that the subspaces for both measurements should be mutually incoherent, in addition to both being incoherent with respect to the Fourier basis given by the measurements. As with the initial recovery theory for the problems of compressed sensing and phase retrieval, we have studied the random case in order to show information theoretically optimal sample complexity is possible by efficient algorithms. Based on this work, it is clear that blind deconvolutional phase retrieval is still a very challenging problem in the presence of deterministic matrices, and one for which development of convex or nonconvex methods may provide substantial progress in applications.

3.1 Related Real-World Applications

As discussed earlier, the proposed framework addresses a general version of the phase retrieval, where as a result of the light propagation through a medium, the rays are convolved with an unknown kernel. Aside from this general setup, in this section we will point out two specific physical problems, solving which requires simultanuously addressing variants of the phase retrieval and blind deconvolution problems.

3.1.1 Stylized Application in Visible Light Communications

As discussed in the body of the paper, an important application domain where blind deconvolution from phaseless Fourier measurements arises is the visible light communication (VLC). A stylized VLC setup is shown in Figure 3. A message is to be transmitted using visible light. The message is first coded by multiplying it with a tall coding matrix and the resultant information is modulated on a light wave. The light wave propagates through an unknown media. This propagation can be modeled as a convolution of the information signal with unknown channel . The vector contains channel taps, and frequently in realistic applications has only few significant taps. In this case, one can model

where is a short vector, and in this case is a subset of the columns of an identity matrix. Generally, the multipath channels are well modeled with non-zero taps in top locations of . In that case, is exactly known to be the top few columns of the identity matrix.

In visible light communication, there is always a difficulty associated with measuring phase information in the received light. Figure 3 shows a setup, where we measure the phaseless Fourier transform (light through the lens) of this signal. The measurements are therefore

and one wants to recover , and given the knowledge of and the coding matrix . Since we chose to be random Gaussian, and is the columns of identity. As mentioned at the end of the numerics section that with this subspace model, we obtain similar recovery results as one would have for both , and being random Gaussians. The proposed convex program solves this difficult inverse problem and recovers the true solution with these subspace models.

\setkeys

Gin width=0.57height=0.3tics=10\OVP@calcsignal beamlensemedia

Figure 3: Visible light communication optical setup; the media block normally consists of phosphor, filter and a linear polarizer. The lens takes the Fourier transform of the light and one can only measure the intensity only measurements of this transformed light source signal.

3.1.2 Crystallography

In crystallography, the lattice structural information is carried in the electron density function of the crystal, which may be represented as

(26)

Here, is a compactly supported central motif, and is a finite, but large compact set of translation vectors. In a sense, the electron density function is the result of convolving the central motif with the indicator of the set .

Denoting the Fourier transforms of and by and , similar to the other phase retrieval problems, X-ray experiments measure the magnitude of the Fourier transform of , which can be written as

Identifying the motif and the set , using measurements of the form would be a problem which involves simultaneously addressing a phase retrieval and blind deconvolution problem. The reader is referred to [Elser et al., 2017] and the references therein for more details of the underlying physics and measurement system.

4 Proof of Theorem 1

As shown in Appendix Appendix A. Proof of Lemma 1, the hyperbolic feasible set is convex in , however, the corresponding constraint function222We will abuse the notation by specifying the same using as parameters, i.e., , where recall that , and is a non-convex function of . In the analysis later, it is easier to work with convex constraint functions instead; therefore, we replace the function above with a convex counterpart whose 0-level-set is the same under the additional constraints that and :

(27)

where

(28)

is a scalar chosen to normalize the gradients computed below. Recall that . It is now easy to check that feasible sets drawn by the convex and non-convex functions are equal under the additional constraint of , and for every , and , i.e., , and , respectively. Mathematically,

for any . Note that automatically constrains and . It is easy to check that is a convex function. Since and imply that , and , respectively, and since , we can write the above conclusion in the matrix space as

In the sequel, we will refer to

which is same as except the measurements is now replaced by corresponding noiseless measurements . Define a convex indicator function for the positive semidefinite cone:

Introduce the convex regularizer

For analysis purposes, we will work with the following optimization program

(29)

where is given by (4).The optimization program (29) is equivalent to (8) as the objective and constraint set remain unchanged. In the analysis later, we will also need the subdifferential , evaluated at , which are given by (10). One can verify that

(30)

To see this, refer to a brief derivation below

where the last equality follows by using .

We now build some preliminaries required to characterize the set of descent directions for the objective function of the optimization program (29). Let , and be the set of symmetric matrices of the form

and denote the orthogonal complements by , and , respectively. Note that iff both the row and column spaces of are perpendicular to . denotes the orthogonal projection onto the set , and a matrix of appropriate dimensions can be projected into as

Similarly, define the projection operator . The projection onto orthogonal complements are then simply , and , where is the identity operator. We use as a shorthand for . The subgradient of the objective at the proposed solution is

(31)

for details; see, Section 8.6 in [Tropp, 2015], and references therein.

Given the measurements (5), one can only identify the true solution up to the blinear scaling ambiguity. To formalize this, begin by defining a set

(32)

and denote by a set shifted by a point . Mathematically,

(33)

We will refer to this set as the linearized global scaling of .

The main argument of stable recovery in the noisy case is summarized as follows: Let

be the orthogonal complement of the subspace . The first step consists of showing that any feasible perturbation about the linearized scaling ambiguity cannot be too large. This only shows that a large perturbation in the direction is not allowed, however, the movement away from the ground truth along the line can still be arbitraritly large. In the second step, we note that the straight line touches (in the noiseless case) the hyperbolic feasible set at , and diverges away from it as we deviate away (large ) from the point along the line . However, moving too far away along the line would make it impossible to jump back into the hyperbolic feasible region whi