Phase Retrieval with One or Two Diffraction Patterns by Alternating Projection with the Null Initialization

Phase Retrieval with One or Two Diffraction Patterns by Alternating Projection with the Null Initialization

Pengwen Chen Department of Applied Mathematics, National Chung Hsing University, Taichung 402, Taiwan. Research is supported in part by the grant 103-2115-M-005-006-MY2 from Ministry of Science and Technology, Taiwan, and US NIH grant U01-HL-114494    Albert Fannjiang Corresponding author. Department of Mathematics, University of California, Davis, CA 95616, USA. Research is supported in part by US National Science Foundation grant DMS-1413373 and Simons Foundation grant 275037.    Gi-Ren Liu Department of Mathematics, University of California, Davis, CA 95616, USA
Oct 24, 2015

Alternating projection (AP) of various forms, including the Parallel AP (PAP), Real-constrained AP (RAP) and the Serial AP (SAP), are proposed to solve phase retrieval with at most two coded diffraction patterns. The proofs of geometric convergence are given with sharp bounds on the rates of convergence in terms of a spectral gap condition.

To compensate for the local nature of convergence, the null initialization is proposed for initial guess and proved to produce asymptotically accurate initialization for the case of Gaussian random measurement. Numerical experiments show that the null initialization produces more accurate initial guess than the spectral initialization and that AP converges faster to the true object than other iterative schemes for non-convex optimization such as the Wirtinger Flow. In numerical experiments AP with the null initialization converges globally to the true object.

Key words. Phase retrieval, coded diffraction patterns, alternating projection, null initialization, geometric convergence, spectral gap

AMS subject classifications. 49K35, 05C70, 90C08

1 Introduction

With wide-ranging applications in science and technology, phase retrieval has recently attracted a flurry of activities in the mathematics community (see a recent review [55] and references therein). Chief among these applications is the coherent X-ray diffractive imaging of a single particle using a coherent, high-intensity source such as synchrotrons and free-electron lasers.

In the so-called diffract-before-destruct approach, the structural information of the sample particle is captured by an ultra-short and ultra-bright X-ray pulse and recorded by a CCD camera [18, 17, 56]. To this end, reducing the radiation exposure and damage is crucial. Due to the high frequency of the illumination field, the recorded data are the intensity of the diffracted field whose phase needs to be recovered by mathematical and algorithmic techniques. This gives rise to the problem of phase retrieval with non-crystalline structures.

The earliest algorithm of phase retrieval for a non-periodic object (such as a single molecule) is the Gerchberg-Saxton algorithm [33] and its variant, Error Reduction [31]. The basic idea is Alternating Projection (AP), going back all the way to the works of von Neuman, Kaczmarz and Cimmino in the 1930s [21, 38, 49]. And these further trace the history back to Schwarz [54] who in 1870 used AP to solve the Dirichlet problem on a region given as a union of regions each having a simple to solve Dirichlet problem.

For any vector let be the vector such that . In a nutshell, phase retrieval is to solve the equation of the form where represents the unknown object, the diffraction/propagation process and the diffraction pattern(s). The subset represents all prior constraints on the object. Also, the number of data is typically greater than the number of components in .

Phase retrieval can be formulated as the following feasibility problem


From the object is estimated via pseudo-inverse


Let be the projection onto and the projection onto defined as

where denotes the Hadamard product and the componentwise division. Where vanishes, is chosen to be 1 by convention. Then AP is simply the iteration of the composite map


starting with an initial guess .

The main structural difference between AP in the classical setting [21, 38, 49] and the current setting is the non-convexity of the set , rendering the latter much more difficult to analyze. Moreover, AP for phase retrieval is well known to have stagnation problems in practice, resulting in poor reconstruction [31, 32, 44].

In our view, numerical stagnation has more to do with the measurement scheme than non-convexity: the existence of multiple solutions when only one (uncoded) diffraction pattern is measured even if additional positivity constraint is imposed on the object. However, if the diffraction pattern is measured with a random mask (a coded diffraction pattern), then the uniqueness of solution under the real-valuedness constraint is restored with probability one [28]. In addition, if two independently coded diffraction patterns are measured, then the uniqueness of solution, up to a global phase factor, holds almost surely without any additional prior constraint [28] (see Proposition LABEL:prop:unique).

The main goal of the present work is to show by analysis and numerics that under the uniqueness framework for phase retrieval with coded diffraction patterns of [28], AP has a significantly sized basin of attraction at and that this basin of attraction can be reached by an effective initialization scheme, called the null initialization. In practice, numerical stagnation disappears under the uniqueness measurement schemes of [28].

Specifically, our goal is two-fold: i) prove the local convergence of various versions of AP under the uniqueness framework of [28] (Theorems LABEL:thm1, LABEL:thm2 and LABEL:thm3) and ii) propose a novel method of initialization, the null initialization, that compensates for the local nature of convergence and results in global convergence in practice. In addition, we prove that for Gaussian random measurements the null initialization alone produces an initialization of arbitrary accuracy as the sample size increases (Theorem LABEL:Gaussian). In practice AP with the null initialization converges globally to the true object.

1.1 Set-up

Let us recall the measurement schemes of [28].

Let be a discrete object function with . Consider the object space consisting of all functions supported in

We assume .

Only the intensities of the Fourier transform, called the diffraction pattern, are measured

which is the Fourier transform of the autocorrelation

Here and below the over-line means complex conjugacy.

Note that is defined on the enlarged grid

whose cardinality is roughly times that of . Hence by sampling the diffraction pattern on the grid

we can recover the autocorrelation function by the inverse Fourier transform. This is the standard oversampling with which the diffraction pattern and the autocorrelation function become equivalent via the Fourier transform [45, 46].

A coded diffraction pattern is measured with a mask whose effect is multiplicative and results in a masked object of the form where is an array of random variables representing the mask. In other words, a coded diffraction pattern is just the plain diffraction pattern of a masked object.

We will focus on the effect of random phases in the mask function where are independent, continuous real-valued random variables and (i.e. the mask is transparent).

For simplicity we assume which gives rise to a phase mask and an isometric propagation matrix


i.e. (with a proper choice of the normalizing constant ), where is the oversampled -dimensional discrete Fourier transform (DFT). Specifically is the sub-column matrix of the standard DFT on the extended grid where is the cardinality of .

If the non-vanishing mask does not have a uniform transparency, i.e. then we can define a new object vector and a new isometric propagation matrix

with which to recover the new object first.

When two phase masks are deployed, the propagation matrix is the stacked coded DFTs, i.e.


With proper normalization, is isometric.

We convert the -dimensional () grid into an ordered set of index. Let and the total number of measured data. In other words, .

Let be a nonempty closed convex set in and let


denote the projection onto .
Phase retrieval is to find a solution to the equation


We focus on the following two cases.

1) One-pattern case: is given by (LABEL:one), or .

2) Two-pattern case: is given by (LABEL:two), (i.e. ).

For the two-pattern case, AP for the formulation (LABEL:feas) shall be called the Parallel AP (PAP) as the rows of and the diffraction data are treated equally and simultaneously, in contrast to the Serial AP (SAP) which splits the diffraction data into two blocks according to the masks and treated alternately.

The main property of the true object is the rank- property: is rank- if the convex hull of in is -dimensional.

Now we recall the uniqueness theorem of phase retrieval with coded diffraction patterns.

Proposition 1.1

[28] (Uniqueness of Fourier phase retrieval) Let be a rank, object and a solution of the phase retrieval problem (LABEL:0) for either the one-pattern or two-pattern case. Then for some constant with probability one.

Remark 1.1

The main improvement over the classical uniqueness theorem [36] is that while the classical result works with generic (thus random) objects Proposition LABEL:prop:unique deals with a given deterministic object. By definition, deterministic objects belong to the measure zero set excluded in the classical setting of [36]. It is crucial to endow the probability measure on the ensemble of random masks, which we can manipulate, instead of the space of unknown objects, which we can not control.

The proof of Proposition LABEL:prop:unique is given in [28] where more general uniqueness theorems can be found, including the -mask case. Phase retrieval solution is unique only up to a constant of modulus one no matter how many coded diffraction patterns are measured. Thus a reasonable error metric for an estimate of the true solution is given by


Our framework and methods can be extended to more general, non-isometric measurement matrix as follows. Let be the QR-decomposition of where is isometric and is upper-triangular. We have


if (and hence ) is full-rank. Now we can define a new object vector and a new isometric measurement matrix with which to recover first.

1.2 Other literature

Much of recent mathematical literature on phase retrieval focuses on generic frames and random measurements, see e.g. [1, 2, 3, 4, 5, 11, 15, 22, 24, 27, 35, 43, 48, 52, 55, 58, 60, 55]. Among the mathematical works on Fourier phase retrieval e.g. [7, 12, 13, 14, 19, 26, 28, 29, 30, 36, 37, 39, 40, 44, 47, 51, 16, 53, 59], only a few focus on analysis and development of efficient algorithms.

There is also vast literature on AP. We only mention the most relevant literature and refer the reader to the reviews [6, 25] for a more complete list of references. Von Neumann’s convergence theorem [49] for AP with two closed subspaces is extended to the setting of closed convex sets in [20, 10] and, starting with [33], the application of AP to the non-convex setting of phase retrieval has been extensively studied [31, 32, 7, 8, 44].

In [42] in particular, local convergence theorems were developed for AP for non-convex problems. However, the technical challenge in applying the theory in [42] to phase retrieval lies precisely in verifying the main assumption of linear regular intersection therein.

In contrast, in the present work, what guarantees the geometric convergence and gives an often sharp bound on the convergence rate is the spectral gap condition which can be readily verified under the uniqueness framework of [28] (see Propositions LABEL:cor5.2 and LABEL:prop4.8 below).

As pointed out above, there are more than one way of formulating phase retrieval, especially with two (or more) diffraction patterns, as a feasibility problem. While PAP is analogous to Cimmino’s approach to AP [21], SAP is closer in spirit to Kaczmarz’s [38]. Surprisingly, SAP performs significantly better than PAP in our simulations (Section LABEL:sec:num). In Sections LABEL:sec:loc and LABEL:sec:SAP we prove that both schemes are locally convergent to the true solution with bounds on rates of convergence. measurement local convergence for PAP was proved in [48].

Despite the theoretical appeal of a convex minimization approach to phase retrieval [12, 15, 14, 16], the tremendous increase in dimension results in impractically slow computation. Recently, new non-convex approaches become popular again because of their computational efficiency among other benefits [13, 47, 48].

One purpose of the present work is to compare these newer approaches with AP, arguably the simplest of all non-convex approaches. An important difference of the measurement schemes in these papers from ours is that their coded diffraction patterns are not oversampled. In this connection, we emphasize that reducing the number of coded diffraction patterns is crucial for the diffract-before-destruct approach and it is better to oversample than to increase the number of coded diffraction patterns. Another difference is that these newer iterative schemes such as the Wirtinger Flow (WF) [13] are not of the projective type. In Section LABEL:sec:num, we provide a detailed numerical comparison between AP of various forms and WF.

Recently we proved local convergence of the Douglas-Rachford (DR) algorithm for coded-aperture phase retrieval [19]. The present work extends the method of [19] to AP. In addition to convergence analysis of AP, we also characterize the limit points and the fixed points of AP in the present work.

More important, to compensate for the local nature of convergence we develop a novel procedure, the null initialization, for finding a sufficiently close initial guess. We prove that the null initialization with the Gaussian random measurement matrix asymptotically approaches the true object (Section LABEL:sec:null). The analogous result for coded diffraction patterns remains open. The null initialization is significantly different from the spectral initialization proposed in [48, 13, 11]. In Section LABEL:sec:Gaussian we give a theoretical comparison and in Section LABEL:sec:num a numerical comparison between these initialization methods. We will see that the initialization with the null initialization is more accurate than with the spectral initialization and SAP with the null initialization converges faster than the Fourier-domain Douglas-Rachford algorithm proposed in [19].

During the review process, the two references [51, 37] were brought to our attention by the referees.

Theorem 3.10 of [37] asserts global convergence to some critical point of a proximal-regularized alternating minimization formulation of (LABEL:feas) provided that the iterates are bounded (among other assumptions). However, neither (global or local) convergence to the true solution nor the geometric sense of convergence is established in [37]. In contrast, we prove that the AP iterates are always bounded, their accumulation points must be fixed points (Proposition LABEL:Cauchy) and the true solution is a stable fixed point. Moreover, any fixed point that shares the same 2-norm with the true object is the true object itself (Proposition LABEL:prop2.21).

On the other hand, Corollary 12 of [51] asserts the existence of a local basin of attraction of the feasible set (LABEL:feas) which includes AP in the one-pattern case and PAP in the two-pattern case (but not SAP). From this and the uniqueness theorem (Proposition LABEL:prop:unique) convergence to the true solution, up to a global phase factor, follows (i.e. a singleton with an arbitrary global phase factor). However, Corollary 12 of [51] asserts a sublinear power-law convergence with an unspecified power. In contrast, we prove a linear convergence and give a spectral gap bound on the convergence rate for AP, including SAP which is emphatically not covered by [51] and arguably the best performer among the tested algorithms.

The paper proceeds as follows. In Section LABEL:sec:null, we discuss the null initialization and prove global convergence to the true object of the null initialization for the complex Gaussian random measurement. In Section LABEL:sec:pap, we formulate AP of various forms and in Section LABEL:sec:fixed we discuss the limit points and the fixed points of AP. We prove local convergence to the true solution for the Parallel AP in Section LABEL:sec:loc and for the real-constraint AP in Section LABEL:sec:one-pattern. In Section LABEL:sec:SAP we prove local convergence for the Serial AP. In Section LABEL:sec:num, we present numerical experiments and compare our approach with the Wirtinger Flow and its truncated version [13, 11].

2 The null initialization

For a nonconvex minimization problem such as phase retrieval, the accuracy of the initialization as the estimate of the object has a great impact on the performance of any iterative schemes.

The following observation motivates our approach to effective initialization. Let be a subset of and its complement such that for all . In other words, are the “weaker” signals and the “stronger” signals. Let be the cardinality of the set . Then is a set of sensing vectors nearly orthogonal to if is sufficiently small (see Remark LABEL:rmk5.2). This suggests the following constrained least squares solution

may be a reasonable initialization. Note that is not uniquely defined as , with is also a null vector. Hence we should consider the global phase adjustment for a given null vector

In what follows, we assume to be optimally adjusted so that


We pause to emphasize that the constraint is introduced in order to simplify the error bound below (Theorem LABEL:Gaussian) and is completely irrelevant to initialization since the AP map (see (LABEL:papf) below for definition) is scaling-invariant in the sense that , for any . Also, in many imaging problems, the norm of the true object, like the constant phase factor, is either recoverable by other prior information or irrelevant to the quality of reconstruction.

Denote the sub-column matrices consisting of and by and , respectively, and, by reordering the row index, write

Define the dual vector


whose phase factor is optimally adjusted as .

2.1 Isometric

For isometric ,


We have

and hence


i.e. the null vector is self-dual in the case of isometric . Eq. (LABEL:51') can be used to construct the null vector from by the power method.

Let be the characteristic function of the complementary index with . The default choice for is the median value .

1 Random initialization:
2 Loop:
3 for  do
4       ;
6 end for
Output: .
Algorithm 1 The null initialization

For isometric , it is natural to define


where is the output of Algorithm 1. As shown in Section LABEL:sec:num (Fig. LABEL:fig:noise0), the null vector is remarkably stable with respect to noise in .

2.2 Non-isometric

When is non-isometric such as the standard Gaussian random matrix (see below), the power method is still applicable with the following modification.

For a full rank , let be the QR-decomposition of where is isometric and is a full-rank, upper-triangular square matrix. Let , and . Clearly, is the null vector for the isometric phase retrieval problem in the sense of (LABEL:2.3).

Let and be the index sets as above. Let



where is the optimal phase factor and

may be an unknown parameter in the non-isometric case. As pointed out above, when with an arbitrary parameter is used as initialization of phase retrieval, the first iteration of AP would recover the true value of as AP is totally independent of any real constant factor.

2.3 The spectral initialization

Here we compare the null initialization with the spectral initialization used in [13] and the truncated spectral initialization used in [11].

1 Random initialization:
2 Loop:
3 for  do
5       ;
6 end for
Output: .
Algorithm 2 The spectral initialization

The key difference between Algorithms 1 and 2 is the different weights used in step 4 where the null initialization uses and the spectral vector method uses (Algorithm 2). The truncated spectral initialization uses a still different weighting


where is the characteristic function of the set

with an adjustable parameter . Both of Algorithm 1 and of (LABEL:truncated_version) can be optimized by tracking and minimizing the residual .

As shown in the numerical experiments in Section LABEL:sec:num (Fig. LABEL:fig:initials and LABEL:fig:initials_2masks), the choice of weight significantly affects the quality of initialization, with the null initialization as the best performer (cf. Remark LABEL:rmk5.2).

Moreover, because the null initialization depends only on the choice of the index set and not explicitly on , the method is noise-tolerant and performs well with noisy data (Fig. LABEL:fig:noise0).

2.4 Gaussian random measurement

Although we are unable to provide a rigorous justification of the null initialization in the Fourier case, we shall do so for the complex Gaussian case , where the entries of are i.i.d. standard normal random variables. The following error bound is in terms of the closely related error metric


which has the advantage of being independent of the global phase factor.

Theorem 2.1

Let be an i.i.d. complex standard Gaussian matrix. Suppose


Then for any and the following error bound


holds with probability at least


where has the asymptotic upper bound


with an absolute constant .

Remark 2.2

To unpack the implications of Theorem LABEL:Gaussian, consider the following asymptotic: With and fixed, let

We have


with probability at least

for moderate constants .

To compare with the asymptotic regimes of [13] and [11] let us set to be a constant and with a sufficiently large constant . Then (LABEL:5.8.1) becomes


which is arbitrarily small with a sufficiently large constant , with probability close to 1 exponentially in .

In comparison, the performance guarantee for the spectral initialization ([13], Theorem 3.3) assumes for the same level of accuracy guarantee with a success probability less than . On the other hand, the performance guarantee for the truncated spectral vector is comparable to Theorem LABEL:Gaussian in the sense that error bound like (LABEL:5.8.2) holds true for the truncated spectral vector with and probability exponentially close to 1 ([11], Proposition 3).

We mention by passing that the initialization by Resampled Wirtinger Flow ([13], Theorem 5.1) requires in practice a large number of coded diffraction patterns and does not apply to the present set-up, so we do not consider it further.

The proof of Theorem LABEL:Gaussian is given in Appendix A.

3 Ap

First we introduce some notation and convention that are frequently used in the subsequent analysis.

The vector space is isomorphic to via the map


and endowed with the real inner product

We say that and are (real-)orthogonal to each other (denoted by ) iff . The same isomorphism exists between and .

Let and be the component-wise multiplication and division between two vectors , respectively. For any define the phase vector with where . When the phase can be assigned any value in . For simplicity, we set the default value whenever the denominator vanishes.

It is important to note that for the measurement schemes (LABEL:one) and (LABEL:two), the mask function by assumption is an array of independent, continuous random variables and so is . Therefore almost surely vanishes nowhere. However, we will develop the AP method without assuming this fact and without specifically appealing to the structure of the measurement schemes (LABEL:one) and (LABEL:two) unless stated otherwise.

Let be any matrix, and



As noted above, for our measurement schemes (LABEL:one) and (LABEL:two), almost surely.

In view of (LABEL:3'), the only possible hinderance to differentiability for is the sum-over- term. Indeed, we have the following result.

Proposition 3.1

The function is infinitely differentiable in the open set


In particular, for an isometric , is infinitely differentiable in the neighborhood of defined by


Proof. Observe that

and hence if . The proof is complete.     

Consider the smooth function


where and


We can write


which has many minimizers if for some . We select by convention the minimizer


Define the complex gradient


and consider the alternating minimization procedure


each of which is a least squares problem.

By (LABEL:3.6) and (LABEL:3.6'), the minimizer (LABEL:ERstep2) is given by



is the pseudo-inverse.

Eq. (LABEL:3.9) can be written as the fixed point iteration


In the one-pattern case, (LABEL:papf) is exactly Fienup’s Error Reduction algorithm [31].

The AP map (LABEL:papf) can be formulated as the projected gradient method [34, 41]. In the small neighborhood of where is smooth (Proposition LABEL:C2), we have


and hence


Where is not differentiable, (LABEL:subdiff) is an element of the subdifferential of . Therefore, the AP map (LABEL:papf) can be viewed as the generalization of the projected gradient method to the non-smooth setting.

The object domain formulation (LABEL:papf) is equivalent to the Fourier domain formulation (LABEL:fap) by the change of variables and letting

We shall study the following three versions of AP. The first is the Parallel AP (PAP)


to be applied to the two-pattern case. The second is the real-constrained AP (RAP)


to be applied to the one-pattern case.

The third is the Serial AP defined as follows. Following [29] in the spirit of Kaczmarz, we partition the measurement matrix and the data vector into parts and treat them sequentially.

Let be a partition of the measurement matrix and data, respectively, as


Let be written as . Instead of (LABEL:feas), we now formulate the phase retrieval problem as the following feasibility problem


As the projection onto the non-convex set is not explicitly known, we use the approximation instead


and consider the Serial AP (SAP) map


In contrast, the PAP map (LABEL:3.11)


is the sum of and . Note that is a fixed point of both and .

4 Fixed points

Next we study the fixed points of PAP and RAP. Our analysis does not extend to the case of SAP.

Following [29] we consider the the generalized AP (PAP) map




We call a fixed point of AP if there exists

satisfying (LABEL:fix2) and


[29]. In other words, the definition (LABEL:fixed) allows flexibility of phase where vanishes.

First we identify any limit point of the AP iterates with a fixed point of AP.

Proposition 4.1

The AP iterates with any starting point , where is given by (LABEL:3.11) or (LABEL:3.12), is bounded and every limit point is a fixed point of AP in the sense (LABEL:fix2)-(LABEL:fixed).

Proof. Due to (LABEL:5) , (LABEL:ERstep1) and (LABEL:ERstep2),


and hence AP yields a non-increasing sequence .

For an isometric ,




Now by the convex projection theorem (Prop. B.11 of [9]).


Setting in Eq. (LABEL:21) we have


Furthermore, the descent lemma (Proposition A.24,  [9]) yields


From Eq. (LABEL:11), Eq. (LABEL:20) and Eq. (LABEL:22), we have

As a nonnegative and non-increasing sequence, converges and then (LABEL:3.23) implies


By the definition of and the isometry of , we have

and hence is bounded. Let be a convergent subsequence and its limit. Eq. (LABEL:23) implies that

If vanishes nowhere, then is continuous at . Passing to the limit in we get . Namely, is a fixed point of .

Suppose for some . By the compactness of the unit circle and further selecting a subsequence, still denoted by , we have

for some satisfying (LABEL:fix2). Now passing to the limit in we have

implying that is a fixed point of AP.     

Since the true object is unknown, the following norm criterion is useful for distinguishing the phase retrieval solutions from the non-solutions among many coexisting fixed points.

Proposition 4.2

Let be the AP map (LABEL:papf) with isometric . If a fixed point of AP in the sense (LABEL:fix2)-(LABEL:fixed) satisfies , then is a phase retrieval solution almost surely. On the other hand, if is not a phase retrieval solution, then .

Remark 4.3

If the isometric is specifically given by (LABEL:two) or (LABEL:one), then we can identify any fixed point satisfying the norm criterion with the unique phase retrieval solution in view of Proposition LABEL:prop:unique.

Proof. By the convex projection theorem (Prop. B.11 of [9])


where the equality holds if and only if . Hence

Clearly holds if and only if both inequalities in Eq. (LABEL:317) are equalities. The second inequality is an equality only when