Algorithms and error bounds for noisy phase retrievalwith low-redundancy frames

Algorithms and error bounds for noisy phase retrieval with low-redundancy frames

Bernhard G. Bodmann  and Nathaniel Hammen
651 Philip G. Hoffman Hall, Department of Mathematics, University of Houston
Houston, TX 77204-3008
Email: bgb@math.uh.edu, Tel.: 713 743 3581
Abstract

The main objective of this paper is to find algorithms accompanied by explicit error bounds for phase retrieval from noisy magnitudes of frame coefficients when the underlying frame has a low redundancy. We achieve these goals with frames consisting of vectors spanning a -dimensional complex Hilbert space. The two algorithms we use, phase propagation or the kernel method, are polynomial time in the dimension . To ensure a successful approximate recovery, we assume that the noise is sufficiently small compared to the squared norm of the vector to be recovered. In this regime, the error bound is inverse proportional to the signal-to-noise ratio. Upper and lower bounds on the sample values of trigonometric polynomials are a central technique in our error estimates.

1 Introduction

When optical, electromagnetic or acoustic signals are measured, often the measurement apparatus records an intensity, the magnitude of the signal amplitude, while discarding phase information. This is the case for X-ray crystallography [Patterson, Millane_90], many optical and acoustic systems [Walther, Rabiner], and also an intrinsic feature of quantum measurements [Finkelstein_QuantumCom04, Gross_2014]. Phase retrieval is the procedure of determining missing phase information from suitably chosen intensity measurements, possibly with the use of additional signal characteristics [Oppenheim_Phase_80, Fienup, Walther, Heinosaari_QuantumTom_13, MV13, Marchesini_AltProj14]. Many of these instances of phase retrieval are related to the Fourier transform [Akutowicz1956, Akutowicz1957, Fienup_93], but it is also of interest to study this problem from an abstract point of view, using the magnitudes of any linear measurements to recover the missing information. Next to infinite dimensional signal models [PYB_JFAA14], the finite dimensional case has received considerable attention in the past years [Balan_RecWithoutPhase_06, Balan_Painless_09, Bandeira_4NConj, CandesEldar_PhaseRetrieval, Candes_PhaseLift, Demanet_PhaselessLinMeas13, CandesLi_FCM13, Waldspurger_PR14]. In this case, the signals are vectors in a finite dimensional Hilbert space and one chooses a frame to obtain for each the magnitudes of the inner products with the frame vectors, . When recovering signals, we allow for a remaining undetermined global phase factor, meaning we identify vectors in the Hilbert space that differ by a unimodular factor , , in the real or complex complex case. Accordingly, we associate the equivalence class with a representative and consider the quotient space as the domain of the magnitude measurement map . The map is well defined, because does not depend on the choice of . The metric on relevant for the accuracy of signal recovery is the quotient metric , which assigns to elements and with representatives the distance .

The case of signals in real Hilbert spaces is now fairly well understood [Balan_RecWithoutPhase_06, BCE07, Bandeira_4NConj], while complex signals still pose many open problems. When the number of measured magnitudes is allowed to grow at a sufficient rate, then techniques from low-rank matrix completion are applicable to phase retrieval [CandesEldar_PhaseRetrieval, Candes_PhaseLift, Demanet_PhaselessLinMeas13, CandesLi_FCM13, Waldspurger_PR14], providing stable recovery from noisy measurements. Other methods also achieve stability by a method that locally patches the phase information together [Alexeev_PhaseRetrieval13, Yang_SampTA13, PYB_JFAA14]. Recently, it was shown that for a vector in a complex -dimensional Hilbert space, a generic choice of linear measurements is sufficient to recover the vector up to a unimodular factor from the magnitudes [Conca_Algebraic13], complementing an earlier result on a deterministic choice of vectors [BodmannHammen]. Nevertheless, fully quantitative stability estimates were missing in this case of lowest redundancy known to be sufficient for recovery.

A main objective of this paper is to find frames for the -dimensional complex Hilbert space such that is small and the magnitude measurement map is injective on with explicit error bounds for the approximate recovery when the magnitude measurements are affected by noise. More precisely, we find a left inverse of which extends to a neighborhood of the range of and is Lipschitz continuous for all input signals whose signal-to-noise ratio is sufficiently large. We show that the recovery is implemented with an explicit algorithm that restores the signal from measurements to a given accuracy in a number of operations that is polynomial in the dimension of the Hilbert space. The algorithm can be chosen to be either phase propagation or what we call the kernel method, a special case of semidefinite programming. The smallest number of frame vectors for which we could provide an algorithm with explicit error bounds is , as presented here.

To formulate the main result, it is convenient to take the Hilbert space as a space of polynomials of maximal degree equipped with the standard inner product, see Section 2.2 for details. With this choice of Hilbert space, the magnitude measurements we use are expressed in terms of point evaluations. We let denote the primitive -st root of unity and the primitive -th root of unity. For a polynomial , the noiseless magnitude measurements are

The noisy magnitude measurements are obtained from perturbing the noiseless magnitudes with a vector , , .

Our main theorem states that for all measurement errors with a sufficiently small maximum noise component , the noisy magnitude measurements determine an approximate reconstruction of with an accuracy . To state the theorem precisely involves several auxiliary quantities that all depend solely on the dimension . We let and choose a slack variable as well as .

Theorem 1.1.

Let and be as above. For any nonzero analytic polynomial and with , an approximation can be constructed from the perturbed magnitude measurements , such that if then the recovery error is bounded by

 ρ([p],[~p])≤⎛⎜ ⎜ ⎜ ⎜⎝2+√2β2(1−α)d−d~C−1+~Cd1−~C√d+1−~Cd2β√1√d(1−α)⎞⎟ ⎟ ⎟ ⎟⎠d(2d−1)(1−~C)∥ϵ∥∞∥p∥2.

The proof and the construction of approximate recovery proceeds in several steps:

Step 1.

First, we augment the finite number of magnitude measurements to an infinite family of such measurements. To this end, the Dirichlet Kernel is used to interpolate the perturbed measurements to functions on the entire unit circle. In the noiseless case, the magnitude measurements determine the values , , and for each , because these are trigonometric polynomials of degree at most . In the noisy case, the interpolation using values from , yields trigonometric polynomials that differ from the unperturbed ones by at most , uniformly on the unit circle.

Step 2.

We select a suitable set of non-zero magnitude measurements from the infinite family. A lemma will show that there exists a on the unit circle such that the distance between any element of and any roots of any non-zero truncation of the polynomial is at least . The reason why we need to consider all non-zero truncations of the polynomial is that the influence of the noise prevents us from determining the true degree of . However, when the coefficients of leading powers are sufficiently small compared to the noise, we can replace with a truncated polynomial without losing the order of approximation accuracy. As a consequence, we show that for this , with some that only depends on the dimension and the norm of . Thus, if the noise is sufficiently small compared to the norm of the vector, then there is a similar lower bound on the real trigonometric polynomials that interpolate the noisy magnitude measurements.

Step 3.

In the last step, the reconstruction evaluates the trigonometric approximations at the sample points and recovers an approximation to the equivalence class . It is essential for this step that the sample values are bounded away from zero in order to achieve a unique reconstruction. There are two algorithms considered for this, phase propagation, which recovers the phase iteratively using the phase relation between sample points, and the kernel method, which computes a vector in the kernel of a matrix determined by the magnitude measurements. The error bound is first derived for phase propagation and then related to that of the kernel method. Both algorithms are known to be polynomial time, either from the explicit description, or from results in numerical analysis [GuEisenstat].

The nature of the main result has also been observed in simulations; assuming an a priori bound on the magnitude of the noise results in a worst-case recovery error that grows at most inverse proportional to the signal-to-noise ratio. Outside of this regime, the error is not controlled in a linear fashion. To illustrate this, we include two plots of the typical behavior for the recovery error for . The range of the plots is chosen to show the behavior of the worst-case error in the linear regime and also for errors where this linear behavior breaks down.

We tested the algorithm on more than 4.5 million randomly generated polynomials with norm 1. When errors were graphed for a fixed polynomial, the linear bound for the worst-case error was confirmed, although the observed errors were many orders of magnitude less than the error bound given in this paper. A small number of polynomials we found exhibited a max-min value that is an order of magnitude smaller than that of all the other randomly generated polynomials. We chose the polynomial with the worst max-min value out of the 4.5 million that had been tried, and applied a random walk to its coefficients, with steps of decreasing size that were accepted only if the max-min value decreased. The random walk terminated at a polynomial which provided an error bound that is an order of magnitude worse than any other polynomials that had been tested before. This numerically found, local worst-case polynomial is given by . The accuracy of the coefficients displayed here is sufficient to reproduce the results initially obtained with floating point coefficients of double precision. The errors resulting for this polynomial in the linear and transition regimes are shown in Figures 1 and 2.

2 Noiseless recovery

It is instructive to follow the construction of the magnitude measurements and the recovery strategy in the absence of noise.

2.1 Recovery algorithms for full vectors

To motivate and prepare the recovery strategy, we compare two recovery methods, phase propagation and the kernel method, a simple form of semidefinite programming, in the absence of noise and under additional non-orthogonality conditions on the input vector.

Definition 2.1.

If is a basis for , and such that for all from to , then we call full with respect to .

We recall a well known result concerning recovery of full vectors [Finkelstein_QuantumCom04, Flammia_PureStates05]. Let be full with respect to an orthonormal basis . For any from to , we define the measurement vector as

 fj=⎧⎪ ⎪⎨⎪ ⎪⎩ejif 1≤j≤dej−d−ej−d+1if d+1≤j≤2d−1ej−(2d−1)−iej−(2d−1)+1if 2d≤j≤3d−2

The set of measurement vectors is a frame for because it contains a basis. Define the magnitude measurement map by .

Recovery of full vectors with measurements has been shown in [Finkelstein_QuantumCom04], and was proven to be minimal in [Flammia_PureStates05]. We show recovery of full vectors with measurements using the measurement map with two different recovery methods. The first is called phase propagation, the second is a special case of semidefinite programming.

2.1.1 Phase propagation

The phase propagation method sequentially recovers the components of the vector, similar to the approach outlined in [Balan_Painless_09], see also [Yang_SampTA13, Pohl_ICASSP14, PYB_JFAA14].

Proposition 2.2.

For any vector , if is an orthonormal basis with respect to which is full, then the vector may be obtained by induction on the components of , using the values of .

Proof.

Without loss of generality, we assume that is the standard basis, drop the subscript from and abbreviate the components of the vector by for each , and similarly for . To initialize, we let so that .

For the th inductive step with , we assume that we have constructed with the given information. We then let

 yk+1=12A(x)k((1−i)A(x)k+(1−i)A(x)k+1−A(x)k+d+iA(x)k+2d+1)yk.

Inserting the values for the magnitude measurements and by a fact similar to the polarization identity,

 yk+1= 12|xk|2((1−i)|xk|2+(1−i)|xk+1|2−|xk−xk+1|2+i|xk−ixk+1|2)yk = ¯¯¯¯¯xkxk+1|xk|2yk=ykxk+1xk=¯¯¯¯¯x1|x1|xkxk+1xk=¯¯¯¯¯x1|x1|xk+1.

Iterating this, we obtain . ∎

2.1.2 Kernel method

Recovery by the kernel method minimizes the values of a quadratic form subject to a norm constraint, or equivalently, computes an extremal eigenvector for an operator associated with the quadratic form. The operator we use is , with

 Tx=d−1∑j=1(|⟨x,ej⟩|2ej⊗e∗j+1−⟨ej,x⟩⟨x,ej+1⟩ej⊗e∗j)

where each denotes the linear functional which is associated with the basis vector . The operator is indeed determined by the magnitude measurements. In particular, the second term in the series is computed via the polarization-like identity as in the proof of the preceding theorem,

 2⟨ej,x⟩⟨x,ej+1⟩=(1−i)A{ej}(x)k+(1−i)A{ej}(x)k+1−A{ej}(x)k+d+iA{ej}(x)k+2d+1

for any integer from to .

By construction, the rank of is at most equal to , because the range of is in the span of . In the next theorem, we show that indeed the kernel of , or equivalently, the kernel of , is one dimensional, consisting of all multiples of .

Proposition 2.3.

For any vector , if is a basis with respect to which is full, then the null space of the operator is given by all complex multiples of .

Proof.

As in the preceding proof, we let denote the standard basis. Thus, using the measurements provided, we may obtain the quantity . With respect to the basis , let be the left shift operator where we extend the vector with the convention . We also define the multiplication operator by the map , where again by convention we let . Similarly, we define the multiplication operator by the map . Note that is invertible if and only if for all from to , which is true by assumption. In terms of these operators, the operator is expressed as . Then for any and ,

 Tx(cx)= M|x|2S(cx)−M¯¯¯xSx(cx) = M|x|2((cxj+1)dj=1)−M¯¯¯xSx((cxj)dj=1) = (|xj|2cxj+1)dj=1−(¯¯¯¯¯xjxj+1cxj)dj=1 = 0

so any complex multiple of is in the null space of this operator.

Conversely, assume that is in the null space of . We use an inductive argument to show that for any from to , . The base case is trivial. For the inductive step, note that for any from to ,

Thus, , and because and , we obtain

 yj+1xj+1=1xj+1¯¯¯¯¯xjxj+1|xj|2yj=yjxj.

We conclude that for any from to , , so the vector is a complex multiple of . ∎

Since the frame vectors used for the magnitude measurements contain an orthonormal basis, determines the norm of . This is sufficient to recover .

Corollary 2.4.

If the vector is full with respect to the orthonormal basis , then the equivalence class is the solution of the problem

 argmin{∥Txy∥2:y∈H,∥y∥2=d∑i=1A(x)i}.

Because the solution to the phase retrieval problem is obtained from the kernel of the linear operator , or equivalently of , we may use methods from numerical linear algebra such as a rank-revealing QR factorization [GuEisenstat] to recover the equivalence class .

2.2 Augmentation and subselection of magnitude measurements

One of the main tools for the recovery procedure is that an entire family of magnitude measurements is determined from the initial choice. We call this an augmentation of the measured values. From this family a suitable subset is chosen which corresponds to a measurement of the form related to an orthonormal basis as explained in the previous section.

To describe the augmentation procedure we represent the -dimensional vector to be recovered as an element of , the space of complex analytic polynomials of degree at most on the unit circle. This space is used to represent the vector because is a reproducing kernel Hilbert space, and the magnitude squared of any element of is an element of , the space of trigonometric polynomials of degree at most on the unit circle, which is itself a reproducing kernel Hilbert space. The space is equipped with the scaled inner product on the unit circle such that for any ,

 ⟨p,q⟩=12π∫[0,2π]p(eit)¯¯¯¯¯¯¯¯¯¯¯¯q(eit)dt.

For any in , let be the vector of coefficients of . Then by orthogonality, the norm induced by the inner product satisfies

 ∥p∥2=12π∫[0,2π]p(eit)¯¯¯¯¯¯¯¯¯¯¯¯p(eit)dt=12π∫[0,2π]d−1∑j=0|cj|2dt=∥c∥22.

If is defined such that , then for any and any on the unit circle,

 p(z0)=d−1∑j=0cjzj0=⟨p,Kz0⟩.

Thus, these polynomials correspond to point evaluations, and linear combinations of these polynomials may be used as measurement vectors for the recovery procedure. If is the -st root of unity and is the -th root of unity, then for any from to , we define the measurement vector as

 ηj=⎧⎪⎨⎪⎩Kωjif 1≤j≤2d−1Kωj−Kωjνif 2d≤j≤4d−2Kωj−iKωjνif 4d−1≤j≤6d−3

Then the magnitude measurement map defined in the Introduction satisfies .

The measurements are grouped into three subsets, corresponding to magnitudes of point evaluations, magnitudes of differences, and magnitudes of differences between complex multiples of point values. Each of these subsets can be interpolated to a family of measurements from which suitable representatives are chosen. In order to simplify the recovery, we recall that for any with and , and are orthogonal because the series given by the inner product simply sums all the -th roots of unity.

Theorem 2.5.

For any polynomial , the measurements determine the values of with such that is an orthonormal basis with respect to which is full.

Proof.

Let be the normalized Dirichlet kernel of degree , so that for any in the unit circle . Then the set of functions is orthonormal with respect to the inner product on the unit circle, and any can be interpolated as . Note that if , then . Thus, each of the functions , , and , are in , and using the Dirichlet Kernel these functions may be interpolated from the values of . So the values of each of these functions are known at all points on the unit circle, not just the points that were measured explicitly.

Let be the zeros of the polynomial . Then the set has finitely many elements, so we may choose a point on the unit circle such that . Then for all from to , . Additionally, the set is an orthonormal basis for , so is a full vector with respect to this basis and the values of , , and at the points correspond to the measurements . ∎

Because the recovery procedure for full vectors has been established in Section 2.1, the phase-retrieval problem may now be solved using either procedure outlined there to obtain the equivalence class .

Corollary 2.6.

For any polynomial , the measurements determine .

Moreover, the number of points that need to be tested in order to find such that for all is at most , quadratic in . This means, in the noiseless case either of the two equivalent methods for recovery requires a number of steps that is polynomial in the dimension .

3 Recovery from noisy measurements

For the purpose of recovery from noisy measurements, we consider the perturbed magnitude measurement map by .

3.1 A max-min principle for magnitude samples

In the presence of noise, choosing a basis such that is full with respect to that basis is not sufficient to establish a bound on the error of the recovered polynomial. A lower bound on the magnitude of the entries of in a such a basis is needed. Once such a lower bound is obtained, recovery may proceed as in the noiseless case, by reduction of the phase retrieval problem to full vector recovery, with quantitative bounds on stability added for each step. Obtaining the needed lower bound on the magnitude requires a few lemmas and definitions.

Lemma 3.1.

If is represented as , then for any , there exists at least one between and such that .

Proof.

By way of contradiction, let for all from to . Then

 ∥a∥1=d−1∑j=0|aj|

This is a contradiction, so the claim holds. ∎

Definition 3.2.

If such that then we define for the -th truncation of to be the polynomial such that .

According to this definition, the -th truncation of a polynomial is the polynomial itself. We also note that even if a polynomial is nonzero, it may have truncations that are zero polynomials.

To obtain a lower bound on the entries of a full vector originating from a polynomial, a lower bound on the distance between any basis element and any roots of any nonzero truncations of the base polynomial is needed.

Lemma 3.3.

For any polynomial , there exists a on the unit circle such that the linear distance between any element of and any roots of any nonzero truncations of is at least .

Proof.

For any from to , let be the number of distinct roots of the -th truncation of if that truncation is nonzero, and let if the -th truncation of is a zero polynomial. Then the number of distinct roots of all nonzero truncations of is

 N≤d∑n=1Nn≤d∑n=1(n−1)=(d−1)d2

Then for the set , we know . If the elements of are ordered by their angle around the unit circle, then the average angle between adjacent elements is , and so there is at least one pair of adjacent elements that is separated by at least this amount. Thus, if we let be the midpoint between these two maximally separated elements on the unit circle, then the angle between and any element of is at least . Thus the linear distance between and any element of the set is at least . ∎

With a lower bound on the distance, a lower bound on the minimum magnitude of any component of the reduced vector can be obtained.

Lemma 3.4.

Let . For any polynomial , if there exists a on the unit circle such that the linear distance between any element of and any zeros of any nonzero truncations of is at least , then for all from to

 |p(νjz0)|≥r(d−1)d2(d−12d)d2d−1(∏d−1k=0(rk+1))∥^p∥1,

where , .

Proof.

Let be the smallest obtained by applying Lemma 3.1 to and . Then if we let , so that , we have

and for all

 |cj|<∥^p∥1(d−12d)d−j((d−12d)−1−1)=∥^p∥1(d−12d)d−jd+1d−1.

Let

We prove, by induction on from to , that for the -th truncation , and for all from to .

For the base case , we know that

 |cn0|−n0−1∑j=0|cj|≥ ∥^p∥1((d−12d)d−n0−n0−1∑j=0(d−12d)d−j)d+1d−1 = ∥^p∥1⎛⎝(d−12d)d−n0−d∑l=d−n0+1(d−12d)l⎞⎠d+1d−1 ≥ ∥^p∥1⎛⎜ ⎜⎝(d−12d)d−n0−(d−12d)d−n0+11−(d−12d)⎞⎟ ⎟⎠d+1d−1 = ∥^p∥1(d−12d)d−n02d−1

and equality only holds if . Then and so

 |pn0(νjz0)|≥ |cn0|−n0−1∑j=0|cj| ≥ ∥^p∥1(d−12d)d−n02d−1 ≥ r(n0−1)n02∥^p∥1(d−12d)d−n02d−1(∏n0−1k=n0(rk+1)) = m(n0,n0).

For the inductive step, assume that we have proven that . Then we choose a threshold . If the leading coefficient of satisfies , then is clearly a nonzero truncation of , so by using the factored form of , for all from to ,

 |pn+1(νjz0)|≥|cn|rn>τnrn=m(n0,n)rnrn+1.

Otherwise, if the leading coefficient satisfies , then for all from to

 |pn+1(νjz0)|≥m(n0,n)−|cn|≥m(n0,n)−τn=m(n0,n)−m(n0,n)rn+1=m(n0,n)rnrn+1.

Either way, for all from to ,

 |pn+1(νjz0)|≥m(n0,n)rnrn+1=r(n−1)n2∥^p∥1(d−12d)d−n02d−1rn⎛⎝n∏k=n0(rk+1)⎞⎠=m(n0,n+1).

Thus, for all from to ,

 |p(νjz0)|=|pd(νjz0)|≥m(n0,d)=r(d−1)d2∥^p∥1(d−12d)d−n02d−1(∏d−1k=n0(rk+1))≥r(d−1)d2∥^p∥1(d−12d)d2d−1(∏d−1k=0(rk+1))

Using the from Lemma 3.3 in the bound in the equation in Lemma 3.4 gives us the desired lower bound. Note that the bound in Lemma 3.3 was obtained by showing a worst case of equally spaced roots and the bound in Lemma 3.4 was obtained by showing a worst case of roots that are bunched together. Thus, the lower bound on the minimum magnitude obtained by combining Lemma 3.3 and Lemma 3.4 will not be achieved for any polynomial of degree greater than and is thus not the greatest lower bound for higher dimensions.

3.2 Recovery algorithms for full vectors in the presence of noise

We also need to be able to estimate the error in the full vector recovery procedure if we know a lower bound for the magnitude measurements.

Lemma 3.5.

Let . For any vectors and , if is an orthonormal basis such that for all from to , , and , then a vector may be obtained such that for all from to ,

 ∣∣∣yk−¯¯¯¯¯x1|x1|xk∣∣∣≤(2+√2m1−Ck−11−C+Ck−12√m)∥ϵ∥∞∥x∥∞

by using the values of