Measurement Matrix Design for Phase Retrieval Based on Mutual Information
In phase retrieval problems, a signal of interest (SOI) is reconstructed based on the magnitude of a linear transformation of the SOI observed with additive noise. The linear transform is typically referred to as a measurement matrix. Many works on phase retrieval assume that the measurement matrix is a random Gaussian matrix, which, in the noiseless scenario with sufficiently many measurements, guarantees invertability of the transformation between the SOI and the observations, up to an inherent phase ambiguity. However, in many practical applications, the measurement matrix corresponds to an underlying physical setup, and is therefore deterministic, possibly with structural constraints. In this work we study the design of deterministic measurement matrices, based on maximizing the mutual information between the SOI and the observations. We characterize necessary conditions for the optimality of a measurement matrix, and analytically obtain the optimal matrix in the low signal-to-noise ratio regime. Practical methods for designing general measurement matrices and masked Fourier measurements are proposed. Simulation tests demonstrate the performance gain achieved by the suggested techniques compared to random Gaussian measurements for various phase recovery algorithms.
In a wide range of practical scenarios, including X-ray crystallography , diffraction imaging , astronomical imaging , and microscopy , a signal of interest (SOI) needs to be reconstructed from observations which consist of the magnitudes of its linear transformation with additive noise. This class of signal recovery problems is commonly referred to as phase retrieval . In a typical phase retrieval setup, the SOI is first projected using a measurement matrix specifically designed for the considered setup. The observations are then obtained as noisy versions of the magnitudes of these projections. Recovery algorithms for phase retrieval received much research attention in recent years. Major approaches for designing phase retrieval algorithms include alternating minimization techniques [6, 7], methods based on convex relaxation, such as phaselift  and phasecut , and non-convex algorithms with a suitable initialization, such as Wirtinger flow , and truncated amplitude flow (TAF) .
The problem of designing the measurement matrix received considerably less attention compared to the design of phase retrieval algorithms. An important desirable property that measurement matrices should satisfy is a unique relationship between the signal and the magnitudes of its projections, up to an inherent phase ambiguity. In many works, particularly in theoretical performance analysis of phase retrieval algorithms [8, 10, 12], the matrices are assumed to be random, commonly with i.i.d. Gaussian entries. However, in practical applications, the measurement matrix corresponds to a fixed physical setup, so that it is typically a deterministic matrix, with possibly structural constraints. For example, in optical imaging, lenses are modeled using discrete Fourier transform (DFT) matrices and optical masks correspond to diagonal matrices . Measurements based on oversampled DFT matrices were studied in , measurement matrices which correspond to the parallel application of several DFTs to modulated versions of the SOI were proposed in , and  studied phase recovery using fixed binary measurement matrices, representing hardware limitations in optical imaging systems.
All the works above considered noiseless observations, hence, the focus was on obtaining uniqueness of the magnitudes of the projections in order to guarantee recovery, though the recovery method may be intractable . When noise is present, such uniqueness no longer guarantees recovery, thus a different design criterion should be considered. Recovery algorithms as well as specialized deterministic measurement matrices were considered in several works. In particular, [17, 18] studied phase recovery from short-time Fourier transform measurements,  proposed a recovery algorithm and measurement matrix design based on sparse graph codes for sparse SOIs taking values on a finite set,  suggested an algorithm using correlation based measurements for flat SOIs, i.e., strictly non-sparse SOIs, and  studied recovery methods and the corresponding measurement matrix design for the noisy phase retrieval setup by representing the projections as complex polynomials.
A natural optimality condition for the noisy setup, without focusing on a specific recovery algorithm, is to design the measurement matrix to minimize the achievable mean-squared error (MSE) in estimating the SOI from the observations. However, in phase retrieval, the SOI and observations are not jointly Gaussian, which makes computing the minimum MSE (MMSE) for a given measurement matrix in the vector setting very difficult. Furthermore, even in the linear non-Gaussian setting, a closed-form expression for the derivative of the MMSE exists only for the scalar case , which corresponds to a single observation. Therefore, gradient-based approaches for MMSE optimization are difficult to apply as well.
In this work we propose an alternative design criterion for the measurement matrix based on maximizing the mutual information (MI) between the observations and the SOI. MI is a statistical measure which quantifies the “amount of information” that one random variable (RV) “contains” about another RV [23, Ch. 2.3]. Thus, maximizing the MI essentially maximizes the statistical dependence between the observations and the SOI, which is desirable in recovery problems. MI is also related to MMSE estimation in Gaussian noise via its derivative , and has been used as the design criterion in several problems, including the design of projection matrices in compressed sensing  and the construction of radar waveforms [26, 27].
In order to rigorously express the MI between the observations and the SOI, we adopt a Bayesian framework for the phase retrieval setup, similar to the approach in . Computing the MI between the observations and the SOI is a difficult task. Therefore, to facilitate the analysis, we first restate the phase retrieval setup as a linear multiple input-multiple output (MIMO) channel of extended dimensions with an additive Gaussian noise. In the resulting MIMO setup, the channel matrix is given by the row-wise Khatri-Rao product (KRP)  of the measurement matrix and its conjugate, while the channel input is the Kronecker product of the SOI and its conjugate, and is thus non-Gaussian for any SOI distribution. We show that the MI between the observations and the SOI of the original phase retrieval problem is equal to the MI between the input and the output of this MIMO channel. Then, we use that fact that for MIMO channels with additive Gaussian noise, the gradient of the MI can be obtained in closed-form  for any arbitrary input distribution. We note that a similar derivation cannot be carried out with the MMSE design criterion since: Differently from the MI, the MMSE for the estimation of the SOI based on the original observations is not equal to the MMSE for the estimation of the MIMO channel input based on the output; For the MIMO setup, a closed-form expression for the gradient of the MMSE exists only when the input is Gaussian, yet, the input is non-Gaussian for any SOI distribution due its Kronecker product structure.
Using the equivalent MIMO channel with non-Gaussian input, we derive necessary conditions on the measurement matrix to maximize the MI. We then obtain a closed-form expression for the optimal measurement matrix in the low signal-to-noise ratio (SNR) regime when the SOI distribution satisfies a symmetry property, we refer to as Kronecker symmetry, exhibited by, e.g., the zero-mean proper-complex (PC) Gaussian distribution. Next, we propose a practical measurement matrix design by approximating the matrix which maximizes the MI for any arbitrary SNR. In our approach, we first maximize the MI of a MIMO channel, derived from the phase retrieval setup, after relaxing the structure restrictions on the channel matrix imposed by the phase retrieval problem. We then find the measurement matrix for which the resulting MIMO channel matrix (i.e., the channel matrix which satisfies the row-wise KRP structure) is closest to the MI maximizing channel matrix obtained without the structure restriction. With this approach, we obtain closed-form expressions for general (i.e., structureless) measurement matrices, as well as for constrained settings corresponding to masked Fourier matrices, representing, e.g., optical lenses and masks. The substantial benefits of the proposed design framework are clearly illustrated in a simulations study. In particular, we show that our suggested practical design improves the performance of various recovery algorithms compared to using random measurement matrices.
The rest of this paper is organized as follows: Section II formulates the problem. Section III characterizes necessary conditions on the measurement matrix which maximizes the MI, and studies its design in the low SNR regime. Section IV presents the proposed approach for designing practical measurement matrices, and Section V illustrates the performance of our design in simulation examples. Finally, Section VI concludes the paper. Proofs of the results stated in the paper are provided in the appendix.
Ii Problem Formulation
We use upper-case letters to denote RVs, e.g., , lower-case letters for deterministic variables, e.g., , and calligraphic letters to denote sets, e.g., . We denote column vectors with boldface letters, e.g., for a deterministic vector and for a random vector; the -th element of is written as . Matrices are represented by double-stroke letters, e.g., , is the -th element of , and is the identity matrix. Hermitian transpose, transpose, complex conjugate, real part, imaginary part, stochastic expectation, and MI are denoted by , , , , , , and , respectively. denotes the trace operator, is the Euclidean norm when applied to vectors and the Frobenius norm when applied to matrices, denotes the Kronecker product, is the Kronecker delta function, i.e., when and otherwise, and . For an vector , is the diagonal matrix whose diagonal entries are the elements of , i.e., . The sets of real and of complex numbers are denoted by and , respectively. Finally, for an matrix , is the column vector obtained by stacking the columns of one below the other. The matrix is recovered from via .
Ii-B The Phase Retrieval Setup
We consider the recovery of a random SOI , from an observation vector . Let be the measurement matrix and be the additive noise, modeled as a zero-mean real-valued Gaussian vector with covariance matrix , . As in [12, Eq. (1.5)], [14, Eq. (1)], and [16, Eq. (1.1)], the relationship between and is given by:
where denotes the element-wise squared magnitude. Since for every , the vectors and result in the same , the vector can be recovered only up to a global phase.
In this work we study the design of aimed at maximizing the MI between the SOI and the observations. Letting be the joint probability density function (PDF) of and , the PDF of , and the PDF of , the MI between the SOI and the observations is given by [23, Ch. 8.5]
Specifically, we study the measurement matrix which maximizes111The optimal matrix is not unique since, for example, for any real , the matrices and result in the same MI . the MI for a fixed arbitrary distribution of , subject to a Frobenious norm constraint , namely,
where and are related via (1). In the noiseless non-Bayesian phase retrieval setup, it has been shown that a necessary and sufficient condition for the existence of a bijective mapping from to is that the number of observations, , is linearly related to the dimensions of the SOI222Specifically, was shown to be sufficient and was shown to be necessary., , see [31, 32]. Therefore, we focus on values of satisfying .
As discussed in the introduction, in practical scenarios, the structure of the measurement matrix is often constrained. One type of structural constraint commonly encountered in practice is the masked Fourier structure, which arises, for example, when the measurement matrix represents an optical setup consisting of lenses and masks [13, 19]. In this case, is obtained by projecting via optical masks, each modeled as an diagonal matrix , , followed by an optical lens, modeled as a DFT matrix of size , denoted [19, Sec. 3]. Consequently, and is obtained as
Since , we focus on . In the following sections we study the optimal design of general (unconstrained) measurement matrices, and propose a practical algorithm for designing both general measurement matrices as well as masked Fourier measurement matrices.
Iii Optimal Measurement Matrix
In this section we first show that the relationship (1) can be equivalently represented (in the sense of having the same MI) as a MIMO channel with PC Gaussian noise. Then, we use the equivalent representation to study the design of measurement matrices for two cases: The first considers an arbitrary SOI distribution, for which we characterize a necessary condition on the optimal measurement matrix. The second case treats an SOI distribution satisfying a symmetry property (exhibited by, e.g., zero-mean PC Gaussian distributions) focusing on the low SNR regime, for which we obtain the optimal measurement matrix in closed-form.
Iii-a Gaussian Mimo Channel Interpretation
Next, define , and the matrix such that
We note that the transformation from to is bijective333The transformation from to is bijective up to a global phase. However, the global phase can be set to an arbitrary value, as (1) is not affected by this global phase. Therefore, bijection up to a global phase is sufficient for establishing equivalence of the two representations in the present setup., since can be obtained from the singular value decomposition (SVD) of the rank one matrix [33, Ch. 2.4]. We also note that corresponds to the row-wise KRP of and [33, Ch. 12.3], namely, the rows of are obtained as the Kronecker product of the corresponding rows of and . Defining to be the selection matrix such that , we can write as [29, Sec. 2.2]
The relationship (7) formulates the phase retrieval setup as a MIMO channel with complex channel input , complex channel matrix , real additive Gaussian noise , and real channel output . We note that is non-Gaussian for any distribution of , since, e.g., is non-negative. In order to identify the measurement matrix which maximizes the MI, we wish to apply the gradient of the MI with respect to the measurement matrix, stated in [30, Thm. 1]. To facilitate this application, we next formulate the phase retrieval setup as a complex MIMO channel with additive PC Gaussian noise. To that aim, let be a random vector, distributed identically to and independent of both and , and also let . The relationship between and corresponds to a complex MIMO channel with additive zero-mean PC Gaussian noise, , with covariance matrix :
As the mapping from to is bijective, it follows from [23, Corollary after Eq. (2.121)] that
The MIMO channel interpretation represents the non-linear phase retrieval setup (1) as a linear problem (9) without modifying the MI. This presents an advantage of using MI as a design criterion over the MMSE, as, unlike MI, MMSE is not invariant to the linear representation, i.e., the error covariance matrices of the MMSE estimator of from and of the MMSE estimator of from are in general not the same.
Iii-B Conditions on for Arbitrary Soi Distribution
Let be the error covariance matrix of the MMSE estimator of from (referred to henceforth as the MMSE matrix) for a fixed measurement matrix , i.e.,
Theorem 1 (Necessary condition).
Let be the -th column of , , and define the matrix
Then, that solves (3) satisfies:
where is selected such that .
Proof: See Appendix -A.
It follows from (12) that the -th row of , , is an eigenvector of the Hermitian positive semi-definite matrix , which depends on . As the optimization problem in (3) is generally non-concave, condition (12) does not uniquely identify the optimal measurement matrix in general. Furthermore, in order to explicitly obtain from (12), the MMSE matrix must be derived, which is not a simple task. As an example, let the entries of be zero-mean i.i.d. PC Gaussian RVs. Then, obeys a singular Wishart distribution , and does not seem to have a tractable analytic expression. Despite this general situation, when the SNR is sufficiently low, we can explicitly characterize in certain scenarios, as discussed in the next subsection.
Iii-C Low SNR Regime
We next show that in the low SNR regime, it is possible to obtain an expression for the optimal measurement matrix which does not depend on . Let and denote the covariance matrices of the SOI, , and of , respectively. In the low SNR regime, i.e., when , the MI satisfies [30, Eq. (41)]:
where is given by (8).
Next, we introduce a new concept we refer to as Kronecker symmetric random vectors:
Definition 1 (Kronecker symmetry).
A random vector with covariance matrix is said to be Kronecker symmetric if the covariance matrix of is equal to .
In particular, zero-mean PC Gaussian distributions satisfy Def. 1, as stated in the following lemma:
Any zero-mean PC Gaussian random vector is Kronecker symmetric.
Proof: See Appendix -B.
We now obtain a closed-form solution to (14) when is a Kronecker symmetric random vector. The optimal for this setup is stated in the following theorem:
Let be the -th column of , , and let be the eigenvector of corresponding to its maximal eigenvalue. If is a Kronecker symmetric random vector with covariance matrix , then, for every with , setting for all solves (14). Thus,
Proof: See Appendix -C.
The result of Theorem 2 is quite non-intuitive from an estimation perspective, as it suggests using a rank-one measurement matrix. This implies that the optimal measurement matrix projects the multivariate SOI onto a single eigenvector corresponding to the largest spread. Consequently, there are infinitely many realizations of which result in the same . The optimality of rank-one measurements can be explained by noting that the selected scalar projection is, in fact, the least noisy of all possible scalar projections, as it corresponds to the largest eigenvalue of the covariance matrix of the SOI. Hence, when the additive noise is dominant, the optimal strategy is to design the measurement matrix such that it keeps only the least noisy spatial dimension of the signal, and eliminates all other spatial dimensions which are very noisy. From an information theoretic perspective, this concept is not new, and the strategy of using a single spatial dimension which corresponds to the largest eigenvalue of the channel matrix in memoryless MIMO channels was shown to be optimal in the low SNR regime, e.g., in the design of the optimal precoding matrix for MIMO Gaussian channels [35, Sec II-B]. However, while in [35, Sec II-B] the problem was to optimize the input covariance (using the precoding matrix) for a given channel, in our case we optimize over the “channel” (represented by the measurement matrix) for a given SOI covariance matrix.
Finally, we show that the optimal measurement matrix in Theorem 2 satisfies the necessary condition for optimality in Theorem 1: In the low SNR regime the MMSE matrix (11) satisfies , see, e.g., [35, Eq. (15)]. The Kronecker symmetry of the SOI implies that . Plugging this into the definition of in Theorem 1 results in . Theorem 1 thus states that for every , the vector must be a complex conjugate of an eigenvector of . Consequently, the optimal matrix in Theorem 2 satisfies the necessary condition in Theorem 1.
Iv Practical Design of the Measurement Matrix
As can be concluded from the discussion following Theorem 1, the fact that (12) does not generally have a unique solution combined with the fact that it is often difficult to analytically compute the MMSE matrix, make the characterization of the optimal measurement matrix from condition (12) a very difficult task. Therefore, in this section we propose a practical approach for designing measurement matrices based on Theorem 1, while circumventing the difficulties discussed above by applying appropriate approximations. We note that while the practical design approach proposed in this section assumes that the observations are corrupted by an additive Gaussian noise, the suggested approach can also be used as an ad hoc method for designing measurement matrices for phase retrieval setups with non-Gaussian noise, e.g., Poisson noise [8, Sec. 2.3]. The practical design is performed via the following steps: First, we find the matrix which maximizes the MI without restricting to satisfy the row-wise KRP structure (8). Ignoring the structural constraints on facilitates characterizing via a set of fixed point equations. Then, we obtain a closed-form approximation of by using the covariance matrix of the linear MMSE (LMMSE) estimator instead of the actual MMSE matrix. We denote the resulting matrix by . Next, noting that the MI is invariant to unitary transformations, we obtain the final measurement matrix by finding which minimizes the Frobenious norm between and a given unitary transformation of , also designed to minimize the Frobenious norm. Using this procedure we obtain closed-form expressions for general measurement matrices as well as for masked Fourier measurement matrices. In the following we elaborate on these steps.
Iv-a Optimizing without Structure Constraints
In the first step we replace the maximization of the MI in (3) with respect to the measurement matrix , with a maximization with respect to , which denotes the row-wise KRP of and . Specifically, we look for the matrix which maximizes , without constraining the structure of , while satisfying the trace constraint in (3).
We now formulate a constraint on which guarantees that the trace constraint in (3) is satisfied. Letting be the -th column of , , we have that
where follows since for all . Next, it follows from (8) that
Note that without constraining to satisfy the structure (8), can be complex, and the MI between the input and the output of the transformed MIMO channel, , may not be equal to the MI between the SOI and the observations of the original phase retrieval setup, .
The solution to (18) is given in the following lemma:
[25, Thm. 4.2], [36, Thm. 1], [37, Prop. 2]: Let be the covariance matrix of the MMSE estimate of from for a given , and let be the eigenvalue decomposition of , in which is unitary and is a diagonal matrix whose diagonal entries are the eigenvalues of in descending order. Let be an diagonal matrix whose entries satisfy
where is selected such that . The matrix which solves (18) is given by the solution to
Lemma 2 characterizes via a set of fixed point equations444The solution in [25, Thm. 4.2] includes a permutation matrix which performs mode alignment. However, for white noise mode alignment is not needed, and the permutation matrix can be set to [36, Sec. III].. Note that the matrix is constructed such that which solves (20) induces a covariance matrix of the MMSE estimate of from , denoted , whose eigenvalues satisfy (19).
Iv-B Replacing the Mmse Matrix with the Lmmse Matrix
In order to obtain from Lemma 2, we need the error covariance matrix of the MMSE estimator of from , , which in turn depends on . As is difficult to compute, we propose to replace the error covariance matrix of the MMSE estimate with that of the LMMSE estimate555An inspiration for this approximation stems from the fact that for parallel Gaussian MIMO scenarios, the covariance matrices of the MMSE estimate and of the LMMSE estimate coincide at high SNRs . of from . The LMMSE matrix is given by [30, Sec. IV-C]
Replacing with in Lemma 2, we obtain the matrix stated in the following corollary:
Let be the eigenvalue decomposition of , in which is unitary and is a diagonal matrix whose diagonal entries are the eigenvalues of arranged in descending order. Let be an diagonal matrix such that
where is selected such that . Finally, let
Then, satisfies the conditions in Lemma 2, computed with replaced by .
Proof: See Appendix -D.
While Lemma 2 corresponds to a generalized mercury waterfilling solution [25, Thm. 4.2], Corollary 1 is reminiscent of the conventional waterfilling solution for the optimal when is Gaussian [25, Thm. 4.1]. However, as noted in Subsection III-A, is non-Gaussian for any distribution of , thus, the resulting has no claim of optimality.
Iv-C Nearest Row-Wise Khatri-Rao Product Representation
The choice of in (22) does not necessarily correspond to a row-wise KRP structure (8). In this case, it is not possible to find a matrix such that , which implies that the matrix does not correspond to the model (1). Furthermore, we note that MI is invariant to unitary transformations, and specifically, for any unitary and for any we have that
To solve (24), let be the column vector corresponding to the -th column of and be the Hermitian part666The Hermitian part of a matrix is given by . of , . The solution to (24) can be analytically obtained as stated in the following proposition:
Let be the vector corresponding to the -th column of , . Let be the largest eigenvalue of , and let be the corresponding eigenvector, when the eigenvector matrix is unitary. Then, the columns of which solves (24) are given by
Proof: See Appendix -E.
The matrix derived in Proposition 1 does not necessarily satisfy the Frobenius norm constraint . Thus, if the squared norm of is larger than , then it is scaled down to satisfy the norm constraint. Moreover, since is monotonically non-decreasing w.r.t. [24, Thm. 2] for any distribution of , if the squared norm of is smaller than , then it is scaled up to the maximal norm to maximize the MI. Consequently, the final measurement matrix is given by .
Next, we show that when is Kronecker symmetric, then, in the low SNR regime, coincides with the optimal matrix characterized in Theorem 2, for any unitary transformation matrix . Let be an vector such that , and let be the eigenvalue decomposition of . For a Kronecker symmetric , we have that , and thus and [33, Ch. 12.3.1]. In the low SNR regime, due to the ”waterfilling” in (22), the measurement matrix extracts only the least noisy spatial dimension of the SOI, resulting in , where is the eigenvector corresponding to the maximal eigenvalue of the SOI covariance matrix, . Therefore, letting denote the leftmost column of , we have that , which results in [40, Ch. 9.2] and . Consequently, for every , and thus is a rank-one matrix of the form , which coincides with stated in Theorem 2. For example, setting results in .
Iv-D Masked Fourier Measurement Matrix
As mentioned in Subsection II-B, in many phase retrieval setups, the measurement matrix represents masked Fourier measurements and is constrained to the structure of (4). In the context of phase retrieval, the design goal is to find the set of masks in (4) which result in optimal recovery performance. To that aim, define the vectors , , to contain the diagonal elements of , , . With this definition, we can write
Since does not necessarily represent a masked Fourier structure, based on the rationale detailed in Subsection IV-C, we suggest to use the masks that minimize the distance between the resulting measurement matrix and a unitary transformation of :
Let be an diagonal matrix such that , . For all , let be the largest eigenvalue of the Hermitian matrix , where is the Hermitian part of , and let be its corresponding eigenvector, when the eigenvector matrix is unitary. Then, the set of mask coefficients which solves (27) is obtained as
Proof: See Appendix -F.
The masked Fourier measurement matrix is obtained from the coefficient vectors via
Applying the same reasoning used in determining the scaling of in Subsection IV-C, we conclude that the MI is maximized, subject to the trace constraint, by normalizing to obtain .
Let us again consider a Kronecker symmetric in the low SNR regime. For simplicity, we set . As discussed in the previous subsection, for this setting we have that , where is the vector such that , and thus is non-zero only for . Therefore, is zero for all , while is the largest eigenvalue of