Blind Gain and Phase Calibration via Sparse Spectral Methods
Abstract
Blind gain and phase calibration (BGPC) is a bilinear inverse problem involving the determination of unknown gains and phases of the sensing system, and the unknown signal, jointly. BGPC arises in numerous applications, e.g., blind albedo estimation in inverse rendering, synthetic aperture radar autofocus, and sensor array autocalibration. In some cases, sparse structure in the unknown signal alleviates the illposedness of BGPC. Recently there has been renewed interest in solutions to BGPC with careful analysis of error bounds. In this paper, we formulate BGPC as an eigenvalue/eigenvector problem, and propose to solve it via power iteration, or in the sparsity or joint sparsity case, via truncated power iteration. Under certain assumptions, the unknown gains, phases, and the unknown signal can be recovered simultaneously. Numerical experiments show that power iteration algorithms work not only in the regime predicted by our main results, but also in regimes where theoretical analysis is limited. We also show that our power iteration algorithms for BGPC compare favorably with competing algorithms in adversarial conditions, e.g., with noisy measurement or with a bad initial estimate.
Last revised: ,
1Introduction
Blind gain and phase calibration (BGPC), the joint recovery of the unknown gains and phases in the sensing system and the unknown signal, is a bilinear inverse problem that arises in many applications: the joint estimation of albedo and lighting conditions in inverse rendering [2]; the joint recovery of phase error and radar image in synthetic aperture radar (SAR) autofocus [3]; and autocalibration of sensor gains and phases in array processing [4]. There exists a long line of research regarding the solutions for each application. However, theoretical analysis of the problem and error bounds for its solutions have been established only recently [5].
In this paper, we reformulate the BGPC problem as an eigenvalue/eigenvector problem. In the subspace case, we use algorithms that find principal eigenvectors such as the power iteration algorithm (also known as the power method) [9], to find the concatenation of the gain and phase vector and the vectorized signal matrix in the form of the principal component of a structured matrix. In the sparsity case, the problem resembles sparse principal component analysis (sparse PCA) [10]. We then propose to solve the sparse eigenvector problem using truncated power iteration [11]. The main contribution of this paper is the theoretical analysis of the error bounds of power iteration and truncated power iteration for BGPC in the subspace and joint sparsity cases, respectively. When the measurement matrix is random, and the signals and the noise are adversarial, our algorithms stably recover the unknown gains and phases, and the unknown signals with high probability under near optimal sample complexities. Since truncated power iteration relies on a good initial estimate, we also propose a simple initialization algorithm, and prove that the output is sufficiently good under certain technical conditions.
We complement the theoretical results with numerical experiments, which show that the algorithms can indeed solve BGPC in the optimal regime. We also demonstrate that the algorithms are robust against noise and an inaccurate initial estimate. Experiments with different initialization schemes show that our initialization algorithm significantly outperforms the baseline. Then we apply the power iteration algorithm to inverse rendering, and showcase its effectiveness in realworld applications.
The rest of the paper is organized as follows. In this section, we introduce the formulation of the BGPC problem, and related work. We then introduce the power iteration algorithms and our main theoretical results in Sections Section 2 and Section 3, respectively. Sections Section 4 and Section 5 give some fundamental estimates regarding the structured matrix in our BGPC formulation, and the proofs for our main results. We conduct some numerical experiments in Section 6, and conclude the paper with some discussion in Section 7.
1.1Notations
We use , , and to denote the transpose, the complex conjugate, and the conjugate transpose of a matrix , respectively. The th entry of a vector is denoted by . The th column, the th row (in a column vector form), and the th entry of a matrix are denoted by , , and , respectively. Upper script in a vector denotes the iteration number in an iterative algorithm. We use to denote the identity matrix of size , and and to denote the matrices of all ones and all zeros of size , respectively. The th standard basis vector is denoted by , whose ambient dimension is clear in the context. The norm and “norm” of a vector are denoted by and , respectively. The Frobenius norm and the spectral norm of a matrix are denoted by and , respectively. The support of a sparse vector is denoted by . The vector denotes the concatenation of the columns of , i.e., . A diagonal matrix with the entries of vector on the diagonal is denoted by . The Kronecker product is denoted by . We use to denote the relation greater than up to log factors. We use to denote the set . For an index set , the projection operator onto is denoted by , and the operator that restricts onto is denoted by . We use these operator notations for different spaces, and the ambient dimensions will be clarified in the context.
1.2Problem Formulation
In this section, we introduce the BGPC problem with a subspace constraint or a sparsity constraint. Suppose is the known measurement matrix, and is the vector of unknown gains and phases, the th entry of which is . Here, and denote the gain and phase of the th sensor, respectively. The BGPC problem is the simultaneous recovery of and the unknown signal matrix from the following measurement:
where is the measurement noise. The th entry in the measurement has the following expression:
Clearly, BGPC is a bilinear inverse problem. The solution suffers from scaling ambiguity, i.e., generates the same measurements as , and therefore cannot be distinguished from it. Despite the fact that the solution can have other ambiguity issues, in this paper, we consider the generic setting where the solution suffers only from scaling ambiguity [6].
1) Subspace case: Suppose that the known matrix is tall () and has full column rank. Then the columns of reside in the lowdimensional subspace spanned by the columns of . The problem is effectively unconstrained with respect to .
2) Sparsity case: Suppose that is a known dictionary with , while the columns of are sparse, i.e., for all . A variation of this setting is that the columns of are jointly sparse, i.e., there are at most nonzero rows in . In this case, the subspace constraint on no longer applies, and one must solve the problem with a sparsity (or joint sparsity) constraint.
The BGPC problem arises in applications including inverse rending, sensor array processing, multichannel blind deconvolution, and SAR autofocus. We refer the reader to our previous work [6] for a detailed account of applications of BGPC. For consistency, from now on, we use the convention in sensor array processing, and refer to and as the numbers of sensors and snapshots, respectively.
Subspace  Joint Sparsity  Sparsity  

Unique Recovery 


–  
Least Squares 

–  –  
Minimization  –  – 


This Paper 


–  
Note: , , and represent the number of sensors, the number of snapshots, the subspace dimension, and the sparsity level, respectively.
1.3Our Contributions
We reformulate BGPC as the problem of finding the principal eigenvector of a matrix (or operator). In the subspace case, this can be solved using any eigensolver, e.g., power iteration (Algorithm ?). In the sparsity case, we propose to solve this problem using truncated power iteration (Algorithm ?). Our main results can be summarized as follows:
In Table 1, we compare the above results with the sample complexities for unique recovery in BGPC [6], and previous guaranteed algorithms for BGPC in the subspace and sparsity case [7]. In the subspace case, power iteration solves BGPC using optimal (up to log factors) numbers of sensors and snapshots. These sample complexities are comparable to the least squares method in [7]. Moreover, we show that power iteration is empirically more robust against noise than least squares.
Truncated power iteration solves BGPC with a joint sparsity structure, with an optimal (up to log factors) number of sensors, and a slightly suboptimal (within a factor of and log factors) number of snapshots. In comparison, the minimization method for the sparsity case of BGPC uses a similar number of sensors, but a much larger number of snapshots. Numerical experiments show that truncated power iteration empirically succeed, in both the joint sparsity case and the more general sparsity case, in the optimal regime.
The success of truncated power iteration relies on a good initial estimate of and . We propose a simple initialization algorithm (Algorithm ?) with the following guarantee:
Despite the above scaling law predicted by theory, numerical experiments suggest that our initialization scheme is effective when .
1.4Related Work
BGPC arises in many realworld scenarios, and previous solutions have mostly been tailored to specific applications such as sensor array processing [4], sensor network calibration [14], synthetic aperture radar autofocus [3], and computational relighting [2]. A special case for BGPC is multichannel blind deconvolution (MBD) with a circular convolution model. Most previous works on MBD consider linear convolution with a finite impulse response (FIR) filter model (see [16], and a recent stabilized method [20]). In comparison, the BGPC problem discussed in this paper involves a more general subspace or sparsity model.
The idea of solving BGPC by reformulating it into a linear inverse problem, which is a key idea in this paper, has been proposed by many prior works [14]. In particular, Bilen et al. [22] provided a solution to BGPC with highdimensional but sparse signals using minimization. However, such methods have not been carefully analyzed until recently. Ling and Strohmer [7] derived an error bound for the least squares solution in the subspace case of BGPC. In this paper, the power iteration method has sample complexities comparable to those of the least squares method [7], and is empirically more robust to noise than the latter. Wang and Chi [8] gave a theoretical guarantee for minimization that solves BGPC in the sparsity case, where they assumed that is the discrete Fourier transform (DFT) matrix and is random following a BernoulliSubgaussian model. In this paper, we give a guarantee for truncated power iteration under the assumption that is a complex Gaussian random matrix, and is jointly sparse, wellconditioned, and deterministic. In this sense, we consider an adversarial scenario for the signal . Our sample complexity results require a near optimal number of sensors, and a much smaller number of snapshots. Moreover, truncated power iteration is more robust against noise and inaccurate initial estimate of phases. Very recently, Eldar et al. [23] proposed new methods for BGPC with signals whose sparse components may lie off the grid. Similar to earlier work on blind calibration of sensor arrays [4], these methods rely on empirical covariance matrices of the measurements and therefore need a relatively large number of snapshots.
To position BGPC in a more broad context, it is a special bilinear inverse problem [5], which in turn is a special case of lowrank matrix recovery from incomplete measurements [24]. A resurgence of interest in bilinear inverse problems was pioneered by the recent studies in singlechannel blind deconvolution of signals with subspace or sparsity structures, where both the signal and the filter are structured [28].
Another related bilinear inverse problem is blind calibration via repeated measurements from multiple sensing operators [33]. Since blind calibration with repeated measurements is in principle an easier problem than BGPC [7], we believe our methods for BGPC and our theoretical analysis can be extended to this scenario.
Also related is the phase retrieval problem [39], where there only exists uncertainty in the phases (and not the gains) of the sensing system. An active line of work solves phase retrieval with guaranteed algorithms (see [40] and [46] for a recent review).
The error bounds of power iteration and truncated power iteration have been analyzed in general settings, e.g., in [9] and [11]. These previous results hinge on spectral properties of matrices such as gaps between eigenvalues, which do not translate directly to sample complexity requirements. This paper undertakes analysis specific to BGPC. We relate spectral properties in BGPC to some technical conditions on , , , and , and derive recovery error under near optimal sample complexities. We also adapt the analysis of sparse PCA [11] to accommodate a structured sparsity constraint in BGPC.
BGPC and our proposed methods are nonconvex in nature. In particular, our truncated power iteration algorithm can be interpreted as projected gradient descent for a nonconvex optimization problem. There have been rapid developments in guaranteed nonconvex methods [47] in a variety of domains such as matrix completion [48], dictionary learning [51], blind deconvolution [31], and phase retrieval [42]. It is a common theme that carefully crafted nonconvex methods have better theoretical guarantees in terms of sample complexity than their convex counterparts, and often have faster implementations and better empirical performance. This paper is a new example of such superiority of nonconvex methods.
2Power Iteration Algorithms for BGPC
Next, we describe the algorithms we use to solve BGPC. In Section 2.1, we introduce a simple trick that turns the bilinear inverse problem in BGPC to a linear inverse problem. In Sections Section 2.2 and Section 2.3, we introduce the power iteration algorithm we use to solve BGPC with a subspace structure, and the truncated (or sparse) power iteration algorithm we use to solve BGPC with sparsity, respectively.
2.1From Bilinearity to Linearity
We use a simple trick to turn BGPC into a linear inverse problem [14]. Without loss of generality, assume that for . Indeed, if any sensor has zero gain, then the corresponding row in is all zero or contains only noise, and we can simply remove the corresponding row in . Let denote the entrywise inverse of , i.e., for . We have
where is the noiseless measurement. Equation is linear in all the entries of and . The bilinear inverse problem in now becomes a linear inverse problem in . In practice, since only the noisy measurement is available, one can solve .
This technique was widely used to solve BGPC with a subspace structure, in applications such as sensor network calibration [14], synthetic aperture radar autofocus [3], and computational relighting [2]. Recently, Ling and Strohmer [7] analyzed the least squares solution to . Wang and Chi [8] considered a special case where is the DFT matrix, and analyzed the solution of a sparse by minimizing the norm of .
We use the same trick in our algorithms. Define
We can decompose into , where
Define also
where is a nonzero constant specified later.
Clearly, can be rewritten as
where . Equivalently, is a null vector of . When certain sufficient conditions are satisfied, is the unique null vector of . For example, if , , and are in general positions in , , and , respectively, then snapshots are sufficient to guarantee uniqueness of the solution to BGPC in the subspace case. We refer readers to our work on the identifiability in BGPC for more details [5].
Since only the noisy matrix is accessible in practice, one can instead find the minor eigenvector, i.e., the eigenvector corresponding to the smallest eigenvalue of . The rest of this section focuses on algorithms that find such an eigenvector of , with no constraint (in the subspace case), or with a sparsity constraint (in the sparsity case).
2.2Power Iteration for BGPC with a Subspace Structure
In the subspace case (), we solve for the minor eigenvector of the positive definite matrix . In Section 3, we derive an upper bound on the error between this eigenvector and the true solution .
The minor eigenvector of can be computed by a variety of methods. Here, we propose an algorithm that remains computationally efficient for large scale problems. By eigenvalue decomposition, the null vector of is identical to the principal eigenvector of
for a large enough constant . This eigenvector can be computed using the power iteration algorithm (see Algorithm ?).
The size of is . An advantage of Algorithm ? over an eigensolver that decomposes , is that one does not need to explicitly compute the entries of to iteratively apply it to a vector. Furthermore, rather than , by the structure of and , the per iteration time complexity of applying the operator to a vector is only . This can be further reduced if and are linear operators with implementations faster than .
The rule of thumb for selecting parameter is that the norms of the columns of be close to those of so that in exhibits good spectral properties for power iterations. A safe choice for is , which may be conservatively large in some cases, but works well in practice. In Section 3, we discuss our choice of parameters under certain normalization assumptions (see Remark ?).
Algorithm ? converges to the principal eigenvector of , as long as the initial estimate is not orthogonal to that eigenvector. This insensitivity to initialization is a privilege not shared by the sparsity case (see Section 2.3).
2.3Truncated Power Iteration for BGPC with Sparsity
When , is a fat matrix, and the null space of has dimension at least . Therefore, there exist at least two linearly independent eigenvectors corresponding to the largest eigenvalue of . To overcome the illposedness, one can leverage the sparsity structure in to make the solution to the eigenvector problem unique.
Let denote the projection of a vector onto the set of sparse vectors. It is computed by setting to zero all but the entries of of the largest absolute values. Let denote the projection of a matrix onto the set of matrices whose columns are jointly sparse. This projection is computed by setting to zero all but the rows of of the largest norms. We define two projection operators on that will be used repeatedly in the rest of this paper:
For the sparsity case of BGPC, we adapt the eigenvector problem in Section 2.2 by adding a sparsity constraint:
This nonconvex optimization is very similar to the sparse PCA problem. The only difference lies in the structure of the sparsity constraint. In sparse PCA, the principal component is sparse. In , the vector consists of sparse vectors , and a dense vector .
To solve , we adopt a sparse PCA algorithm called truncated power iteration [11], and revise it to adapt to the sparsity structure of BGPC (see Algorithm ?). One can choose parameters and using the same rules as in Section 2.2. Note that we use a sparsity level in this algorithm, for two reasons: (a) in practice, it is easier to obtain an upper bound on the sparsity level instead of the exact number of nonzero entries in the signal; and (b) the ratio is an important constant in the main results, controlling the tradeoff between the number of measurements and the rate of convergence.
For the joint sparsity case, we use essentially the same algorithm, with replaced by .
Since is a nonconvex optimization problem, a good initialization is crucial to the success of Algorithm ?. Algorithm ? outlines one such initialization. We denote by the projection onto the support set , which sets to zero all rows of but the rows of the largest norms in each block. (The th block of consists of contiguous rows indexed by .) Then the normalized left and right singular vectors and of are computed as initial estimates for and . We use to denote the entrywise inverse of except for zero entries, which are kept zero. In Section 3, we further comment on how to choose a proper initial estimate (see Remark ?).
2.4Alternative Interpretation as Projected Gradient Descent
Algorithms ? and ? can be interpreted as gradient descent and projected gradient descent, respectively. Next, we explain such equivalence using the sparsity case as an example.
Recall that BGPC is linearized as . Relaxing the sparsity level from to , the optimization in is equivalent to:
The gradient of the objective function at is
Each iteration of projected gradient descent consists of two steps:
(i) Gradient descent with a step size of :
(ii) Projection onto the constraint set, i.e., the intersection of a cone () and a sphere ():
Clearly, the two steps are identical to those in each truncated power iteration except for a different scaling in Step (i), which, due to the normalization in Step (ii), is insignificant.
3Main Results
In this section, we give theoretical guarantees for Algorithms ? and ? in the subspace case and in the joint sparsity case, respectively. We also give a guarantee for the initialization by Algorithm ?.
3.1Main Assumptions
We start by stating the assumptions on , , and , which we use throughout this section.
Assumptions ? – ? can be relaxed in practice.
The complex Gaussian distribution in Assumption ? can be relaxed to for any . We choose the particular scaling , because then satisfies the restricted isometry property (RIP) [54], i.e., for some , when is large compared to the number of nonzero entries in .
The gains can center around any , i.e., . Due to bilinearity, we may assume that ’s center around without loss of generality by solving for .
The Frobenius norm of matrix can be any positive number. If is known, one can scale to have unit Frobenius norm before solving BGPC. In practice, the norm of is generally unknown. However, due to Assumptions ? (RIP) and ? (“flat” gains), we have
Hence is a good surrogate for in noiseless or low noise settings, and one can scale by to achieve the desired scaling. The slight deviation of from does not have any significant impact on our theoretical analysis. Therefore, we assume to simply the constants in our derivation.
The conditioning of can also be relaxed. When is large, one can choose a subset of columns in , such that the matrix formed from the corresponding columns of has good conditioning. When noise amplification is not of concern (noiseless or low noise settings), one can choose a preconditioning matrix such that is well conditioned, and then solve the BGPC with .
In summary, we can manipulate the BGPC problem and make it approximately satisfy our assumptions. For example, can be rewritten as:
We can run Algorithms ? and ? with input and , and solve for and . The above manipulations do not have any significant impact on the solution, or on our theoretical analysis. However, by making these assumptions, we eliminate some tedious and unnecessary discussions.
We also need an assumption on the noise level.
In the subspace case, the assumption on the noise level is very mild. Because under Assumptions ? – ?, , the noise term , which satisfies , can be on the same order in terms of Frobenius norm as the clean signal .
Finally, the following assumption is required for a theoretical guarantee of the initialization.
Assumption ? says that the support of can be partitioned into two subsets. The absolute values of the entries in the first subset are sufficiently large. Moreover, the total energy (sum of squares of the entries) in the second subset is small compared to the squared norm of . For example, the assumption is satisfied in the following special case: (therefore for ), and the absolute values of the nonzero entries are all comparable, e.g., .
Before introducing our main results, we disclose the choice of parameters and for our theoretical analysis of Algorithms ? and ?.
3.2A Perturbation Bound for the Eigenvector Problem
Next, we introduce a key result, a perturbation bound for the eigenvector problem, which is used to derive error bounds for power iteration algorithms.
Let denote subsets of , such that and . We define and as follows:
Recall that restricts a vector to the support , and hence is the projection operator onto the support . Clearly, we have , and . In the subspace case discussed in Theorem ?, we have , , , and . In the joint sparsity case, we have . We set , which we justify later in the analysis of truncated power iteration.
Let
denote the normalized version of , which is the eigenvector of and corresponding to eigenvalue . Let denote the principal eigenvector of . In the joint sparsity case, let denote the principal eigenvector of , where , , and the support of is a subset of defined in .
In Algorithms ? and ? and in our analysis, vectors , , and are normalized to unit norm. However, multiplication by a scalar of unit modulus is a remaining ambiguity, i.e., the set is an equivalence class for . Our main results use to denote the distance between and , which is a metric on the set of such equivalence classes.
When is large (e.g., ), does not hold, hence the perturbation bound of the eigenvector of in Theorem ? is no longer true. We can, however, bound the perturbation of the eigenvector of a submatrix of .
The error bounds for Algorithms ? and ? in the next section rely on Theorems ? and ?, and existing analysis of power iteration [9] and truncated power iteration [11]. Additionally, the perturbation bounds in this section are of independent interest. In particular, Theorem ? shows that if the assumptions and the prescribed sample complexities in are satisfied, then with high probability the principal eigenvector of is an accurate estimate of the vector that concatenates the unknown variables. It gives an error bound for any algorithm that finds the principal eigenvector of . Similarly, we believe that Theorem ? can be used to analyze other algorithms that find the sparse principal component of .
3.3Error Bounds for the Power Iteration Algorithms
In this section, we give performance guarantees for Algorithms ? and ? under the assumptions stated in Section 3.1.
Theorem ? shows that the power iteration algorithm requires sensors and snapshots to successfully recover and . This agrees, up to log factors, with the sample complexity required for the uniqueness of in the subspace case, which is and [6].
Next, we compare Theorem ? with a similar error bound for the least squares approach by Ling and Strohmer [7]. The sample complexity in Theorem ? matches the numbers required by the least squares approach and (up to one log factor). One caveat in the least squares approach is that, apart from the linear equation , it needs an extra linear constraint to avoid the trivial solution , . Unfortunately, in the noisy setting, the recovery error by the least squares approach is sensitive to this extra linear constraint. Our numerical experiments (Section 6) show that power iteration outperforms least squares in the noisy setting.
Theorem ? is only valid when . With the choice , when approaches , and approaches , the convergence rate is roughly . We discuss a more realistic scenario next.
Theorem ? states that for Algorithm ? to recover and a jointly sparse , it is sufficient to have sensors and snapshots. In comparison, the (up to a factor of ) optimal sample complexity for unique recovery in the joint sparsity case is and [6]. Hence, the number of sensors required in Theorem ? is (up to log factors) optimal, but the number of snapshots required is suboptimal. Another drawback is that these results apply only to the joint sparsity case, and not to the more general sparsity case. However, we believe these drawbacks are due to artifacts of our analysis. For both the joint sparsity case and the sparsity case, we have complexvalued measurements, and complexvalued unknowns. One may expect successful recovery when and are (up to log factors) on the order of and , respectively. In fact, numerical experiments in Section 6 confirms that truncated power iteration successfully recovers and in this regime for the more general sparsity case.
Wang and Chi [8] analyzed the performance of minimization for BGPC in the sparsity case, where they assumed that is the DFT matrix, and is a BernoulliSubgaussian random matrix. Their sample complexity for minimization is and . The success of their algorithm relies on a restrictive assumption that , which is analogous to the dependence of our algorithm on a good initialization of . In the next section, we show that such dependence can be relaxed under some additional conditions using the initialization provided by Algorithm ?.
3.4A Theoretical Guarantee of the Initialization
The next theorem shows that, under certain conditions, Algorithm ? recovers the locations of the large entries in correctly, and yields an initial estimate that satisfies (close to ).
By Theorem ?, the constant in Theorem ? can be set to in a low noise setting. For , this constant is larger than the one in Remark ?, and allows for more choices of .
Our guarantee for the initialization requires that the number of sensors scales quadratically (up to log factors) in the sparsity , which seems suboptimal. Since similar suboptimal sampling complexities show up in sparse PCA [55] and sparse phase retrieval [41], we conjecture that such a scaling law is intrinsic in the problem of sparse BGPC.
In the joint sparsity case, instead of estimating the supports of separately, one can estimate the row support of directly by sorting for and finding the largest. In this case, Assumption ? can be changed to: There exists a subset of large rows (in terms of norm), such that for all ,
and
In this case, the subset can be identified and an initialization can be computed under the same conditions as in Theorem ?, which can be proved using the same arguments.
4Fundamental Estimates
To prove the main results, we must first establish some fundamental estimates specific to BGPC. Proofs of some lemmas in this section can be found in the appendix.
4.1A Gap in Eigenvalues
A key component in establishing a perturbation bound for an eigenvector problem (e.g., Theorem ?) is bounding the gap between eigenvalues. Lemma ? gives us such a bound.
4.2Perturbation Due to Randomness in
Next, we show that , whose randomness comes from , is close to its mean under certain conditions.
4.3Perturbation Due to Noise
We established some fundamental estimates regarding in Sections Section 4.1 and Section 4.2. In this section, we turn to perturbation caused by noise. By the definitions of , , , , and , we have
where
Therefore,
Lemma ? gives an upper bound on the spectral norm of the perturbation from noise.
4.4Scalar Concentration
We now introduce a few scalar concentration bounds that are useful in the proof of Theorem ?.
5Proofs of the Main Results
5.1Proof of the Perturbation Bound for the Eigenvector Problem
In this section, we prove Theorem ?. Theorem ? can be proved similarly.
One can prove Theorem ? using the same steps as in the proof of Theorem ?, by restricting rows and columns of matrices to the support and applying the corresponding bounds on submatrices.
5.2Proof of the Error Bound for Algorithm
5.3Proof of the Error Bound for Algorithm
5.4Proof of the Guarantee for Algorithm
6Numerical Experiments
In this section, we test the empirical performance of Algorithm ? and Algorithm ?.
6.1Subspace Case: Power Iteration vs. Least Squares
In Algorithm ?, we choose , and (computed using another power iteration on ). We compare Algorithm ? with the least squares approach in [7], where is used to avoid the trivial solution.
We generate as a complex Gaussian random matrix, whose entries are drawn independently from , i.e., the real and imaginary part are drawn independently from . The unknown gains and phases are generated as follows:
such that is on a small circle of radius centered at a point on the unit circle, and and are drawn independently from a uniform distribution on . Figure 1 visualizes one such synthesized in the complex plane. We set in all the numerical experiments.
The entries of are drawn independently from , so that the Frobenius norm of is approximately . In the noisy setting, we generate complex white Gaussian noise , whose entries are drawn from . We define measurement signaltonoise ratio (MSNR) and recovery signaltonoise ratio (RSNR) as:
We test the two approaches at four noise levels: , , , and , which roughly correspond to MSNR of , dB, dB, and dB. At these noise levels, we say the recovery is successful if the RSNR exceeds dB, dB, dB, dB, respectively. The success rates do not change dramatically as functions of these thresholds. In the experiments, we set , , and . For each , we repeat the experiments times and compute the empirical success rates, which are shown in Figure ?.
As seen in Figure , both power iteration and least squares achieve perfect recovery in the noiseless setting. However, as seen in Figures – , power iteration is clearly more robust against noise than least squares, whose performance degrades more severely in the noisy settings.
The empirical phase transitions of power iteration are shown in Figure ?. We fix and plot the phase transition with respect to and (Figure ); we then fix and plot the phase transition with respect to and (Figure ). Clearly, to achieve successful recovery, must scale linearly with , but can be small compared to and . This confirms the sample complexity in Theorem ?, of and . Careful readers may notice in Figure that for the success rates at are worse than those at . This seemingly peculiar phenomenon is caused by a small , which does not belong to the large number regime associated with a high probability.
6.2Sparsity Case: Truncated Power Iteration vs. Minimization
In the sparsity case, we use the same setup described in the previous section, except for the signal . The supports of the sparse columns of are chosen uniformly at random, and the nonzero entries follow . This unstructured sparsity case is more challenging than the joint sparsity case in Theorem ?.
In Algorithm ?, we choose , and . In all the experiments, we assume that the sparsity level is known, and set for convenience. A more sophisticated scheme that decreases as the iteration number increases may lead to better empirical performance [11].
For the experiment we suppose that the phases in are available, and let
denote the initial estimate of , which is close to but different from the true , i.e., the entrywise inverse of in . See Figure 1 for an illustration of , , and . Then we initialize Algorithm ? with .
We compare Algorithm ? with an minimization approach. Wang and Chi [8] adopted an approach tailored for the case where is the DFT matrix and . They use a linear constraint to avoid the trivial solution of all zeros. For fair comparison, we revise their approach to accommodate arbitrary and . The revised approach uses the alternating direction method of multipliers (ADMM) [58] to solve the following convex optimization problem:
Here, is the initial estimate of defined in , and used as initialization in our Algorithm ? in this comparison.
We conduct numerical experiments with the same four noise levels and criterion for successful recovery as in Section 6.1. In the experiments, we set , , , and . For each , we repeat the experiments times and compute the empirical success rates, which are shown in Figure ?. In the noiseless case (Figure ), minimization achieves a slightly higher success rate near the phase transition. However, truncated power iteration is more robust against noise than minimization, which breaks down completely at the higher noise levels (Figures. – ).
Figure clearly shows that truncated power iteration recovers successfully when , , and . This suggests that truncated power iteration may succeed when and are (up to log factors) on the order of and , respectively. However, while the scaling with the number of sensors agrees with Theorem ?, success with such small number of snapshots is not guaranteed by our current theoretical analysis.
Next, we assume that only a subset of the phases are available, and examine to what extent Algorithm ? and minimization depend on a good initial estimate of . In the numerical results shown in Figure ?, we consider only the noiseless setting of BGPC with sparsity, and set . In Figures and , we replace and of with random phases, respectively, and use the resulting bad estimate in Algorithm ? and minimization. As seen in Figure ?, truncated power iteration is less dependent on accurate initial estimate of .
We repeat the above experiments for the joint sparsity case, where we replace in Algorithm ? with . We also replace the norm in the competing approach with a mixed norm:
which is a well known convex method for the recovery of jointly sparse signals. The results for different noise levels and for inaccurate are shown in Figures ? and ?, respectively. In the joint sparsity case, truncated power iteration is robust against noise, but seems less robust against errors in the initial phase estimate. We conjecture that the failure of Algorithm ? in the joint sparsity case is due to the restriction of . By projecting onto jointly sparse supports, the algorithm is likely to converge prematurely to an incorrect support. When compared to the results in Figures and , Figures and show that using instead of in the first half of the iterations indeed improves the performance of Algorithm ? in the joint sparsity case. In the rest of the experiments, we use during the first half of the iterations in Algorithm ? for the joint sparsity case.
Next, we plot the phase transitions for truncated power iteration. We fix and and plot the empirical phase transition with respect to and (sparsity case in Figure , and joint sparsity case in Figure ); we then fix and and plot the empirical phase transition with respect to and (sparsity case in Figure , and joint sparsity case in Figure ). It is seen that, to achieve successful recovery, must scale linearly with , but can be small compared to and . On the one hand, the scaling law in Theorem ? is confirmed by Figure ?; on the other hand, seems conservative and might be an artifact of our proof techniques. We have yet to come up with a theoretical guarantee that covers the more general sparsity case, or requires a less demanding sample complexity . In Figures and , the success rates at smaller are lower than those at a larger , because the number of sensors is too small to yield a high probability.
6.3Sparsity Case: Initialization
In this section, we examine the quality of the initialization produced by Algorithm ? by comparing it with two different initializations: (i) the good initialization aided by side information on the phase in Section 6.2; and (ii) a baseline initialization . We use the same setting as in Section 6.2, except that . We let , and claim the recovery is successful if the RSNR exceeds dB. In the experiment for the joint sparsity case, for the reason mentioned in Section 6.2, we ignore the joint sparsity structure and estimate the support of different columns of independently in the initialization and during the first half of the iterations. Only in the second half of the iterations, we use the projection onto jointly sparse supports.
Figure ? shows that, although the initialization provided by Algorithm ? is not as good as the accurate initialization with side information, it is far better than the baseline. Figure ? shows the empirical phase transition with respect to and , when Algorithm ? is used to initialize truncated power iteration (sparsity case in Figure , and joint sparsity case in Figure ). The results suggest that when scales linearly with , Algorithm ? can provide a sufficiently good initialization for truncated power iteration. For example, in , the success rate is 1 when and . Therefore, the sample complexity in Theorem ? could be overly conservative and an artifact of our analysis.
6.4Application: Inverse Rendering
In this section, we apply the power iteration algorithm to the inverse rendering problem in computational relighting – given images of an object under different lighting conditions (Figure ), and the surface normals of the object (Figure ), the goal is to recover the albedos (also known as reflection coefficients) of the object surface and the lighting conditions. In this problem, the columns of represent images under different lighting conditions, which are the products of the unknown albedo map and the intensity maps of incident light under different conditions . For Lambertian surfaces, it is reasonable to assume that the intensity of incident light resides in a subspace spanned by the first spherical harmonics computed from the surface normals [2], which we denote by the columns of . Then the columns of are the coordinates of the spherical harmonic expansion, which parameterize the lighting conditions. We can solve for and using Algorithm ?. Our approach is similar to that of Nguyen et al. [2], which also formulates inverse rendering as an eigenvector problem. Despite the fact that the two approaches solve for the eigenvectors of different matrices, they yield identical solutions in the ideal scenario where the model is exact and the solution is unique.
In our experiment, we obtain color images and the surface normals of an object under different lighting conditions,
7Conclusion
We formulate the BGPC problem as an eigenvector problem, and propose to solve BGPC with power iteration, and solve BGPC with a sparsity structure with truncated power iteration. We give theoretical guarantees for the subspace case with a near optimal sample complexity, and for the joint sparsity case with a suboptimal sample complexity. Numerical experiments show that both power iteration and truncated power iteration can recover the unknown gain and phase, and the unknown signal, using a near optimal number of samples. It is an open problem to obtain theoretical guarantees with optimal sample complexities, for truncated power iteration that solves BGPC with joint sparsity or sparsity constraints.
Footnotes
 An example of another ambiguity is a shift ambiguity when is the discrete Fourier transform matrix [5]. For a generic matrix , the solution to BGPC does not suffer from shift ambiguity.
 In the noisy setting, one could replace the linear constraint with an ellipsoid constraint . However, the parameter needs to be adjusted with noise levels. For fair comparison of robustness to noise, we use the linear constrained minimization in the noisy setting (similar to [8]).
 The images are downloaded from https://courses.cs.washington. edu/courses/csep576/05wi/projects/project3/project3.htm on September 16, 2017. The surface normals are computed using the method described in the same webpage.
References
 Y. Li, K. Lee, and Y. Bresler, “Blind gain and phase calibration for lowdimensional or sparse signal sensing via power iteration,” in Int. Conf. Sampling Theory and Applications (SampTA), July 2017, pp. 119–123.
 =2plus 43minus 4 H. Q. Nguyen, S. Liu, and M. N. Do, “Subspace methods for computational relighting,” in Computational Imaging XI, C. A. Bouman, I. Pollak, and P. J. Wolfe, Eds.1em plus 0.5em minus 0.4em, Feb 2013. =0pt
 =2plus 43minus 4 R. Morrison, M. Do, and J. Munson, D.C., “MCA: A multichannel approach to SAR autofocus,” IEEE Trans. Image Process., vol. 18, no. 4, pp. 840–853, Apr 2009. =0pt
 =2plus 43minus 4 A. Paulraj and T. Kailath, “Direction of arrival estimation by eigenstructure methods with unknown sensor gain and phase,” in Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), vol. 10.1em plus 0.5em minus 0.4emIEEE, Apr 1985, pp. 640–643. =0pt
 Y. Li, K. Lee, and Y. Bresler, “Identifiability in bilinear inverse problems with applications to subspace or sparsityconstrained blind gain and phase calibration,” IEEE Trans. Inf. Theory, vol. 63, no. 2, pp. 822 – 842, 2017.
 ——, “Optimal sample complexity for blind gain and phase calibration,” IEEE Trans. Signal Process., vol. 64, no. 21, pp. 5549–5556, Nov 2016.
 =2plus 43minus 4 S. Ling and T. Strohmer, “Selfcalibration via linear least squares,” arXiv preprint arXiv:1611.04196, 2016. =0pt
 L. Wang and Y. Chi, “Blind deconvolution from multiple sparse inputs,” IEEE Signal Process. Lett., vol. 23, no. 10, pp. 1384–1388, Oct 2016.
 G. H. Golub and C. F. Van Loan, Matrix computations (third edition).1em plus 0.5em minus 0.4emJHU Press, 1996.
 =2plus 43minus 4 B. Moghaddam, Y. Weiss, and S. Avidan, “Spectral bounds for sparse PCA: Exact and greedy algorithms,” in Advances in Neural Inform. Process. Syst. (NIPS 2006), Y. Weiss, P. B. Schölkopf, and J. C. Platt, Eds.1em plus 0.5em minus 0.4emMIT Press, 2006, pp. 915–922. =0pt
 X.T. Yuan and T. Zhang, “Truncated power method for sparse eigenvalue problems,” Journal of Machine Learning Research, vol. 14, no. 4, pp. 899–925, Apr 2013.
 Y. Rockah, H. Messer, and P. M. Schultheiss, “Localization performance of arrays subject to phase errors,” IEEE Trans. Aerosp. Electron. Syst., vol. 24, no. 4, pp. 402–410, Jul 1988.
 =2plus 43minus 4 A. J. Weiss and B. Friedlander, “Eigenstructure methods for direction finding with sensor gain and phase uncertainties,” Circuits, Syst. and Signal Process., vol. 9, no. 3, pp. 271–300, Sep 1990. =0pt
 L. Balzano and R. Nowak, “Blind calibration of sensor networks,” in Proc. 6th Int. Conf. Inform. Process. in Sensor Networks (IPSN 2007).1em plus 0.5em minus 0.4emAssociation for Computing Machinery (ACM), 2007.
 J. Lipor and L. Balzano, “Robust blind calibration via total least squares,” in Proc. Int. Conf. Acoustics, Speech, and Signal Processing (ICASSP), May 2014, pp. 4244–4248.
 L. Tong, G. Xu, and T. Kailath, “A new approach to blind identification and equalization of multipath channels,” in Conf. Record of the TwentyFifth Asilomar Conf. Signals, Systems and Computers, Nov 1991, pp. 856–860 vol.2.
 =2plus 43minus 4 E. Moulines, P. Duhamel, J. Cardoso, and S. Mayrargue, “Subspace methods for the blind identification of multichannel fir filters,” IEEE Trans. Signal Process., vol. 43, no. 2, pp. 516–525, Feb 1995. =0pt
 =2plus 43minus 4 G. Harikumar and Y. Bresler, “FIR perfect signal reconstruction from multiple convolutions: minimum deconvolver orders,” IEEE Trans. Signal Process., vol. 46, no. 1, pp. 215–218, Jan 1998. =0pt
 ——, “Perfect blind restoration of images blurred by multiple filters: theory and efficient algorithms,” IEEE Trans. Image Process., vol. 8, no. 2, pp. 202–219, Feb 1999.
 =2plus 43minus 4 K. Lee, F. Krahmer, and J. Romberg, “Spectral methods for passive imaging: Nonasymptotic performance and robustness,” arXiv preprint arXiv:1708.04343, 2017. =0pt
 K. Lee, N. Tian, and J. Romberg, “Fast and guaranteed blind multichannel deconvolution under a bilinear system model,” arXiv preprint arXiv:1610.06469, 2016.
 C. Bilen, G. Puy, R. Gribonval, and L. Daudet, “Convex optimization approaches for blind sensor calibration using sparsity,” IEEE Trans. Signal Process., vol. 62, no. 18, pp. 4847–4856, Sep 2014.
 =2plus 43minus 4 Y. C. Eldar, W. Liao, and S. Tang, “Sensor calibration for offthegrid spectral estimation,” arXiv preprint arXiv:1707.03378, 2017. =0pt
 M. A. Davenport and J. Romberg, “An overview of lowrank matrix recovery from incomplete observations,” IEEE J. Sel. Topics Signal Process., vol. 10, no. 4, pp. 608–622, Jun. 2016.
 Y. Li, K. Lee, and Y. Bresler, “Optimal sample complexity for stable matrix recovery,” in Proc. Int. Symp. Inform. Theory (ISIT), Jul 2016, pp. 81–85.
 =2plus 43minus 4 M. Kech and F. Krahmer, “Optimal injectivity conditions for bilinear inverse problems with applications to identifiability of deconvolution problems,” SIAM J. Applied Algebra and Geometry, vol. 1, no. 1, pp. 20–37, 2017. =0pt
 =2plus 43minus 4 K. Lee, Y. Wu, and Y. Bresler, “Near optimal compressed sensing of sparse lowrank matrices via sparse power factorization,” arXiv preprint arXiv:1312.0525, 2013. =0pt
 =2plus 43minus 4 A. Ahmed, B. Recht, and J. Romberg, “Blind deconvolution using convex programming,” IEEE Trans. Inf. Theory, vol. 60, no. 3, pp. 1711–1732, Mar 2014. =0pt
 =2plus 43minus 4 S. Ling and T. Strohmer, “Selfcalibration and biconvex compressive sensing,” Inverse Problems, vol. 31, no. 11, p. 115002, 2015. =0pt
 Y. Chi, “Guaranteed blind sparse spikes deconvolution via lifting and convex optimization,” IEEE J. Sel. Topics Signal Process., vol. 10, no. 4, pp. 782–794, Jun 2016.
 K. Lee, Y. Li, M. Junge, and Y. Bresler, “Blind recovery of sparse signals from subsampled convolution,” IEEE Trans. Inf. Theory, vol. 63, no. 2, pp. 802 – 821, 2017.
 =2plus 43minus 4 X. Li, S. Ling, T. Strohmer, and K. Wei, “Rapid, robust, and reliable blind deconvolution via nonconvex optimization,” arXiv preprint arXiv:1606.04933, 2016. =0pt
 =2plus 43minus 4 S. Bahmani and J. Romberg, “Lifting for blind deconvolution in random mask imaging: Identifiability and convex relaxation,” SIAM J. Imaging Sci., vol. 8, no. 4, pp. 2203–2238, 2015. =0pt
 =2plus 43minus 4 G. Tang and B. Recht, “Convex blind deconvolution with random masks,” in Classical Optics 2014.1em plus 0.5em minus 0.4emOptical Society of America, 2014, p. CW4C.1. =0pt
 V. Cambareri and L. Jacques, “A nonconvex blind calibration method for randomised sensing strategies,” in Int. Workshop on Compressed Sensing Theory and its Applications to Radar, Sonar and Remote Sensing (CoSeRa), Sep 2016, pp. 16–20.
 V. Cambareri, A. Moshtaghpour, and L. Jacques, “A greedy blind calibration method for compressed sensing with unknown sensor gains,” in Proc. Int. Symp. Inform. Theory (ISIT), Jun 2017, pp. 1132–1136.
 =2plus 43minus 4 A. Ahmed and L. Demanet, “Leveraging diversity and sparsity in blind deconvolution,” arXiv preprint arXiv:1610.06098, 2016. =0pt
 =2plus 43minus 4 A. Cosse, “From blind deconvolution to blind superresolution through convex programming,” arXiv preprint arXiv:1709.09279, 2017. =0pt
 =2plus 43minus 4 J. R. Fienup, “Phase retrieval algorithms: a comparison,” Appl. Opt., vol. 21, no. 15, pp. 2758–2769, Aug 1982. =0pt
 =2plus 43minus 4 E. J. Candès, T. Strohmer, and V. Voroninski, “Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming,” Commun. Pure Appl. Math., vol. 66, no. 8, pp. 1241–1274, 2013. =0pt
 =2plus 43minus 4 P. Netrapalli, P. Jain, and S. Sanghavi, “Phase retrieval using alternating minimization,” in Advances in Neural Inform. Process. Syst. (NIPS 2013), C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, Eds.1em plus 0.5em minus 0.4emCurran Associates, Inc., 2013, pp. 2796–2804. =0pt
 E. J. Candès, X. Li, and M. Soltanolkotabi, “Phase retrieval via Wirtinger flow: Theory and algorithms,” IEEE Trans. Inf. Theory, vol. 61, no. 4, pp. 1985–2007, Apr 2015.
 T. T. Cai, X. Li, and Z. Ma, “Optimal rates of convergence for noisy sparse phase retrieval via thresholded Wirtinger flow,” Ann. Stat., vol. 44, no. 5, pp. 2221–2251, Oct 2016.
 =2plus 43minus 4 S. Bahmani and J. Romberg, “Phase retrieval meets statistical learning theory: A flexible convex relaxation,” arXiv preprint arXiv:1610.04210, 2016. =0pt
 =2plus 43minus 4 T. Goldstein and C. Studer, “Phasemax: Convex phase retrieval via basis pursuit,” arXiv preprint arXiv:1610.07531, 2016. =0pt
 Y. Shechtman, Y. C. Eldar, O. Cohen, H. N. Chapman, J. Miao, and M. Segev, “Phase retrieval with application to optical imaging: A contemporary overview,” IEEE Signal Process. Mag., vol. 32, no. 3, pp. 87–109, May 2015.
 =2plus 43minus 4 J. Sun, Q. Qu, and J. Wright, “When are nonconvex problems not scary?” arXiv preprint arXiv:1510.06096, 2015. =0pt
 R. H. Keshavan, A. Montanari, and S. Oh, “Matrix completion from a few entries,” IEEE Trans. Inf. Theory, vol. 56, no. 6, pp. 2980–2998, Jun 2010.
 P. Jain, P. Netrapalli, and S. Sanghavi, “Lowrank matrix completion using alternating minimization,” in Proc. Ann. ACM Symp. Theory of Computing (STOC).1em plus 0.5em minus 0.4emACM, 2013, pp. 665–674.
 R. Sun and Z. Q. Luo, “Guaranteed matrix completion via nonconvex factorization,” IEEE Trans. Inf. Theory, vol. 62, no. 11, pp. 6535–6579, Nov 2016.
 J. Sun, Q. Qu, and J. Wright, “Complete dictionary recovery over the sphere I: Overview and the geometric picture,” IEEE Trans. Inf. Theory, vol. 63, no. 2, pp. 853–884, Feb 2017.
 ——, “Complete dictionary recovery over the sphere II: Recovery by Riemannian trustregion method,” IEEE Trans. Inf. Theory, vol. 63, no. 2, pp. 885–914, Feb 2017.
 ——, “A geometric analysis of phase retrieval,” in Proc. Int. Symp. Inform. Theory (ISIT), Jul 2016, pp. 2379–2383.
 E. Candès and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec 2005.
 =2plus 43minus 4 Q. Berthet and P. Rigollet, “Computational lower bounds for sparse pca,” arXiv preprint arXiv:1304.0828, 2013. =0pt
 K. Jaganathan, S. Oymak, and B. Hassibi, “Sparse phase retrieval: Uniqueness guarantees and recovery algorithms,” IEEE Trans. Signal Process., vol. 65, no. 9, pp. 2402–2410, May 2017.
 C. Davis and W. M. Kahan, “The rotation of eigenvectors by a perturbation. III,” SIAM J. Numerical Anal., vol. 7, no. 1, pp. 1–46, Mar 1970.
 =2plus 43minus 4 S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends in Machine Learning, vol. 3, no. 1, pp. 1–122, 2010. =0pt
 K. R. Davidson and S. J. Szarek, “Local operator theory, random matrices and banach spaces,” in Handbook of the Geometry of Banach Spaces. 1em plus 0.5em minus 0.4emElsevier BV, 2001, pp. 317–366.
 M. Rudelson and R. Vershynin, “HansonWright inequality and subgaussian concentration,” Electronic Communications in Probability, vol. 18, no. 82, 2013.
 =2plus 43minus 4 J. A. Tropp, “Userfriendly tail bounds for sums of random matrices,” Found. Comput. Math., vol. 12, no. 4, pp. 389–434, Aug 2011. =0pt
 =2plus 43minus 4 K. Lee and M. Junge, “RIPlike properties in subsampled blind deconvolution,” arXiv preprint arXiv:1511.06146, 2015. =0pt
 F. Krahmer, S. Mendelson, and H. Rauhut, “Suprema of chaos processes and the restricted isometry property,” Communications on Pure and Applied Mathematics, vol. 67, no. 11, pp. 1877–1904, Jan 2014.
 M. Ledoux and M. Talagrand, Probability in Banach Spaces: isoperimetry and processes.1em plus 0.5em minus 0.4emSpringer Science & Business Media, 2013.
 =2plus 43minus 4 S. Artstein, V. Milman, and S. J. Szarek, “Duality of metric entropy,” Annals of Mathematics, vol. 159, no. 3, pp. 1313–1328, 2004. =0pt
 =2plus 43minus 4 B. Carl, “ l@eng Inequalities ofBernsteinJacksontype and the degree of compactness of operators in banach spaces,” l@eng Annales de l’institut Fourier, vol. 35, no. 3, pp. 79–118, 1985. =0pt
 M. Junge and K. Lee, “Generalized notions of sparsity and restricted isometry property. part I: a unified framework,” In preparation, 2017.
 =2plus 43minus 4 S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing.1em plus 0.5em minus 0.4emSpringer Basel AG, 2013. =0pt