Enhancing Pure-Pixel Identification Performance via Preconditioning
In this paper, we analyze different preconditionings designed to enhance robustness of pure-pixel search algorithms, which are used for blind hyperspectral unmixing and which are equivalent to near-separable nonnegative matrix factorization algorithms. Our analysis focuses on the successive projection algorithm (SPA), a simple, efficient and provably robust algorithm in the pure-pixel algorithm class. Recently, a provably robust preconditioning was proposed by Gillis and Vavasis (arXiv:1310.2273) which requires the resolution of a semidefinite program (SDP) to find a data points-enclosing minimum volume ellipsoid. Since solving the SDP in high precisions can be time consuming, we generalize the robustness analysis to approximate solutions of the SDP, that is, solutions whose objective function values are some multiplicative factors away from the optimal value. It is shown that a high accuracy solution is not crucial for robustness, which paves the way for faster preconditionings (e.g., based on first-order optimization methods). This first contribution also allows us to provide a robustness analysis for two other preconditionings. The first one is pre-whitening, which can be interpreted as an optimal solution of the same SDP with additional constraints. We analyze robustness of pre-whitening which allows us to characterize situations in which it performs competitively with the SDP-based preconditioning. The second one is based on SPA itself and can be interpreted as an optimal solution of a relaxation of the SDP. It is extremely fast while competing with the SDP-based preconditioning on several synthetic data sets.
Keywords. hyperspectral unmixing, pure-pixel search, preconditioning, pre-whitening, successive projection algorithm, near-separable NMF, robustness to noise, semidefinite programming
Given a hyperspectral image, blind hyperspectral unmixing (blind HU) aims at recovering the spectral signatures of the constitutive materials present in the image, called endmembers, along with their abundances in each pixel. Under the linear mixing model, the spectral signature of a pixel is equal to a linear combination of the spectral signatures of the endmembers where the weights correspond to the abundances. More formally, letting represent a hyperspectral image with wavelengths and pixels, we have, in the noiseless case,
where is the spectral signature of the th pixel, the spectral signature of the th endmember, and is the abundance of the th endmember in the th pixel so that and for all (abundance sum-to-one constraint). Note that blind HU is equivalent to nonnegative matrix factorization (NMF) which aims at finding the best possible factorization of a nonnegative matrix where and are nonnegative matrices.
In blind HU, the so-called pure-pixel assumption plays a significant role. It is defined as follows. If for each endmember there exists a pixel containing only that endmember, that is, if for all there exists such that , then the pure-pixel assumption holds. In that case, the matrix has the following form
and being a permutation. This implies that the columns of are convex combinations of the columns of , and hence blind HU under the linear mixing model and the pure-pixel assumption reduces to identifying the vertices of the convex hull of the columns of ; see, e.g., [5, 18] and the references therein. This problem is known to be efficiently solvable . In the presence of noise, the problem becomes more difficult and several provably robust algorithms have been proposed recently; for example the successive projection algorithm to be described in Section 1.1. Note that, in the NMF literature, the pure-pixel assumption is referred to as the separability assumption  and NMF under the separability assumption in the presence of noise is referred to as near-separable NMF; see, e.g.,  and the references therein. Therefore, in this paper, we will assume that the matrix corresponding to the hyperspectral image has the following form (in the noiseless case):
Assumption 1 (Separable Matrix).
The matrix is separable if where , with the sum of the entries of each column of being at most one, that is, for all , and is a permutation.
Note that we have relaxed the assumption for all to for all ; this allows for example different illumination conditions among the pixels in the image.
1.1 Successive Projection Algorithm
The successive projection algorithm (SPA) is a simple but fast and robust pure-pixel search algorithm; see Alg. SPA.
At each step of the algorithm, the column of the input matrix with maximum norm is selected, and then is updated by projecting each column onto the orthogonal complement of the columns selected so far. SPA is extremely fast as it can be implemented in operations . SPA was first introduced in , and is closely related to other algorithms such as automatic target generation process (ATGP), successive simplex volume maximization (SVMAX) and vertex component analysis (VCA); see the discussion in . What makes SPA distinguishingly interesting is that it is provably robust against noise :
If a matrix satisfying Assumption 1 is premultiplied by a matrix , it still satisfies Assumption 1 where is replaced with . Since pure-pixel search algorithms are sensitive to the conditioning of matrix , it would be beneficial to find a matrix that reduces the conditioning of . In particular, the robustness result of SPA (Th. 1) can be adapted when the input matrix is premultiplied by a matrix :
Let where satisfies Assumption 1, has full column rank and is noise with ; and let (). If has full column rank, and
then SPA applied on matrix identifies indices corresponding to the columns of up to error .
Note that Corollary 1 does not simply amount to replacing by in Theorem 1 because the noise is also premultiplied by . Note also that, in view of Theorem 1 , preconditioning is beneficial for any such that .
1.2.1 SDP-based Preconditioning
Assume that (the problem can be reduced to this case using noise filtering; see Section 3). An optimal preconditioning would be so that would be perfectly conditioned, that is, . In particular, applying Corollary 1 with gives the following result: if , then SPA applied on matrix identifies indices corresponding to the columns of up to error . This is a significant improvement compared to Theorem 1, especially for the upper bound on the noise level: the term disappears from the denominator. For hyperspectral images, can be rather large as spectral signatures often share similar patterns. Note that the bound on the noise level is essentially optimal for SPA since would allow the noise to make the matrix rank deficient .
Of course, is unknown otherwise the problem would be solved. However, it turns out that it is possible to compute approximately (up to orthogonal transformations, which do not influence the conditioning) even in the presence of noise using the minimum volume ellipsoid centered at the origin containing all columns of . An ellipsoid centered at the origin in is described via a positive definite matrix : . The volume of is equal to times the volume of the unit ball in dimension . Therefore, given a matrix of rank , we can formulate the minimum volume ellipsoid centered at the origin and containing the columns of matrix as follows
This problem is SDP representable [6, p.222]. It was shown in  that (i) in the noiseless case (that is, ), the optimal solution of (1) is given by and hence factoring allows to recover (up to orthogonal transformations); and that (ii) in the noisy case, the optimal solution of (1) is close to and hence leads to a good preconditioning for SPA. More precisely, the following robustness result was proved:
Theorem 2 (, Th. 3).
Let where satisfies Assumption 1 with , has full column rank and is noise with . If , then SDP-based preconditioned SPA identifies a subset so that approximates the columns of up to error .
In case , it was proposed to first replace the data points by their projections onto the -dimensional linear subspace obtained with the SVD (that is, use a linear dimensionality reduction technique for noise filtering; see also Section 3); see Alg. SDP-Prec.
Alg. SDP-Prec first requires the truncated SVD which can be computed in operations.
It then requires the solution of the SDP with variables and constraints, which
operations per iteration
standard interior point methods
However, effective active set methods can be used to solve large-scale problems (see ):
in fact, one can keep only constraints from (1) to obtain an equivalent problem .
Note that Mizutani  solves the same SDP, but for another purpose, namely to preprocess the input matrix by removing the columns which are not on the boundary of the minimum volume ellipsoid.
1.3 Motivation and Contribution of the Paper
The SDP-based preconditioning described in Section 1.2 is appealing in the sense that it builds an approximation of whose error is provably bounded by the noise level. This is a somewhat ideal solution but can be computationally expensive to obtain since it requires the resolution of an SDP. Hence, a natural move is to consider computationally cheaper preconditioning alternatives.
The focus of this paper is on a theoretical analysis of the robustness to noise of several preconditionings. The contribution of this paper is threefold:
Let where satisfies Assumption 1 with , has full rank, and is the noise and satisfies . Let also be a feasible solution of (1) whose objective function value is some multiplicative factor away from the optimal value; that is, for some where is an optimal solution to (1). If , then SPA applied on matrix identifies indices corresponding to the columns of up to error .
The above stated result suggests that any good approximate solution of (1) provides a reasonable preconditioning. This gives a theoretical motivation for developing less accurate but faster solvers for (1); for example, one could use the proximal point algorithm proposed in . We should mention that this paper focuses on theoretical analysis only, and developing fast solvers is a different subject and is considered a future research topic.
In Section 3, we analyze robustness of pre-whitening, a standard preconditioning technique in blind source separation. We try to understand under which conditions pre-whitening can be as good as the SDP-based preconditioning; in fact, pre-whitening was shown to perform very similarly as the SDP-based preconditioning on some synthetic data sets . Pre-whitening corresponds to a solution of (1) with additional constraints, and hence robustness of pre-whitened SPA follows from the result above (Section 3.2). However, this result is not tight and we provide a tight robustness analysis of pre-whitening (Section 3.3). We also provide a robustness analysis of pre-whitening under a standard generative model (Section 3.4).
In Section 4, we analyzed a preconditioning based on SPA itself. The idea was proposed in , where the resulting method was found to be extremely fast, and, as opposed to pre-whitening, perform perfectly in the noiseless case and is not affected by the abundances of the different endmembers. Moreover, we are able to improve the theoretical bound on the noise level allowed by SPA by a factor using this preconditioning.
Finally, in Section 5, we illustrate these results on synthetic data sets. In particular, we show that pre-whitening and the SPA-based preconditioning performs competitively with the SDP-based preconditioning while showing much better runtime performance in practice.
2 Analysis of Approximate SDP Preconditioning
In this section, we analyze the effect of using an approximate solution of (1) instead of the optimal one for preconditioning matrix . We will say that is an -approximate solution of (1) for some if is a feasible solution of (1), that is, and for all , and
where is the optimal solution of (1). Letting , analyzing the effect of as a preconditioning reduces to show that is well-conditioned, that is, to upper bound , which is equivalent to bounding since . In fact, if can be bounded, robustness of preconditioned SPA follows from Corollary 1.
Since our goal is to bound and , it is equivalent to bound . It was shown in  that, for sufficiently small noise level , which implies robustness of SDP-preconditioned SPA; see Theorem 2. In this section, we analyze (2) directly and will use the following assumption:
It is interesting to notice that the case is trivial since is a scalar thus has (In fact, all columns of are multiple of the unique column of ). Otherwise, since , we only need to provide an upper bound for . The steps of the proof are the following:
The lower bound for and the upper bound for follow directly from results in .
See Lemma 2 and proof of Lemma 3 in . ∎
The optimal value
where and is given by
The proof is given in Appendix A. ∎
By Lemma 1, we have
where the second inequality above is obtained by the facts that is nonincreasing in and its limit is given by
and that . Finally, since the function is increasing for and for all , we have
2.2 Robustness of SPA Preconditioned with an Approximate SDP Solution
The upper bound on (Theorem 3) proves that the preconditioning generates a well-conditioned near-separable matrix for sufficiently close to one. Hence any good approximation of (1) allows us to obtain more robust near-separable NMF algorithms. In particular, we have the following result for SPA.
Using for all , and by Theorem 3 for which we need to assume that , we further obtain
Applying the above bound to Corollary 1 leads to the desired result: leads to an error proportional to . ∎
Theorem 4 shows that the SDP-based preconditioning does not require a high accuracy solution of the SDP. For example, compared to the optimal solution, any -approximate solution would not change the upper bound on the noise level for (since ) while it would increase the error by a factor smaller than three. This paves the way to the design of much faster preconditionings, solving (1) only approximately: this is a topic for further research. Moreover, this analysis will allow us to understand better two other preconditionings: pre-whitening (Section 3) and SPA-based preconditioning (Section 4).
Noise filtering and pre-whitening are standard techniques in blind source separation; see, e.g., . They were used in  as a preconditioning for SPA. In this section, in light of the results from the previous section, we analyze pre-whitening as a preconditioning. We bound the condition number of where is the preconditioner obtained from pre-whitening. In the worst case, can be large as it depends on the number of pixels . However, under some standard generative model, pre-whitening can be shown to be much more robust.
Let be the rank- truncated SVD of , so that is the best rank- approximation of with respect to the Frobenius norm. Assuming that the noiseless data matrix lives in a -dimensional linear space and assuming Gaussian noise, replacing with allows noise filtering. Given , pre-whitening amounts to keeping only the matrix . Equivalently, it amounts to premultiplying (or ) with since
where is the full SVD of (recall :, and that the columns of and are orthonormal); see Alg. NF-PW.
3.2 Link with the SDP-based Preconditioning
For simplicity, let us consider the case (or, equivalently, assume that noise filtering has already been applied to the data matrix). In that case, the preconditioner given by pre-whitening is
where denotes the inverse of the square root of a positive definite matrix, that is, (which is unique up to orthogonal transformations). In fact, for , hence . We have the following well-known result:
Let be of rank . Then is the optimal solution of
Moreover, satisfies .
Observing that the objective can be replaced with , and that the constraint will be active at optimality (otherwise can be multiplied by a scalar larger than one), any optimal solution has to satisfy the following first-order optimality conditions:
where is the Lagrangian multiplier. This implies that . We have
The above equation, together with the condition , imply , which proves .
Denoting the compact SVD of (), we obtain since has orthogonal columns and . ∎
A robustness analysis of pre-whitening follows directly from Theorem 4. In fact, by Lemma 4, the matrix is a feasible solution of the minimum volume ellipsoid problem (1). Moreover, it is optimal up to a factor : by Lemma 4, the optimal solution of
is given by while the optimal solution of (1) is a feasible solution of this problem (since ) so that
Let where satisfies Assumption 1 with , is of full rank and the noise satisfies . If
pre-whitened SPA identifies indices corresponding to the columns of up to error .
This bound is rather bad as is often large compared to ; for the hyperspectral unmixing application, we typically have and . In the next subsections, we provide a tight robustness analysis of pre-whitening, and analyze pre-whitening under a standard generative model.
3.3 Tight Robustness Analysis
In this subsection, we provide a better robustness analysis of pre-whitening. More precisely, we provide a tight upper bound for . As before, we only consider the case . Under Assumption 1, we have
where we denote and . Recall that the conditioner given by pre-whitening is . Hence, the condition number of will be equal to the square root of the condition number of . In fact,
so that . Therefore, to provide a robustness analysis of pre-whintening, it is sufficient to bound . In the next lemma, we show that , which implies and will lead to an error bound that is much smaller than that derived in the previous subsection.
Let be a nonnegative matrix with for all , and let satisfy . Then,
Let us first show that
where or , and note that
Using , one easily gets
Also, we have
since . Finally, using the singular value perturbation theorem (Weyl; see, e.g., ), we have
for or , and since , we obtain
This bound allows us to provide a robustness analysis of pre-whitening.
Let where satisfies Assumption 1 with , has full rank and the noise satisfies . If , pre-whitened SPA identifies the columns of up to error .
The bounds of Theorem 5 are tight. In fact, if, except for the pure pixels, all pixels contain the same endmember, say the th, then all columns of are equal to the th column of the identity matrix, that is, for all . Therefore,
since is a diagonal matrix with for all and .
This indicates that pre-whitening should perform the worse when one endmember contains most pixels, as this matches the upper bound of Theorem 5. However, if the pixels are relatively well-spread in the convex hull of the endmembers, then pre-whitening may perform well. This will be proven in the next subsection, wherein the robustness of pre-whitening under a standard generative model is analyzed.
3.4 Robustness under a Standard Generative Model
We continue our analysis by considering a standard generative model in hyperspectral unmixing. We again consider , and the generative model is described as follows.
The near-separable matrix is such that
is of full rank.
is i.i.d. following a Dirichlet distribution with parameter , for all . Also, without loss of generality, it will be assumed that .
is i.i.d. with mean zero and covariance , for all (Gaussian noise).
The number of samples goes to infinity, that is, .
Note that the assumption (ii), which models the abundances as being Dirichlet distributed, is a popular assumption in the HU context; see the literature, e.g., . In particular, the parameter characterizes how the pixels are spread. To describe this, let us consider a simplified case where ; i.e., symmetric Dirichlet distribution. We have the following phenomena: if , then ’s are uniformly distributed over the unit simplex; if and decreases, then ’s are more concentrated around the vertices (or pure pixels) of the simplex; if and increases, then ’s are more concentrated around the center of the simplex. In fact, means that ’s contain only pure pixels in the same proportions. It should also be noted that we do not assume the separability or pure-pixel assumption, although the latter is implicitly implied by Assumption 2. Specifically, under the assumptions (ii) and (iv), for every endmember there exists pixels that are arbitrarily close to the pure pixel in a probability one sense.
Now, our task is to prove a bound on under the above statistical assumptions, thereby obtaining implications on how pre-whitened SPA may perform with respect to the abundances’ distribution (rather than in the worst-case scenario). To proceed, we formulate the pre-whitening preconditioner as
For , we have
Also, under Assumption 2, the above correlation matrix can be shown to be
We have the following lemma.
Under Assumption 2, the matrix is given by
where and . Also, the largest and smallest eigenvalues of are bounded by
It is known that for a random vector