Enhancing PurePixel Identification Performance via Preconditioning
Abstract
In this paper, we analyze different preconditionings designed to enhance robustness of purepixel search algorithms, which are used for blind hyperspectral unmixing and which are equivalent to nearseparable nonnegative matrix factorization algorithms. Our analysis focuses on the successive projection algorithm (SPA), a simple, efficient and provably robust algorithm in the purepixel algorithm class. Recently, a provably robust preconditioning was proposed by Gillis and Vavasis (arXiv:1310.2273) which requires the resolution of a semidefinite program (SDP) to find a data pointsenclosing minimum volume ellipsoid. Since solving the SDP in high precisions can be time consuming, we generalize the robustness analysis to approximate solutions of the SDP, that is, solutions whose objective function values are some multiplicative factors away from the optimal value. It is shown that a high accuracy solution is not crucial for robustness, which paves the way for faster preconditionings (e.g., based on firstorder optimization methods). This first contribution also allows us to provide a robustness analysis for two other preconditionings. The first one is prewhitening, which can be interpreted as an optimal solution of the same SDP with additional constraints. We analyze robustness of prewhitening which allows us to characterize situations in which it performs competitively with the SDPbased preconditioning. The second one is based on SPA itself and can be interpreted as an optimal solution of a relaxation of the SDP. It is extremely fast while competing with the SDPbased preconditioning on several synthetic data sets.
Keywords. hyperspectral unmixing, purepixel search, preconditioning, prewhitening, successive projection algorithm, nearseparable NMF, robustness to noise, semidefinite programming
1 Introduction
Given a hyperspectral image, blind hyperspectral unmixing (blind HU) aims at recovering the spectral signatures of the constitutive materials present in the image, called endmembers, along with their abundances in each pixel. Under the linear mixing model, the spectral signature of a pixel is equal to a linear combination of the spectral signatures of the endmembers where the weights correspond to the abundances. More formally, letting represent a hyperspectral image with wavelengths and pixels, we have, in the noiseless case,
where is the spectral signature of the th pixel, the spectral signature of the th endmember, and is the abundance of the th endmember in the th pixel so that and for all (abundance sumtoone constraint). Note that blind HU is equivalent to nonnegative matrix factorization (NMF) which aims at finding the best possible factorization of a nonnegative matrix where and are nonnegative matrices.
In blind HU, the socalled purepixel assumption plays a significant role. It is defined as follows. If for each endmember there exists a pixel containing only that endmember, that is, if for all there exists such that , then the purepixel assumption holds. In that case, the matrix has the following form
and being a permutation. This implies that the columns of are convex combinations of the columns of , and hence blind HU under the linear mixing model and the purepixel assumption reduces to identifying the vertices of the convex hull of the columns of ; see, e.g., [5, 18] and the references therein. This problem is known to be efficiently solvable [3]. In the presence of noise, the problem becomes more difficult and several provably robust algorithms have been proposed recently; for example the successive projection algorithm to be described in Section 1.1. Note that, in the NMF literature, the purepixel assumption is referred to as the separability assumption [3] and NMF under the separability assumption in the presence of noise is referred to as nearseparable NMF; see, e.g., [10] and the references therein. Therefore, in this paper, we will assume that the matrix corresponding to the hyperspectral image has the following form (in the noiseless case):
Assumption 1 (Separable Matrix).
The matrix is separable if where , with the sum of the entries of each column of being at most one, that is, for all , and is a permutation.
Note that we have relaxed the assumption for all to for all ; this allows for example different illumination conditions among the pixels in the image.
1.1 Successive Projection Algorithm
The successive projection algorithm (SPA) is a simple but fast and robust purepixel search algorithm; see Alg. SPA.
At each step of the algorithm, the column of the input matrix with maximum norm is selected, and then is updated by projecting each column onto the orthogonal complement of the columns selected so far. SPA is extremely fast as it can be implemented in operations [13]. SPA was first introduced in [1], and is closely related to other algorithms such as automatic target generation process (ATGP), successive simplex volume maximization (SVMAX) and vertex component analysis (VCA); see the discussion in [18]. What makes SPA distinguishingly interesting is that it is provably robust against noise [13]:
1.2 Preconditioning
If a matrix satisfying Assumption 1 is premultiplied by a matrix , it still satisfies Assumption 1 where is replaced with . Since purepixel search algorithms are sensitive to the conditioning of matrix , it would be beneficial to find a matrix that reduces the conditioning of . In particular, the robustness result of SPA (Th. 1) can be adapted when the input matrix is premultiplied by a matrix :
Corollary 1.
Let where satisfies Assumption 1, has full column rank and is noise with ; and let (). If has full column rank, and
then SPA applied on matrix identifies indices corresponding to the columns of up to error .
Proof.
Note that Corollary 1 does not simply amount to replacing by in Theorem 1 because the noise is also premultiplied by . Note also that, in view of Theorem 1 , preconditioning is beneficial for any such that .
1.2.1 SDPbased Preconditioning
Assume that (the problem can be reduced to this case using noise filtering; see Section 3). An optimal preconditioning would be so that would be perfectly conditioned, that is, . In particular, applying Corollary 1 with gives the following result: if , then SPA applied on matrix identifies indices corresponding to the columns of up to error . This is a significant improvement compared to Theorem 1, especially for the upper bound on the noise level: the term disappears from the denominator. For hyperspectral images, can be rather large as spectral signatures often share similar patterns. Note that the bound on the noise level is essentially optimal for SPA since would allow the noise to make the matrix rank deficient [12].
Of course, is unknown otherwise the problem would be solved. However, it turns out that it is possible to compute approximately (up to orthogonal transformations, which do not influence the conditioning) even in the presence of noise using the minimum volume ellipsoid centered at the origin containing all columns of [12]. An ellipsoid centered at the origin in is described via a positive definite matrix : . The volume of is equal to times the volume of the unit ball in dimension . Therefore, given a matrix of rank , we can formulate the minimum volume ellipsoid centered at the origin and containing the columns of matrix as follows
(1) 
This problem is SDP representable [6, p.222]. It was shown in [12] that (i) in the noiseless case (that is, ), the optimal solution of (1) is given by and hence factoring allows to recover (up to orthogonal transformations); and that (ii) in the noisy case, the optimal solution of (1) is close to and hence leads to a good preconditioning for SPA. More precisely, the following robustness result was proved:
Theorem 2 ([12], Th. 3).
Let where satisfies Assumption 1 with , has full column rank and is noise with . If , then SDPbased preconditioned SPA identifies a subset so that approximates the columns of up to error .
In case , it was proposed to first replace the data points by their projections onto the dimensional linear subspace obtained with the SVD (that is, use a linear dimensionality reduction technique for noise filtering; see also Section 3); see Alg. SDPPrec.
Alg. SDPPrec first requires the truncated SVD which can be computed in operations.
It then requires the solution of the SDP with variables and constraints, which
takes
operations per iteration
to compute
if
standard interior point methods
are used.
However, effective active set methods can be used to solve largescale problems (see [12]):
in fact, one can keep only constraints from (1) to obtain an equivalent problem [15].
Note that Mizutani [19] solves the same SDP, but for another purpose, namely to preprocess the input matrix by removing the columns which are not on the boundary of the minimum volume ellipsoid.
1.3 Motivation and Contribution of the Paper
The SDPbased preconditioning described in Section 1.2 is appealing in the sense that it builds an approximation of whose error is provably bounded by the noise level. This is a somewhat ideal solution but can be computationally expensive to obtain since it requires the resolution of an SDP. Hence, a natural move is to consider computationally cheaper preconditioning alternatives.
The focus of this paper is on a theoretical analysis of the robustness to noise of several preconditionings. The contribution of this paper is threefold:

In Section 2, we analyze robustness of preconditionings obtained using approximate solutions of (1) and prove the following (see Theorem 4):
Let where satisfies Assumption 1 with , has full rank, and is the noise and satisfies . Let also be a feasible solution of (1) whose objective function value is some multiplicative factor away from the optimal value; that is, for some where is an optimal solution to (1). If , then SPA applied on matrix identifies indices corresponding to the columns of up to error .
The above stated result suggests that any good approximate solution of (1) provides a reasonable preconditioning. This gives a theoretical motivation for developing less accurate but faster solvers for (1); for example, one could use the proximal point algorithm proposed in [23]. We should mention that this paper focuses on theoretical analysis only, and developing fast solvers is a different subject and is considered a future research topic.

In Section 3, we analyze robustness of prewhitening, a standard preconditioning technique in blind source separation. We try to understand under which conditions prewhitening can be as good as the SDPbased preconditioning; in fact, prewhitening was shown to perform very similarly as the SDPbased preconditioning on some synthetic data sets [12]. Prewhitening corresponds to a solution of (1) with additional constraints, and hence robustness of prewhitened SPA follows from the result above (Section 3.2). However, this result is not tight and we provide a tight robustness analysis of prewhitening (Section 3.3). We also provide a robustness analysis of prewhitening under a standard generative model (Section 3.4).

In Section 4, we analyzed a preconditioning based on SPA itself. The idea was proposed in [11], where the resulting method was found to be extremely fast, and, as opposed to prewhitening, perform perfectly in the noiseless case and is not affected by the abundances of the different endmembers. Moreover, we are able to improve the theoretical bound on the noise level allowed by SPA by a factor using this preconditioning.
Finally, in Section 5, we illustrate these results on synthetic data sets. In particular, we show that prewhitening and the SPAbased preconditioning performs competitively with the SDPbased preconditioning while showing much better runtime performance in practice.
2 Analysis of Approximate SDP Preconditioning
In this section, we analyze the effect of using an approximate solution of (1) instead of the optimal one for preconditioning matrix . We will say that is an approximate solution of (1) for some if is a feasible solution of (1), that is, and for all , and
where is the optimal solution of (1). Letting , analyzing the effect of as a preconditioning reduces to show that is wellconditioned, that is, to upper bound , which is equivalent to bounding since . In fact, if can be bounded, robustness of preconditioned SPA follows from Corollary 1.
In [12], a change of variable is performed on the SDP (1) using to obtain the following equivalent problem
(2) 
Since our goal is to bound and , it is equivalent to bound . It was shown in [12] that, for sufficiently small noise level , which implies robustness of SDPpreconditioned SPA; see Theorem 2. In this section, we analyze (2) directly and will use the following assumption:
Definition 1.
Note that is an approximate solution of (2) if and only if is an approximate solution of (1). In fact, .
The main result of this section is therefore to show that any approximate solution of (2) is wellconditioned. More precisely, we show in Theorem 3 that, for a sufficiently small noise level ,
This will imply, by Corollary 1, that preconditioned SPA using an approximate solution of (1) is robust to noise, given that is sufficiently close to one; see Theorem 4.
2.1 Bounding
It is interesting to notice that the case is trivial since is a scalar thus has (In fact, all columns of are multiple of the unique column of ). Otherwise, since , we only need to provide an upper bound for . The steps of the proof are the following:
The lower bound for and the upper bound for follow directly from results in [12].
Proof.
Proof.
See Lemma 2 and proof of Lemma 3 in [13]. ∎
Lemma 3.
The optimal value
where and is given by
(4) 
Proof.
The proof is given in Appendix A. ∎
Theorem 3.
Proof.
By Lemma 1, we have
where the third inequality above is obtained by the fact that is increasing in for . Also, by Lemma 2, we have . Combining these results with Lemma 3 (via setting and ) yields
where the second inequality above is obtained by the facts that is nonincreasing in and its limit is given by
and that . Finally, since the function is increasing for and for all , we have
∎
2.2 Robustness of SPA Preconditioned with an Approximate SDP Solution
The upper bound on (Theorem 3) proves that the preconditioning generates a wellconditioned nearseparable matrix for sufficiently close to one. Hence any good approximation of (1) allows us to obtain more robust nearseparable NMF algorithms. In particular, we have the following result for SPA.
Theorem 4.
Proof.
This follows from Corollary 1 and Theorem 3. Let be such that where is an approximate solution of (1). Since is an approximate solution of (2), we have that
Using for all , and by Theorem 3 for which we need to assume that , we further obtain
Applying the above bound to Corollary 1 leads to the desired result: leads to an error proportional to . ∎
Theorem 4 shows that the SDPbased preconditioning does not require a high accuracy solution of the SDP. For example, compared to the optimal solution, any approximate solution would not change the upper bound on the noise level for (since ) while it would increase the error by a factor smaller than three. This paves the way to the design of much faster preconditionings, solving (1) only approximately: this is a topic for further research. Moreover, this analysis will allow us to understand better two other preconditionings: prewhitening (Section 3) and SPAbased preconditioning (Section 4).
3 PreWhitening
Noise filtering and prewhitening are standard techniques in blind source separation; see, e.g., [8]. They were used in [12] as a preconditioning for SPA. In this section, in light of the results from the previous section, we analyze prewhitening as a preconditioning. We bound the condition number of where is the preconditioner obtained from prewhitening. In the worst case, can be large as it depends on the number of pixels . However, under some standard generative model, prewhitening can be shown to be much more robust.
3.1 Description
Let be the rank truncated SVD of , so that is the best rank approximation of with respect to the Frobenius norm. Assuming that the noiseless data matrix lives in a dimensional linear space and assuming Gaussian noise, replacing with allows noise filtering. Given , prewhitening amounts to keeping only the matrix . Equivalently, it amounts to premultiplying (or ) with since
where is the full SVD of (recall :, and that the columns of and are orthonormal); see Alg. NFPW.
3.2 Link with the SDPbased Preconditioning
For simplicity, let us consider the case (or, equivalently, assume that noise filtering has already been applied to the data matrix). In that case, the preconditioner given by prewhitening is
where denotes the inverse of the square root of a positive definite matrix, that is, (which is unique up to orthogonal transformations). In fact, for , hence . We have the following wellknown result:
Lemma 4.
Let be of rank . Then is the optimal solution of
(5) 
Moreover, satisfies .
Proof.
Observing that the objective can be replaced with , and that the constraint will be active at optimality (otherwise can be multiplied by a scalar larger than one), any optimal solution has to satisfy the following firstorder optimality conditions:
where is the Lagrangian multiplier. This implies that . We have
The above equation, together with the condition , imply , which proves .
Denoting the compact SVD of (), we obtain since has orthogonal columns and . ∎
A robustness analysis of prewhitening follows directly from Theorem 4. In fact, by Lemma 4, the matrix is a feasible solution of the minimum volume ellipsoid problem (1). Moreover, it is optimal up to a factor : by Lemma 4, the optimal solution of
is given by while the optimal solution of (1) is a feasible solution of this problem (since ) so that
and hence
In other words, is a approximate solution of (1). Combining this result with Theorem 4, we obtain
Corollary 2.
Let where satisfies Assumption 1 with , is of full rank and the noise satisfies . If
prewhitened SPA identifies indices corresponding to the columns of up to error .
This bound is rather bad as is often large compared to ; for the hyperspectral unmixing application, we typically have and . In the next subsections, we provide a tight robustness analysis of prewhitening, and analyze prewhitening under a standard generative model.
3.3 Tight Robustness Analysis
In this subsection, we provide a better robustness analysis of prewhitening. More precisely, we provide a tight upper bound for . As before, we only consider the case . Under Assumption 1, we have
where we denote and . Recall that the conditioner given by prewhitening is . Hence, the condition number of will be equal to the square root of the condition number of . In fact,
so that . Therefore, to provide a robustness analysis of prewhintening, it is sufficient to bound . In the next lemma, we show that , which implies and will lead to an error bound that is much smaller than that derived in the previous subsection.
Lemma 5.
Let be a nonnegative matrix with for all , and let satisfy . Then,
Proof.
Let us first show that
We have
where or , and note that
Using , one easily gets
Also, we have
as and
since . Finally, using the singular value perturbation theorem (Weyl; see, e.g., [14]), we have
for or , and since , we obtain
∎
This bound allows us to provide a robustness analysis of prewhitening.
Theorem 5.
Let where satisfies Assumption 1 with , has full rank and the noise satisfies . If , prewhitened SPA identifies the columns of up to error .
Proof.
The bounds of Theorem 5 are tight. In fact, if, except for the pure pixels, all pixels contain the same endmember, say the th, then all columns of are equal to the th column of the identity matrix, that is, for all . Therefore,
since is a diagonal matrix with for all and .
This indicates that prewhitening should perform the worse when one endmember contains most pixels, as this matches the upper bound of Theorem 5. However, if the pixels are relatively wellspread in the convex hull of the endmembers, then prewhitening may perform well. This will be proven in the next subsection, wherein the robustness of prewhitening under a standard generative model is analyzed.
3.4 Robustness under a Standard Generative Model
We continue our analysis by considering a standard generative model in hyperspectral unmixing. We again consider , and the generative model is described as follows.
Assumption 2.
The nearseparable matrix is such that

is of full rank.

is i.i.d. following a Dirichlet distribution with parameter , for all . Also, without loss of generality, it will be assumed that .

is i.i.d. with mean zero and covariance , for all (Gaussian noise).

The number of samples goes to infinity, that is, .
Note that the assumption (ii), which models the abundances as being Dirichlet distributed, is a popular assumption in the HU context; see the literature, e.g., [21]. In particular, the parameter characterizes how the pixels are spread. To describe this, let us consider a simplified case where ; i.e., symmetric Dirichlet distribution. We have the following phenomena: if , then ’s are uniformly distributed over the unit simplex; if and decreases, then ’s are more concentrated around the vertices (or pure pixels) of the simplex; if and increases, then ’s are more concentrated around the center of the simplex. In fact, means that ’s contain only pure pixels in the same proportions. It should also be noted that we do not assume the separability or purepixel assumption, although the latter is implicitly implied by Assumption 2. Specifically, under the assumptions (ii) and (iv), for every endmember there exists pixels that are arbitrarily close to the pure pixel in a probability one sense.
Now, our task is to prove a bound on under the above statistical assumptions, thereby obtaining implications on how prewhitened SPA may perform with respect to the abundances’ distribution (rather than in the worstcase scenario). To proceed, we formulate the prewhitening preconditioner as
For , we have
Also, under Assumption 2, the above correlation matrix can be shown to be
We have the following lemma.
Lemma 6.
Under Assumption 2, the matrix is given by
(6) 
where and . Also, the largest and smallest eigenvalues of are bounded by
respectively.
Proof.
It is known that for a random vector