Enhancing Pure-Pixel Identification Performance via Preconditioning

Enhancing Pure-Pixel Identification Performance via Preconditioning

Nicolas Gillis
Department of Mathematics and Operational Research
Faculté Polytechnique, Université de Mons
Rue de Houdain 9, 7000 Mons, Belgium
nicolas.gillis@umons.ac.be
   Wing-Kin Ma
Department of Electronic Engineering
The Chinese University of Hong Kong
wkma@ee.cuhk.edu.hk
Abstract

In this paper, we analyze different preconditionings designed to enhance robustness of pure-pixel search algorithms, which are used for blind hyperspectral unmixing and which are equivalent to near-separable nonnegative matrix factorization algorithms. Our analysis focuses on the successive projection algorithm (SPA), a simple, efficient and provably robust algorithm in the pure-pixel algorithm class. Recently, a provably robust preconditioning was proposed by Gillis and Vavasis (arXiv:1310.2273) which requires the resolution of a semidefinite program (SDP) to find a data points-enclosing minimum volume ellipsoid. Since solving the SDP in high precisions can be time consuming, we generalize the robustness analysis to approximate solutions of the SDP, that is, solutions whose objective function values are some multiplicative factors away from the optimal value. It is shown that a high accuracy solution is not crucial for robustness, which paves the way for faster preconditionings (e.g., based on first-order optimization methods). This first contribution also allows us to provide a robustness analysis for two other preconditionings. The first one is pre-whitening, which can be interpreted as an optimal solution of the same SDP with additional constraints. We analyze robustness of pre-whitening which allows us to characterize situations in which it performs competitively with the SDP-based preconditioning. The second one is based on SPA itself and can be interpreted as an optimal solution of a relaxation of the SDP. It is extremely fast while competing with the SDP-based preconditioning on several synthetic data sets.

Keywords. hyperspectral unmixing, pure-pixel search, preconditioning, pre-whitening, successive projection algorithm, near-separable NMF, robustness to noise, semidefinite programming

1 Introduction

Given a hyperspectral image, blind hyperspectral unmixing (blind HU) aims at recovering the spectral signatures of the constitutive materials present in the image, called endmembers, along with their abundances in each pixel. Under the linear mixing model, the spectral signature of a pixel is equal to a linear combination of the spectral signatures of the endmembers where the weights correspond to the abundances. More formally, letting represent a hyperspectral image with wavelengths and pixels, we have, in the noiseless case,

where is the spectral signature of the th pixel, the spectral signature of the th endmember, and is the abundance of the th endmember in the th pixel so that and for all (abundance sum-to-one constraint). Note that blind HU is equivalent to nonnegative matrix factorization (NMF) which aims at finding the best possible factorization of a nonnegative matrix where and are nonnegative matrices.

In blind HU, the so-called pure-pixel assumption plays a significant role. It is defined as follows. If for each endmember there exists a pixel containing only that endmember, that is, if for all there exists such that , then the pure-pixel assumption holds. In that case, the matrix has the following form

and being a permutation. This implies that the columns of are convex combinations of the columns of , and hence blind HU under the linear mixing model and the pure-pixel assumption reduces to identifying the vertices of the convex hull of the columns of ; see, e.g., [5, 18] and the references therein. This problem is known to be efficiently solvable [3]. In the presence of noise, the problem becomes more difficult and several provably robust algorithms have been proposed recently; for example the successive projection algorithm to be described in Section 1.1. Note that, in the NMF literature, the pure-pixel assumption is referred to as the separability assumption [3] and NMF under the separability assumption in the presence of noise is referred to as near-separable NMF; see, e.g., [10] and the references therein. Therefore, in this paper, we will assume that the matrix corresponding to the hyperspectral image has the following form (in the noiseless case):

Assumption 1 (Separable Matrix).

The matrix is separable if where , with the sum of the entries of each column of being at most one, that is, for all , and is a permutation.

Note that we have relaxed the assumption for all to for all ; this allows for example different illumination conditions among the pixels in the image.

1.1 Successive Projection Algorithm

The successive projection algorithm (SPA) is a simple but fast and robust pure-pixel search algorithm; see Alg. SPA.

0:  Matrix with satisfying Assumption 1, rank .
0:  Set of indices such that .
1:  Let , , .
2:  while  and  do
3:        .
4:        .
5:        .
6:        .
7:  end while
Algorithm SPA – Successive Projection Algorithm [1]

At each step of the algorithm, the column of the input matrix with maximum norm is selected, and then is updated by projecting each column onto the orthogonal complement of the columns selected so far. SPA is extremely fast as it can be implemented in operations [13]. SPA was first introduced in [1], and is closely related to other algorithms such as automatic target generation process (ATGP), successive simplex volume maximization (SVMAX) and vertex component analysis (VCA); see the discussion in [18]. What makes SPA distinguishingly interesting is that it is provably robust against noise [13]:

Theorem 1 ([13], Th. 3).

Let where satisfies Assumption 1, has full column rank and is noise with . If , then SPA identifies the columns of up to error , that is, the index set identified by SPA satisfies

where is the condition number of , and for .

1.2 Preconditioning

If a matrix satisfying Assumption 1 is premultiplied by a matrix , it still satisfies Assumption 1 where is replaced with . Since pure-pixel search algorithms are sensitive to the conditioning of matrix , it would be beneficial to find a matrix that reduces the conditioning of . In particular, the robustness result of SPA (Th. 1) can be adapted when the input matrix is premultiplied by a matrix :

Corollary 1.

Let where satisfies Assumption 1, has full column rank and is noise with ; and let (). If has full column rank, and

then SPA applied on matrix identifies indices corresponding to the columns of up to error .

Proof.

This result follows directly from [12, Cor. 1 ]. In fact, in [12, Cor. 1 ], the result is proved for

with error up to . Since,

and

the proof is complete. ∎

Note that Corollary 1 does not simply amount to replacing by in Theorem 1 because the noise is also premultiplied by . Note also that, in view of Theorem 1 , preconditioning is beneficial for any such that .

1.2.1 SDP-based Preconditioning

Assume that (the problem can be reduced to this case using noise filtering; see Section 3). An optimal preconditioning would be so that would be perfectly conditioned, that is, . In particular, applying Corollary 1 with gives the following result: if , then SPA applied on matrix identifies indices corresponding to the columns of up to error . This is a significant improvement compared to Theorem 1, especially for the upper bound on the noise level: the term disappears from the denominator. For hyperspectral images, can be rather large as spectral signatures often share similar patterns. Note that the bound on the noise level is essentially optimal for SPA since would allow the noise to make the matrix rank deficient [12].

Of course, is unknown otherwise the problem would be solved. However, it turns out that it is possible to compute approximately (up to orthogonal transformations, which do not influence the conditioning) even in the presence of noise using the minimum volume ellipsoid centered at the origin containing all columns of [12]. An ellipsoid centered at the origin in is described via a positive definite matrix : . The volume of is equal to times the volume of the unit ball in dimension . Therefore, given a matrix of rank , we can formulate the minimum volume ellipsoid centered at the origin and containing the columns of matrix as follows

(1)

This problem is SDP representable [6, p.222]. It was shown in [12] that (i) in the noiseless case (that is, ), the optimal solution of (1) is given by and hence factoring allows to recover (up to orthogonal transformations); and that (ii) in the noisy case, the optimal solution of (1) is close to and hence leads to a good preconditioning for SPA. More precisely, the following robustness result was proved:

Theorem 2 ([12], Th. 3).

Let where satisfies Assumption 1 with , has full column rank and is noise with . If , then SDP-based preconditioned SPA identifies a subset so that approximates the columns of up to error .

In case , it was proposed to first replace the data points by their projections onto the -dimensional linear subspace obtained with the SVD (that is, use a linear dimensionality reduction technique for noise filtering; see also Section 3); see Alg. SDP-Prec.

0:  Matrix with satisfying Assumption 1, rank .
0:  Preconditioner .
1:   = rank- truncated SVD().
2:  Let and solve (1) to get .
3:  Factorize (e.g., Cholesky decomposition).
4:  .
Algorithm SDP-Prec – SDP-based Preconditioning [12]

Alg. SDP-Prec first requires the truncated SVD which can be computed in operations. It then requires the solution of the SDP with variables and constraints, which takes operations per iteration to compute if standard interior point methods are used. However, effective active set methods can be used to solve large-scale problems (see [12]): in fact, one can keep only constraints from (1) to obtain an equivalent problem [15].

Note that Mizutani [19] solves the same SDP, but for another purpose, namely to preprocess the input matrix by removing the columns which are not on the boundary of the minimum volume ellipsoid.

1.3 Motivation and Contribution of the Paper

The SDP-based preconditioning described in Section 1.2 is appealing in the sense that it builds an approximation of whose error is provably bounded by the noise level. This is a somewhat ideal solution but can be computationally expensive to obtain since it requires the resolution of an SDP. Hence, a natural move is to consider computationally cheaper preconditioning alternatives.

The focus of this paper is on a theoretical analysis of the robustness to noise of several preconditionings. The contribution of this paper is threefold:

  1. In Section 2, we analyze robustness of preconditionings obtained using approximate solutions of (1) and prove the following (see Theorem 4):

    Let where satisfies Assumption 1 with , has full rank, and is the noise and satisfies . Let also be a feasible solution of (1) whose objective function value is some multiplicative factor away from the optimal value; that is, for some where is an optimal solution to (1). If , then SPA applied on matrix identifies indices corresponding to the columns of up to error .

    The above stated result suggests that any good approximate solution of (1) provides a reasonable preconditioning. This gives a theoretical motivation for developing less accurate but faster solvers for (1); for example, one could use the proximal point algorithm proposed in [23]. We should mention that this paper focuses on theoretical analysis only, and developing fast solvers is a different subject and is considered a future research topic.

  2. In Section 3, we analyze robustness of pre-whitening, a standard preconditioning technique in blind source separation. We try to understand under which conditions pre-whitening can be as good as the SDP-based preconditioning; in fact, pre-whitening was shown to perform very similarly as the SDP-based preconditioning on some synthetic data sets [12]. Pre-whitening corresponds to a solution of (1) with additional constraints, and hence robustness of pre-whitened SPA follows from the result above (Section 3.2). However, this result is not tight and we provide a tight robustness analysis of pre-whitening (Section 3.3). We also provide a robustness analysis of pre-whitening under a standard generative model (Section 3.4).

  3. In Section 4, we analyzed a preconditioning based on SPA itself. The idea was proposed in [11], where the resulting method was found to be extremely fast, and, as opposed to pre-whitening, perform perfectly in the noiseless case and is not affected by the abundances of the different endmembers. Moreover, we are able to improve the theoretical bound on the noise level allowed by SPA by a factor using this preconditioning.

Finally, in Section 5, we illustrate these results on synthetic data sets. In particular, we show that pre-whitening and the SPA-based preconditioning performs competitively with the SDP-based preconditioning while showing much better runtime performance in practice.

2 Analysis of Approximate SDP Preconditioning

In this section, we analyze the effect of using an approximate solution of (1) instead of the optimal one for preconditioning matrix . We will say that is an -approximate solution of (1) for some if is a feasible solution of (1), that is, and for all , and

where is the optimal solution of (1). Letting , analyzing the effect of as a preconditioning reduces to show that is well-conditioned, that is, to upper bound , which is equivalent to bounding since . In fact, if can be bounded, robustness of preconditioned SPA follows from Corollary 1.

In [12], a change of variable is performed on the SDP (1) using to obtain the following equivalent problem

(2)

Since our goal is to bound and , it is equivalent to bound . It was shown in [12] that, for sufficiently small noise level , which implies robustness of SDP-preconditioned SPA; see Theorem 2. In this section, we analyze (2) directly and will use the following assumption:

Definition 1.

The matrix is an -approximate solution of (2) for some , that is, is a feasible solution of (2) and

Note that is an -approximate solution of (2) if and only if is an -approximate solution of (1). In fact, .

The main result of this section is therefore to show that any -approximate solution of (2) is well-conditioned. More precisely, we show in Theorem 3 that, for a sufficiently small noise level ,

This will imply, by Corollary 1, that preconditioned SPA using an approximate solution of (1) is robust to noise, given that is sufficiently close to one; see Theorem 4.

2.1 Bounding

It is interesting to notice that the case is trivial since is a scalar thus has (In fact, all columns of are multiple of the unique column of ). Otherwise, since , we only need to provide an upper bound for . The steps of the proof are the following:

  • Derive a lower bound for (Lemma 1).

  • Provide an upper bound of (Lemma 2).

  • Combine the two bounds above to bound . In fact, we prove in Lemma 3 that the condition number of an -by- matrix with and can be bounded above; see Equation (4).

The lower bound for and the upper bound for follow directly from results in [12].

Lemma 1.

If where satisfies Assumption 1 with , then any -approximate solution of (2) satisfies

(3)
Proof.

In [12, Lemma 1], it was proved that the optimal solution of (2) satisfies

Hence, the result follows directly from Definition 1. ∎

Lemma 2.

If where satisfies Assumption 1 with , and

then any feasible solution of (2) satisfies .

Proof.

See Lemma 2 and proof of Lemma 3 in [13]. ∎

Lemma 3.

The optimal value

where and is given by

(4)
Proof.

The proof is given in Appendix A. ∎

Theorem 3.

If where satisfies Assumption 1 with and , then

where is an -approximate solution of (2).

Proof.

By Lemma 1, we have

where the third inequality above is obtained by the fact that is increasing in for . Also, by Lemma 2, we have . Combining these results with Lemma 3 (via setting and ) yields

where the second inequality above is obtained by the facts that is nonincreasing in and its limit is given by

and that . Finally, since the function is increasing for and for all , we have

2.2 Robustness of SPA Preconditioned with an Approximate SDP Solution

The upper bound on (Theorem 3) proves that the preconditioning generates a well-conditioned near-separable matrix for sufficiently close to one. Hence any good approximation of (1) allows us to obtain more robust near-separable NMF algorithms. In particular, we have the following result for SPA.

Theorem 4.

Let where satisfies Assumption 1 with , has full rank and the noise satisfies . Let also be such that where is an -approximate solution of (1). If

then SPA applied on matrix identifies indices corresponding to the columns of up to error .

Proof.

This follows from Corollary 1 and Theorem 3. Let be such that where is an -approximate solution of (1). Since is an -approximate solution of (2), we have that

Using for all , and by Theorem 3 for which we need to assume that , we further obtain

Applying the above bound to Corollary 1 leads to the desired result: leads to an error proportional to . ∎

Theorem 4 shows that the SDP-based preconditioning does not require a high accuracy solution of the SDP. For example, compared to the optimal solution, any -approximate solution would not change the upper bound on the noise level for (since ) while it would increase the error by a factor smaller than three. This paves the way to the design of much faster preconditionings, solving (1) only approximately: this is a topic for further research. Moreover, this analysis will allow us to understand better two other preconditionings: pre-whitening (Section 3) and SPA-based preconditioning (Section 4).

3 Pre-Whitening

Noise filtering and pre-whitening are standard techniques in blind source separation; see, e.g., [8]. They were used in [12] as a preconditioning for SPA. In this section, in light of the results from the previous section, we analyze pre-whitening as a preconditioning. We bound the condition number of where is the preconditioner obtained from pre-whitening. In the worst case, can be large as it depends on the number of pixels . However, under some standard generative model, pre-whitening can be shown to be much more robust.

3.1 Description

Let be the rank- truncated SVD of , so that is the best rank- approximation of with respect to the Frobenius norm. Assuming that the noiseless data matrix lives in a -dimensional linear space and assuming Gaussian noise, replacing with allows noise filtering. Given , pre-whitening amounts to keeping only the matrix . Equivalently, it amounts to premultiplying (or ) with since

where is the full SVD of (recall :, and that the columns of and are orthonormal); see Alg. NF-PW.

0:  Matrix with satisfying Assumption 1, rank .
0:  Preconditioner .
1:   = rank- truncated SVD().
2:  .
Algorithm NF-PW – Noise Filtering and Pre-Whitening

In [12], Alg. NF-PW is used as an heuristic preconditioning, and performs similarly as the SDP-based preconditioning. Alg. NF-PW requires operations to compute the rank- truncated SVD of .

3.2 Link with the SDP-based Preconditioning

For simplicity, let us consider the case (or, equivalently, assume that noise filtering has already been applied to the data matrix). In that case, the preconditioner given by pre-whitening is

where denotes the inverse of the square root of a positive definite matrix, that is, (which is unique up to orthogonal transformations). In fact, for , hence . We have the following well-known result:

Lemma 4.

Let be of rank . Then is the optimal solution of

(5)

Moreover, satisfies .

Proof.

Observing that the objective can be replaced with , and that the constraint will be active at optimality (otherwise can be multiplied by a scalar larger than one), any optimal solution has to satisfy the following first-order optimality conditions:

where is the Lagrangian multiplier. This implies that . We have

The above equation, together with the condition , imply , which proves .

Denoting the compact SVD of (), we obtain since has orthogonal columns and . ∎

A robustness analysis of pre-whitening follows directly from Theorem 4. In fact, by Lemma 4, the matrix is a feasible solution of the minimum volume ellipsoid problem (1). Moreover, it is optimal up to a factor : by Lemma 4, the optimal solution of

is given by while the optimal solution of (1) is a feasible solution of this problem (since ) so that

and hence

In other words, is a -approximate solution of (1). Combining this result with Theorem 4, we obtain

Corollary 2.

Let where satisfies Assumption 1 with , is of full rank and the noise satisfies . If

pre-whitened SPA identifies indices corresponding to the columns of up to error .

This bound is rather bad as is often large compared to ; for the hyperspectral unmixing application, we typically have and . In the next subsections, we provide a tight robustness analysis of pre-whitening, and analyze pre-whitening under a standard generative model.

3.3 Tight Robustness Analysis

In this subsection, we provide a better robustness analysis of pre-whitening. More precisely, we provide a tight upper bound for . As before, we only consider the case . Under Assumption 1, we have

where we denote and . Recall that the conditioner given by pre-whitening is . Hence, the condition number of will be equal to the square root of the condition number of . In fact,

so that . Therefore, to provide a robustness analysis of pre-whintening, it is sufficient to bound . In the next lemma, we show that , which implies and will lead to an error bound that is much smaller than that derived in the previous subsection.

Lemma 5.

Let be a nonnegative matrix with for all , and let satisfy . Then,

Proof.

Let us first show that

We have

where or , and note that

Using , one easily gets

Also, we have

as and

since . Finally, using the singular value perturbation theorem (Weyl; see, e.g., [14]), we have

for or , and since , we obtain

This bound allows us to provide a robustness analysis of pre-whitening.

Theorem 5.

Let where satisfies Assumption 1 with , has full rank and the noise satisfies . If , pre-whitened SPA identifies the columns of up to error .

Proof.

This follows from Corollary 1 and Lemma 5. We have

and as a result Lemma 5 can be applied to obtain for pre-whitening. By plugging the above bound into Corollary 1, Theorem 5 is obtained. ∎

The bounds of Theorem 5 are tight. In fact, if, except for the pure pixels, all pixels contain the same endmember, say the th, then all columns of are equal to the th column of the identity matrix, that is, for all . Therefore,

since is a diagonal matrix with for all and .

This indicates that pre-whitening should perform the worse when one endmember contains most pixels, as this matches the upper bound of Theorem 5. However, if the pixels are relatively well-spread in the convex hull of the endmembers, then pre-whitening may perform well. This will be proven in the next subsection, wherein the robustness of pre-whitening under a standard generative model is analyzed.

3.4 Robustness under a Standard Generative Model

We continue our analysis by considering a standard generative model in hyperspectral unmixing. We again consider , and the generative model is described as follows.

Assumption 2.

The near-separable matrix is such that

  1. is of full rank.

  2. is i.i.d. following a Dirichlet distribution with parameter , for all . Also, without loss of generality, it will be assumed that .

  3. is i.i.d. with mean zero and covariance , for all (Gaussian noise).

  4. The number of samples goes to infinity, that is, .

Note that the assumption (ii), which models the abundances as being Dirichlet distributed, is a popular assumption in the HU context; see the literature, e.g., [21]. In particular, the parameter characterizes how the pixels are spread. To describe this, let us consider a simplified case where ; i.e., symmetric Dirichlet distribution. We have the following phenomena: if , then ’s are uniformly distributed over the unit simplex; if and decreases, then ’s are more concentrated around the vertices (or pure pixels) of the simplex; if and increases, then ’s are more concentrated around the center of the simplex. In fact, means that ’s contain only pure pixels in the same proportions. It should also be noted that we do not assume the separability or pure-pixel assumption, although the latter is implicitly implied by Assumption 2. Specifically, under the assumptions (ii) and (iv), for every endmember there exists pixels that are arbitrarily close to the pure pixel in a probability one sense.

Now, our task is to prove a bound on under the above statistical assumptions, thereby obtaining implications on how pre-whitened SPA may perform with respect to the abundances’ distribution (rather than in the worst-case scenario). To proceed, we formulate the pre-whitening preconditioner as

For , we have

Also, under Assumption 2, the above correlation matrix can be shown to be

We have the following lemma.

Lemma 6.

Under Assumption 2, the matrix is given by

(6)

where and . Also, the largest and smallest eigenvalues of are bounded by

respectively.

Proof.

It is known that for a random vector