Hyperspectral Image Unmixing with Endmember Bundles and Group Sparsity Inducing Mixed Norms
Abstract
Hyperspectral images provide much more information than conventional imaging techniques, allowing a precise identification of the materials in the observed scene, but because of the limited spatial resolution, the observations are usually mixtures of the contributions of several materials. The spectral unmixing problem aims at recovering the spectra of the pure materials of the scene (endmembers), along with their proportions (abundances) in each pixel. In order to deal with the intraclass variability of the materials and the induced spectral variability of the endmembers, several spectra per material, constituting endmember bundles, can be considered. However, the usual abundance estimation techniques do not take advantage of the particular structure of these bundles, organized into groups of spectra. In this paper, we propose to use group sparsity by introducing mixed norms in the abundance estimation optimization problem. In particular, we propose a new penalty which simultaneously enforces group and within group sparsity, to the cost of being nonconvex. All the proposed penalties are compatible with the abundance sumtoone constraint, which is not the case with traditional sparse regression. We show on simulated and real datasets that well chosen penalties can significantly improve the unmixing performance compared to the naive bundle approach.
I Introduction
Hyperspectral imaging, also known as imaging spectroscopy, is a technique which allows to acquire information in each pixel under the form of a spectrum of reflectance or radiance values for many –typically hundreds of– narrow and contiguous wavelengths of the electromagnetic spectrum, usually (but not exclusively) in the visible and infrared domains [Bioucas2013]. The fine spectral resolution of these images allow an accurate identification of the materials present in the scene, since two materials usually have distinct spectral profiles. However, this identification is made harder by the relatively low spatial resolution (significantly lower than panchromatic, color or even multispectral images). Therefore, many pixels are acquired with several materials in the field of view of the sensor, and the resulting observed signature is a mixture of the contributions of these materials. Spectral Unmixing (SU) is then a source separation problem whose goal is to recover the signatures of the pure materials of the scene (called endmembers), and to estimate their relative proportions (called fractional abundances) in each pixel of the image [Keshava2002].
Usually, since the abundances are meant to be interpreted as proportions, they are required to be positive and to sum to one in each pixel. Another classical assumption made for SU is that the mixture of the contributions of the materials is linear, leading to the standard Linear Mixing Model (LMM) [bioucas2012, 6678258]. Let us denote a hyperspectral image by , gathering the pixels () in its columns, where is the number of spectral bands, and is the number of pixels in the image. The signatures of the endmembers considered for the unmixing are gathered in the columns of a matrix . The abundance coefficients for each pixel and material are stored in the matrix . With these notations, the LMM writes:
(1) 
where is an additive noise (usually assumed to be Gaussian distributed). Eq. (1) can be rewritten in a matrix form for the whole image:
(2) 
where comprises all the noise values. We keep in mind the constraints on the abundances: Abundance Nonnegativity Constraint (ANC) and the Abundance Sumtoone Constraint (ASC) . This model is considered physically realistic when each ray of light reaching the sensor has interacted with no more than one material on the ground (the socalled checkerboard configuration) [bioucas2012]. A nice property of the LMM is that, combined with the constraints on the abundances, it provides a strong geometrical structure to the problem: indeed the ANC and ASC force the abundances to lie in the probability simplex with vertices, which we denote by . Since the data are obtained via a linear transformation of the abundances, they are also constrained to lie in a simplex, whose vertices are the endmembers. The actual signal subspace is then a dimensional subspace, embedded in the ambient space.
The typical blind SU chain comprises three steps:

Estimating the number of endmembers to consider. This is a very hard and illposed problem in itself (because there is no such thing as an optimal number of endmembers in real data, among other reasons) and many algorithms have been considered in the community to try to obtain a good estimate [Robin2015].

Extracting the spectra corresponding to the endmembers, a procedure referred to as endmember extraction. Then again, many Endmember Extraction Algorithms (EEAs) exist in the litterature to tackle this problem, with various assumptions, the main one being the presence in the data of pure pixels, i.e. pixels in which only one material of interest is present [Plaza2004]. These algorithms try to exploit the geometry of the problem by looking for extreme pixels in the data, which are the endmembers if the LMM holds. A popular EEA using the pure pixel assumption is the Vertex Component Analysis (VCA) [Nascimento2005].

Finally, estimating the abundances using the data and the extracted endmembers. This step is usually carried out by solving a constrained optimization problem:
(3) where denotes a vector of ones whose size is given in index, and is the Frobenius norm. Solving this problem is often referred to as Fully Constrained Least Squares Unmixing (FCLSU) [Heinz2001].
This unmixing chain is now standard and has been used with success over the last two decades. However, the main two limitations of this approach have been identified as nonlinear effects, and endmember variability. Nonlinear interactions may occur if light reaching the sensor has interacted with more than one material on the ground (e.g. in tree canopies, or in particulate media). These physical considerations have spanned a number of approaches taking some nonlinear effects into consideration; see [heylen2014, 6678284] for reviews on that subject. The other limitation, termed spectral variability, is simply based on the consideration that all materials have a certain intraclass variability, and therefore cannot be represented by a single signature, as in the conventional LMM [zare2014, drumetz2016endmember]. Note that even though both phenomena are usually considered separately, some approaches attempt to address both simultaneously [7508937]. Recently, a formal link between models addressing these two limitations was derived in [8022979].
Causes of variability of the materials from one pixel to the other include changing illumination conditions locally in the scene, especially because of a nonflat topography, which will change the local incident angle of the light and the viewing angle of the sensor. Also, physicochemical changes in the composition of the materials induce modifications on their spectra. For example, the concentration of chlorophyll in grass or soil moisture content can change the signatures of these two materials. Several approaches have been designed for variability aware unmixing, that is to correct the abundances due to the intraclass variations of the endmembers. The existing methods could be basically divided into two groups: dictionary or bundlebased approaches [bateson2000endmember, somers2012, xu2015image], which try to model an endmember by a certain number of instances of each material, and modelbased approaches, which define a specific model for the variation of the endmembers, be it computational [PLMM, 8264812, 7217817] or more physicsinspired [drumetztip]. Even though most methods dealing with spectral variability in the spatial domain, within a single image, studies dealing with temporal endmember variability have also been recently developed [henrot2016, thouvenin2016online].
This paper focuses on variability in the spatial domain using bundlebased approaches. Even though they lack the interpretability of their model based counterparts, they can still fit the framework of the LMM, while correcting the abundances with less assumptions on the data. The idea behind spectral bundles is to find a way to extract a dictionary of different instances of the materials directly from the data. Then one can simply replace the matrix in Eq. (2) by a dictionary made of spectral bundles , where is the total number of endmember candidates. Since this dictionary (and the abundance coefficients as well) is organized into groups, and since can be relatively large, it makes sense to consider that only a few atoms of the bundle dictionary are going to be used in each pixel. Moreover, a now common assumption in SU is that a few materials are active in each pixel, out of the considered materials. These two assumptions, though similar, are not equivalent. In any case, they justify the use of sparsity in the SU problem. For instance, sparse regression using libraries of endmembers (i.e. semisupervised unmixing) has been investigated in [Iordache2011] and exploited by many authors thereafter. It is interesting to note that sparse unmixing, when performed using convex optimization, and more precisely norm regularization, is not compatible with the ASC. We will come back to this issue in the next section. When dealing with coefficients organized in groups, a small number of which are supposed to be active, one may resort to approaches similar to the socalled group LASSO (for Least Absolute Shrinkage and Selection Operator) [meier2008group]. In SU, both approaches have been combined in semisupervised unmixing in [iordache2011hyperspectral], when the dictionary can be organized into clusters of signatures.
The contributions of this paper are threefold. We first show that when using the classical FCLSU to estimate the abundances of the materials in the context of endmember bundles, it is possible to recover pixelwise endmembers representing variability effects more subtly than with the signatures of the bundle dictionary, and we provide a simple geometrical interpretation for them. Second, further elaborating on the considerations and results of the conference paper [7532746], we improve the abundance estimation when bundles are used by considering three “social” sparsity inducing penalties [kowalski2009sparsity, kowalski2013social], whose interest we show by comparing their performance on simulated and real data to the classical naive approach. Third, while two of those penalties have already been used in various signal processing problems, the last one (a mixed fractional quasinorm with integers) is new to the best of our knowledge, and presents remarkable properties, while raising interesting technical challenges in its optimization.
The remainder of this paper is organized as follows: Section II presents in more detail the basic procedure to obtain a bundle dictionary from the image data, based on [somers2012]. While this is not the core of our method, it still serves as a basis for blind SU with bundles, and we present it here for the reader’s convenience. We also propose a new geometric interpretation to SU using spectral bundles with FCLSU. Section III introduces the proposed sparsity inducing penalties, and details how to handle the resulting optimization problem for abundance estimation in each case. Section IV tests the different penalties on synthetic and real data, and compares them to the standard bundle unmixing approach. Finally, Section V gathers some concluding remarks and possible future research avenues.
Ii Dealing with spectral variability with endmember bundles
Iia Automated Endmember Bundles
Extracting endmember bundles from the image data is a procedure whose goal is to obtain various instances of each endmember, still in a blind unmixing setting. We present and will use the socalled Automated Endmember Bundles (AEB) approach of [somers2012]. It is conceptually very simple and based on the following idea: since running a classical EEA once on the data provides the extreme points of the dataset, several instances of each material could be obtained by running an EEA on several subsets of the data. Then, one can randomly sample subsets of the data of size (pixels), and run an EEA on each subset in order to obtain endmember candidates, and therefore signatures in total. It is possible that for a given run, an EEA extract several signatures corresponding to the same material. Thus, it is not guaranteed that each endmember is represented by exactly instances. Some rare endmembers could also appear. Usually, however, the number of desired bundles is set as the estimated number of endmember for the whole image. In addition, even in a perfect case, the instances for each endmember are not aligned, i.e. the ordering of the extracted endmembers in different subsets is not the same, due to the stochasticity of the EEAs. Thus, a clustering step is required to group the signatures into bundles. In [somers2012], the clustering is performed using the kmeans algorithm, with the spectral angle as a similarity measure. The spectral angle has the nice property of being insensitive to scalings of the input vectors, which is a wellknown effect of spectral variability related to changing illumination conditions [drumetztip].
This idea of extracting several endmember candidates in different parts of the image is related to the recent Local Spectral Unmixing approaches, which handle variability by assuming it is less important in local (as opposed to random here) subsets of the image. In this type of methods, the whole unmixing process is entirely carried out in each (connected) subset. Some examples of local unmixing include [Goenaga2013, canham2011, Veganzones2014, tochon2016]. The main difference here is that we extract endmembers in random subsets, before gathering all those signatures into a common pool organized into groups, from which we estimate abundances for the whole image, not in local subsets.
In any case, the result of the clustering is a dictionary of endmember candidates. The clustering step defines a group structure^{1}^{1}1This denomination should not be understood in the sense of the algebraic structure. Instead, this simply means that the endmembers (and hence the abundance coefficients) are grouped into clusters, which we call groups here. on the abundance coefficients and on the dictionary. We denote this group structure by , and each group , contains signatures, so that representative of group in the dictionary is denoted as . Then there are columns in .
Let us note that obviously, the quality of the unmixing is going to be very dependent on the quality of the bundle itself. In some cases, a spectral library of endmembers incorporating some endmember variability may be available directly. In such a case, the methods we describe next still apply, and the unmixing paradigm becomes semisupervised, as in the sparse unmixing approaches [Iordache2011].
IiB Abundance Estimation
Now that the bundles are available, one has to estimate the abundances of the different materials, which can be done in multiple ways. The Fisher Discriminant Nullspace (FDN) approach [Jin2010] is a dimension reduction technique based on finding a linear transformation which maps the data to a subspace where the intra class variability of the bundles is minimized, while their inter class variability is maximized. The unmixing is then carried out in this new subspace, e.g. using the centroids of each class a the endmembers.
Multiple Endmember Spectral Mixture Analysis (MESMA) [Roberts1998] is a technique which aims at selecting in each pixel of the HSI the best endmember candidate for each material in terms of data fit. To do that, Problem (3) is solved using all possible combinations of (a small number of) candidate endmembers as the endmember matrix. However, the problem thus becomes combinatorial and is computationally expensive.
Machine learninginspired approaches have also been designed, using the fact that “training data” can be constructed by simulating mixtures of the different instances of the materials in various proportions [mianji2011, uezato2016].
Last, but not least, one can simply replace the endmember matrix with the bundle matrix in Eq. (3) and solve for the new abundance matrix in the exact same way, via constrained least squares. This approach is the most straightworward, but differs from the ones mentioned above insofar as several endmembers from each bundle can be simultaneously active, and summing the individual contributions of each material in each pixel is required to recover the abundances of the materials. This can be seen as a drawback, which makes the interpretation of the results harder. Of course, when the dictionary is not extracted from the data, and each instance represents a particular physical configuration of a material, this could indeed be an issue if the goal is to associate each pixel with a given physical parameter (though it is technically possible that several configurations of the same material can be present at the same time in a given pixel). Here, we show that this solution is actually theoretically more powerful than considering one instance per class, because it allows to define pixelwise endmembers which can be derived from the representatives of each material, and interpreted geometrically.
Indeed, we can write the LMM in the context of FCLSU with spectral bundles in one pixel (we drop the pixel index here to keep the notation uncluttered) in two different ways:
(4) 
where is the column of this dictionary, and is the abundance coefficient associated to . Now, if we want the global abundance of material to be , then we have to rewrite Eq. (4) as:
(5) 
with the total abundance coefficient for material in the considered pixel, and . This vector actually contains a new “equivalent” endmember for this pixel and material , associated with the global “intuitive” abundance coefficients. As a matter of fact, this new endmember is a weighted mean of all the available instances of this material, where the weights are the abundances extracted by FCLSU. Therefore, the normalized coefficients of this weighted mean are nonnegative and sum to one. This means that is a convex combination of the instances of the corresponding endmember. Geometrically, each equivalent endmember belongs to the convex hull of the elements of the bundle for this material. In this sense, finding abundances with FCLSU rather than MESMA allows more freedom in terms of spectral variability: the latter constrains each pixel to come from a combination of the extracted sources, while the former theoretically allows any point inside the convex hull of the each bundle to be a local endmember. This geometric interpretation is shown in Fig. 1. The perpixel equivalent endmember is of course only defined if at least one instance of group is active in this pixel. Otherwise, it makes no sense trying to extract spectral variability in a pixel from a material which is not present.
Similarly to what is done in the semisupervised sparse unmixing paradigm, it makes sense to assume that only a few atoms of the bundle dictionary are going to be active in each pixel. Hence, one can add a regularization term to the formulation (3) so as to enforce sparse abundance vectors:
(6) 
where denotes the sparsityinducing norm of the matrix, i.e. the sum of the absolute values of all its entries, and is a regularization term, weighting the sparsity related term w.r.t. the data fit term. Note that the ASC has been dropped here; we explain below why.
This approach is useful to induce sparsity in the estimated abundances, but lacks two important features:

The compatibility with the sumtoone constraint. Indeed, let us assume that the ASC if enforced in problem (6). Since the abundances are constrained to be nonnegative, a straightforward computation shows that . The norm of the abundance matrix is then constant, and then minimizing this quantity will result in breaking the constraint. This phenomenon is known and has been reported for instance in [6776522], where the nonconvex quasinorm instead is used to induce sparsity in the solutions.

The structure of the bundles is not taken into account during the unmixing. Each abundance coefficient is treated independently w.r.t. sparsity. The first reason why sparsity was brought to hyperspectral image unmixing is the assumption that a few materials, rather than atoms of a dictionary, are active in each pixel.
Iii Proposed abundance estimation method
Iiia Group sparsity inducing mixed norms
All the penalties we are going to detail in the following are based on applying a mixed norm on the abundance vector (in each pixel), which is endowed with a group structure , which partitions the endmembers extracted by the runs of VCA on random subsets into groups (as many as the number of materials to unmix). We drop the pixel index for simplicity of the notation. In the most general form, the group twolevel mixed norm is defined, for any two positive real numbers and as [kowalski2013social]:
(7) 
where is a subvector of the abundance vector () comprising all the abundance coefficients associated to the endmembers of group . This equation only defines a true norm for (and also for or , by taking limits). As explained in Fig. 2, the idea is to take the norm of each of the subvectors of coefficient defined by the groups, to store the results in a dimensional vector, for which we are going to compute the norm.
We will see that with smart choices of and , different types of sparsity can be obtained when this mixed norm is used as a regularizer in the unmixing with bundles. The definition of the norm can easily be extended to a matrix , using the same expression, operating columnwise, and summing the results on all pixels:
(8) 
Here, we are interested in norms which can handle any group structure, while enforcing several types of sparsity. For instance, the use of sparsity in SU is based on the assumption that a few materials are active in each pixel. If the dictionary of endmembers has a group structure, it makes sense to enforce sparsity on the number of groups, rather than on the total number of signatures. This rationale is the basis of the socalled group LASSO [meier2008group]. This method uses the norm, which enforces sparsity on the vector whose entries are the . This means that when one of these entries is zero, the whole group is discarded entirely since the vector has a zero norm. Within each group, there is no sparsity and thus most or all signatures are likely to be active. The effect of this penalty on a matrix is shown in Fig. 3.
In some cases, for example when we deal with a small number of groups, and we have reasons to believe that there is only one or few instances of each group which are active in a pixel, we may expect withingroup sparsity, without group sparsity. In this case the elitist LASSO penalty [kowalski2009sparsity] is suited to the problem, since it uses the norm, which promotes a small value of the norms of each . The effect of this penalty is shown in Fig. 4.
Using a penalty which enforces both group sparsity and global sparsity (on the total number of active signatures) also seems appealing. A seemingly natural way to do so with the proposed framework is to take both , to enfore sparsity on both norms that are computed in sequence. However, it turns out that the norm actually is the same as the regular norm, i.e. using these values for and actually breaks the group structure of the coefficients, and is of course still incompatible with the ASC. The sparse group LASSO [simon2013sparse] uses a combination of the group lasso penalty and a classical norm to benefit from both properties. It was recently used in a sparse SU context in [iordache2011hyperspectral]. However, in this case, benefiting from both penalties comes at the cost of having two regularization parameters to tune. In our case, recall that this penalty is also still at odds with the ASC due to the presence of the norm in the objective. The ASC can also be contradictory with sparsity in some other configurations: for instance, if each material has only one representative, the group LASSO reduces to the regular LASSO and the ASC conflicts with the objective. In order to avoid this issue, we are also using a fractional case, with the “norm”, with ( and ) and . This penalty is no longer a norm, because we lose convexity due to the fact that , but it has the advantage of enforcing both group sparsity and withingroup sparsity in a compact formulation, without conflicting with the ASC anymore. In addition, the quasinorm is a better approximation of the norm than the norm. The effect of the penalty on the abundance matrix is shown in Fig. 5. The choice of the fraction is an additional parameter, but we will see that it turns out to be relatively easy to tune.
Geometrically, the group lasso encourages a small number of materials (i.e. groups) to be active in each pixel, but tends to prefer a dense mixture within each group. Hence, the pixelwise endmembers tend to lie around the center of mass of each bundle. For the elitist penalty, a large number of groups are active in each pixel. However, within each bundle, a small number of endmember candidates will be active, which translates geometrically into the fact that the pixelwise endmbers will lie inside polytopes with a few number of vertices. The fractional penalty has the same effect as the elitist one, except than in addition only a few groups are going to be active in each pixel.
IiiB Optimization
With either of those penalties, the optimization problem to solve is:
(9) 
where, is the indicator function of the probability simplex with vertices, applied independently to every column of . Note that after solving this problem, in order to obtain the global abundances, one simply has to sum the abundances within each instance of each group, as described in section II. The optimization problem (9) is convex for both the group and elitist penalties, but not for the fractional one.
In any case, since the problem is not differentiable, we are going to use the Alternating Direction Method of Multipliers (ADMM) [boyd2011] to solve it. Convergence to the global minimum is automatically guaranteed for the group and elitist penalties. For the nonconvex case, as we will see, the situation is more complex. The ADMM was designed to tackle convex problems, but it has been more and more (successfully) used for nonconvex problems as well, and recent works [wang2015global] show that if the nonconvex function satisfies some conditions, the ADMM is proven to converge to a stationary point in the nonconvex case. One of these cases includes the quasinorm for . This results remains to be shown in the mixed norm with case, (but it is likely to satisfy the same conditions, being “less nonconvex” than the quasinorm, because the unit ball of such a norm will have some nonconvex facets, but not all since some of them will be similar as the facets of the ball).
The next two sections lay out the algorithm to solve the optimization problem with the group and elitist penalties, while the last one shows how to handle the fractional case.
IiiB1 Group and elitist cases
The optimization problem to solve in both cases is convex, and can be handled in the same fashion. In order to use the ADMM, we need to introduce auxilary split variables, which will decompose problem (9) into easier subproblems. Let us then rewrite this problem as:
(10) 
The ADMM consists in expressing the constrained problem defined in Eq. (10) in an unconstrained way using an Augmented Lagrangian (AL), and then minimizing it iteratively and alternatively for each of the variables introduced, before finally maximizing it w.r.t. the Lagrange Multipliers appearing in the AL (the socalled dual update). is the barrier parameter weighting the AL terms (which we will set to for all the algorithms). Here, the Augmented Lagrangian writes:
(11) 
where and are the set of dual variables.
Each of the updates is supposed to be straightforward. The update w.r.t. only involves quadratic terms. and the updates w.r.t. and can be readily obtained, provided the proximal operators [combettes2011proximal] of the mixed norm, and of the indicator function of the simplex can be easily computed. The latter can indeed be easily obtained with the algorithm of [condat2014fast]. For both the group and elitist penalties, the proximal operators have closed forms, which we provide now.
The proximal operator of the group penalty is simply a group version of the socalled block soft thresholding operator, i.e. the proximal operator of the norm:
(12) 
The block soft thresholding is denoted by , where is the scale parameter of this operator:
(13) 
where (and we have ).
The proximal operator for the elitist norm is a bit more complex, but has a closed form (derived in [kowalski2009sparse]), which involves the regular soft thresholding operator, the proximal operator of the norm. We recall that
(14) 
With this definition, the proximal operator of the elitist penalty is given as:
(15) 
where the soft thresholding is applied entrywise, and
(16) 
The ADMM procedure for any case is summarized in Algorithm 1, in which denotes the proximal operator of the function in index, and denotes the proximal operator of the indicator function of the simplex, a simple projection. denotes the identity matrix whose size is in index (we only provide one dimension for brevity since the matrix is square). Note that the proximal operators are carried out columnwise.
IiiB2 Fractional case
The problem is more complex for the fractional mixed norm. As we have pointed out above, there is no proof that the mixed norm with satisfies the required properties for the ADMM to converge to a local minimum. However, in our problem, with an appropriate variable splitting scheme, we can express the fractional case for problem (9) as a regularized constrained least squares problem.
Let us suppose for simplicity (and without loss of generality), that the rows of and the columns of have been sorted so that, in each pixel, the abundance vector has the following form:
Recall that is the number of instances of one of the groups, in this case .
The problem we want to solve is:
(17) 
with the total number of signatures in the bundle . With the following variable splitting scheme, Problem . (17) is equivalent to:
(18) 
with
(19) 
where , and the row having consecutive ones.
This way, we have reduced the optimization problem to a regularized least squares problem, where the variable on which the fractional norm is applied is a vector whose entries are the norms of the abundance coefficients in each group. Note that the new problem is equivalent to the original one only thanks to the nonnegativity constraint, which allowed us to turn the norm into linear constraints.
This way, the convergence of the ADMM for our nonconvex problem is, in theory, guaranteed [wang2015global], should we be able to compute exact updates for all the subproblems of the ADMM.
However, even after this simplification of the problem, an issue remains: there is no closed form expression or known algorithm (to the best of our knowledge) to compute exactly the shrinkage operator of the quasinorm (to the power ) when , except when or [cao2013fast]. Here, we prefer the term “shrinkage operator” to the term “proximal operator” because the latter is usually defined for convex functions only. In addition, this operator is a discontinuous function, because of the nonconvexity of the quasinorm [yukawa2013lp]. This limits the applicability of proximal methods to solve this type of problems, and other types of algorithms (or of nonconvex sparsity inducing penalties) have been investigated in remote sensing (see e.g. [tuia2016nonconvex] and references therein).
In our case, in order to be able to apply ADMM nonetheless, we need an explicit shrinkage operator. We resort to an approximate shrinkage operator , as defined in [woodworth2016compressed]:
(20) 
This operator reduces to the soft thresholding operator when and to the hard thresholding operator when . The hard thresholding operator is closely related to the shrinkage operator of the norm [woodworth2016compressed]. In addition, it can be shown ([woodworth2016compressed], theorem II.4) that the operator of Eq. (20) is actually the exact shrinkage operator of a nonconvex function (which we will denote as ) with desirable properties: it is separable w.r.t. each entry of , with , and the function is even, continuous, strictly increasing and concave for , differentiable everywhere except in 0, and satisfies the triangle inequality. This function behaves in a way similar to the absolute value for small values of its argument, and more like the absolute value taken to the power for larger arguments (see [woodworth2016compressed] for a graphical representation). For , this penalty function is the absolute value, but in general, unfortunately, there is no analytical expression for it. This result is interesting because we have an explicit shrinkage operator with nice properties to use with any proximal algorithm, to the cost of having a regularizer without an explicit expression. Nevertheless, the convergence of the ADMM in the nonconvex case remains to be proven, since we replaced the norm with another nonconvex penalty, which should itself satisfy the required properties of [wang2015global] in order to guarantee convergence.
Finally, the optimization problem we solve is:
(21) 
The AL writes:
(22) 
and we can use the ADMM algorithm to minimize it. The optimization procedure is described in Algorithm 2 (the approximate shrinkage is performed coordinatewise).
IiiC Computational complexity
For the three proposed algorithms, the most costly operations involved are matrix products and solving linear systems. Hence the computational complexity of the algorithms for the group and elitist penalties is (assuming ). The complexity for FCLSU is the same but a much faster convergence is expected since the constraints are less complex and much less iterations will be required to reach convergence. Since the fractional penalty involves additional matrix products using matrix , then the complexity for the corresponding algorithm is slightly higher and is . A significant increase in running time is then expected for the fractional penalty when the number of endmembers gets large.
Iv Results
Iva Synthetic data
First, we test the three proposed penalties in a semisupervised scenario, where the possible instances of each endmember are known beforehand. This case is similar to the now wellknown and widespread sparse unmixing approaches [Iordache2011, 7293641]. The main difference with respect to regular sparse unmixing is that here the spectral library incorporates several variants of each endmember in order to account for the variability of the scene. The group structure is then known a priori as well. The objective of this dataset is to compare the performance of the proposed penalties in terms of abundance estimation and variability retrieval, i.e. the core of the contribution of this paper, when the bundle extraction is assumed to have been carried out efficiently.
In order to show the interest of the proposed penalties, we designed a synthetic dataset corresponding to a challenging unmixing scenario, with 20 different materials present in the scene. We then selected randomly signatures from the United States Geological Survey (USGS) spectral library [kokaly2017usgs], comprising 224 spectral bands in the visible and near infrared parts of the electromagnetic spectrum. For each of those 20 signatures, spectral variants were generated by modifiying the spectra using several effects: scaling variations, accounting for local changes in illumination conditions [drumetztip], a nonlinear quadratic (in the original signature) perturbation to mimick intrinsic variability effets, and i.i.d white Gaussian noise. 20 such variants were randomly generated for each material, providing a dictionary of spectral signatures, divided into groups. In each pixel, only one signature is active for each endmember.
sparse (between 1 and 3 active materials in each pixel) abundance maps were computed using Gaussian Random Fields, so as to give them a spatial coherence (which is not used by the algorithm but is helpful to visualize the results).
We tested the performance of 4 algorithms on this dataset: the FCLSU algorithm, which does not take the group structure into account, and the three proposed penalties: group penalty, elitist penalty, and fractional penalty. In each case, we optimize the regularization parameter via a grid search to obtain the best abundance estimation performance (, see below). The FCLSU algorithm was implemented using ADMM as well, to allow a fair comparison for the running times of the various algorithms. We ran all the algorithms for 1000 iterations or stopped them when the relative variations (in norms) of the abundance matrix went below . For the fractional penalty, the grid search is also carried out w.r.t. the choice of the fraction (we tested values between and , with steps of ). To quantitatively assess the abundance estimation performance, we use the abundance Root Mean Squared Error between the estimated abundances for each material (summing the abundances of all the representatives for each group) and the true ones in each pixel:
(23) 
where the abundance matrix comprises only the global abundance coefficients.
To assess the precision of the abundance estimation within each group, we also use this metric for each individual atom of the dictionary, which we call in that case :
(24) 
where is the number of instances in each group, and here the abundance matrix used is the full bundle dictionary.
To compare the performance in terms of material variability retrieval, we also use the Root Mean Squared Error between the true signatures and the estimated ones:
(25) 
and the overall spectral angle mapper (SAM) between the true signature used for material in pixel and the estimated ones (using Eq. (5)):
(26) 
All the quantitative results are gathered in Table I.
\backslashboxMetricAlgorithm  FCLSU  Group  Elitist  Fractional 

0.0103  0.0072  0.0158  0.0064  
0.0166  0.0183  0.0169  0.0209  

0.0342  0.0339  0.0382  0.0337 
SAM (degrees)  2.026  1.749  2.148  1.738 
Running time (s)  16  73  74  95 
0.006  0.01  0.02 
From this table, we see that the elitist algorithm obtains the worst performance in terms of abundance estimation, followed by the FCLSU algorithm, while the group penalty largely outperforms those two when it comes to global abundances. The fractional penalty is able to refine these results slightly and obtains the best performance. If we look at what happens within groups, then the fractional penalty is outperformed by the others. We will see why it is so when we look at the abundance maps. The group and fractional penalties obtain the best endmember RMSE and SAM values, meaning that they recover the closest endmembers to the ground truth. In this configuration, the intergroup sparsity property of the abundances for those two penalties is crucial. The reason the fractional penalty further improves the estimation is that it does not encourage a too harsh within group sparsity, while enforcing inter group sparsity. On the other hand, this sparsity comes at the expense of a precise identification of which endmember candidates were actually used in each pixel. In terms of running time, except for FCLSU which is the fastest, the other approaches are notably slower. The fractional penalty is without surprise the slowest because the constraints in the optimization problem are more complex than for the other two penalties.
The optimal value of , among those we tested, for the fractional penalty is possibly because this makes the sparsity inducing term closer to the group norm. This shows the interest of the proposed approximation scheme to optimize this mixed fractional norm for any value of , instead of simply settling with the cases where or . The sensitivity to this parameter is not very critical, as can be seen in Fig. 6, where the optimal (over ) is reported for the different fraction values.
Fig. 7 shows the abundance maps for 5 of the 20 materials to unmix, for all the tested algorithms. The abundances for the FCLSU algorithm and the group penalty look very similar, with a slight improvement in terms of “noisiness” of the abundance maps fot the latter (e.g. for material 3). The abundances for the elitist penalty are slightly worse than the other methods, because although the elitist penalty encourages within group sparsity, it also tends to favor dense mixtures over the groups. The fractional penalty obtains the best visual results, by eliminating most of the noise in the abundances, and getting the support of each material right. However, it seems that it tends to locally make the abundances slightly too close to 1.
We also show in Fig. 8 a part of the abundance matrices obtained as images, to show the structure that each penalty induces on the coefficients. We confirm that FCLSU and the group penalty obtain similar performance, while the elitist penalty tends to incorporate spurious endmembers because of the between group density of the coefficients. The fractional penalty obtains a more accurate support of the endmembers than the other two, but at the price of sometimes overestimating the abundances because of a too violent sparse behavior, which is necessary to get rid of the noise but too harsh for a high precision abundance estimation.
IvB Real data
In this section, we test the proposed algorithms on a real hyperspectral image. The dataset is a subset of an image acquired over the University of Houston campus, Texas, USA, in June 2012. It was used in the 2013 IEEE GRSS Data Fusion Contest (DFC) [Debes2014]. The image comprises 144 spectral bands in the 380 nm to 1050 nm region, and comes with a LiDAR dataset acquired a day before over the same area, with the same spatial resolution (2.5 m). We are interested here in a subset of this image, acquired over Robertson stadium on the Houston Campus and its surroundings. Fig. 9 shows a RGB representation of the observed scene, as well as a high spatial resolution RGB image of the scene, and a LiDARderived digital elevation model. This dataset was used in several works on spectral unmixing and has been shown to exhibit significant variability effects [drumetztip]. It comprises several structures with locally changing geometries, namely the red metallic roofs which are pyramidal and whose facets correspond to different incidence and emergence angles from the light. The concrete stands also show different orientations with respect to the sun. Vegetation also incorporates variability, since it can be divided into two subclasses: trees and grass (which can in addition can be mixed with soil, which is interpreted as variability in the absence of a soil endmember).
Since a library of reference endmembers is unavalaible for this image (let alone incorporating variability), the unmixing chain has to be completely blind and we resort the the AEB method to extract bundles. We sample without replacement 10 subsets comprising 10% of the data pixels each, and extract endmembers on each subset using the VCA. Then we cluster the resulting dictionary into classes using the kmeans algorithm (with the cosine similarity measure). Therefore, in this case, we have . We then compare the performance of the different algorithms in several ways. We use for the fractional algorithm. The values of the regularization parameters are reported in Table II.
\backslashboxMetricAlgorithm  FCLSU  Group  Elitist  Fractional 
RMSE  0.0029  0.010  0.016  0.008 
SAM (degrees)  0.7240  2.7891  5.1944  2.3684 
Running time (s)  3  8  51  63 
2  0.4  0.5 
We first show the abundance maps obtained for each algorithm in Fig. 10. From those, we see that all algorithms seem to be able to reasonably explain the material variability present within the image, as opposed to using classical linear unmixing algorithms [drumetztip]. The group penalty is able to sparsify the abundance maps more than FCLSU does, providing clearer abundance maps for the red roofs and the painted structures which are next the the left and right stands of the stadium. However, the detection of the concrete stands themselves is slightly less accurate. The elitist penalty is able to identify the different structures in the image, but fails to consider them as pure because it favors a large number of groups to be simultaneously active in each pixel. The fractional penalty logically provides sparser abundance maps because it penalizes dense mixtures over the groups more aggressively than the group penalty. The structures are considered as purer, which makes sense for an urban scenario such as this one. However, the abundance maps are slightly noisier, with some neighboring pixels in homogeneous regions which do not share the same sparsity properties. This could be mitigated by using spatial regularizations on the abundance maps.
Then we show in more detail the abundance maps of 5 of the 10 representatives of the bundles corresponding to vegetation (Fig. 11), red roofs (Fig. 12), and concrete (Fig. 13). The idea is to find out if the various algorithms are able to provide a meaning to the different representatives within each bundle. First, for all materials, it seems that the group penalty tends to favor dense mixtures within each group, as expected, making it hard to give an interpretation to each endmember candidate. On the other hand, the elitist penaly obtains much sparser abundance maps within one group, and some endmember candidates can be interpreted (the football field is assigned to a single endmember candidate in the vegetation class, for example). The same goes for FCLSU, which seems to naturally enforce a certain sparsity within each group. Finally, the fractional penalty provides the most conclusive results. For each class, only the predominant signatures are retained and are interpretable, whereas the others are very sparse and some are almost entirely discarded (because they could be spurious, or redundant). For the vegetation class, three subclasses are identified by the algorithm (only two are clearly identifiable for FCLSU): football field, trees, and the area within the field which is mixed with soil. This is interpreted as variability by the model. The other less explicative abundance maps are made more sparse by the algorithm. For the red roof class, the facets of each roof are assigned to two different categories, depending on their orientation w.r.t. the sun, and a third map detects the other red roofs of the image, which are subject to other illumination conditions. The concrete stands are assigned to two different signatures for similar reasons. Note that the abundances of these subclasses are rarely exactly one; this means that the algorithms prefers to identify as a local endmember a convex combination of these two signatures, thus taking into account variability effects which are not sufficiently described by the endmember candidates only.
Finally, for completeness, we show in Table II the overall RMSE and SAM values for each algorithm (computed on the reconstruction of the pixels, in the absence of ground truth), the running times as well as the chosen values for the regularization parameter, when applicable. Without surprise, the RMSE and SAM values decrease when either of the three proposed penalties are used, since we compromise between accurate reconstruction of the data and structured sparsity of the abundance maps. We still note that apart from FCLSU the reconstruction errors are the best when the fractional penalty is used. Since is rather small here, the running times are more comparable for the different algorithms, except that FCLSU and the group penalty converge after fewer iterations.
V Conclusion
In this paper, we have proposed and compared three different sparsityinducing penalties, specifically aimed at tackling endmember variability in hyperspectral image unmixing, when a dictionary of endmember candidates is available, or when endmember bundles are extracted from the data. We provided a new geometric interpretation of unmixing hyperspectral data within this paradigm. Among the three proposed penalties, the group one enforces a natural interclass sparsity property of the abundance maps, but is hampered by the fact that it prefers dense mixtures within each group. Conversely, the elitist penalty tries to select the best endmember candidates within each group, but provide inconclusive results because it prefers dense mixtures over the groups. The new fractional penalty we propose leads to the best results: it allows to enforce both within and inter group sparsity at the same time in a single mixed quasinorm, and is compatible with the sumtoone constraint on the abundances. The interest of allowing to use any fraction is to provide more flexibility and better results than using the two cases the shrinkage operators have a closed form (, or ). It finally provides purer global abundance maps while giving clearer interpretations to subclasses than the other tested methods. Future work could include the combination of this penalty to spatial regularizations on the abundances, as well as an automatic way to tune the regularization parameter providing a compromise between sparsity and data fit. Finding a proof of convergence of the Alternating Direction Method of Multipliers (ADMM) in the approximation framework we use for the optimization would also be an interesting perspective.