WaveletBased Semantic Features
for Hyperspectral Signature Discrimination
Abstract
Hyperspectral signature classification is a quantitative analysis approach for hyperspectral imagery which performs detection and classification of the constituent materials at the pixel level in the scene. The classification procedure can be operated directly on hyperspectral data or performed by using some features extracted from the corresponding hyperspectral signatures containing information like the signature’s energy or shape. In this paper, we describe a technique that applies nonhomogeneous hidden Markov chain (NHMC) models to hyperspectral signature classification. The basic idea is to use statistical models (such as NHMC) to characterize wavelet coefficients which capture the spectrum semantics (i.e., structural information) at multiple levels. Experimental results show that the approach based on NHMC models can outperform existing approaches relevant in classification tasks.
I Introduction
Hyperspectral remote sensors collect reflected image data simultaneously in hundreds of narrow, adjacent spectral bands that make it possible to derive a continuous spectrum curve for each image cell. Such hyperspectral reflectance curves provide insight into the onground (or near ground) constituent materials in a single remotely sensed pixel.
The identification of ground materials from hyperspectral images often requires comparing the reflectance spectra of the image pixels, extracted endmembers, or ground cover exemplars to a training library of spectra obtained in the laboratory from well characterized samples. There is a rich literature on hyperspectral image classification (see [5] for a recent survey); however, classification methods emphasizing matching to a spectral library and material identification have received less attention [6, 7, 8]. On the one hand, many methods rely on nearest neighbor classification schemes based on one of many possible spectral similarity measures to match the observed test spectra with training library spectra. On the other hand, practitioners have designed feature extraction schemes that capture relevant information, in conjunction with appropriate similarity metrics, in order to discriminate between different materials.
Classification methods based on spectral similarity measures can provide researchers with simple implementation and relatively small computational requirements; however, there is a tradeoff with the amount of storage required for the training spectra as well as with the uneven performance of nearest neighbor methods. For example, in some cases taking the whole spectrum into consideration brings a large amount of redundant information to practitioners, while the role of relevant structural features is weakened.
Practitioners recognize several structural features in the spectral curves of each material as “diagnostic” or characteristic of its chemical makeup, such as the position and shape of absorption bands. Several approaches like the Tetracorder [9] have been proposed to encode such characteristics. However, such techniques require the construction of adhoc rules to characterize instances of each material while new rules must be created when spectral species which were not previously analyzed are added. Parente et al. [10] proposed an approach using parametric models to represent the absorption features. However, it still requires the construction of specific rules to match observations to a training library.
In this paper, we consider the formulation of an information extraction process from hyperspectral signatures via the use of mathematical models for hyperspectral signals. Our goal is to encode the signature’s scientifically meaningful structural features into numerical features, which are referred to as semantic features, without adhoc rules for the spectra of any material type. Our proposed method provides automated extraction of semantic information from the hyperspectral signature, in contrast with the aforementioned diagnostic characteristics designed by hand by expert practitioners. Furthermore, no new rules should need to be constructed when mineral species which were not analyzed before are added.
Mathematical signal models have been used to represent reflectance spectra. More specifically, models leveraging wavelet decompositions are of particular interest because they enable the representation of structural features at different scales. The wavelet transform is a popular tool in many signal processing applications due to the capability of wavelet coefficients to characterize signal discontinuities at different scales and offsets. As mentioned above, the semantic information utilized by researchers is heavily related to the shape of reflectance spectra, which is succinctly represented in the magnitudes of its wavelet coefficients. A coefficient with large magnitude generally indicates a rapid change in its support while a small wavelet coefficient generally implies a smooth region. Existing wavelet approaches are limited to filtering techniques but do not extract features [6, 7].
In this paper, we apply hidden Markov models (HMMs) to the wavelet coefficients derived from the observed hyperspectral signals so that the correlations between wavelet coefficients at adjacent scales can be captured by the models. The HMMs allow us to identify significant vs. nonsignificant portions of the hyperspectral signatures with respect to the database used for training. The applications of HMMs for this purpose is inspired by the hidden Markov tree (HMT) model proposed in [11]. As for the wavelet transform, we use an undecimated wavelet transform (UWT) in order to obtain maximum flexibility on the set of scales and offsets (spectral bands or wavelengths^{1}^{1}1We use these three equivalent terms interchangeably in the sequel.) considered.
Our model for a spectrum encompassing spectral bands takes the form of a collection of nonhomogeneous hidden Markov chains (NHMCs), each corresponding to a particular spectral band. Such a model provides a map from each signal spectrum to a binary space that encodes the structural features at different scales and wavelengths, effectively representing the semantic features that allow for the discrimination of spectra. To the best of our knowledge, the application of statistical wavelet models to the automatic selection of semantically meaningful features in hyperspectral signatures has not been proposed previously.
This paper is organized as follows. Section II introduces the mathematical background behind our hyperspectral signature classification system and reviews relevant existing approaches for the hyperspectral classification task. Section III provides an overview of the proposed feature extraction method, including details about the choice of mother wavelet, statistical model training, and label computing; we also show examples of the semantic information in hyperspectral signatures captured by the proposed features. Section IV describes our experimental test setup as well as the corresponding results. Some conclusions are provided in Section V. Finally, proofs of our theoretical results are presented in the appendix.
Ii Background and Related Work
In this section, we begin by discussing several existing spectral matching approaches. Then, we review the theoretical background for our proposed hyperspectral signature classification system, including wavelet analysis, hidden Markov chain models, and the Viterbi algorithm.
Iia Spectral Matching Measures
A direct comparison of spectral similarity measures taken on the observed hyperspectral signals is the easiest and the most direct way to do spectral matching. Generally speaking, spectral similarity measures can be combined with nearest neighbor classifiers. In this paper we use four commonly used spectral similarity measures. To present these measures, we use and to denote the reflectance or radiance signatures of two hyperspectral image pixel vectors
IiA1 Spectral Angle Measure
The spectral angle measure (SAM) [12] between two reflectance spectra is defined as
A smaller spectral angle indicates larger similarity between the spectra.
IiA2 Euclidean Distance Measure
The Euclidean distance measure (ED) [13] between two reflectance spectra is defined as . As with SAM, smaller ED implies larger similarity between two vectors. The ED measure takes the intensity of two reflectance spectra into account, while the former is invariant to intensity.
IiA3 Spectral Correlation Measure
The spectral correlation measure (SCM) [14] between two reflectance spectra is defined as
where is the mean of the values of all the elements in a reflectance spectrum vector . The SCM can take both positive or negative values; larger positive values are indicative of similarity between spectra.
IiA4 Spectral Information Divergence Measure
The spectral information divergence measure (SID) [15] between two reflectance spectra is defined as , where is regarded as the relative entropy (or KullbackLeibler divergence) of with respect to , which is defined as
Here corresponds to a normalized version of the spectrum at the spectral band, which is interpreted in the relative entropy formulation as a probability distribution.
IiB Wavelet Analysis
The wavelet transform of a signal provides a multiscale analysis of a signal’s content which effectively encodes the locations and scales at which the signal structure is present in a compact fashion [16]. To date, several hyperspectral classification methods based on wavelet transform have been proposed. Most of these classification approaches (e.g. [8, 17, 18]) employ a dyadic/decimated wavelet transform (DWT) as the preprocessing step. Compared with UWT, the DWT provides a more concise representation because it minimizes the amount of redundancy in the coefficients. However, the tradeoff for such redundancy is that UWT provides maximum flexibility on the choice of scales and offsets used in the multiscale analysis, which is desired because it allows for a simple characterization of the spectrum structure at each individual spectral band.
A onedimensional realvalued UWT of an sample signal is composed of wavelet coefficients , each labeled by a scale and offset , where . The coefficients are defined using inner products as , where denotes a sampled version of the mother wavelet function dilated to scale and translated to offset :
To improve the interpretability of the notation, we will change our notation for scales in the sequel from to (i.e., we reverse the ordering of the scales). With this change, small values of correspond to coarse scales while large values of correspond to fine scales. All the coefficients can be organized into a twodimensional matrix of size , where rows represent scales and columns represent wavelengths. In this case, each coefficient , where , has a child coefficient at scale . Similarly, each coefficient at scale has one parent at scale . Such a structure in the wavelet coefficients enables the representation of fluctuations in a spectral signature by chains of large coefficients appearing within the columns of the wavelet coefficient matrix .
IiC Advantages of Haar Wavelet
The Haar wavelet is the simplest possible compact wavelet which has the properties of squarelike shape and discontinuity. These properties makes the Haar wavelet sensitive to a larger range of fluctuations than other mother wavelets and provides it with a lower discriminative power. Thus, the Haar wavelet enables the detection of both slowvarying fluctuations and sudden changes in a signal [16], while not particularly sensitive to small discontinuities (i.e., noise) on a signal, in effect averaging them out over the wavelet support.
Consider the example in Fig. 1, where the figure at the top represents an example hyperspectral signature, while the figures in the middle and at the bottom show the undecimated wavelet coefficient matrix of the spectrum under the Haar and Daubechies4 wavelets, respectively. The middle figure in Fig. 1 shows the capability of Haar wavelets to capture both rapid changes and gently sloping fluctuations in the sample reflectance spectrum.
Similarly, the bottom figure shows that the Daubechies4 wavelet is sensitive to compact and drastic discontinuities (i.e., higher order fluctuations that are often due to noise). Thus, the Daubechies4 wavelet does not provide a good match to semantic information extraction for this example reflectance spectrum. Intuitively, these issues will also be present for other higherorder wavelets, which provide good analytical matches to functions with fast, highorder fluctuations.
In general, wavelet representations of spectral absorption bands are less emphasized under Haar wavelet than under other higher order wavelets. However, this drawback can be alleviated using discretization, which will be described in the next subsection.
IiD Statistical Modeling of Wavelet Coefficients
Crouse et al. [11] proposed the use of hidden Markov models (HMM) to capture the statistics of DWT coefficients. In that paper, the dyadic nature of DWT coefficients gives rise to a hidden Markov tree (HMT) model that characterizes the clustering and persistence properties of wavelet coefficients. The statistical model is constructed based on the wavelet representation of spectra in a training library.
The statistical model is motivated by the compression property of the DWT, which states that the wavelet transform of a piecewise smooth signal generally features a small number of large coefficients and a large number of small coefficients. This property motivates the use of a zeromean Gaussian mixture model (GMM) with two Gaussian components to capture the compression property, where one Gaussian component with a highvariance characterizes the small number of “large” coefficients (labeled with a state ), while a second Gaussian component with a lowvariance characterizes the large number of “small” wavelet coefficients (labeled with a state ). The state of a wavelet coefficient^{1}^{1}1Since the same model is used for each chain of coefficients , , we remove the index from the subscript for simplicity in this sequel whenever possible. is said to be hidden because its value is not explicitly observed. The likelihoods of the two Gaussian components and should meet the condition that . The conditional probability of a particular wavelet coefficient given the value of the state can be written as , where , and the distribution of the same wavelet coefficient can be written as .
In cases where a UWT is used, the persistence property of wavelet coefficients [19, 20] (which implies the high probability of a chain of wavelet coefficients to be consistently small or large across adjacent scales) can be accurately modeled by a nonhomogeneous hidden Markov chain (NHMC) that links the states of wavelet coefficients in the same offset. This means the state of a coefficient is only affected by the state of its parent (if it exists) and by the value of its coefficient . The Markov chain is completely determined by the likelihoods for the first state and the set of state transition matrices for the different parentchild label pairs for :
(1) 
where for .
The training process of an HMM is based on the expectation maximization (EM) algorithm which generates a set of HMM parameters
including the probabilities for the first hidden states, the state transition matrices, and Gaussian variances for each of the states.
We define the matrix containing the collection of state values for all scales and spectral bands.
The iterative parts of the algorithm can be briefly described as follows:

E step: Perform maximum likelihood estimation of the state labels using a forwardbackward algorithm [21]:
this joint conditional probability mass function (PMF) will be used in the M step.

M step: Update model parameters to maximize the expected value of the joint likelihood of the wavelet coefficients and state estimates [11]:

Set . If converged, then stop; otherwise, repeat.
IiE Waveletbased Spectral Matching
Many hyperspectral signature classification approaches have been proposed in the literature, with a subset of them involving wavelet analysis [6, 7, 8, 22, 23]. In this paper, we review two approaches that are particularly close in scope to our proposed method, which will be used for comparison in our numerical experiments. Since our focus in this paper is on hyperspectral classification for individual pixels, we limit our comparison to methods that rely exclusively on the spectral of a given pixel or on features obtained from the pixel’s spectra. More specifically, we do not compare to other methods that use additional information (e.g. spatial information for a HSI) or that assume prior knowledge of the location of semantic information, which is usually obtained from an expert practitioner.
First, Rivard et al. [6] propose a method based on the wavelet decomposition of the spectral data. The obtained wavelet coefficients are separated into two categories: lowscale components of power (LCP) capturing mineral spectral features (corresponding to the first fine scales), and highscale components of power (HCP) containing the overall continuum (corresponding to coarser scales). The coefficients for the LCP spectrum, which capture detailed structural features, are summed across scales at each spectral band. This process can conceptually be described as a filtering approach, since the division into LCP and HCP effectively acts as a highpass filter that preserves only the finescale detailed portion of the spectrum.
A second waveletbased classification approach is proposed in [7]. This second approach applies an UWT on the entire database. The set of wavelet coefficients for each separate wavelength is considered as a separate feature vector. Linear discriminant analysis (LDA) is performed on each one of these vectors for dimensionality reduction purposes. The outputs are grouped into classes, corresponding to the elements of interest, to train either a single multivariate Gaussian distribution or a GMM for each of the classes, where a classification label or score is obtained for each wavelength. Finally, decision fusion is performed among the wavelengths to obtain a single classification label for the spectrum. It is implicitly expected by this method that the number of training samples for each one of the classes is sufficiently large so that the classspecific Gaussian (mixture) models can be accurately constructed.
Iii NHMCBased Feature Extraction and Classification
In this section, we introduce a feature extraction scheme for hyperspectral signatures that exploits a Markov model for the signature’s wavelet coefficients. A wavelet analysis is used in an UWT to capture information on the fluctuations of the spectra. The state labels extracted from the Markov model represent the semantic information relevant for hyperspectral signal processing.
Iiia MultiState Hidden Markov Chain Model
In our system, we choose to use the NHMC model described in Section IID applied to the UWT via the Haar wavelet. We select the Haar wavelet due to its special shape, which allows for the magnitude of the wavelet coefficients to be proportional to the slope of the spectra across the wavelet’s support. Furthermore, the signs of these coefficients are indicative of the slope orientation (increasing or decreasing for negative and positive, respectively).
In contrast to the prior work of [11], we design our NHMC to feature state GMMs for the wavelet coefficients. We increase the number of states from 2 to because a twostate zeromean GMM provides an overly coarse distinction between sharper absorption bands (fluctuations) and flatter regions in a hyperspectral signature, which are usually assigned large and small state labels, respectively. In our cases of interest, spectrum classification requires a labeling granularity for the signature fluctuations that is finer than that achieved by binary labels.
We associate each wavelet coefficient with an unobserved hidden state , where the states have prior probabilities for . Here the state represents smooth regions of the spectral signature, in a fashion similar to the small () state for binary GMMs, while represent a more finely grained set of states for spectral signature fluctuations, similarly to the large () state in binary GMMs. All the weights should meet the condition . Each state is characterized by a zeromean Gaussian distribution for the wavelet coefficient with variance . The value of determines which of the components of the mixture model is used to generate the probability distribution for the wavelet coefficient : . We can then infer that . In analogy with the binary GMM case, we can also define a transition probability matrix
where . Note that the probabilities in the diagonal of are expected to be larger than those in the offdiagonal elements due to the persistence property of wavelet transforms. Note also that all state probabilities for can be derived from the matrices and .
The training of the GMM NHMC is also performed via an EM algorithm. Because of the overlap between wavelet functions at a fixed scale and neighboring offsets, adjacent coefficients may have correlations in relative magnitudes [24]. However, for computational reasons, in this paper we only consider the parentchild relationship of the wavelet coefficients in the same offset. Namely, we train an NHMC separately on each of the wavelengths sampled by the hyperspectral acquisition device. The set of NHMC parameters of a certain spectral band include the probabilities for the first hidden states , the state transition matrices , and the Gaussian variances . In the sequel, we remove from the parameters the dependence on the wavelength index whenever possible.
IiiB Label Computation
Given the model parameters , the state label values for a given observation are obtained using a Viterbi algorithm [21, 11]. For a particular wavelet coefficient , a dimensional conditional probability vector is defined with elements being the conditional PMF of the wavelet coefficient
under each possible state value . A variable is defined as the “best score” that ends in a particular state at scale from its previous state, while the variable is the most likely state at a particular scale to have children with state . The definitions of the two variables are
(2)  
(3)  
(4)  
(5) 
for and . The algorithm also returns the likelihood of a wavelet coefficient matrix under the model as a byproduct. We propose the use of the state label array as classification features for the original hyperspectral signal . It is easy to identify the presence of such features simply by inspecting the labels obtained from the NHMC.
IiiC Additional Modifications to NHMC
As mentioned above, because of the shape of the Haar wavelet function, the signs of Haar wavelet coefficients of a reflectance spectrum capture whether the slopes increase or decrease as a function of wavelength. This characteristic of Haar wavelet coefficients can be utilized to design state labels that capture the slope orientations of the corresponding reflectance spectra. Thus, we make a simple modification by adding the sign of a Haar wavelet coefficient to its counterpart in the corresponding state label matrix. Fig. 2 shows the effect of adding signs to state label matrices. The top two figures represent the reflectance spectrum of a sample material and the corresponding Haar wavelet coefficient matrix, while the bottom two show the corresponding state label matrices with and without being added wavelet coefficient signs, respectively. The figure shows that the fluctuations in the region are predominantly not detected by state labels. Furthermore, one can see many narrow chains of “large” state labels starting at . Increasing the number of GMM state enables a finerscale quantization of spectral signature fluctuations, which is somewhat analogous to increasing the quantization resolution for our wavelet analysis. This is quite important when the Haar wavelet is used due to its sensitivity to a large range of fluctuation orders, which implies a relatively low discriminative power when compared with higherorder wavelet transforms.
Unfortunately, a large number of GMM states might also have negative influence on classification results. The GMM state of a particular wavelet coefficient is determined by the coefficient’s magnitude with respect to those for the rest of the NHMC training spectra, the state label of its parent , and the transition probability matrix . In practice, this dependence causes different maps between coefficient value ranges and GMM states across scales and offsets . This variance often makes it difficult to assess the semantic information in the label array of a spectral signature. In practice, this variance may sometimes affect the interpretability of features obtained from GMM labels. Furthermore, the likelihood of such variability in the valuetostate mappings could increase when we use multistate GMM. Additionally, such variance may have a particularly negative influence on classification schemes based on NN classifiers that act on GMM state label vectors. Thus, we desire a modification to the model that features the simplicity of a binarystate GMM (to preclude mismatch in coefficienttostate mappings across wavelengths and states) and the spectral fluctuation characterization capability of a multistate GMM (providing finer fluctuation characterization than a binarystate GMM).
We propose a solution that combines the advantages of a binarystate GMM and a state GMM, where . Our modified wavelet coefficient statistical model consists of a binarystate NHMC with a “small” state () modeled by a standard zeromean Gaussian distribution and a “large” state () modeled by a mixture of 1 Gaussian distributions. Note that we use numbers here instead of letters for the state labels to distinguish between the state GMM NHMC and the state MOG NHMC. We denote this modified model mixture of Gaussians (MOG) NHMC in the sequel. As desired, this modified model maintains the discriminability between smooth regions and absorption bands in spectral signatures, while providing classification features (binary labels, in this case) that decrease the likelihood of the variability stated above.
In order to obtain a MOG NHMC model, the first step is to train a state GMM NHMC model that yields state labels . After that, all the states are quantized into two states so that we can get a MOG NHMC that yields state labels with probabilities , . One can show that the change of models lead to the following mapping for labels:
(6) 
Similar to (1), we can define a transition probability matrix
for the MOG NHMC, where for and . We have the following pair of intuitive results, whose proves are presented in Appendices A and B.
Lemma 1.
Denote the vector of state probabilities for a wavelet coefficient under the state GMM NHMC as . The corresponding vector of probabilities for the MOG NHMC states can be written as follows:
Lemma 2.
The elements of the MOG NHMC transition matrix can be written in terms of the elements of the GMM NHMC transition matrix as follows:
(7)  
(8)  
(9)  
(10) 
Here and represent state labels ranging from to .
Below is an example of the transform of a state probability vector and transition probability matrix, respectively, where the original number of state is 4:
Correspondingly, we also make small modifications to the label computation scheme from Section IIIB. For the MOG NHMC, equations (2–5) become
respectively, for and . The required conditional probabilities involving can be written as given in the following lemma.
Lemma 3.
The stateconditional probabilities for the MOG NHMC can be given in terms of the stateconditional probabilities for the GMM NHMC as follows:
where denotes a state label ranging from to .
We provide an example comparison between labels obtained from the GMM NHMC and the MOG NHMC in Fig. 3.
IiiD Illustration of Extracted Semantic Information
The state label arrays obtained from the NHMC model characterize four important semantic features of the corresponding hyperspectral signatures: () the orientations of the signature slope, which is reflected in the state label values; () the extent of the signature slope, which is reflected in the duration of corresponding state label values through different wavelengths; () the intensity of the signature slope, which is reflected on the depth of the corresponding state label values through the scales (when GMMs are used); and () the locations of the absorption bands, which are reflected in the locations at which the labels switch from to . In order to showcase the semantic information captured by our designed features, we illustrate these four types of semantic features in several example reflectance spectra. For convenience of illustration, we only use state label arrays based on MOG due to its binary property, which only reflects the orientation of slopes regardless of the intensities. To begin, we calculate the mean of each column in a state label array and then transform it to an integer by using round. In this way, we obtain what we call a state label mean vector of the same length as the corresponding reflectance spectrum whose possible element values are and . Figure 4 shows four example reflectance spectra with the corresponding extracted semantic information based on an NHMC using an MOG with 2 states as well as the corresponding state label arrays. We plot the reflectance spectral curve by using three different colors to encode the value of the state label mean vector: green, red, and blue portions represent wavelengths for which the state label mean vector elements are , , and , respectively. Finally, we calculate all the middle points between the end of a ’s series and the beginning of a ’s series, and mark those points on the plotted reflectance spectra to find the locations of absorption bands. As expected, spectral curves in Fig. 4 have blue increasing slopes, red decreasing slopes, and green flat regions.
IiiE Classification System Overview
We provide an overview of the NHMCbased hyperspectral classification system in Fig. 5. The system consists of two modules: an NHMC model training module and a classification module. While the figure assumes a binarystate Gaussian mixture model (GMM) in the NHMC, as described in Section IID, one can easily formulate a ary GMM state variant, , as described in Section III. The training stage uses a training library of spectra containing samples from the classes of interest to train the NHMC model, which is then used to compute state estimates for each of the training spectra using a Viterbi Algorithm. The state arrays obtained from the NHMC model will then be used as classification features coupled with a classification scheme, e.g., nearestneighbor (NN) or support vector machine (SVM) classification. The testing module considers a spectrum under test and computes the state estimates under the trained NHMC model using the parameters obtained during training. The module then applies the classification scheme being tested, returning the class label of the selected training spectrum.
Iv Classification Experiments and Result Analysis
In this section, we present multiple experimental results that assess the performance of the proposed features in hyperspectral signature classification. We also study the effect of NHMC parameter selections on the classification performance from the corresponding extracted features.
Iva Study Data and Performance Evaluation
The dataset used in this paper is a part of the RELAB spectral database with 26 mineral reflectance spectrum classes. Since the spectra in the original database have different wavelength ranges, we only use the spectral region from 0.35 to 2.6 (if applicable) which contains almost all of the visible and nearinfrared region of the electromagnetic spectrum. We only use the spectra with spectral resolution being 5 nm to eliminate the differences in spectral resolution in different sources. A different number of samples is present in each mineral class. Thus, in order to ensure the same weight of each class in the training process, we use the Hapke mixing model [25] to generate additional mixtures of existing spectra in a given class until all classes have the same number of samples. We do this to prevent different mineral types from having different contributions to the model obtained and influencing the final classification accuracy. The final dataset contains 1690 reflectance spectra with each class including 65 reflectance spectra. Additionally, in order to eliminate the influences caused by illumination conditions, we perform normalization on the whole database by dividing each reflectance spectrum by its maximum value.
We compare different NHMC models (both GMM and MOG with different number of mixed Gaussian components and with/without assigning Haar wavelet coefficient signs to state labels). We first randomly separate the dataset into a training library (including 1352 samples with each class containing 52 reflectance spectra) and a test set (including 338 samples with each class containing 13 reflectance spectra). In order to evaluate the performance of different NHMCbased features, we train these NHMCbased features on the training library. Then we use the Viterbi algorithm to obtain the corresponding state labels for both the training library and test set and use both linear and nonlinear classifiers (nearest neighbor (NN) classifier, support vector machine (SVM) classifier) on the test set to evaluate the classification accuracy of different models.
Unfortunately, the resulting dataset features a significant separation between the different classes, and so it is difficult to differentiate the performances of the different proposed methods, which are very high. In order to discriminate among the methods, we introduced mixing into the database as an attempt to increase the variability among reflectance spectra in each given class. Our mixing methodology is designed to resemble the image blurring process common in hyperspectral imaging. First, we randomly order the reflectance spectra in the database into a 3D array (a socalled datacube) with two spatial dimensions and one spectral dimension. We then perform identical spatial blurring on each wavelength using a pixel Gaussian smoothing operator. Finally, we build a new library from the blurred pixels’ spectra while retaining the original labels. By performing this imagebased blurring, each spectrum in the resulting database exhibits a mixture of structural features from spectra in multiple classes, which provides a more challenging spectrum classification setup. We vary the Gaussian blurring kernel variance among a range of values to adjust the amount of mixing performed: the dominant material percentage (DMP) of the original pixel in the corresponding blurred pixel is obtained as the normalized weight of the central element in a Gaussian smoothing operator. In our experiment, we vary the DMP from to with a step of .
IvB Feature Comparison
For this study, classification performance is evaluated by using NN and SVM classification accuracies. For the NN classifier, three distance metrics are employed: distance, Euclidean () distance, and cosine similarity measure. For the SVM classifier, we use radial basis function (RBF) as the kernel and perform a grid search for the corresponding parameter values (cost and Gaussian variance) that provide best performance for each NHMC model. Both the NHMC model (if applicable) and the classifier (NN or SVM) are trained using the aforementioned training set, and the performance is measured on the aforementioned test set.
Figure 6 shows the classification rates for different NHMC models under different dominant material percentages using the aforementioned NN and SVM classifiers. Additionally, the figure also includes the classification accuracies of the related approaches described in Section IIE. In the figure, different classification features are identified as follows: “Rivard” denotes the approach proposed in [6];^{2}^{2}2Note that “Rivard” only appears in the bottom left figure of Fig. 6 because it is defined specifically in terms of a NN classifier with cosine distance [6]. “Wavelet Coefficient” denotes the classification scheme of using wavelet coefficients as classification features; “Spectral Similarity” denotes spectral similarity matching classification scheme (i.e., the spectra themselves are the input to each NN classifier); “GMM” denotes an NHMC featuring Gaussian mixture models; “MOG” denotes an NHMC featuring mixtures of Gaussians; and “GMM+Sign” and “MOG+Sign” denotes the previous two approaches where Haar wavelet coefficient signs being added to state labels. Our NHMC tests involve NHMC models containing different numbers of mixed Gaussian components; Fig. 6 shows the highest performance among all tested values for the number of mixed Gaussian components, and Tables IIV list the bestperforming values for each DMP.
70%  75%  80%  85%  90%  95%  100%  
GMM  10  6  10  2  710  810  310 
GMM+sign  10  6  4,5,10  2  79  210  210 
MOG  4,7,8  5  7  3,5,79  57,9  610  310 
MOG+sign  4  10  3  10  9  310  310 
70%  75%  80%  85%  90%  95%  100%  

GMM  5,10  5  10  2  9  68,10  310 
GMM+sign  6  3  4,9,10  5  6,9  210  210 
MOG  7,8  5  7  3,5,79  57,9  610  310 
MOG+sign  4,9  4,10  3  710  9  310  310 
70%  75%  80%  85%  90%  95%  100%  
GMM  6  6  5,9,10  4  5,9  6,7,9,10  310 
GMM+sign  9  6,9  3,5  5  5,7  210  210 
MOG  4  4  7  8  57,9  610  310 
MOG+sign  3,5  5  3  68  9  310  310 
70%  75%  80%  85%  90%  95%  100%  
GMM  8  10  4,710  3  410  310  210 
GMM+sign  2  10  2  3,5,8,9  410  210  210 
MOG  6  7  6,7,9,10  9  4  510  310 
MOG+sign  3  3,57  4,5  310  310  310  310 
We highlight some features of the obtained results:

In most cases, the use of signs in the NHMC features improves performance with respect to their original counterparts.

In the NN classifiers, GMM performs better than MOG for lower DMPs, which are more challenging settings, while MOG with additional signs outperforms GMM for DMPs closer to 100%. Nonetheless, in most cases MOG without wavelet coefficient signs provides the worst performance.

While the performance of NHMC methods with SVM classifiers is higher than that obtained with NN classifiers, they are outperformed by the wavelet coefficient approach. We conjecture that this is due to the discrete nature of NHMC labels, which are not as easily leveraged in the SVM’s search for a separation boundary from support vectors.
We also attempted to implement the approach proposed in [7]. However, because of the lack of sufficient data for individual classes, we obtained several illconditioned covariance matrices when constructing multivariate GMMs. Thus, we do not include the comparison with this approach in this paper.
IvC NHMC Parameters
Next, we evaluate the effect of the number of states included in the NHMC model on the performance of the tested classifiers. We set the DMP to 85% for concreteness, and evaluate the classification performance of all proposed NHMC features with NN and SVM classifiers as a function of the number of states, which varies between 2 and 10 for GMM and between 3 and 10 for MOG. Fig. 7 shows the variation tendency of classification accuracy with increasing number of mixed Gaussian components using different classifiers and similarity metrics.
From these four figures, we see that MOG with additional wavelet coefficient signs provides relatively consistent performance compared with other NHMCbased models. Additionally, in terms of classification accuracy, the two model configurations using MOG provide two performance extremes: by adding wavelet coefficient signs we obtain the highest classification performance, while MOG without signs provides the lowest one. As mentioned earlier, MOG combines the simplicity of a binarystate GMM and the spectral fluctuation characterization capability of a multistate GMM. In that case, if we do not consider the signs of the wavelet coefficient, spectra that have approximately matching locations for their fluctuations while exhibiting differing magnitudes and orientations will be matched to similar MOG label vectors. The reason is that a binarystate GMM form could assign the same state labels to several fluctuations of different levels and orientations. However, if Haar wavelet coefficient signs are added, the state labels better reflect the spectral fluctuation orientation information.
V Conclusion
In this paper, we proposed the design of a feature extraction scheme for hyperspectral signatures that preserves the semantic information used by practitioners in signature discrimination (i.e., location of distinguishing fluctuations and discontinuities). Our approach is automated thanks to the use of statistical models for wavelet transform coefficients, which succinctly capture the location and magnitude of fluctuations in the spectra observed. Furthermore, the statistical model also enables a segmentation of the spectra into informative and noninformative portions. The success of statistical modeling is mostly dependent on the availability of a largescale database for training containing representative examples of the spectra that are observed by the sensing system.
We also tested the quality of the preservation of semantic information in our proposed features by using a simple example hyperspectral classification system based on nearest neighbor search. We also compared our feature extraction method with three existing feature extraction approaches for classification; the first approach is spectral matching, which performs classification directly on the hyperspectral signature; the second approach performs classification directly on wavelet coefficients, and the third approach computes features as the sum of wavelet coefficients of certain scales. We showed that the performance of our proposed features meets or exceeds that of baselines relying on spectral matching and wavelet coefficient representations, in particular for high DMP.
While the performance of each method we tested decreases as the DMP is reduced, the reduction is stronger for the MOG and GMM methods in comparison with some of their counterparts (in particular, to the case where NN with cosine similarity is applied directly on the spectra). We believe that this effect is due to the additional difficulty of modeling signal classes of increased variability (as the DMP decreases) using the extracted binary features. Nonetheless, we note that even with this handicap the performance of the best combinations of NHMC features and NN classifiers exceeds the performance of the comparison baseline methods when the DMP is sufficiently large. Furthermore, we believe that the size of the datasets we use here, while much larger than that of our previous results [1, 2, 3], may still be insufficient to fully exploit the power of the statistical models leveraged here. Thus, further work will focus on expanding the size of database and investigating additional modifications to the feature extraction scheme and the underlying statistical models. As an example, NHMC models based on nonzeromean GMM are an attractive alternative to be pursued in the future, as in certain cases the histogram of wavelet coefficients cannot be accurately modeled by zeromean Gaussian mixture models.
Acknowledgements
We thank Mr. Ping Fung for providing an efficient implementation of our NHMC training code that largely decreased the time of running experiments.
Appendix A Proof of Lemma 1
Appendix B Proof of Lemma 2
The relationship between the original state labels , and the combined state labels , can be characterized by a directed graphical model shown in Fig. 8. By considering all possible transitions from to through the state transitions to and the map above, and denoting
we appeal to the law of total probability to write
(11) 
From the map in equation (6), we can also infer that , , , , and
where . After combining the equalities above with for , we can get the four elements in new matrices expressed in , proving the lemma.
References
 [1] M. Parente and M. F. Duarte, “A new semantic waveletbased spectral representation,” in IEEE Workshop on Hyperspectral Image and Signal Proc.: Evolution in Remote Sensing (WHISPERS), Gainesville, FL, June 2013.
 [2] M. F. Duarte and M. Parente, “Nonhomogeneous hidden Markov chain models for waveletbased hyperspectral image,” in Allerton Conf. Communication, Control, and Computing (Allerton), Monticello, IL, Oct. 2013, pp. 154–159.
 [3] S. Feng, Y. Itoh, M. Parente, and M. F. Duarte, “Tailoring nonhomogeneous Markov chain wavelet models for hyperspectral signature classification,” in IEEE Int. Conf. Image Processing (ICIP), Paris, France, Oct. 2014, pp. 5073–5077.
 [4] S. Feng, M. Duarte, and M. Parente, “Universality of waveletbased nonhomogeneous hidden Markov chain model features for hyperspectral signatures,” in IEEE Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), 2015, pp. 19–27.
 [5] G. CampsValls, D. Tuia, L. Bruzzone, and J. Atli Benediktsson, “Advances in hyperspectral image classification: Earth monitoring with statistical learning methods,” IEEE Signal Processing Magazine, vol. 31, no. 1, pp. 45–54, Jan. 2014.
 [6] B. Rivard, J. Feng, A. Gallie, and A. SanchezAzofeifa, “Continuous wavelets for the improved use of spectral libraries and hyperspectral data,” Remote Sensing of Environment, vol. 112, no. 6, pp. 2850–2862, 2008.
 [7] S. Prasad, W. Li, J. E. Fowler, and L. M. Bruce, “Information fusion in the redundantwavelettransform domain for noiserobust hyperspectral classification,” IEEE Trans. Geoscience and Remote Sensing, vol. 50, no. 9, pp. 33 473–3486, 2012.
 [8] X. Zhang, N. H. Younan, and C. G. O. Hara, “Wavelet domain statistical hyperspectral soil texture classification,” IEEE Trans. Geoscience and Remote Sensing, vol. 43, no. 3, pp. 615–618, Mar. 2005.
 [9] R. Clark, G. A. Swayze, K. Livo, S. Sutley, J. Dalton, R. McDougal, and C. Gent, “Imaging spectroscopy: Search and planetary remote sensing with the USGS Tetracorder and expert systems,” Journal of Geophysical Research: Planets, vol. 108, no. E12, Dec. 2003.
 [10] M. Parente, H. D. Makarewicz, and J. L. Bishop, “Decomposition of mineral absorption bands using nonlinear least squares curve fittening: application to martian meteorites and CRISM data,” Planetary and Space Science, vol. 59, no. 5–6, pp. 423–442, 2011.
 [11] M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, “Waveletbased statistical signal processing using hidden Markov models,” IEEE Trans. Signal Processing, vol. 46, no. 4, pp. 886–902, 1998.
 [12] F. A. Kruse, A. B. Lefkoff, J. W. Boardman, K. B. Heidebrecht, A. T. Shapiro, P. J. Barloon, and F. H. Goetz, “The Spectral Image Processing System (SIPS) – Interactive visulization and analysis of imaging spectrometer data,” Remote Sensing of Environment, vol. 44, no. 2–3, pp. 145–163, May 1993.
 [13] J. N. Sweet, “The spectral similarity scale and its application to the classification of hyperspectral remote sensing data,” IEEE Workshop on Advances in Techniques for Analysis of Remotely Sensed Data, pp. 92–99, 1993.
 [14] F. van der Meer and W. Bakker, “CCSM: Cross correlogram spectral matching,” International Journal of Remote Sensing, vol. 18, no. 3, pp. 1197–1201, 1997.
 [15] C. I. Chang, “An informationtheoretic approach to spectral variability, similarity, and discrmination for hyperspectral image analysis,” IEEE Trans. Information Theory, vol. 46, no. 5, pp. 1927–1932, May 2000.
 [16] S. Mallat, A wavelet tour of signal processing. San Diego, CA: Academic Press, 1999.
 [17] K. Masood, “Hyperspectral imaging with wavelet transform for classification of colon tissue biopsy samples,” in Proc. SPIE 7073, Applications of Digital Image Processing XXXI, ser. Proc. SPIE, vol. 7073, San Diego, CA, Aug. 2008.
 [18] T. West, S. Prasad, and L. M. Bruce, “Multiclassifiers and decision fusion in the wavelet domain for exploitation of hyperspectral data,” in IEEE Int. Sym. Geoscience and Remote Sensing (IGARSS), 2007, pp. 4850–4853.
 [19] S. Mallat and W. Hwang, “Singularity detection and processing with wavelets,” IEEE Trans. Information Theory, vol. 38, no. 2, pp. 617–643, Mar. 1992.
 [20] S. Mallat and S. Zhong, “Characterization of signals from multiscale edges,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 14, no. 7, pp. 710–732, Jul. 1992.
 [21] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
 [22] Y. Qian, M. Ye, and J. Zhou, “Hyperspectral image classification based on structured sparse logistic regression and threedimensional wavelet texture features,” IEEE Trans. Geoscience and Remote Sensing, vol. 51, no. 4, pp. 2276–2291, 2013.
 [23] L. Shen and S. Jia, “Threedimensional gabor wavelets for pixelbased hyperspectral imagery classification,” IEEE Trans. Geoscience and Remote Sensing, vol. 49, no. 12, pp. 5039–5046, 2011.
 [24] M. T. Orchard and K. Ramchandran, “An investigation of waveletbased image coding using an engropyconstrained quantization framework,” in Data Compression Conf. (DCC), Snowbird, UT, Mar. 1994, pp. 341–350.
 [25] B. Hapke, Theory of reflectance and emittance spectroscopy. Cambridge University Press, 2012.