HyperSpectral Image Analysis with PartiallyLatent Regression and Spatial Markov Dependencies
Abstract
Hyperspectral data can be analyzed to recover physical properties at large planetary scales. This involves resolving inverse problems which can be addressed within machine learning, with the advantage that, once a relationship between physical parameters and spectra has been established in a datadriven fashion, the learned relationship can be used to estimate physical parameters for new hyperspectral observations. Within this framework, we propose a spatiallyconstrained and partiallylatent regression method which maps highdimensional inputs (hyperspectral images) onto lowdimensional responses (physical parameters such as the local chemical composition of the soil). The proposed regression model comprises two key features. Firstly, it combines a Gaussian mixture of locallylinear mappings (GLLiM) with a partiallylatent response model. While the former makes highdimensional regression tractable, the latter enables to deal with physical parameters that cannot be observed or, more generally, with data contaminated by experimental artifacts that cannot be explained with noise models. Secondly, spatial constraints are introduced in the model through a Markov random field (MRF) prior which provides a spatial structure to the Gaussianmixture hidden variables. Experiments conducted on a database composed of remotely sensed observations collected from the Mars planet by the Mars Express orbiter demonstrate the effectiveness of the proposed model.
I Introduction
In modern geosciences, the use of remotely sensed data acquired from airborne, spacecraft, or satellite sensors allows to acquire multimodal observations at planetary levels [1, 2, 3, 4]. Nowadays, hyperspectral observations are used to monitor geophysical and environmental phenomena characterized by complex spectral signatures covering wide spectrum ranges.
Usually, recovering physical parameters such as the local chemical composition of the soil from hyperspectral images involves solving inverse problems. This typically requires building complex highdimensional to lowdimensional mappings relating the hyperspectral space to the physical parameter spaces. Many approaches can be considered such as physicallybased models, which build hightolow dimensional mappings from geophysical models [5, 6, 7, 8, 9]. These models follow the laws of physical phenomena and are driven by causetoeffect relationships. In [5], vegetation reflectance models are built to extract vegetation variables from satellite sensors directional and spectral information. [6, 7] integrate hyperspectral observations to a canopy radiative transfer model for forest species classification, and forest biophysical and biochemical properties estimation. [8, 9] use multispectral observations from multiple sensors, combined with physical radiative transfer models, to estimate ice surface characteristics such as temperature, concentration, and thickness. There are issues related to physicallybased models, namely: expert knowledge is required for their construction. Moreover, due to strong nonlinear interactions between a large number of variables, their inversion is not straightforward. Indeed, inverse problems are in general illposed in the presence of real data. Thus, every model requires a carefully designed inversion procedure.
Another family of approaches learn inputoutput mappings in a data driven fashion. These approaches have several advantages. Firstly, they do not require complete knowledge of the underlying physical phenomena. Therefore, they can be applied whenever training data and reliable models are available. Secondly, learned models can be readily applied to newly sensed data [10]. Furthermore, in situations where expert knowledge is available, learning based approaches can benefit from physicsbased models, e.g., by exploiting training data produced by radiative transfer models [11]. Machine learning methods are very popular for hyperspectral data classification in the case of discretevalued output variables. The reader is referred to [12] for an up to date review of stateoftheart classification models applied to hyperspectral data. Recently, [13] proposed a deep learning approach for hyperspectral data classification. When the output variables are continuous, existing regression methods are mainly concerned with hyperspectral data unmixing where the goal is to identify endmembers constituents together with their corresponding abundance from hyperspectral observed mixtures [14, 15].
In the machine learning literature, regression models have been extensively studied. When dealing with highdimensional input data, such as hyperspectral observations, the vast majority of methods start by reducing the dimensionality followed by regression [16, 17, 18]. Such twostep approaches cannot be conveniently expressed in terms of a single optimization problem. Moreover, this presents the risk to map the input onto an intermediate lowdimensional space that does not necessarily contain the information needed to correctly predict the output. To estimate nonlinear mappings, mixtures of locally linear models were proposed [19, 20, 21], but not in the case of highdimensional inputs and in the presence of partiallyobserved responses. An alternative popular approach consists in the use of kernel functions, with the drawbacks that these functions cannot be appropriately chosen automatically, and that the learned mappings cannot be inverted [22, 23, 17, 24].
In this paper we propose a learning based regression model that maps highdimensional hyperspectral observations onto lowdimensional geophysical parameters. The proposed model generalizes the Gaussian mixture of locallylinear mapping (GLLiM) model recently proposed [25]. The design principles of GLLiM consist of first, learning the parameters of an inverse lowtohigh dimensional probabilistic mixture of locallylinear affine transformations, and second, deriving the forward posterior conditional density of the response, given an observed input and the learned model parameters. Exchanging the roles of the input and of the response variables during training, makes highdimensional regression tractable and the conditional density of the response has a closedform solution. Moreover, the model is designed such that it can deal with a partiallylatent response: the vectorvalued response variable is composed of both observed entries and latent (unobserved) entries. This enables to handle in a principled fashion the presence, in the training data, of physical parameters and experimental artifacts that cannot be explicitly measured.
With respect to [25], the main novelty of the model proposed in this paper is the introduction of a Markov random field (MRF) prior on the hidden (assignment) variables of the mixture model. This prior explicitly enforces spatial regularity of the reconstructed geophysical fields. Our model integrates highdimensional to lowdimensional regression and MRF spatial regularization into a unified Bayesian framework. Because model parameter learning is intractable due to the complex Markov interactions between the variables, in particular because of the MRF prior, the parameters are estimated via an approximate variational Bayesian learning scheme. We provide a detailed description of a variational EM (VEM) procedure that is used for learning the model parameters from training data. Our modeling is quite original as in general MRF priors are enforced onto response variable fields [26, 14]. Furthermore, because the mixture’s hidden variables are discrete and finite, while response variables are continuous, our setting is computationally more tractable. Another interesting feature is that the Markov field is directly included in the generative modeling of the hightolow dimensional mapping. In general, such dimension reduction aims at representing spatial vectors in a lower more tractable space in which noise can be better characterized. Indeed, the dimension of the space spanned by spectra from an image is generally much lower than the available number of bands. When the goal is to characterize observed spectra with a limited number of features (such as abundances or physical characteristics), identifying appropriate subspaces can then significantly improve performance. This reduction step is then often an intermediate step before additional refinement such as the inclusion of contextual information is performed. Typically, in [27], dimension reduction and Markov modeling are carried out in two successive steps and assuming a linear mixing model. This is quite different from our approach which aims at modeling and identifying nonlinear mappings and not necessarily in the unmixing context only. In our work, the hightolow mapping is a target while it is only a convenient intermediate step in most studies. Of course, considering the high level of activity in hyperspectral image analysis, it would be challenging to present an exhaustive view of the subject but we believe that methods using subspace reduction in this field correspond to a different use of dimensionality reduction. We refer to [28] for one recent review on both subspace identification and spatial modeling.
The model that we propose is also related to hyperspectral data unmixing [28, 29, 14] since in our model the constituting components of the hyperspectral data are segregated with a locallyaffine regression model. However and in contrast to existing methods, our model, through the presence of the partiallylatent output, is able to handle the presence of non identifiable components in the input data. This particular property could be interpreted as an ability to partially unmix hyperspectral observations while letting a remaining part unexplained.
The effectiveness of the proposed model is assessed using datasets of simulated and real hyperspectral images of the OMEGA imaging spectrometer on board of the Mars Express spacecraft [1]. Thorough quantitative and qualitative results demonstrate that our model outperforms several stateoftheart regression models. Furthermore, the fact that we integrate an MRF prior produces spatiallyregular parameter maps.
The remainder of this paper is organized as follows. Section II summarizes the highdimensional partiallylatent regression model [25]. Section III presents the novel MRF spatiallyconstrained regression model. Section IV applies the proposed method to hyperspectral images collected from Mars with the OMEGA instrument. Finally, Section V draws conclusions and discusses possible future research challenges.
Ii Inverse Regression with PartiallyLatent Response Variables
In this section, we summarize the inverse regression model proposed in [25]. This model maps lowdimensional variables onto highdimensional variables . To each low dimensional variable corresponds a high dimensional variable with . We consider the hybrid^{1}^{1}1hybrid refers to the observed and latent structure of the response variable . Gaussian locallylinear mapping (hGLLiM) framework proposed in [25] that treats each as a partially latent variable, namely:
(1) 
where is observed and is latent, with . Let us consider and , where is a realization of , such that is the noisy image of obtained from a component locallyaffine transformation. This is modeled by introducing the latent variable such that:
(2) 
where is the indicator function, matrix and vector define an affine transformation and is an error term capturing both the observation noise in and the reconstruction error due to the local affine approximation. Under the assumption that is a zeromean Gaussian variable with covariance matrix that does not depend on , , and , we obtain:
(3) 
where is the set of all the model parameters. It has to be noticed that the roles of the input and of the output (or response) variables are interchanged, i.e., the lowdimensional variable is the regressor.
To complete the definition and enforce the affine transformations to be local, each is assumed to follow a mixture of Gaussians defined by
(4) 
with means and covariance matrices . In the standard hGLLiM model, a multinomial prior
(5) 
is used on the hidden variable . An efficient closedform EM algorithm [25], provides maximum likelihood estimates of the parameters given the observed training data .
Iia Inverse Regression Strategy
Once the parameter vector has been estimated, one obtains an inverse regression from (lowdimensional space) to (highdimensional space), using the following inverse conditional density:
(6) 
The forward regression of interest, i.e., from (the high dimension) to (the low dimension), is obtained from the forward conditional density:
(7) 
The latter involves the forward regression parameters:
(8) 
that can be analytically derived from the inverse regression parameters that were previously learned:
(9)  
(10)  
(11)  
(12)  
(13)  
(14) 
Learning an inverse regression model to obtain the desired forward mapping is referred to as inverse regression strategy. This is motivated by a drastic reduction of the model’s size, making tractable its estimation. Indeed, if we consider that the error vectors are modeled by isotropic Gaussian noise with equal variances, the dimension of the learned parameter vector is , while it would be using a forward model^{2}^{2}2Recall that ..
IiB Estimating the Latent Dimension with BIC
The proposed model involves latent response variables of dimensionality . The Bayesian information criterion (BIC) can be used to select a value for the latent dimension. Once a set of model parameters has been learned, BIC can be computed as follows:
(15) 
where denotes the observeddata loglikelihood and denotes the dimension of the complete parameter vector . Assuming isotropic and equal noise covariance matrices , we have:
(16)  
(17) 
A natural way to define a value for is, for a given value of , to train the model with different values of , and select the value minimizing BIC. We will refer to the corresponding method as hGLLiMBIC. While being computationally demanding, the BIC parameter selection procedure has the advantage of not requiring manually setting .
Iii Variational EM for the Spatial Markov Model
In this section we propose and derive an extension of hGLLiM that accounts for spatially located variables, i.e., is defined at the sites of hyperspectral data, where each site typically corresponds to a graph vertex or to a pixel. The novelty of the model proposed in this paper with respect to [25] lies in the explicit spatial dependencies which are modeled by introducing the assumption that variables are distributed according to a discrete Markov random field (MRF) on with pairwise potentials. The multinomial prior (5) is therefore replaced by:
(18) 
with the energy function and the normalizing constant . In these expressions, denotes the neighbors of site in (typically the vertices linked to by an edge), , is a vector of external field parameters and is a scalar interaction parameter. In practice, a constraint on is required to avoid identifiability issues. Thus, we assume . We also assume that variables from different site locations are conditionally independent given . Therefore, the parameters of this model are defined by:
(19)  
(20) 
Because defined in (1) is a partiallylatent variable, the parameter estimation procedure exploits observed pairs while being constrained by the presence of the latent variables . The decomposition of into observed and latent components implies that the model parameters , and must be decomposed as well. Assuming the independence of and given , this gives:
Considering the complete data, with observed variables and hidden variables , the corresponding expectationmaximisation (EM) algorithm is intractable due to the complex Markovian dependencies between variables . Therefore, we resort to a variational approximation which consists of approximating the true posterior by a distribution which factorizes as
From this factorization follows a variational EM (VEM) parameter learning procedure which consists of successive Esteps denoted by E for , followed by variational Msteps.
Iiia Variational E step
Given current parameters values and distribution , the E step consists of computing the probability distribution which is proportional to:
(21) 
where (resp. ) denotes the variables (resp. ).
IiiB Variational MSteps
The variational Msteps consist in estimating the parameters that maximize the expectation:
(26) 
It follows that the update of parameters in is identical to that of the standard hGLLiM EM [25], where the posterior densities are now replaced by the variational distributions . As in [25], the update of noise covariance matrices depends on the way they are constrained. In practice, we enforce isotropic covariances for all in order to avoid overparameterization.
Once has been updated, the proposed MRF model induces an additional Mstep for the update of . This step does not admit an explicit closedform expression but can be solved numerically using gradient descent schemes. It is straightforward to show that the maximization of (26) in admits a unique solution. Indeed, it is equivalent to solve
(27)  
Denoting the gradient vector and Hessian matrix respectively by and , it comes
(28)  
(29)  
The last expectations in (28) and (29) are taken over the Potts prior (18). It follows that whenever is linear in , is zero, the Hessian matrix is a semidefinite negative matrix and the function to optimize is concave. Unfortunately, due to the intractable normalizing constant , expressions (28) and (29) are not directly available. It is necessary to approximate the terms involving the true MRF prior using an approximation. A natural approach is to use:
(30) 
with defined by:
(31) 
This MRF prior approximation induced by the posterior variational approximation has been proposed in [30] and also exploited in [31]. It follows that the gradient in (28) can be approximated by (with ):
(32) 
IiiC Inverse Regression Strategy with Markov Random Fields
With this proposed MRF extension of hGLLiM, the derivation of the desired forward conditional density (7) is less straightforward. Indeed, while equations (7)(8) remain valid, they now involve the marginal of the MRF prior which is known to be hard to compute (see (18)). Approximations are required and a natural candidate is which has to be computed iteratively using the variational EM algorithm with (or ) held fixed. Note that could also either be set fixed to the learned values or reestimated. At iteration (i), for each , given current values for and , is updated using:
(33) 
Although the update formula for is synchronous, an asynchronous update is known to be more efficient. Then parameters are obtained by solving (27) as explained in Section IIIB. At algorithm convergence, is set as:
(34) 
When required, a response corresponding to an input can be obtained using the expectation of that can be approximated using:
(35) 
IiiD Prediction with spatial constraints: MRFhGLLiM
To complete this section, we provide an overview of the proposed procedure implementing the inverse regression strategy while accounting for spatial interactions. It divides into two steps, a learning step followed by a prediction step. Each step is summarized below:

Spatial inverseregression learning:

Training observed data: ;

Missing variables: ;

Parameters: and ;

Estimation procedure: VEM (section III);


Spatial forwardprediction:

Observed data: (no observed );

Missing variables: where ;

Model: MRFhGLLiM with fixed to ;

Output:

,

Estimated predictions for , i.e., computed using approximation (35).


Iv Retrieving Mars Physical Properties from Hyperspectral Images
In order to evaluate the proposed model, we use a database composed of synthetic spectra with their associated physical parameter values. This database was generated using the radiative transfer model presented in [11] in order to investigate hyperspectral images collected from the imaging spectrometer OMEGA [1] onboard of the Mars Express spacecraft. The synthetic dataset is composed of 15,407 spectra associated with five physical parameter values, namely:

proportion of water ice (Prop. HO),

proportion of CO ice (Prop. CO),

proportion of dust (Prop. Dust),

grain size of water ice (Size HO), and

grain size of CO ice (Size CO).
Each spectrum is made of 184 wavelengths. Compared to other available hyperspectral image databases, e.g. [32, 33], this one combines several desirable features, making it suitable for the evaluation of the proposed model for the following reasons. First, the radiative transfer model can be used to generate as much data as required to reliably estimate the model parameters. Second, generated spectra present a highly nonlinear relationship with underlying physical parameters. Third, the associated real hyperspectral images of Mars surface contain spatial dependencies between neighboring pixels.
Iva HybridGLLiM Evaluation on Synthetic Individual Spectra
In order to numerically evaluate the performances of MRFhGLLiM on the synthetic data, we adopted a crossvalidation strategy: one part of the synthetic database is used for training and the remaining part is used for testing. Note that because all these spectra were generated independently, there are no spatial dependencies present in the training dataset. Therefore, the value of the MRF interaction parameter was set to 0 in that case, i.e., no neighborhood dependencies are accounted for during training. With this setting, the proposed MRFhGLLiM and hGLLiM models are equivalent. The synthetic data are used, firstly to learn an inverse lowdimensional to highdimensional regression function between physical parameters and spectra from the database, and secondly to estimate the unknown physical parameters corresponding to a newly observed individual spectrum and using the forward mapping (35), where is replaced by , or equivalently .
Method  Prop. HO  Prop. CO  Prop. Dust  Size HO  Size CO 

JGMM  
SIR1  
SIR2  
RVM  
MLE (hGLLiM0)  
hGLLiM1  
hGLLiM2 
Method  Proportion of CO ice  Proportion of dust  Grain size of HO ice 

JGMM  
SIR1  
SIR2  
RVM  
MLE (hGLLiM0)  
hGLLiM1  
hGLLiM2  
hGLLiM3  
hGLLiM4  
hGLLiM5  
hGLLiM20  
hGLLiMBIC 
The hGLLiM algorithm was compared to JGMM (joint Gaussian mixture model) [20], MLE (mixture of linear experts) [19], SIR (sliced inverse regression) [16] and RVM (multivariate relevance vector machine) [24]. When a value is used for training, we denote by hGLLiM the hybrid GLLiM algorithm. As shown in [25], JGMM and MLE are equivalent to hGLLiM0 with covariance matrices being respectively unconstrained and diagonal. SIR is used with one (SIR1) or two (SIR2) principal axes for dimensionality reduction, 20 slices^{3}^{3}3The number of slices is known to have very little influence on the results., and polynomial regression of order three^{4}^{4}4Higher orders did not show significant improvements in our experiments.. SIR quantizes the lowdimensional data into slices or clusters which in turn induces a quantization of the space. Each slice (all points that map to the same slice) is then replaced with its mean and PCA is carried out on these means. The resulting dimensionality reduction is then informed by values through the preliminary slicing. RVM may be viewed as a multivariate probabilistic formulation of support vector regression [34]. As with any kernel method, RVM critically depends on the choice of a kernel function. Using the authors’ publicly available software^{5}^{5}5http://www.mvrvm.com/Multivariate_Relevance_Vector , we ran preliminary tests to determine an optimal kernel choice for each dataset under consideration. We tested 14 kernels with 10 different scales, ranging from 1 to 30, resulting in a total of 140 kernels for each dataset.
As already mentioned an objective evaluation is performed using crossvalidation. A total of 10,000 training inputoutput couples are randomly selected twenty times as the training set, and the 5,407 remaining spectra are used for testing. For all models, the training data are normalized to zero mean and unit variance. Normalization is then reversed on test data and estimated output to obtain the final results. This normalization showed to significantly improve the performances of all methods. We use for MLE, hGLLiM and JGMM. MLE and JGMM are constrained with isotropic covariance matrices as this parametrization achieves the best results. For RVM, a third degree polynomial kernel with scale 6 is selected as the top performing among 140 tested kernels using crossvalidation.
As a quality measure of the estimated parameters, we compute the normalized root meansquare error (NRMSE) which quantifies the difference between estimated and groundtruth parameter values and :
(36) 
NRMSE is normalized thus enabling direct comparison between the parameters which are of very different range: closer to zero, more accurate the predicted values.
Fully Observed Output
Using these different regression models, from 184dimensional spectra, we start by retrieving the five parameter values, namely, proportion of water ice (Prop. HO), proportion of CO ice (Prop. CO), proportion of dust (Prop. Dust), grain size of water ice (Size HO), and grain size of CO ice (Size CO). The training is done with synthetic spectra annotated with the five parameters, and hence output variables are fully observed. Results obtained with the five methods are summarized in Table I. MLE and hGLLiM perform similarly and outperform the three other methods in estimating each parameter. The average training time of MLE was 12.6 seconds and the average training time of hGLLiM was 18.9 seconds using our Matlab implementation. In this task, as expected, using 1 or 2 additional latent components in hGLLiM only shows a slight improvement compared to MLE, since the output is fully observed during training. Notice that the obtained mean NRMSE with the proportion of water (Table I, column 2) and the grain size of CO (Table I, column 6) parameters are very large, i.e., above for all methods. This suggests that the relationship between these parameters and observed spectra is complex and hard to learn.
Partially Latent Output
In order to fully illustrate the potential of hybrid GLLiM, we deliberately ignore the values of two of the five parameters in the database and treat these two parameters as latent variables. We chose to ignore the proportion of water ice and the grain size of CO ice since they yield the worst reconstruction errors when they are fully observed (see Table I). A previous study [10] reveals that these two parameters are sensitive to the same wavelengths and are suspected to mix with the other three parameters in the radiative transfer model used to generate the synthetic data. Therefore, not only that they are difficult to estimate, but they also seem to affect the estimation of the three other parameters.
Method  Prop. HO  Prop. CO  Prop. Dust  Size HO  Size CO 

hGLLiM0  
MRFhGLLiM0 
Table II shows NRMSE values obtained for the three remaining parameters. The groundtruth latent variable dimension is , and accordingly, the empirically best latent dimension for hGLLiM is . hGLLiM2 outperforms all the other methods on that task, more precisely the error is lower than the second bestperforming method, RVM, closely followed by MLE. No significant difference is observed between hGLLiM2 and hGLLiM3. The average training time of MLE was 12.2 seconds and the average training time of hGLLiM was 18.8 seconds using our Matlab implementation. Due to the computation of the kernel matrix, the training time of RVM was about 10 times larger.
We also tested the selection of the latent dimension () based on BIC (Section IIB). For each training set, the hGLLiMBIC method minimizes BIC with the latent dimension in the range , and used the corresponding model to perform the regression. Interestingly, hGLLiMBIC performed very well on these large training sets () as it correctly selects for the 20 training sets used by BIC (the BIC selection could differ with the training set), yielding the same results as those obtained with hGLLiM2. Interestingly, notice how hGLLiM and almost all the methods benefit from removing the “faulty” parameters that are difficult to estimate. In all cases, a slight performance increase for the reconstruction of the remaining three others parameters is observed when comparing to the results of Table I.
IvB MRFhGLLIM Evaluation on Synthetic Hyperspectral Images
The data used in the previous section is not spatial, in the sense that there is no dependency between the different vectors of observations. In order to assess the advantage of the proposed MRF extension of hGLLiM when spatial dependencies exist, we synthesized 50 hyperspectral images from the synthetic spectra dataset presented in the previous section. We generated pixel images consisting of 12 () square regions (see Figure 1), where each region corresponds to a set of similar underlying physical parameters. More specifically, for each region, a vector of physical parameters in was picked, and the region was filled with this vector as well as its 15 nearest neighbors in the dataset. A synthetic hyperspectral image was then generated from associated spectra, and corrupted with additive white Gaussian noise with a signaltonoiseratio of 6 dB.
We first learned the model parameters using hGLLiM0 (equivalent to MLE [19]) and 10,000 individual spectra associated to all 5 physical parameters, as explained in the previous section. We then compared the performance of MRFhGLLiM0 when was automatically estimated from the test images (section IIIC) and when was set to 0, i.e. no spatial dependency, which corresponds to hGLLiM0. The spectra used in the 50 test synthetic images were outside the training set. The MRF prior was incorporated using a 8pixel neighborhood and MRFhGLLiM0 estimated values comprised between 2.8 et 3.1 for these synthetic images. Table III shows the NRMSE obtained with both methods. The use of spatial dependencies significantly reduced the estimation errors for all concentrations, in all 50 test images. For each image, the averaged across pixels errors were computed for both hGLLiM0 and MRFhGLLiM0 and the corresponding two samples of size 50 were compared using a paired ttest. The test confirmed a highly signficant gain in favor of MRFhGLLiM0. The average computational time for a test image is 220s using hGLLiM0 and 550 using MRFhGLLiM0 with our Matlab implementation.
IvC Real Mars Hyperspectral Images
We now investigate the potential of the proposed MRFhGLLiM method using a dataset of real hyperspectral images collected from the imaging spectrometer OMEGA instrument [1] on board of the Mars Express spacecraft. We adopt the evaluation protocol proposed in [10]. An adequately selected subset of 6,983 spectra of the synthetic database is used for model training. Then, trained models are applied to the real data made of observed spectra acquired from satellite hyperspectral sensors. For evaluation, we focus on data from Mars’s South polar cap. Since groundtruth for the physical properties of Mars’s South pole regions is not currently available, we consider a qualitative evaluation by comparing five models: hGLLiM2, MRFhGLLiM2, and the three best performing methods, among the tested ones, namely RVM, MLE and JGMM.
First, hGLLiM2, RVM, MLE and JGMM are used to extract physical parameters from two hyperspectral images of the same area but seen from two slightly different view points (orbit 41 and orbit 61). Since responses are proportions with values between 0 and 1, values smaller than 0 or higher than 1 are not admissible and hence they are set to either one of the closest bounds. In the real data setting, the best performance is achieved when no data meanvariance normalization is performed, and using for hGLLiM2, MLE and JGMM. As it can be seen in Fig. 2, hGLLiM2 estimates proportion maps with similar characteristics for the two view points, which suggests a good modelresponse consistency. Such a consistency is not observed with the other three tested methods. Furthermore, RVM and MLE result in a much larger number of values outside the admissible interval . Moreover, hGLLiM2 is the only method featuring less dust at the South pole cap center and higher proportions of dust at the boundaries of the CO2 ice, which matches expected results from planetology [35]. Finally, note that the proportions of CO2 ice and dust clearly seem to be complementary using hGLLiM2, while this complementarity is not obviously noticed when using the other methods.
Despite these satisfying qualitative results, hGLLiM2 produces less regular maps than the other methods. This is particularly true for orbit 41, as can be seen in the top rows of Fig. 2(a) and (b). This does not match physical expectations since chemical proportions should vary rather smoothly in space. To address this issue, we use MRFHGLLiM2 by exploiting the MRFforward mapping (35).
The MRFhGLLiM2 model is trained using synthetic data without accounting for the spatial dependencies ( set to ). This training results in a model equivalent to hGLLiM2. However, during testing, the MRF prior is enforced using an 8pixel neighborhood. For illustration purpose, the parameter is empirically set to control spatial smoothness of the resulting physical parameter map. According to the MRF prior model, the larger the value of , the smoother the resulting parameter field.
We compare the outputs of hGLLiM2 and MRFhGLLiM2 with different interaction parameter values . The top rows of Fig. 3 and Fig. 4 color each pixel according to the class maximizing . By progressively increasing from 0 to 500, isolated points are removed, thus creating larger and larger image regions associated to a single class. As it can be seen in Fig. 3, this has the desired smoothing effect on the resulting proportion maps. Interestingly, MRFhGLLiM2 yields smoother maps than hGLLiM2 while preserving its desirable qualitative performances, e.g., higher proportions of dust at boundaries and complementarity of CO2 and dust proportions. Notice that has a less significant effect on proportions obtained at orbit 61, e.g., Fig. 4, since hGLLiM2 already yields quite smooth maps for this orbit.
V Conclusions
In this paper, we proposed a Gaussian mixture of locallyaffine regression model that maps a highdimensional space of hyperspectral observations onto a lowdimensional space of physical parameters. Spatial regularity is enforced and integrated into the model through an MRF prior on the Gaussianmixture hidden variables. Model learning is achieved via a variational expectationmaximization procedure that is fully derived and described in detail. Model evaluation is conducted using both simulated data and real hyperspectral images from the Mars Express satellite sensors. The proposed approach outperforms four stateoftheart regression methods. Furthermore, when applied to actual images of the Mars South pole, our model produces spatiallyregular and smooth maps in accordance with the MRF constraints.
As a first model extension, we will investigate its application to the hyperspectral unmixing problem. In this context, we will design experimental setups to study the ability of the model to partially estimate hyperspectral mixture components together with their corresponding proportions. Further research, in line with dynamical geophysical dataassimilation research [36, 37], will be to investigate generalizations of the model in order to process temporal sequences of observations. To this end, relevant dynamical models of physical phenomena driving hyperspectral observation evolution will be investigated and integrated. Equipped with an MRF spatial prior and a temporal dynamical model, our formulation could be able to address missingdata interpolation problems which are ubiquitous in remote sensing [38].
References
 [1] J.P. Bibring et al., “OMEGA: Observatoire pour la Minéralogie, l’Eau, les Glaces et l’Activité,” in Mars Express: The Scientific Payload, vol. 1240, 2004, pp. 37–49.
 [2] G. Chang et al., “The new age of hyperspectral oceanography,” Oceanography, vol. 17, no. 2, pp. 16–23, 2004.
 [3] E. Cloutis, B. Vila, A. Bell, and M. Lamothe, “Hyperspectral and luminescence observer (HALO) Mars mission concept  innovative data triage, compression, processing and analysis for the hyperspectral imager,” IEEE Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing, vol. 93, no. 3, pp. 1–4, 2009.
 [4] F. Hilton et al., “Hyperspectral earth observation from IASI: Five years of accomplishments,” Bulletin of the American Meteorological Society, vol. 93, no. 3, pp. 347–370, 2012.
 [5] D. S. Kimes, Y. Knyazikhin, J. L. Privette, A. A. Abuelgasim, and F. Gao, “Inversion methods for physically based models,” Remote Sensing Reviews, vol. 18, no. 24, pp. 381–439, 2000.
 [6] D. Goodenough, B. Victoria, J. Li, G. Asner, M. Schaepman, L. Ustin, and A. Dyk, “Combining hyperspectral remote sensing and physical modeling for applications in land ecosystems,” in IEEE International Conference on Geoscience and Remote Sensing, 2006, pp. 2000–2004.
 [7] P. Stenberg, M. Mottus, and M. Rautiainen, “Modeling the spectral signature of forests: Application of remote sensing models to coniferous canopies,” Springer Advances in Land Remote Sensing, pp. 147–171, 2008.
 [8] M. Shokr, “A physicsbased remote sensing data fusion approach,” in International Conference on Geoscience and Remote Sensing Symposium. IEEE, 2003, pp. 1050–1052.
 [9] L. Foster, B. Brock, M. Cutler, and F. Diotri, “A physically based method for estimating supraglacial debris thickness from thermal band remotesensing data,” Journal of Glaciology, vol. 58, no. 210, pp. 677–691, 2012.
 [10] C. BernardMichel, S. Douté, M. Fauvel, L. Gardes, and S. Girard, “Retrieval of Mars surface physical properties from OMEGA hyperspectral images using regularized sliced inverse regression,” Journal of Geophysical Research: Planets, vol. 114, no. E6, 2009.
 [11] S. Douté, E. Deforas, F. Schmidt, R. Oliva, and B. Schmitt, “A comprehensive numerical package for the modeling of Mars hyperspectral images,” in Lunar and Planetary Science XXXVIII, March 2007.
 [12] M. Fauvel, Y. Tarabalka, J. A. Benediktsson, J. Chanussot, and J. C. Tilton, “Advances in spectralspatial classification of hyperspectral images,” Proceedings of the IEEE, vol. 101, no. 3, pp. 652–675, 2013.
 [13] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learningbased classification of hyperspectral data,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 3, pp. 2094–2107, 2014.
 [14] J. Bhatt, M. Joshi, and M. Raval, “A datadriven stochastic approach for unmixing hyperspectral imagery,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 1936–1946, 2014.
 [15] R. Heylen, M. Parente, and P. Gader, “A review of nonlinear hyperspectral unmixing methods,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 1844–1868, 2014.
 [16] K. C. Li, “Sliced inverse regression for dimension reduction,” Journal of the American Statistical Association, vol. 86, no. 414, pp. 316–327, 1991.
 [17] H. Wu, “Kernel sliced inverse regression with applications to classification,” Journal of Computational and Graphical Statistics, vol. 17, no. 3, pp. 590–610, 2008.
 [18] K. P. Adragni and R. D. Cook, “Sufficient dimension reduction and prediction in regression,” Philosophical Transactions of the Royal Society A, vol. 367, no. 1906, pp. 4385–4405, 2009.
 [19] L. Xu, M. I. Jordan, and G. E. Hinton, “An alternative model for mixtures of experts,” in Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, USA, December 1995, pp. 633–640.
 [20] Y. Qiao and N. Minematsu, “Mixture of probabilistic linear regressions: A unified view of GMMbased mapping techiques,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, April 2009, pp. 3913–3916.
 [21] S. Ingrassia, S. C. Minotti, and G. Vittadini, “Local statistical modeling via a clusterweighted approach with elliptical distributions,” Journal of classification, vol. 29, no. 3, pp. 363–401, 2012.
 [22] M. Tipping, “Sparse Bayesian learning and the relevance vector machine,” The Journal of Machine Learning Research, vol. 1, pp. 211–244, 2001.
 [23] N. Lawrence, “Probabilistic nonlinear principal component analysis with gaussian process latent variable models,” The Journal of Machine Learning Research, vol. 6, pp. 1783–1816, 2005.
 [24] A. Thayananthan, R. Navaratnam, B. Stenger, P. Torr, and R. Cipolla, “Multivariate relevance vector machines for tracking,” in European Conference on Computer Vision. Springer, 2006, pp. 124–138.
 [25] A. Deleforge, F. Forbes, and R. Horaud, “HighDimensional Regression with Gaussian Mixtures and PartiallyLatent Response Variables,” Statistics and Computing, 2014.
 [26] Y. Tarabalka, M. Fauvel, J. Chanussot, and J. Benediktsson, “SVM and MRFbased method for accurate classification of hyperspectral images,” IEEE Geoscience and Remote Sensing Letters, vol. 7, no. 4, pp. 736–740, 2010.
 [27] J. Li, J. M. BioucasDias, and A. Plaza, “SpectralSpatial Hyperspectral Image Segmentation Using Subspace Multinomial Logistic Regression and Markov Random Fields,” IEEE Transactions on Geoscience and Remote Sensing, vol. 50, no. 3, pp. 809–823, 2012.
 [28] J. M. BioucasDias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P. Gader, and J. Chanussot, “Hyperspectral unmixing overview: Geometrical, statistical, and sparse regressionbased approaches,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 5, no. 2, pp. 1–4, 2012.
 [29] Y. Altmann, N. Dobigeon, S. McLaughlin, and J.Y. Tourneret, “Residual component analysis of hyperspectral images: Application to joint nonlinear unmixing and nonlinearity detection,” IEEE Transactions on Image Processing, vol. 23, no. 5, pp. 2148–2158, 2014.
 [30] G. Celeux, F. Forbes, and N. Peyrard, “EM procedures using mean fieldlike approximations for Markov modelbased image segmentation,” Pattern Recognition, vol. 36, no. 1, pp. 131–144, 2003.
 [31] L. Chaari, T. Vincent, F. Forbes, M. Dojat, and P. Ciuciu, “Fast joint detectionestimation of evoked brain activity in eventrelated fMRI using a variational approach,” IEEE Transactions on Medical Imaging, vol. 32, no. 5, pp. 821–837, May 2013.
 [32] J. Leenaars, “Africa soil profiles database, version 1.1. a compilation of georeferenced and standardized legacy soil profile data for SubSaharian Africa (with dataset),” Africa Soil Information Service (AfSIS) project, Wageningen, the Netherlands, Tech. Rep. ISRIC report 2013/03, 2013.
 [33] L. Tits, B. Somers, J. Stuckens, and P. Coppin, “Validating nonlinear mixing models: benchmark datasets from vegetated areas,” in 6th Workshop on Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), June 2014.
 [34] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” Statistics and computing, vol. 14, no. 3, pp. 199–222, 2004.
 [35] S. Douté et al., “Nature and composition of the icy terrains of the south pole of Mars from MEX OMEGA observations,” in Lunar and Planetary Science XXXVI, March 2005.
 [36] Q. Sixian, M. Jianwen, and W. Xuanji, “Construction and experiment of hierarchical Bayesian network in data assimilation,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6, no. 2, pp. 1036–1047, 2013.
 [37] Y. Dong, J. Wang, C. Li, G. Yang, Q. Wang, F. Liu, J. Zhao, H. Wang, and W. Huang, “Comparison and analysis of data assimilation algorithms for predicting the leaf area index of crop canopies,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 6, no. 1, pp. 188–201, 2013.
 [38] A.B. Salberg, “Land cover classification of cloudcontaminated multitemporal highresolution images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 49, no. 1, pp. 377–387, 2011.
Antoine Deleforge received the B.Sc. (2008) and M.Sc. (2010) engineering degrees in computer science and mathematics from the Ecole Nationale Supérieure dÕInformatique et Mathématiques Appliquées de Grenoble (ENSIMAG), France, as well as the specialized M.Sc. (2010) research degree in computer graphics, vision, robotics from the Université Joseph Fourier, Grenoble. In 2013, he received the Ph.D. degree in computer science and applied mathematics from the University of Grenoble. Since January 2014, he has as a postdoctoral fellow appointment with the chair of Multimedia Communication and Signal Processing of the ErlangenNuremberg University, Germany. His research interests include machine learning for signal processing, Bayesian statistics, computational auditory scene analysis, and robot audition. 
Florence Forbes received the B.Sc. and M.Sc. degrees in computer science and applied mathematics from the Ecole Nationale Supérieure dÕInformatique et Mathématiques Appliquées de Grenoble (ENSIMAG), France, and the PhD degree in applied probabilities from the University Joseph Fourier, Grenoble, France. Since 1998, she has been a research scientist with the Institut National de Recherche en Informatique et Automatique (INRIA), Grenoble RhôneAlpes, Montbonnot, France, where she founded the MISTIS team and has been the team head since 2003. Her research activities include Bayesian analysis, Markov and graphical models, and hidden structure models. 
Silèye Ba received the M.Sc. (2000) in applied mathematics and signal processing from University of Dakar, Senegal, and the M.Sc. (2002) in mathematics, computer vision, and machine learning from Ecole Normale Supérieure de Cachan, Paris, France. From 2003 to 2009 he was a PhD student and then a postdoctoral researcher at IDIAP Research Institute, Martigny, Switzerland where he worked on probabilistic models for object tracking and human activity recognition. From 2009 to 2013, he was a researcher at Telecom Bretagne, Brest, France working on variational models for multimodal geophysical data processing. From 2013 to 2014 he worked at RN3D Innovation Lab, Marseille, France, as a research engineer, where he used computer vision and machine learning principles and methods to develop humancomputer interaction software tools. Since June 2014 he has been a researcher in the Perception team at INRIA Grenoble RhôneAlpes working on machine learning and computer vision models for natural humanrobot interaction. 
Radu Horaud received the B.Sc. (1977) degree in electrical engineering, the M.Sc. (1979) degree in control engineering, and the Ph.D. (1981) degree in computer science from the Institut National Polytechnique de Grenoble, France. Currently he holds a position of director of research with the Institut National de Recherche en Informatique et Automatique (INRIA), Grenoble RhôneAlpes, Montbonnot, France, where he is the founder and head of the PERCEPTION team. His research interests include computer vision, machine learning, audio signal processing, audiovisual analysis, and robotics and humanrobot interaction. He is an area editor of the Computer Vision and Image Understanding, a member of the advisory board of the International Journal of Robotics Research, and an associate editor of the International Journal of Computer Vision. In 2013, Radu Horaud was awarded a five year ERC Advanced Grant for his project Vision and Hearing in Action (VHIA, 20142019). 