Entropybased Statistical Analysis of PolSAR Data
Abstract
Images obtained from coherent illumination processes are contaminated with speckle noise, with polarimetric synthetic aperture radar (PolSAR) imagery as a prominent example. With an adequacy widely attested in the literature, the scaled complex Wishart distribution is an acceptable model for PolSAR data. In this perspective, we derive analytic expressions for the Shannon, Rényi, and restricted Tsallis entropies under this model. Relationships between the derived measures and the parameters of the scaled Wishart law (i.e., the equivalent number of looks and the covariance matrix) are discussed. In addition, we obtain the asymptotic variances of the Shannon and Rényi entropies when replacing distribution parameters by maximum likelihood estimators. As a consequence, confidence intervals based on these two entropies are also derived and proposed as new ways of capturing contrast. New hypothesis tests are additionally proposed using these results, and their performance is assessed using simulated and real data. In general terms, the test based on the Shannon entropy outperforms those based on Rényi’s.
Information theory, SAR polarimetry, contrast measures.
1 Introduction
\IEEEPARstartPolarimetric synthetic aperture radar (PolSAR) has been used to describe earth surface phenomena [1]. This technology uses coherent illumination which causes the interference pattern called ‘speckle’ [2], which is multiplicative by nature and, in the format here considered, is nonGaussian. This fact precludes the use of conventional tools in PolSAR image analysis, requiring specialized techniques.
The scaled complex Wishart distribution has been successfully employed as a statistical model for homogeneous regions in PolSAR images. This law is at the core of segmentation [3], classification [4], and boundary detection [5] techniques.
The concepts of “information” and “entropy” were given formal mathematical definitions in the context of data communications by Shannon in 1948 [6]. Thenceforth, the proposition and application of information and entropy measures have become an active research field in several areas. Zografos and Nadarajah derived closed expressions for Shannon and Rényi entropies for several univariate [7], bivariate [8], and multivariate [9] distributions.
Exploring relationships associated with the loglikelihood function, Zong [10] applied the Rényi entropy to several univariate distributions. In fact, this entropy measure has been applied to image processing problems such as data mining, detection, segmentation, and classification [11, 12, 13]. Another prominent entropy measure is the restricted Tsallis entropy. This tool was introduced by Tsallis in [14, 15] and is related to the Rényi entropy. The restricted Tsallis entropy has found applications in statistical physics [16].
Among these information theoretical tools, the Shannon entropy has been applied to PolSAR imagery. Morio et al. [17] analyzed such entropy for the characterization of polarimetric targets under the complex, circular, and multidimensional Gaussian distribution.
Stochastic distances are also derived within the framework of information theory. A comprehensive examination of these measures is presented and applied to intensity SAR data in [18, 19], and to PolSAR models in [20].
In this paper, we derive analytic expressions for the Shannon, Rényi, and restricted Tsallis entropies under the scaled complex Wishart law. These measures are analyzed as particular cases of the ()entropy proposed by Salicrú et al. [21]. When parameters are replaced by maximum likelihood estimators, these entropies become random variables; expressions for the asymptotic variances of the Shannon and Rényi entropies are derived (the Tsallis entropy becomes analytically intractable and, thus, is no further considered).
Novel methodologies for testing hypotheses and constructing confidence intervals are proposed for quantifying contrast in PolSAR imagery using these results. Such measures can be used in PolSAR segmentation [3], classification [4], boundary detection [22, 23], and change detection [24]. Monte Carlo experiments are performed for assessing the performance of the discussed measures in synthetic data, and an application to real PolSAR data is performed.
The remainder of this paper is organized as follows. Section 2 recalls the scaled Wishart law. Selected information theoretic tools are summarized in Section 3. Section 4 presents the proposed entropies and the asymptotic variance of their estimators, along with an application. Section 6 concludes the paper.
2 The Complex Wishart distribution
Fullpolarimetric SAR sensors record the complex scattering coefficient for the four combinations of the received and transmitted complex linear polarizations: (horizontalhorizontal), (horizontalvertical), (verticalhorizontal), and (verticalvertical). When natural targets are considered, the conditions of the reciprocity theorem [2, 25, 26] are satisfied and it can be assumed that .
In general, we may consider systems with polarization elements, which constitute a complex random vector denoted by:
(1) 
where is the transposition operator. It is commonly assumed that the scattering vector follows a circular complex Gaussian law [27].
Multilook PolSAR data are usually formed in order to enhance the signaltonoise ratio (SNR):
where is the Hermitian operator, represents the scattering vector in the th look, and is the number of looks, a parameter related to the noise effect in SAR imagery. Matrix is defined over the set of positivedefinite Hermitian matrices . Moreover, Goodman showed that follows the ordinary complex Wishart law [28] and, therefore, the density of the scaled random matrix is
(2) 
where is the order of , , , is the gamma function, , and is the expectation operator. This is denoted as .
3 Information Theory
Information theory provides important tools for statistical inference [29], data compression [30], and image processing [17], to name a few applications. In particular, entropy is a fundamental concept related to the notion of disorder in mechanical statistics [31]. Salicrú et al. [21] proposed the ()entropy class, which generalizes the original concept. In the following, we recall this definition and we derive entropies for positivedefinite Hermitian random matrices.
Let be a probability density function with parameter vector which characterizes the distribution of the random matrix . The ()entropy relative to is defined by
where either is concave and is increasing, or is convex and is decreasing. The differential element is given by
where is the th entry of matrix , and and denote the real and imaginary parts, respectively [28]. Table 1 shows the specification of and for the three entropies we use in this article: Shannon, Rényi, and restricted Tsallis.
entropy  

Shannon [21]  
Restricted Tsallis (order ) [32]  
Rényi (order ) [33] 
The following result, derived by Pardo et al. [34], paves the way for the proposal of asymptotic statistical inference methods based on entropy.
Lemma 1
Let be the ML estimate of the parameter vector based on a random sample of size under the model . Then
where is the Gaussian distribution with mean and variance , ‘’ denotes convergence in distribution,
(3) 
is the Fisher information matrix, and such that for .
In the following we introduce a methodology for hypothesis tests and confidence intervals based on entropy.
3.1 Hypothesis test
Let be a positivedefinite random matrix with probability density function defined over with parameter vector for , where is the number of populations to be assessed. We are interested in testing the following hypotheses:
In other words, statistical evidence is sought for assessing whether at least one of the regions of a PolSAR image has different entropy when compared to the remaining regions.
Let be the ML estimate for based on a random sample of size under , for . From Lemma 1 we have that
for . Therefore,
(4) 
Since is, in practice, unknown, in the following we modify this test statistic in order to take this into account. Considering an application of Cochran’s theorem [35], we obtain:
(5) 
where
Salicrú et al. [21] showed that the second summation in the righthand side of Equation (5) is chisquare distributed with one degree of freedom. Since the lefthand side of (5) is chisquare distributed with degrees of freedom (cf. Equation (4)), we conclude that:
In particular, consider the following test statistic:
(6) 
We are now in the position to state the following result.
Proposition 1
Let , , be sufficiently large. If , then the null hypothesis can be rejected at a level if .
3.2 Confidence intervals
Let be the ML estimate of for a sufficiently large sample . An approximate confidence interval for at nominal level is
(7) 
where is the quantile of the standard Gaussian distribution.
Consider now and , ML estimates based on large samples and , respectively. An approximate confidence interval for is given by [34]
4 Results
In the following, we derive results for the Shannon, Rényi, and restricted Tsallis entropies, denoted as , , and , respectively, under the scaled complex Wishart law. In particular, these measures are algebraically expressed, numerically evaluated, and assessed. We adopt , where is the vectorisation operator, as the working parameter vector. Additionally, asymptotic results for these measures are computed. To that end, we derive analytic expressions for the variance of the considered entropies. The entropies were evaluated at the ML estimate values. Subsequently, hypothesis tests and confidence intervals are proposed based on Shannon and Rényi entropies.
4.1 Expressions for the complex Wishart distribution
We now present three schemes concerning the derivations of Shannon, restricted Tsallis, and Rényi entropies.
Shannon entropy
Using the expression of the Shannon entropy obtained applying and from Table 1 to the density given in Equation (2), we have that
Minor manipulations yield the following result:
In [36], Anfinsen et al. obtain the following identity:
where is the term of order zero of the thorder multivariate polygamma function given by
is the ordinary polygamma function expressed by
for (in this case, is known as the digamma function). By the linearity of the expectation operator, the following holds true:
Thus, the Shannon entropy relative to the random variable is expressed by
(8) 
Restricted Tsallis entropy
Rényi entropy
From Table 1, the Rényi entropy is given by
Notice that this measure also depends on , which was already computed in Equation (12).
Therefore, denoting , the Rényi entropy is expressed by
(13) 
It is known that, as , both the Rényi [38, p. 676] and Tsallis [39] entropies converge to the Shannon entropy. Thus:
(14) 
These convergences hold true regardless the number of looks. Moreover, it is important to emphasize that the derived expressions can be related to the eigenvalues of the covariance matrix. This approach results in new expressions for , , and in terms of the geometrical and arithmetic mean of these eigenvalues as follows. Let be the eigenvalues of the covariance matrix . Following Mardia et al [40], and , then
Thus, setting and adopting as the covariance matrix in (8) and (13), we have new expressions that can be used in place of the ones proposed by Cloude and Pottier [41] and Yan et al. [42].
In the following, we examine the behavior of , , and in terms of and .
Case study
Frery et al. [22] observed the following covariance matrix on an urban area from the ESAR image of Weßling (Bavaria, Germany):
only the diagonal and the upper triangle values are shown. Fig. a depicts plots of the discussed entropies for , , and . Considering the same interval for the number of looks, Figs. b, c, and d show the Shannon, Tsallis (of order ), and Rényi (of order ) entropies for the covariance matrix , , respectively.
Figs. a and e illustrate the property stated in (14). In the case shown here, the convergences are from above, i.e., is always smaller than and .
Figs. b and d suggest that multiplying the covariance matrix by a constant — hence increasing its determinant — also increases both Shannon and Rényi entropies. As expected, increasing the number of looks leads to smaller entropy values due to the increased SNR.
Although the restricted Tsallis entropy can be used in several fields of image processing [17], the derivation of its variance does not lead to a mathematically tractable expression. Thus, henceforth we focus our attention on the Shannon and Rényi entropies, which allow the necessary algebraic manipulations for the the method described in Section 3. In the next section, we derive asymptotic variances for the Shannon and Rényi entropy estimates.
4.2 Asymptotic variances
Let be a random matrix which follows a scaled complex Wishart distribution with parameter as already defined. Its loglikelihood is
The Hessian matrix , the Fisher information matrix , and the biased version (according to Anfinsen et al. [36]) for CramérRao lower bound are necessary to obtain closed form expressions for the asymptotic entropy variance used in Equation (3). In particular, the following quantity plays a central role:
where represents complex conjugation. Anfinsen et al. [36] showed that
Moreover, it is known that [43]
Thus, we have that
(15) 
From Equation (4.2) we obtain:
(16) 
The analytical expressions for , , and are, thus,
where is the null square matrix of order .
Anfinsen el al. [36] derived the Fisher information matrix for the unscaled complex Wishart law. That approach found that the parameters of such distribution are not orthogonal. Based on , we conclude that and become orthogonal under the scaling of the complex Wishart law. Among other benefits, such scaling makes the likelihood equations separable, as shown in the following.
Consider a random sample of size obtained from . Since , we have that
and
(17) 
Thus, the ML estimator of is the sample mean, while is obtained solving the system shown in Equation (17). The NewtonRaphson iterative method [44] can be used to solve this nonlinear system.
The asymptotic variance given by Equation (3) is determined by and by the term [34]
We denote the resulting as and , when and are considered, respectively. We could not find a closed expression for the variance of the restricted Tsallis entropy, so it will not be further considered in the remainder of this work. Analogously, entropy variances are denoted as and . These quantities are given by expressions (18)(21).

Shannon entropy:
(18) and
(19) 
Rényi entropy:
(20) and
(21)
Note that, as expected from Equation (14),
The entropies, along with their variances, can be used as alternative goodnessoffit tests to the one proposed in [45] for the Wishart distribution, specifying in Equation (4). Cintra et al. [19] showed that these statistics outperform the KolmogorovSmirnov nonparametric test in intensity SAR imagery. In this paper, we aim to quantify contrast in PolSAR images and situations with . In the following, the proposed tests are applied to PolSAR imagery.
5 Applications
In this section, we combine the entropies and variances derived in the previous section to form statistical tests, whose performance with respect to test size and power is assessed by Monte Carlo experiments. Finally, the discussed methodology is applied to real data.
5.1 Entropy as a feature for discrimination
Fig. a shows the HH band of a PolSAR Lband image over Weßling, Germany. This image was obtained by the ESAR sensor [46] with equivalent number of looks (ENL), which is a PolSAR parameter associated to the degree of averaging of SAR measurements during data formation [36]. Three regions were selected: , , and , corresponding to areas with strong, moderate, and weak return, respectively. The image has approximately 3 resolution.
Table 2 shows the sample size and the ML estimates of the scaled Wishart distribution parameters for each region. The ENL estimates are close, suggesting similar levels of radar texture in the three regions; such levels are probably low, since the regions appear to be cropland. Notice that the smallest estimated number of looks and the highest value of determinant for the sample convariance matrix belong to region , where the texture is more pronounced. Figs. bd present histograms and fitted densities from considered data for two distributions of the scaled complex Wishart family: and . Additionally, Table 2 also presents the Akaike information criterion (AIC), a measure of the goodness of these fits [47]. In all cases, the distribution presented the best fit.
Regions  AIC  

3708  355494.500  1.361  50769.93  49856.90  
2088  3321.241  1.657  18353.35  17931.51  
1079  274.189  2.557  6749.56  6629.15 
Table 3 presents the asymptotic lower and upper bounds for the Shannon and Rényi entropies at the 95% level of confidence, as computed according to Equation (7). The intervals are disjoint, suggesting that the entropy can be used as a feature for region discrimination. The smallest entropies are associated with regions whose estimated covariance matrices have small determinants, in accordance with the case study presented in Section 4.1.4.
Noting that (i) the entropy is a measure of randomness, which is associated to variability, and that (ii) PolSAR areas with high reflectance are more affected by speckle noise, even those with negligible texture, it is intuitive that the determinant of the covariance matrix and the entropy are positively correlated. Moreover, the expressions of entropies are also directly proportional to the determinant, i.e., .
Goodman [48] defined the stochastic covariance matrix determinant as a generalized variance which is associated with the speckle variability, defined as the effect of the speckle noise resulting from multipath interference. Additionally, when there is texture variability it is due to the spatial variability in reflectance, and is associated with the “heterogeneity”, which is the usual measure of the texture level. This last source of variability can be captured by, for instance, the roughness parameter of the polarimetric law [49, 50].
Region 


Lower  Upper  Lower  Upper  Lower  Upper  
37.979  38.432  61.083  61.332  44.045  44.364  
30.079  30.541  45.563  45.867  36.124  37.049  
19.611  19.949  35.000  35.346  20.901  21.230 
5.2 Synthetic Data
We quantify the performance of hypothesis tests based on Shannon and Rényi entropies with synthetic data following the scaled complex Wishart distribution, generated as suggested in [51].
The simulation employed the following parameters: (i) ; (ii) the estimated covariance matrices from regions , , and , denoted as , and , respectively; (iii) sample sizes (i.e., squared windows of side pixels); and (iv) significance levels . After generating two samples of size from and , we tested the the null hypothesis for . Following [18], we run replicas of the Monte Carlo simulation for every case. Empirical test size and power were employed as figures of merit. These quantities are defined as the rejection rates of when this hypothesis is true and is false (i.e., ), respectively; they are also called Type I error and true positive rates.
Table 4 presents the resulting empirical test sizes. From a purely statistical viewpoint, the ideal test is the one whose empirical size is exactly the nominal one for simulated data. Therefore, we conclude that the Rényi entropy for is generally outperformed by the Shannon and Rényi of order , except for . The two last hypothesis tests have empirical test sizes close to the nominal levels even for small sample sizes.
Shannon entropy  
9  1.35  6.15  11.76  1.36  6.16  12.02  1.44  6.20  12.05 
49  1.38  5.53  10.44  1.38  5.65  10.35  1.35  5.47  10.44 
81  1.33  5.51  9.75  1.35  5.47  9.84  1.35  5.36  10.07 
121  1.18  5.56  10.58  1.15  5.51  10.51  1.15  5.49  10.44 
400  1.35  5.38  10.33  1.38  5.47  10.40  1.40  5.51  10.22 
Rényi entropy with  
9  1.38  6.22  11.65  1.44  6.18  11.93  1.45  6.09  11.78 
49  1.27  5.62  10.36  1.31  5.65  10.31  1.33  5.38  10.55 
81  1.31  5.47  9.76  1.20  5.47  9.91  1.29  5.36  9.95 
121  1.05  5.35  10.45  1.15  5.44  10.44  1.13  5.53  10.55 
400  1.35  5.27  10.31  1.36  5.29  10.29  1.27  5.33  10.42 
Rényi entropy with  
9  0.69  4.04  8.80  0.76  4.15  8.65  0.82  4.04  8.55 
49  0.71  3.62  7.82  0.75  3.87  7.89  0.69  3.45  7.73 
81  0.53  3.69  7.51  0.56  3.84  7.49  0.55  3.62  7.51 
121  0.60  3.73  7.58  0.60  3.40  7.47  0.53  3.58  7.64 
400  0.56  3.84  7.65  0.67  3.82  7.64  0.67  3.62  7.65 
Samples from different covariance matrices always led to rejecting the null hypothesis with the proposed statistical tests, thus leading to tests with unitary empirical test power. Table 5 shows the mean value of the statistics
where is or as given in (6) and are the ML estimates based on generated data from population at the th Monte Carlo replica. The table also shows the coefficient of variation (CV) of the test statistics under alternative hypotheses:
      
Shannon entropy  
9  20.59  108.16  13.83  248.53  38.42  29.89 
49  8.65  576.23  5.82  1329.19  16.38  156.21 
81  6.70  949.51  4.52  2191.84  12.71  256.54 
121  5.51  1418.22  3.71  3274.03  10.43  383.17 
400  3.01  4687.29  2.02  10817.80  5.72  1265.69 
Rényi entropy with  
9  20.56  106.72  13.72  245.22  38.45  29.50 
49  8.63  569.69  5.76  1314.11  16.40  154.43 
81  6.67  938.93  4.47  2167.39  12.72  253.68 
121  5.48  1402.43  3.66  3237.61  10.42  378.93 
400  3.00  4635.65  2.00  10698.61  5.71  1251.74 
Rényi entropy with  
9  21.70  78.42  14.27  180.12  40.60  21.79 
49  9.11  423.29  6.00  976.32  17.44  114.77 
81  7.02  698.47  4.63  1612.06  13.50  188.81 
121  5.73  1043.26  3.77  2408.63  11.00  282.06 
400  3.15  3450.84  2.07  7963.98  6.03  931.79 
The largest means correspond to the largest absolute values , leading to the following relations:
(22) 
Shannon test statistics produced the largest test statistics when contrasting samples from different areas.
5.3 Real Data
Applying the methodology described above to real data, we obtained the results shown in Fig. 3. To that end, the following steps were followed:
where is the Type II error rate; i.e., the estimate for the probability of not rejection of when the null hypothesis is false. We considered the number of observations in squared windows of size .
Fig. 3 shows the empirical test sizes, with the nominal test size in dotted line. The behavior of is consistent across the three values of : the empirical test size is larger in Shannon than in Rényi which, in turn, is larger than Rényi . The fastest convergence to with respect to the sample size is consistently observed in area . No test attains the nominal size in area . This is probably due to its pronounced brightness and roughness, as noted in Table 2.
In the following we select visually similar regions, but with distinct statistical properties. To that end, Fig. a shows an EMISAR image of Foulum (Denmark) obtained with eight nominal looks over agricultural areas and sub 10 resolution; three regions were selected for our study. Table 6 presents the sample sizes and the ML estimates, while Figs. b, c and d show empirical densities of different areas and channels. The empirical densities of B and B are the closest for all polarization channels, as well as their estimates of .
Regions  

3192  6.925  
1408  11.937  
1848  10.752 
We quantify the discrimination between selected regions by means of and performing steps 1–7 with the Foulum samples. Since Rényi entropy converges to Shannon when the order tends to , we omit the analysis of because it is similar to , as illustrated in Fig. 5.
Figs. a, b and c exhibit the empirical test power. In general terms, the estimated power increase quickly when sample sizes increase. The test based on the Shannon entropy outperforms the one based on Rényi’s.
These results suggest that the test statistics in terms of the Shannon entropy is a more appropriate tool for change and boundary detection in PolSAR images than the measure based on the Rényi entropy.
6 Conclusions
In this paper, the closed expressions for Shannon, Rényi, and Tsallis entropies have been derived for data distributed according with the scaled complex Wishart distribution. The variances of the two first entropies were also derived when parameters are replaced by ML estimators, leading to the derivation of two test statistics with known asymptotic distributions. The statistics based on the Tsallis entropy were not derived since its variance could not be expressed in closed form.
New hypothesis tests and confidence intervals based on the ()family of entropies were derived for the scaled complex Wishart distribution. These new statistical tools provide means for contrasting samples and verifying if they come from the same distribution. Such tools were derived by obtaining orthogonal maximum likelihood estimators for the scaled complex Wishart law, and by characterizing their asymptotic behavior.
Monte Carlo experiments were employed for quantifying the performance of the proposed hypotheses tests. The results provided evidence that the statistic based on the Shannon entropy is the most efficient in terms of empirical test size and power.
An application to actual data was also considered in order to assess the performance of the proposed hypothesis tests. The empirical test sizes observed with real data are relatively high when dealing with small samples, but they decrease as the sample size increases. As expected, the tests presented the worst results in the most heterogeneous situations. Nevertheless, they perform correctly when the sample size is increased.
The hypothesis test based on the Shannon entropy presented the smallest empirical test size. All test statistics detected differences among different regions at the specified levels. This is important for PolSAR image analysis, such as in boundary detection [22] and change detection [24].
As an overall conclusion, the Shannon entropy can be safely used for discriminating areas in PolSAR imagery. Moreover, the Shannon entropy it is not challenged in terms of simplicity either, which consolidates its position as the preferred entropy measure. However, care must be taken when small samples are analyzed. Indeed, in this case, the proposed tests are prone to classifying regions of similar natures as distinct, i.e., to incurring in Type I error. In practice, however, this issue is not severe, since PolSAR image processing often handles data with a large number of pixels.
[]Alejandro C. Frery graduated in Electronic and Electrical Engineering from the Universidad de Mendoza, Argentina. His M.Sc. degree was in Applied Mathematics (Statistics) from the Instituto de Matemática Pura e Aplicada (Rio de Janeiro) and his Ph.D. degree was in Applied Computing from the Instituto Nacional de Pesquisas Espaciais (São José dos Campos, Brazil). He is currently with the Instituto de Computação, Universidade Federal de Alagoas, Maceió, Brazil. His research interests are statistical computing and stochastic modelling.
[]Renato J. Cintra earned his B.Sc., M.Sc., and D.Sc. degrees in Electrical Engineering from Universidade Federal de Pernambuco, Brazil, in 1999, 2001, and 2005, respectively. In 2005, he joined the Department of Statistics at UFPE. During 20082009, he worked at the University of Calgary, Canada, as a visiting research fellow. He is also a graduate faculty member of the Department of Electrical and Computer Engineering, University of Akron, OH. His long term topics of research include theory and methods for digital