A Nonlocal Denoising Algorithm for Manifold-Valued Images Using Second Order Statistics

A Nonlocal Denoising Algorithm for Manifold-Valued Images Using Second Order Statistics

Friederike Laus111Department of Mathematics, Technische Universität Kaiserslautern, Paul-Ehrlich-Str. 31, 67663 Kaiserslautern, Germany, {friederike.laus, persch, steidl}@mathematik.uni-kl.de.    Mila Nikolova222CMLA – CNRS, ENS Cachan, 61 av. President Wilson, 94235 Cachan Cedex, France, nikolova@cmla.ens-cachan.fr    Johannes Persch11footnotemark: 1    and Gabriele Steidl11footnotemark: 1
July 20, 2019
Abstract

Nonlocal patch-based methods, in particular the Bayes’ approach of Lebrun, Buades and Morel [41], are considered as state-of-the-art methods for denoising (color) images corrupted by white Gaussian noise of moderate variance. This paper is the first attempt to generalize this technique to manifold-valued images. Such images, for example images with phase or directional entries or with values in the manifold of symmetric positive definite matrices, are frequently encountered in real-world applications. Generalizing the normal law to manifolds is not canonical and different attempts have been considered. Here we focus on a straightforward intrinsic model and discuss the relation to other approaches for specific manifolds. We reinterpret the Bayesian approach of Lebrun et al. [41] in terms of minimum mean squared error estimation, which motivates our definition of a corresponding estimator on the manifold. With this estimator at hand we present a nonlocal patch-based method for the restoration of manifold-valued images. Various proof of concept examples demonstrate the potential of the proposed algorithm.

\setkomafont

sectioning \setkomafonttitle \setkomafontdescriptionlabel \subtitleExtended Version

1 Introduction

In many situations where measurements are taken the obtained data are corrupted by noise, and typically one uses a stochastic model to describe the recorded data. If there are several, independent factors that may have an influence on the data acquisition, the central limit theorem suggests to model the noise as additive white Gaussian noise. This is also the standard noise model one encounters in image analysis, see, e.g., [31]. One might of course wonder whether this noise modeling is realistic and in fact, in many situations the image formation process already suggests a non-Gaussian model, e.g. Poisson noise in the case where images are obtained based on photon counting with a CCD device. But also in these cases, in order to benefit from the rich knowledge and all the appealing properties of the normal distribution, one often tries to transform the image in such a way that the assumption of Gaussian white noise is at least approximately fulfilled. For instance, for the Poisson noise this can be achieved by the so called Anscombe transform [4].
Much effort has been spent on the denoising of images corrupted with white Gaussian noise and a huge amount of methods have been proposed in the literature. Among others we mention variational models with total variation regularizers [60] and many extensions thereof, denoising based on sparse representations over learned dictionaries [26], nonlocal means [30, 63, 80, 77] and their generalizations [17, 23, 38, 63], the piecewise linear estimator from Gaussian mixture models (PLE, E-PLE) [80, 73] and SURE guided Gaussian mixture models [74], patch-ordering based wavelet methods [56], the expected patch log-likelihood (EPLL) algorithm [81] or better its multiscale variant [52], BM3D [21] and BM3D-SAPCA [22], and the nonlocal Bayes’ algorithm of Lebrun et al. [40, 41]. The latter can be viewed as an optimized reinterpretation of the two step image denoising method (TSID) [64, 78] in a Bayesian framework. For a recent review of the denoising problem and the different denoising principles we refer to [42]. Currently, nonlocal patch-based methods achieve the best results and the quality of the denoised images has become excellent for moderate noise levels. Even more, based on experiments with a set of 20.000 images containing about patches the authors of [47] conjecture that for natural images, the recent patch-based denoising methods might already be close to optimality. Their conjecture points in the same direction as the paper of Chatterjee et al. [16], who raised the question „Is denoising dead?”.
The situation described above completely changes when dealing with manifold-valued images instead of real-valued ones, a situation which is frequently encountered in applications. For instance, images with values on the circle (periodic data) appear in interferometric synthetic aperture radar [13, 24], in applications involving the phase of Fourier transformed data [10], or when working with the hue-component of an image in the HSV color space. Spherical data play a role when dealing with 3D directional information [39, 70] or in the representation of a color image in the chromaticity-brightness (CB) color space [15]. SO(3)-valued data appear in electron backscattered tomography [8, 9]. Finally, to mention only a few examples, images with symmetric positive definite matrices as values are handled in DT-MRI imaging [18, 28, 69, 75, 77] or when covariance matrices are associated to image pixels [68]. Recently, some methods for the denoising of manifold-valued images have been suggested, among them variational approaches using embeddings in higher dimensional spaces [59] or based on (generalized) TV-regularization [7, 11, 46, 66, 76].
In this paper, we aim at generalizing the nonlocal patch-based denoising of Lebrun et al. [40, 41] to manifold-valued images. However, for general manifolds, already the question of how to define Gaussian white noise (or, more general, a normal distribution) is not canonically solved. Different approaches have been proposed in the literature, either by making use of characterizing properties of the real-valued normal distribution as for instance in [51, 53] or by restricting to particular manifolds such as spheres, see, e.g. [48], the simplex [49], or symmetric positive definite matrices [61]. In this paper, we adopt a simple model for a normal distribution and in particular for Gaussian white noise on a manifold and discuss its relationship to existing models. We review the minimum mean squared error estimator in the Euclidean setting, which coincides with those of the Bayesian approach in [41] under the normal distribution assumption. This motivates our definition of a corresponding estimator on the manifold and gives rise to a nonlocal patch-based method for the restoration of manifold-valued images.
The outline of this paper is as follows: in Section 2 we reinterprete the nonlocal Bayes algorithm of Lebrun et al. [40, 41] in a minimum mean square error estimation setting. This review in the Euclidean setting is necessary to understand its generalization to manifold-valued images. In Section 3 we introduce the notation on manifolds. Then, in Section 4 we detail the nonlocal patch-based denoising algorithm for manifold valued images. This requires to precise what we mean by the normal law on the manifolds of interest. We discuss the relation between this model and other existing ones. In Section 5 we provide several numerical examples to demonstrate that our denoising approach is indeed computationally manageable. Examples, yet academical, for cyclic and directional data, and for images with values in the manifold of symmetric positive definite matrices show the potential of nonlocal techniques for manifold-valued images. Specific real-world applications are not within the scope of this paper. Finally, we draw conclusions and initiate further directions of research in Section 6.

2 Nonlocal Patch-Based Denoising of Real-Valued Images

In this section we consider the nonlocal Bayesian image denoising method of Lebrun et al. [40, 41]. In contrast to these authors we prefer to motivate the method by a minimum mean square estimation approach. One reason is that the best linear unbiased estimator in (9) has a similar form as the MMSE, but does not rely on the assumption that the random variables are jointly normally distributed. This leaves potential for future work, e.g. when extending the model to other distributions than the normal distribution.

2.1 Minimum Mean-Square Estimator

Let be a probability space and and two random vectors. We wish to estimate given , i.e., we seek an estimator such that approximates . A common quality measure for this task is the mean square error , which gives rise to the definition of the minimum mean square estimator (MMSE)

where denotes the -algebra generated by and stands for all -measurable random variables , see, e.g., [43]. Under weak additional regularity assumptions on the estimator , the Lehmann-Scheffé theorem [44, 45] states that the general solution of the minimization problem is determined by

In general it is not possible to give an analytical expression of the MMSE. One exception constitutes of the normal distribution. Recall that a random vector is normally distributed with mean and covariance matrix , , if and only if there exists a random vector , whose components are independent real-valued standard normally distributed random variables and a matrix , such that , where is the rank of the covariance matrix . If has full rank, then the probability density function (pdf) of with respect to the Lebesgue measure is given by

(1)

where denotes the determinant of . In view of the next section it is useful to recall some properties of the normal distribution.

Remark 2.1.

(Properties of Gaussian distribution on )

  1. The Gaussian density function in (1) maximizes the entropy
    over all density functions on with fixed mean and covariance matrix .

  2. Let , , be i.i.d. realizations of an absolutely continuous distribution having first and second moments, denoted by and . Then the likelihood function reads as and the maximum likelihood (ML) estimator is defined as

    It holds that

    (2)

    if and only if the density function is of the form (1), see, e.g., [25, 65]. For the normal distribution the ML estimator of the covariance matrix reads as

    (3)
  3. The density function of the standard normal distribution with the identity matrix is the kernel of the heat equation.

In order to compute the MMSE estimator for Gaussian random variables we need to determine the conditional distribution of given . It is well known that, if and are jointly normally distributed, i.e.,

then the conditional distribution of given is normally distributed as well and reads as

where

As a consequence we obtain for normally distributed random vectors the MMSE estimator

(4)

In our situation (denoising) fits into the above framework if we set

(5)

where we assume that and are independent and is known. Then , and by the independence of and further and

(6)

Now, the MMSE of given in (4) becomes

(7)
(8)

Two remarks may be useful to see the relation to other estimators.

Remark 2.2.

(Relation between MMSE and BLUE)
The estimator of the general form

(9)

makes also sense for more general distributions. It is known as best linear unbiased estimator (BLUE), as it is an unbiased estimator which has minimum mean-square error among all affine estimators. For jointly normally distributed and it coincides with .

Remark 2.3.

(Relation between MMSE and MAP)
The MMSE can also be derived in a Bayesian framework under a Gaussian prior (see, e.g. [27]), which is detailed in the following. Let , , , where and are independent. This implies and , so that the respective densities are given by

and

By Bayes’ formula we have

and therewith, the maximum a posteriori (MAP) estimate reads as

Setting the gradient to zero results in

Observing that and , we obtain finally

In practical applications, the parameters and are unknown and need to be estimated using realizations (observations) of . Here we use the ML estimators given in (2) and (3). Note that the ML estimator for the covariance matrix is slightly biased. Instead we could also deal with an unbiased estimator by replacing the averaging factor by . However, the numerical difference is negligible for large .

Summarizing our findings, we obtain the following empirical estimator

(10)
Remark 2.4 (Positive definiteness of the empirical covariance matrix).

Equation (10) contains via (6) the hidden assumption that However, based on the empirical covariance matrix it is not necessarily ensured that is positive semi-definite and thus a valid covariance matrix. There are different ways to overcome this problem, e.g. replacing negative eigenvalues by a small positive value as for instance proposed in [57], compare also the discussion in [40, Section 3.5] or [79, page 406]. In our numerical experiments we did not observe that this issue had negative impacts on the results.

2.2 Denoising Using the MMSE Approach

Next we describe how the results of the previous section can be used for image denoising. To this aim, let be a discrete gray-value image, defined on a grid . By a slight abuse of notation we also write , where for the columnwise reshaped version of the image. It will be always clear from the context to which notation we refer. We assume that the image is corrupted with white Gaussian noise, i.e.,

where is now a realization of . Based on we wish to reconstruct the original image .
We use the fact that natural images are to some extend self-similar, i.e., small similar patches may be found several times in the image, and that for these patches locally a normality assumption holds approximately true, see, e.g. [81]. To formalize this idea, consider an neighborhood (patch) centered at , where , . After vectorization this corresponds to a realization of an -dimensional normally distributed random vector , where . This patch is referred to as a reference patch in the following. Similar patches are interpreted as other realizations of . There are several strategies to define similar patches. Take for example, for a fixed , the nearest patches with respect to the Euclidean distance in a search window around , where , . Let denote the set of centers of patches similar to . Then the estimates of the expectation value (2) and the covariance (3) become

The obtained estimates are then used to restore the reference patch and all its similar patches with (10) as:

(11)

Proceeding as above for all pixels yields a variable number of estimates for each pixel. Therewith, the final estimate at pixel is obtained as an average over all patches containing the pixel (aggregation). There are some fine-tuning steps that were partly also considered in [40, 41]. This is summarized in the following remark.

Remark 2.5.

(Fine tuning steps)

  1. Boundaries: Special attention has to be paid to patches at the boundaries of an image. There are at least two possibilities: Either, one extends the image, e.g. by mirroring, or one considers only patches lying completely inside the image together with appropriately smaller search windows. To our opinion the second strategy is preferable since it does not introduce artificial information. However, it leads to less estimates at the boundaries of the image, but we observed that this does not yield visible artifacts in practice.

  2. Flat areas: Flat areas, where differences between patches are only caused by noise, require a special consideration, as it is very likely that the estimated covariance matrix will not have full rank. In this case, the patches are better denoised by only using their mean. Flat areas might be detected using the empirical variance of the patches, which is close to .

  3. Second step: The similarity of patches and the covariance structure of the patches can be better estimated using the first step denoised image as an oracle image for a second step.

  4. Acceleration: To speed up the denoising procedure, each patch that has been used (and therefore denoised at least once) in a group of similar patches is not considered as reference patch anymore. Nevertheless, it may be denoised several times by being potentially chosen in other groups.

The whole denoising procedure is given in Algorithm 1. We would like to point out the differences between the two steps, which look at first glance very similar: Step 2 uses the denoised image from Step 1 in order to find similar patches and to estimate the covariance matrix, but reuses the original noisy image for the other computations, i.e, for the mean patch and the restored image.

Input: noisy image , variance of the noise
Output: first step denoised image and final image
Parameters: sizes of patches, numbers of similar patches, homogeneous area parameter, sizes of search areas
Step 1:
for all patches of the noisy image not considered before do
     Determine the set of centers of patches similar to in a window around
     Compute the empirical mean patch, ,
     Homogeneous area test: Compute the mean value
     and the empirical variance of the patches
     if  then
         Compute the restored patches ,
     else
         Compute the empirical covariance matrix
         Compute the restored patches ,      
     Aggregation: Obtain the first estimate at each pixel by computing the average over all restored patches containing the pixel
Algorithm 1 Nonlocal MMSE Denoising Algorithm on , Step 1
Step 2:
for all patches of the noisy image not considered before do
     Determine in a window around the set of centers of patches which are similar to patch  of the denoised image in Step 1.
     Compute the empirical mean patch, ,
     Homogeneous area test: Compute the mean value by
     and the empirical variance of the patches
     
     if  then
         Compute the restored patches ,
     else
         Compute the empirical covariance matrix
         Compute the restored patches ,      
     Aggregation: Obtain the final estimate at each pixel by computing the average over all restored patches containing the pixel
Algorithm 2 Nonlocal MMSE Denoising Algorithm on , Step 2

The overall approach can be generalized to images with values in , , in a straightforward way, dealing now with -dimensional random vectors, where . In particular, RGB-color images () can be denoised in this way. At this point, considering the three color channels independently does usually not yield good results as there is a significant correlation between the red, the green, and the blue color channel. This correlation is correctly taken into account in the three-variate setting. As an alternative, Lebrun et al. [41] suggested to work in the so-called color space [50], which is a variant of the YUV space where transform from the RGB space is orthogonal and thus does not change the noise statistics. This color system separates geometric from chromatic information and thereby decorrelates the color channels, so that treating them independently does not create noticeable color artifacts as it would be the case in the RGB space.

3 Random Points on Manifolds

Instead of -valued images we are now interested in images having values in a -dimensional manifold . We start by introducing the necessary notation in Riemannian manifolds. In our numerical examples we will deal with images having components on the -sphere equipped with the Euclidean metric of the embedding spaces , , and the manifold of positive definite -matrices , , with the affine invariant metric. For these manifolds the specific expressions of the following quantities are given in Appendix B. Further, we will consider the open probability simplex , , with the Rao-Fisher metric obtained from the categorial distribution and the hyperbolic manifold , , equipped with the Minkowski metric. Besides many textbooks on differential geometry the reader may have a look into Pennec’s paper [53] to get an overview. We adapted our notation to this paper.

Manifolds

If not stated otherwise, let be a complete, connected -dimensional Riemannian manifold. All of the previously mentioned manifolds are complete, except for the probability simplex. Observe that we will work with patches of -dimensional manifolds such that we finally deal with product manifolds of dimension with the usual product metric. By we denote the tangent space of at and by the Riemannian metric. Let , , , be the geodesic starting from with . Since is complete, the exponential map with

is well-defined for every . The exponential map realizes a local diffeomorphism (exponential chart) from a “sufficiently small neighborhood” of the origin of into a neighborhood of . To precise how large this “small neighborhood” can be chosen, we follow the geodesic from to infinity. It is either minimizing all along or up to a finite time and not any longer afterwards. In the latter case, is called cut point and the corresponding tangent vector is called tangential cut point. The set of all cut points of all geodesics starting from is the cut locus and the set of corresponding vectors the tangential cut locus. Then the open domain around bounded by the tangential cut locus is the maximal domain for which the exponential chart at is injective. It is connected and star-shaped with respect to and . This allows to define the inverse exponential map as

For the -sphere the cut locus of is just its antipodal point . Thus the tangential cut locus is the ball with radius around and its boundary. For Hadamard manifolds which are complete, simply-connected manifolds with non-positive sectional curvature [6], as or , we have that .

The Riemannian metric yields a distance function on the manifold by and a measure written in local coordinates by , where and .

Random Points

Let be a probability space and the Borel -algebra on (with respect to ). A measurable map is called a random point on . We consider only absolutely continuous random points with probability density , i.e., for all and . The variance of with respect to a given point is defined as

(12)

and local minimizers of are called Riemannian centers of mass [36]. For a discussion of the existence and uniqueness of global minimizers, known as Fréchet expectation or means of see, e.g., [1, 36, 37]. For Hadamard manifolds with curvature bounded from below the Riemannian center of mass exists and is unique. For the spheres , if the support of is contained in a geodesic ball of radius , then the Riemannian center of mass is unique within this ball and it is the global minimizer of (12). In the following we assume that the variance is finite and the cut locus has a probability measure zero at any point . Then a necessary condition for to be a Riemannian center of mass is

(13)

For Hadamard manifolds with curvature bounded from below this condition is also sufficient. Assuming that the mean is known, we define the covariance matrix of (with respect to ) by

(14)

In practice, typically and are unknown and need to be estimated. Given observations of a random point , we estimate the mean point by

(15)

which is according to [12] a consistent estimator of and can be computed by a gradient descent algorithm, see, e.g. [2]. An estimator for the covariance matrix reads as

(16)

4 Nonlocal Patch-Based Denoising of Manifold-Valued Images

In this section we propose an NL-MMSE denoising algorithm for manifold-valued images. To this end, we have to specify what we mean by “normally distributed” random points on manifolds. In contrast to the vector space setting, there does not exist a canonical definition of a normally distributed random vector on a manifold since various properties characterizing the normal distribution on as those in Remark 2.1, cannot be generalized to the manifold setting in a straightforward way. Here we rely on a simple approach which transfers normally distributed zero mean random vectors on tangent spaces via the exponential map to the manifold. Based on this definition we will see how the NL-MMSE from Section 2 carries over to manifold-valued images.

4.1 Gaussian Random Points

In the following, we describe the Gaussian model used in this paper for Hadamard manifolds and spheres and discuss its relation to other models for small variances.

For each tangent space with fixed orthonormal basis we can identify the element with the local coordinate vector , which establishes an isometry between and . Note that also the expressions in (14) and (16) are basis dependent, but assuming a fixed basis the relation skipped for simplicity of notation. Now, let and let be the linear isometric mapping

(17)

Let . Since is continuous, we have for any that is a Borel set and for any integrable function it holds

(18)

where . Conversely, for any Borel set and any integrable function we see that and

(19)

where If is standard normally distributed on with pdf , then

(20)

is a random point on . For Hadamard manifolds, we have so that for ,

(21)

Thus, has the pdf

(22)

Note that by incorporating the factor into the density function we avoid problems as discussed in [35]. By construction and (13) it follows directly that the mean of is and the covariance (14) is . We consider as normally distributed on and write . In other words, is normally distributed on with mean and covariance if is standard normally distributed on .

If as it is the case for -spheres, we assume that up to a set of Lebesgue measure zero , where is an index set and . Further, we suppose that there are diffeomorphisms such that for it holds . Then, in order to obtain the pdf of in (20), we have to replace in (22) by the wrapped function

(23)

Now, we follow the same lines as in the Euclidean setting and agree that is normally distributed with mean and positive definite covariance if , respectively,

(24)

and write .

The following proposition shows how the pdf of a normally distributed random point (20) looks for various one-dimensional manifolds.

Proposition 4.1.

The pdf of a random point is given by

  1. the log-normal distribution for ,

    with respect to the measure on ;

  2. the -wrapped Gaussian distribution for ,

    (25)

    with respect to the parameterization
    and the Lebesgue measure ;

  3. the -wrapped, even shifted Gaussian distribution for ,

    (26)

    with respect to the parameterization , , and Lebesgue measure .

The proof of the proposition is given in the Appendix A.

The above definition (24) of normally distributed random points has the advantage that it adopts the affine invariance of the Gaussian distribution known from the Euclidean setting via the tangent space. Moreover, it is easy to sample from the distribution.

Remark 4.2 (Sampling from ).

Sampling of a distributed random variable can be performed as follows: i) sample from in , ii) apply which by (17) requires only the knowledge of an orthogonal basis in , and iii) map the result by to .

For the one-dimensional manifolds in Proposition 4.1, the pdfs in (i) - (iii) are the kernels of the heat equations with the corresponding Laplace-Beltrami operators. This is in general not true for higher dimensions [33]. However, numerical experiments show that samples from the Gauss-Weierstrass kernel on [29, p. 112] and from the heat kernel on  [67, p. 107] are very similar. For kernel density estimations on special Hadamard spaces we refer also to [19] and for kernels in connection with dithering on the sphere we refer to [32].

Neither the maximizing entropy nor the ML estimation property from Remark 2.1 generalize to the above setting. In [53] Pennec showed that under certain conditions on the pdf of a random point on that maximizes the entropy given prescribed mean value and covariance is of the form with normalization constant . Said and co-workers made use of the ML-estimator property in order to generalize the (isotropic) normal distribution to in [61] and to symmetric spaces of non-compact type in [62]. They proposed the following density function for a normal distribution with mean and covariance :

(27)

For the concrete definition of and the noise simulation according to this model in the case of see [61] and the Appendix C. For the models coincide by Proposition 4.1. For this distribution it is clear that the Riemannian center of mass is given by and that the ML estimator for is the empirical Karcher mean. Figure 1, left shows a image of realizations of normally distributed noise on by our model , where . For comparison, in Figure 1, right the same is shown for realizations of , also for . The noise looks visually very similar, which is also confirmed by Table 1. The first two rows of Table 1 present the estimated mean based on (15), the covariance matrix based on (16) and the estimated standard deviation for our noise model and those of Said et al..

Table 1: Estimated parameters for the noise in Figure 1.
Figure 1: Sampling with respect to our noise model and that of Said et al. [61] for samples and with .

Below we give an example for the above pdf (27) and those in (22) for the manifold .

Example 4.3.

Let be the hyperbolic manifold equipped with the Minkowski metric . The distance reads as and

(28)
(29)

We parametrize as

(30)

First, we compute the pdf (22) of an